Cool, I'm going to take a look. Thanks Niels. On Thu, Mar 18, 2010 at 1:34 PM, Niels Hoogeveen <pd_aficion...@hotmail.com>wrote:
> > Github project is up and running: > http://github.com/NielsHoogeveen/Scala-Neo4j-utils > > For now it contains only one file with my first attempt writing a traverser > in Scala. > > Scala version 2.8 is required. > > Kind regards, > Niels Hoogeveen > > > From: linxbet...@gmail.com > > Date: Wed, 17 Mar 2010 22:21:49 -0500 > > To: user@lists.neo4j.org > > Subject: Re: [Neo] basic questions > > > > That sounds pretty cool. Do you have a github project? > > > > On Wed, Mar 17, 2010 at 7:23 PM, Niels Hoogeveen > > <pd_aficion...@hotmail.com>wrote: > > > > > > > > The example I provided is in effect nothing but a couple of nested > > > iterations, so nothing like a real traversal, but certainly comparable > to > > > the the java pipes or the Gremlin path expressions. > > > > > > I am working on a Scala based traverser, having higher order functions > > > makes it possible to supply filter expressions to the traverser. The > result > > > of a traversal can then be used in a for comprehension to further > navigate > > > the results or to compose different traversals. > > > > > > Kind regards, > > > Niels Hoogeveen > > > > > > > From: linxbet...@gmail.com > > > > Date: Wed, 17 Mar 2010 19:39:07 -0400 > > > > To: user@lists.neo4j.org > > > > Subject: Re: [Neo] basic questions > > > > > > > > I really like that syntax, however I'm concerned that although the > intent > > > is > > > > expressed concisely, behind the scenes it's still going to have to > > > traverse > > > > the entire relationship graph. The guards in the for comprehension > will > > > > translate to filter() calls but it still has to examine everything in > the > > > > Iterable, right? > > > > > > > > I've seen some very cool scala wrappers for SQL and MongoDB that > > > > re-implement map(), filter(), and flatMap() so that you can use for > > > > comprehensions even though it's doing custom work behind the scenes. > It > > > > would be really cool to do that for Neo4j and have it generate > efficient > > > > traversals instead of doing naive iteration. > > > > > > > > Thanks, > > > > Lincoln > > > > > > > > On Wed, Mar 17, 2010 at 6:32 PM, Niels Hoogeveen > > > > <pd_aficion...@hotmail.com>wrote: > > > > > > > > > > > > > > The original questioner works in Scala where the pipe concept can > > > easily be > > > > > expressed using for comprehensions. > > > > > > > > > > val node = index.getSingleNode("id") > > > > > for( > > > > > i <- node.getRelationships(FOLLOWS, OUTGOING) //binds the > iteration > > > over > > > > > the relations to i > > > > > if(i.getProperty("follow_code") != "not important"); // filters > the > > > > > properties of the relations bound to i > > > > > j <- i.getEndNode; // binds the end nodes of the relations to j > > > > > if(j.getProperty("name") == "something"); // filters the > properties of > > > the > > > > > node bound to j > > > > > k <- j.getRelationships(RefersTo, OUTGOING) //binds the iteration > over > > > the > > > > > relations of j > > > > > if(k.getProperty("CREATED") <= "2010-03-17"); // filters the > > > properties of > > > > > the relation bound to k > > > > > )yield{ > > > > > println(k.getProperty("name")) > > > > > } > > > > > > > > > > Kind regards, > > > > > Niels Hoogeveen > > > > > > > > > > > From: okramma...@gmail.com > > > > > > Date: Wed, 17 Mar 2010 14:37:32 -0700 > > > > > > To: user@lists.neo4j.org > > > > > > Subject: Re: [Neo] basic questions > > > > > > > > > > > > Hey, > > > > > > > > > > > > You might want to consider Blueprints Pipes for a more controlled > > > > > traverser framework that doesn't require the use of for-loops and > > > allows you > > > > > to specify arbitrary paths through a graph. > > > > > > > > > > > > > > > http://wiki.github.com/tinkerpop/blueprints/pipes-traversal-framework > > > > > > > > > > > > For the example viewer-->FOLLOWS-->user-->CREATED-->message do, > > > > > > > > > > > > ////////////////////////////////////////// > > > > > > Pipe<Vertex,Edge> pipe1 = new > > > > > VertexEdgePipe(VertexEdgePipe.Step.OUT_EDGES); > > > > > > Pipe<Edge,Edge> pipe2 = new > LabelFilterPipe(Arrays.asList("FOLLOWS"), > > > > > false); > > > > > > Pipe<Edge,Vertex> pipe3 = new > > > > > EdgeVertexPipe(EdgeVertexPipe.Step.IN_VERTEX); > > > > > > Pipe<Vertex,Edge> pipe4 = new > > > > > VertexEdgePipe(VertexEdgePipe.Step.OUT_EDGES); > > > > > > Pipe<Edge,Edge> pipe5 = new > LabelFilterPipe(Arrays.asList("CREATED"), > > > > > false); > > > > > > Pipe<Edge,Vertex> pipe6 = new > > > > > EdgeVertexPipe(EdgeVertexPipe.Step.IN_VERTEX); > > > > > > Pipeline<Vertex,Vertex> pipeline = new > > > > > > > > > Pipeline<Vertex,Vertex>(Arrays.asList(pipe1,pipe2,pipe3,pipe4,pipe5,pipe6)); > > > > > > > > > > > > Neo4jGraph graph = new Neo4jGraph('/dir/neo') > > > > > > graph.startTransaction(); > > > > > > pipeline.setStarts(Arrays.asList(viewer).iterator()); > > > > > > while(pipeline.hasNext()) { > > > > > > System.out.println("message: " + pipeline.next()); > > > > > > } > > > > > > graph.stopTransaction(true); > > > > > > ////////////////////////////////////////// > > > > > > > > > > > > NOTE: I hand typed this from memory so there might be some errors > > > here or > > > > > there.... > > > > > > > > > > > > A Pipe/Pipeline implements Iterator so you can just page out as > many > > > > > items as you want that can legally flow through pipeline... > > > > > > > > > > > > If this is interesting to you and if you use MVN and Git, you may > > > want to > > > > > build the latest and greatest of Blueprints [ > > > > > http://blueprints.tinkerpop.com ] as I continually add new Pipes > to do > > > new > > > > > things [ see > > > > > > > > > http://wiki.github.com/tinkerpop/blueprints/pipes-traversal-framework#pipes_api > > > ]. > > > > > > > > > > > > Also, a non-iterator based mechanism is provided by Gremlin [ > > > > > http://gremlin.tinkerpop.com ] which would express the same thing > as: > > > > > > > > > > > > $messages := > ./ou...@label='FOLLOWS']/inV/ou...@label='CREATED']/inV > > > > > > > > > > > > Take care, > > > > > > Marko. > > > > > > > > > > > > http://markorodriguez.com > > > > > > http://gremlin.tinkerpop.com > > > > > > > > > > > > > > > > > > On Mar 17, 2010, at 2:19 PM, Lincoln wrote: > > > > > > > > > > > > > Wow dude, this is blowing my mind just a little. > > > > > > > > > > > > > > Ok, sticking with the twitter example, I'm concerned about the > edge > > > > > cases. > > > > > > > I'd say it's easy to optimize with a relational db or any other > > > storage > > > > > for > > > > > > > that matter if I make the assumption that people only follow a > few > > > > > hundred > > > > > > > people and only want recent messages. However some people > follow > > > > > hundreds > > > > > > > of thousands of people. If Guy Kawasaki uses my app, I'd run > into > > > a > > > > > problem > > > > > > > quickly. > > > > > > > > > > > > > > However I see your point that I don't have to limit myself to > just > > > the > > > > > > > obvious relationships, but can create relationships that serve > > > specific > > > > > > > purposes and use-cases such as your day example. I'm not sure > how > > > I > > > > > would > > > > > > > want to model my use-case to allow for Guy Kawaski, I'll have > to > > > think > > > > > more > > > > > > > about it. Is there a threshold beyond which adding > relationships > > > > > between > > > > > > > nodes causes problems? If not, or if it's high, you could > create > > > > > custom > > > > > > > relationships for every type of query you'd want to do. > > > > > > > > > > > > > > However, a secondary question comes up. If we continue with > the > > > > > twitter > > > > > > > example, and I want to be able to page through results, is that > > > > > directly > > > > > > > supported through Neo4j's API? Coming from a more traditional > > > storage > > > > > > > background I tend to think of what I'd want as a sort by time > and > > > then > > > > > a > > > > > > > skip and limit on the results (so I could say give me messages > > > 1-100 > > > > > sorted > > > > > > > by time descending). Is there anything equivalent in Neo4j or > is > > > the > > > > > > > approach totally different? > > > > > > > > > > > > > > Thanks, > > > > > > > Lincoln > > > > > > > > > > > > > > > > > > > > > On Wed, Mar 17, 2010 at 12:41 PM, Craig Taverner < > cr...@amanzi.com > > > > > > > > > wrote: > > > > > > > > > > > > > >> Hi Lincoln, > > > > > > >> > > > > > > >> So it sounds like you don't need the IS_VISIBLE relations > after > > > all. > > > > > The > > > > > > >> traverser works by following all relationships of the > specified > > > types > > > > > and > > > > > > >> directions from each current node (as you traverse, or walk > the > > > > > graph). You > > > > > > >> can have a complex graph and traverse to high depth very fast > > > > > (thousands of > > > > > > >> relationships per second). The traverser will also > automatically > > > check > > > > > that > > > > > > >> the same node is not returned twice. The test for the > relationship > > > > > type is > > > > > > >> efficient. Still reasonable, but less efficient is the custom > test > > > you > > > > > > >> might > > > > > > >> put in the returnable evaluator, but if the limiting factor is > > > usually > > > > > the > > > > > > >> number of relationships traversed, and if that is kept > managable, > > > the > > > > > > >> evaluator test is no concern. > > > > > > >> > > > > > > >> I think twitter is a good case in point, even with many > millions > > > of > > > > > users, > > > > > > >> you will still only follow perhaps a hundred and they will > tweet > > > > > perhaps a > > > > > > >> hundred, or a thousand times, so your traverser will find the > > > 10k-100k > > > > > > >> messages quite quickly. This can be speeded up further, but > the > > > right > > > > > > >> approach depends again on your use case. The idea with using a > > > graph > > > > > > >> database is that the actual usage probably maps very well to > the > > > graph > > > > > > >> structure, so when deciding how to speed up your search, > consider > > > how > > > > > it > > > > > > >> will be used. In twitter one normally only cares about recent > > > > > messages, so > > > > > > >> how about not linking directly from the user to the message, > but > > > link > > > > > to an > > > > > > >> intermediate node representing time, for example, a day-node. > Then > > > > > each new > > > > > > >> message is added to the day node for that day, and that will > > > > > automatically > > > > > > >> become yesterday the next day. Then your traversal can have a > stop > > > > > > >> evaluator > > > > > > >> to not follow old messages (unless your query is looking for > old > > > > > messages, > > > > > > >> of course). So the 100k messages might drop to only a few > hundred, > > > or > > > > > even > > > > > > >> just a few dozen. Certainly that will be a query of the order > of > > > > > > >> milliseconds! > > > > > > >> > > > > > > >> Moving away from the traverser, you also have the option to > call > > > > > directly > > > > > > >> the getRelationships() methods from the node. If you structure > is > > > > > > >> predictable, like viewer-->FOLLOWS-->user-->CREATED-->message, > > > then > > > > > two > > > > > > >> nested for loops would work, the outer iterating over the > > > followers > > > > > and the > > > > > > >> inner iterating over the messages. If you changed to add a > > > time-based > > > > > > >> interim node (which is a kind of graph-index), then you need > to > > > have > > > > > three > > > > > > >> loops. If you made your time index a deeper tree > > > (months->days->hours, > > > > > > >> etc.), then you would need to further refactor the code. > However, > > > if > > > > > you > > > > > > >> stuck with a traverser, you might not need to change the > traverser > > > > > even of > > > > > > >> the graph structure changed, as long as the same relationship > > > types > > > > > were > > > > > > >> maintained. Does that make sense? > > > > > > >> > > > > > > >> Cheers, Craig > > > > > > >> > > > > > > >> On Wed, Mar 17, 2010 at 4:00 PM, Lincoln < > linxbet...@gmail.com> > > > > > wrote: > > > > > > >> > > > > > > >>> Thanks Craig, > > > > > > >>> > > > > > > >>> I'd like to clarify my question (I don't think it changes > your > > > answer > > > > > > >>> though). > > > > > > >>> > > > > > > >>> I wanted all messages visible to me created by users I > follow. > > > Thus, > > > > > the > > > > > > >>> FOLLOWS relationship is not enough. I'd need to see messages > > > that > > > > > are > > > > > > >>> visible to me and then check if they were created by users I > > > follow, > > > > > or > > > > > > >> I'd > > > > > > >>> need to see messages created by users I follow and then see > if > > > > > they're > > > > > > >>> visible to me. > > > > > > >>> > > > > > > >>> I assume your last example still yields the result I'm > looking > > > for. > > > > > > >> Could > > > > > > >>> you describe what actually happens here though? I'm unclear > on > > > what > > > > > the > > > > > > >>> traversal looks like. Would it first traverse every outgoing > > > FOLLOWS > > > > > > >>> relationship from the viewer, yielding other users, and then > > > traverse > > > > > all > > > > > > >>> the CREATED relationships to get to messages? > > > > > > >>> > > > > > > >>> Also, given very large numbers of FOLLOWS and CREATED > > > relationships > > > > > (with > > > > > > >>> say, a twitter graph), how is this made efficient? > > > > > > >>> > > > > > > >>> Sorry for all the basic questions but I couldn't find this > > > > > information in > > > > > > >>> the docs. If there's something I should be reading before > > > posting > > > > > these > > > > > > >>> questions, please point me to it. > > > > > > >>> > > > > > > >>> Thanks! > > > > > > >>> > > > > > > >>> Lincoln > > > > > > >>> > > > > > > >>> On Wed, Mar 17, 2010 at 7:06 AM, Craig Taverner < > > > cr...@amanzi.com> > > > > > > >> wrote: > > > > > > >>> > > > > > > >>>> I'm uncertain about one ambiguity in your model, you are > able to > > > > > find > > > > > > >>>> messages through FOLLOWS and IS_VISIBLE_BY. These will give > two > > > > > > >> different > > > > > > >>>> sets, and my first impression was that FOLLOWS gives you the > > > right > > > > > > >>> answer. > > > > > > >>>> In other words you want to query for 'all messages by users > I > > > > > follow'? > > > > > > >> In > > > > > > >>>> that case you do not need IS_VISIBLE_BY. However, if there > are > > > > > messages > > > > > > >>> by > > > > > > >>>> people you follow, but are not allowed to see, then you also > > > need > > > > > the > > > > > > >>>> IS_VISIBLE_BY. But I would still reconsider linking directly > > > from > > > > > the > > > > > > >>>> viewer > > > > > > >>>> to the message for that case. I'd rather have the messages > > > linked to > > > > > > >> some > > > > > > >>>> categorization structure for things like 'public', > 'private', > > > etc. > > > > > > >>>> > > > > > > >>>> Anyway, here are some suggestions for the various approaches > > > above: > > > > > > >>>> *'all messages by users I follow'* > > > > > > >>>> val msgs = viewer.traverse( > > > > > > >>>> Order.BREADTH_FIRST, StopEvaluator.END_OF_GRAPH, > > > > > > >>>> (tp: TraversalPosition) => IsMessage(tp.currentNode()), > > > > > > >>>> Rels.FOLLOWS, Direction.OUTGOING, > > > > > > >>>> Rels.CREATED, Direction.OUTGOING) > > > > > > >>>> > > > > > > >>>> *'all messages visible to me'* > > > > > > >>>> val msgs = viewer.traverse( > > > > > > >>>> Order.BREADTH_FIRST, StopEvaluator.END_OF_GRAPH, > > > > > > >>>> ReturnableEvaluator.ALL_BUT_START_NODE, > > > > > > >>>> Rels.IS_VISIBLE_BY, Direction.INCOMING) > > > > > > >>>> > > > > > > >>>> *'all messages, visible to me, by people I follow'* > > > > > > >>>> val msgs = viewer.traverse( > > > > > > >>>> Order.BREADTH_FIRST, StopEvaluator.END_OF_GRAPH, > > > > > > >>>> (tp: TraversalPosition) => { > > > > > > >>>> val msg = tp.currentNode() > > > > > > >>>> IsMessage(msg) && IsVisibleBy(msg,viewer) > > > > > > >>>> }, > > > > > > >>>> Rels.FOLLOWS, Direction.OUTGOING, > > > > > > >>>> Rels.CREATED, Direction.OUTGOING) > > > > > > >>>> > > > > > > >>>> Of course I assume you make the utility functions > > > IsMessage(node: > > > > > Node) > > > > > > >>> and > > > > > > >>>> IsVisibleBy(msg: Node, user: Node), and these will test the > > > > > existance > > > > > > >> of > > > > > > >>>> properties and relations as appropriate to make the > decision. > > > > > > >>>> > > > > > > >>>> > > > > > > >>>> On Wed, Mar 17, 2010 at 6:32 AM, Lincoln < > linxbet...@gmail.com> > > > > > wrote: > > > > > > >>>> > > > > > > >>>>> Hi, I've just started looking at Neo4j and I'm quite > intrigued. > > > > > > >>> However, > > > > > > >>>>> the cognitive dissonance that I've grown so used to in > modeling > > > > > > >> storage > > > > > > >>>> is > > > > > > >>>>> proving to be a bit difficult to let go at this early stage > :) > > > > > > >>>>> > > > > > > >>>>> I was hoping that if someone could help me through an > example I > > > > > would > > > > > > >>> be > > > > > > >>>>> able to grok how to properly structure my data and query it > in > > > > > Neo4j. > > > > > > >>>>> > > > > > > >>>>> Nodes: > > > > > > >>>>> Message( text: String ) > > > > > > >>>>> User( id: Long ) > > > > > > >>>>> > > > > > > >>>>> Relationships: > > > > > > >>>>> CREATED > > > > > > >>>>> FOLLOWS > > > > > > >>>>> IS_VISIBLE_BY > > > > > > >>>>> > > > > > > >>>>> So I might have a graph with entries like so: > > > > > > >>>>> > > > > > > >>>>> User(1) --> CREATED --> Message("i woke up late today") > > > > > > >>>>> User(2) --> CREATED --> Message("hello") > > > > > > >>>>> User(3) --> CREATED --> Message("ugh, i hate mondays") > > > > > > >>>>> > > > > > > >>>>> User(1) --> FOLLOWS --> User(2) > > > > > > >>>>> > > > > > > >>>>> Let's also say all messages are visible to User 1. > > > > > > >>>>> > > > > > > >>>>> Message("i woke up late today") --> IS_VISIBLE_BY --> > User(1) > > > > > > >>>>> Message("hello") --> IS_VISIBLE_BY --> User(1) > > > > > > >>>>> Message("ugh, i hate mondays") --> IS_VISIBLE_BY --> > User(1) > > > > > > >>>>> > > > > > > >>>>> So, I can do a simple traversal for visible: > > > > > > >>>>> > > > > > > >>>>> val graphDb = new EmbeddedGraphDatabase( "path/to/neo4j-db" > ) > > > > > > >>>>> val index = new LuceneIndexService( graphDb ) > > > > > > >>>>> val viewer = index.getSingleNode("id", 1) > > > > > > >>>>> val msgs = viewer.traverse( Order.BREADTH_FIRST, > > > > > > >>>>> StopEvaluator.END_OF_GRAPH, > > > > > > >>>>> ReturnableEvaluator.ALL_BUT_START_NODE, Rels.IS_VISIBLE_BY, > > > > > > >>>>> Direction.INCOMING) > > > > > > >>>>> msgs.toList.map(_.toJson).mkString("{ msgs : [", ",", "] > }") > > > // > > > > > > >>> assuming > > > > > > >>>> i > > > > > > >>>>> have the relevant functions > > > > > > >>>>> > > > > > > >>>>> But let's say that this is going to return too many > messages. > > > Just > > > > > > >>>> because > > > > > > >>>>> all the messages are possibly visible to me, doesn't mean I > > > want to > > > > > > >> see > > > > > > >>>>> them > > > > > > >>>>> all. So, I'd like to additionally filter by the FOLLOWS > > > > > > >> relationship. > > > > > > >>>> I'd > > > > > > >>>>> like to express "get all messages that are visible and were > > > created > > > > > > >> by > > > > > > >>> a > > > > > > >>>>> user that I follow." Can someone show me an example of how > to > > > do > > > > > > >> that? > > > > > > >>>>> > > > > > > >>>>> I'm guessing that you need to implement a custom > > > > > ReturnableEvaluator, > > > > > > >>> but > > > > > > >>>> I > > > > > > >>>>> don't understand how you traverse multiple relationships at > the > > > > > same > > > > > > >>>> time. > > > > > > >>>>> > > > > > > >>>>> Thanks, > > > > > > >>>>> Lincoln > > > > > > >>>>> _______________________________________________ > > > > > > >>>>> Neo mailing list > > > > > > >>>>> User@lists.neo4j.org > > > > > > >>>>> https://lists.neo4j.org/mailman/listinfo/user > > > > > > >>>>> > > > > > > >>>> _______________________________________________ > > > > > > >>>> Neo mailing list > > > > > > >>>> User@lists.neo4j.org > > > > > > >>>> https://lists.neo4j.org/mailman/listinfo/user > > > > > > >>>> > > > > > > >>> _______________________________________________ > > > > > > >>> Neo mailing list > > > > > > >>> User@lists.neo4j.org > > > > > > >>> https://lists.neo4j.org/mailman/listinfo/user > > > > > > >>> > > > > > > >> _______________________________________________ > > > > > > >> Neo mailing list > > > > > > >> User@lists.neo4j.org > > > > > > >> https://lists.neo4j.org/mailman/listinfo/user > > > > > > >> > > > > > > > _______________________________________________ > > > > > > > Neo mailing list > > > > > > > User@lists.neo4j.org > > > > > > > https://lists.neo4j.org/mailman/listinfo/user > > > > > > > > > > > > _______________________________________________ > > > > > > Neo mailing list > > > > > > User@lists.neo4j.org > > > > > > https://lists.neo4j.org/mailman/listinfo/user > > > > > > > > > > _________________________________________________________________ > > > > > Express yourself instantly with MSN Messenger! Download today it's > > > FREE! > > > > > http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/ > > > > > _______________________________________________ > > > > > Neo mailing list > > > > > User@lists.neo4j.org > > > > > https://lists.neo4j.org/mailman/listinfo/user > > > > > > > > > _______________________________________________ > > > > Neo mailing list > > > > User@lists.neo4j.org > > > > https://lists.neo4j.org/mailman/listinfo/user > > > > > > _________________________________________________________________ > > > New Windows 7: Simplify what you do everyday. Find the right PC for > you. > > > http://windows.microsoft.com/shop > > > _______________________________________________ > > > Neo mailing list > > > User@lists.neo4j.org > > > https://lists.neo4j.org/mailman/listinfo/user > > > > > _______________________________________________ > > Neo mailing list > > User@lists.neo4j.org > > https://lists.neo4j.org/mailman/listinfo/user > > _________________________________________________________________ > Express yourself instantly with MSN Messenger! Download today it's FREE! > http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/ > _______________________________________________ > Neo mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user > _______________________________________________ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user