Re: [Neo] basic questions

Niels Hoogeveen Wed, 17 Mar 2010 17:23:29 -0700

The example I provided is in effect nothing but a couple of nested iterations, 
so nothing like a real traversal, but certainly comparable to the the java 
pipes or the Gremlin path expressions.


I am working on a Scala based traverser, having higher order functions makes it 
possible to supply filter expressions to the traverser. The result of a 
traversal can then be used in a for comprehension to further navigate the 
results or to compose different traversals.

Kind regards,
Niels Hoogeveen

> From: linxbet...@gmail.com
> Date: Wed, 17 Mar 2010 19:39:07 -0400
> To: user@lists.neo4j.org
> Subject: Re: [Neo] basic questions
> 
> I really like that syntax, however I'm concerned that although the intent is
> expressed concisely, behind the scenes it's still going to have to traverse
> the entire relationship graph.  The guards in the for comprehension will
> translate to filter() calls but it still has to examine everything in the
> Iterable, right?
> 
> I've seen some very cool scala wrappers for SQL and MongoDB that
> re-implement map(), filter(), and flatMap() so that you can use for
> comprehensions even though it's doing custom work behind the scenes.  It
> would be really cool to do that for Neo4j and have it generate efficient
> traversals instead of doing naive iteration.
> 
> Thanks,
> Lincoln
> 
> On Wed, Mar 17, 2010 at 6:32 PM, Niels Hoogeveen
> <pd_aficion...@hotmail.com>wrote:
> 
> >
> > The original questioner works in Scala where the pipe concept can easily be
> > expressed using for comprehensions.
> >
> > val node = index.getSingleNode("id")
> > for(
> >  i <- node.getRelationships(FOLLOWS, OUTGOING) //binds the iteration over
> > the relations to i
> >  if(i.getProperty("follow_code") != "not important"); // filters the
> > properties of the relations bound to i
> >  j <- i.getEndNode; // binds the end nodes of the relations to j
> >  if(j.getProperty("name") == "something"); // filters the properties of the
> > node bound to j
> >  k <- j.getRelationships(RefersTo, OUTGOING) //binds the iteration over the
> > relations of j
> >  if(k.getProperty("CREATED") <= "2010-03-17"); // filters the properties of
> > the relation bound to k
> > )yield{
> >  println(k.getProperty("name"))
> > }
> >
> > Kind regards,
> > Niels Hoogeveen
> >
> > > From: okramma...@gmail.com
> > > Date: Wed, 17 Mar 2010 14:37:32 -0700
> > > To: user@lists.neo4j.org
> > > Subject: Re: [Neo] basic questions
> > >
> > > Hey,
> > >
> > > You might want to consider Blueprints Pipes for a more controlled
> > traverser framework that doesn't require the use of for-loops and allows you
> > to specify arbitrary paths through a graph.
> > >
> > > http://wiki.github.com/tinkerpop/blueprints/pipes-traversal-framework
> > >
> > > For the example viewer-->FOLLOWS-->user-->CREATED-->message do,
> > >
> > > //////////////////////////////////////////
> > > Pipe<Vertex,Edge> pipe1 = new
> > VertexEdgePipe(VertexEdgePipe.Step.OUT_EDGES);
> > > Pipe<Edge,Edge> pipe2 = new LabelFilterPipe(Arrays.asList("FOLLOWS"),
> > false);
> > > Pipe<Edge,Vertex> pipe3 = new
> > EdgeVertexPipe(EdgeVertexPipe.Step.IN_VERTEX);
> > > Pipe<Vertex,Edge> pipe4 = new
> > VertexEdgePipe(VertexEdgePipe.Step.OUT_EDGES);
> > > Pipe<Edge,Edge> pipe5 = new LabelFilterPipe(Arrays.asList("CREATED"),
> > false);
> > > Pipe<Edge,Vertex> pipe6 = new
> > EdgeVertexPipe(EdgeVertexPipe.Step.IN_VERTEX);
> > > Pipeline<Vertex,Vertex> pipeline = new
> > Pipeline<Vertex,Vertex>(Arrays.asList(pipe1,pipe2,pipe3,pipe4,pipe5,pipe6));
> > >
> > > Neo4jGraph graph = new Neo4jGraph('/dir/neo')
> > > graph.startTransaction();
> > > pipeline.setStarts(Arrays.asList(viewer).iterator());
> > > while(pipeline.hasNext()) {
> > >       System.out.println("message: " + pipeline.next());
> > > }
> > > graph.stopTransaction(true);
> > > //////////////////////////////////////////
> > >
> > > NOTE: I hand typed this from memory so there might be some errors here or
> > there....
> > >
> > > A Pipe/Pipeline implements Iterator so you can just page out as many
> > items as you want that can legally flow through pipeline...
> > >
> > > If this is interesting to you and if you use MVN and Git, you may want to
> > build the latest and greatest of Blueprints [
> > http://blueprints.tinkerpop.com ] as I continually add new Pipes to do new
> > things [ see
> > http://wiki.github.com/tinkerpop/blueprints/pipes-traversal-framework#pipes_api].
> > >
> > > Also, a non-iterator based mechanism is provided by Gremlin [
> > http://gremlin.tinkerpop.com ] which would express the same thing as:
> > >
> > > $messages := ./ou...@label='FOLLOWS']/inV/ou...@label='CREATED']/inV
> > >
> > > Take care,
> > > Marko.
> > >
> > > http://markorodriguez.com
> > > http://gremlin.tinkerpop.com
> > >
> > >
> > > On Mar 17, 2010, at 2:19 PM, Lincoln wrote:
> > >
> > > > Wow dude, this is blowing my mind just a little.
> > > >
> > > > Ok, sticking with the twitter example, I'm concerned about the edge
> > cases.
> > > > I'd say it's easy to optimize with a relational db or any other storage
> > for
> > > > that matter if I make the assumption that people only follow a few
> > hundred
> > > > people and only want recent messages.  However some people follow
> > hundreds
> > > > of thousands of people.  If Guy Kawasaki uses my app, I'd run into a
> > problem
> > > > quickly.
> > > >
> > > > However I see your point that I don't have to limit myself to just the
> > > > obvious relationships, but can create relationships that serve specific
> > > > purposes and use-cases such as your day example.  I'm not sure how I
> > would
> > > > want to model my use-case to allow for Guy Kawaski, I'll have to think
> > more
> > > > about it.  Is there a threshold beyond which adding relationships
> > between
> > > > nodes causes problems?  If not, or if it's high, you could create
> > custom
> > > > relationships for every type of query you'd want to do.
> > > >
> > > > However, a secondary question comes up.  If we continue with the
> > twitter
> > > > example, and I want to be able to page through results, is that
> > directly
> > > > supported through Neo4j's API?  Coming from a more traditional storage
> > > > background I tend to think of what I'd want as a sort by time and then
> > a
> > > > skip and limit on the results (so I could say give me messages 1-100
> > sorted
> > > > by time descending).  Is there anything equivalent in Neo4j or is the
> > > > approach totally different?
> > > >
> > > > Thanks,
> > > > Lincoln
> > > >
> > > >
> > > > On Wed, Mar 17, 2010 at 12:41 PM, Craig Taverner <cr...@amanzi.com>
> > wrote:
> > > >
> > > >> Hi Lincoln,
> > > >>
> > > >> So it sounds like you don't need the IS_VISIBLE relations after all.
> > The
> > > >> traverser works by following all relationships of the specified types
> > and
> > > >> directions from each current node (as you traverse, or walk the
> > graph). You
> > > >> can have a complex graph and traverse to high depth very fast
> > (thousands of
> > > >> relationships per second). The traverser will also automatically check
> > that
> > > >> the same node is not returned twice. The test for the relationship
> > type is
> > > >> efficient. Still reasonable, but less efficient is the custom test you
> > > >> might
> > > >> put in the returnable evaluator, but if the limiting factor is usually
> > the
> > > >> number of relationships traversed, and if that is kept managable, the
> > > >> evaluator test is no concern.
> > > >>
> > > >> I think twitter is a good case in point, even with many millions of
> > users,
> > > >> you will still only follow perhaps a hundred and they will tweet
> > perhaps a
> > > >> hundred, or a thousand times, so your traverser will find the 10k-100k
> > > >> messages quite quickly. This can be speeded up further, but the right
> > > >> approach depends again on your use case. The idea with using a graph
> > > >> database is that the actual usage probably maps very well to the graph
> > > >> structure, so when deciding how to speed up your search, consider how
> > it
> > > >> will be used. In twitter one normally only cares about recent
> > messages, so
> > > >> how about not linking directly from the user to the message, but link
> > to an
> > > >> intermediate node representing time, for example, a day-node. Then
> > each new
> > > >> message is added to the day node for that day, and that will
> > automatically
> > > >> become yesterday the next day. Then your traversal can have a stop
> > > >> evaluator
> > > >> to not follow old messages (unless your query is looking for old
> > messages,
> > > >> of course). So the 100k messages might drop to only a few hundred, or
> > even
> > > >> just a few dozen. Certainly that will be a query of the order of
> > > >> milliseconds!
> > > >>
> > > >> Moving away from the traverser, you also have the option to call
> > directly
> > > >> the getRelationships() methods from the node. If you structure is
> > > >> predictable, like viewer-->FOLLOWS-->user-->CREATED-->message, then
> > two
> > > >> nested for loops would work, the outer iterating over the followers
> > and the
> > > >> inner iterating over the messages. If you changed to add a time-based
> > > >> interim node (which is a kind of graph-index), then you need to have
> > three
> > > >> loops. If you made your time index a deeper tree (months->days->hours,
> > > >> etc.), then you would need to further refactor the code. However, if
> > you
> > > >> stuck with a traverser, you might not need to change the traverser
> > even of
> > > >> the graph structure changed, as long as the same relationship types
> > were
> > > >> maintained. Does that make sense?
> > > >>
> > > >> Cheers, Craig
> > > >>
> > > >> On Wed, Mar 17, 2010 at 4:00 PM, Lincoln <linxbet...@gmail.com>
> > wrote:
> > > >>
> > > >>> Thanks Craig,
> > > >>>
> > > >>> I'd like to clarify my question (I don't think it changes your answer
> > > >>> though).
> > > >>>
> > > >>> I wanted all messages visible to me created by users I follow.  Thus,
> > the
> > > >>> FOLLOWS relationship is not enough.  I'd need to see messages that
> > are
> > > >>> visible to me and then check if they were created by users I follow,
> > or
> > > >> I'd
> > > >>> need to see messages created by users I follow and then see if
> > they're
> > > >>> visible to me.
> > > >>>
> > > >>> I assume your last example still yields the result I'm looking for.
> > > >> Could
> > > >>> you describe what actually happens here though?  I'm unclear on what
> > the
> > > >>> traversal looks like.  Would it first traverse every outgoing FOLLOWS
> > > >>> relationship from the viewer, yielding other users, and then traverse
> > all
> > > >>> the CREATED relationships to get to messages?
> > > >>>
> > > >>> Also, given very large numbers of FOLLOWS and CREATED relationships
> > (with
> > > >>> say, a twitter graph), how is this made efficient?
> > > >>>
> > > >>> Sorry for all the basic questions but I couldn't find this
> > information in
> > > >>> the docs.  If there's something I should be reading before posting
> > these
> > > >>> questions, please point me to it.
> > > >>>
> > > >>> Thanks!
> > > >>>
> > > >>> Lincoln
> > > >>>
> > > >>> On Wed, Mar 17, 2010 at 7:06 AM, Craig Taverner <cr...@amanzi.com>
> > > >> wrote:
> > > >>>
> > > >>>> I'm uncertain about one ambiguity in your model, you are able to
> > find
> > > >>>> messages through FOLLOWS and IS_VISIBLE_BY. These will give two
> > > >> different
> > > >>>> sets, and my first impression was that FOLLOWS gives you the right
> > > >>> answer.
> > > >>>> In other words you want to query for 'all messages by users I
> > follow'?
> > > >> In
> > > >>>> that case you do not need IS_VISIBLE_BY. However, if there are
> > messages
> > > >>> by
> > > >>>> people you follow, but are not allowed to see, then you also need
> > the
> > > >>>> IS_VISIBLE_BY. But I would still reconsider linking directly from
> > the
> > > >>>> viewer
> > > >>>> to the message for that case. I'd rather have the messages linked to
> > > >> some
> > > >>>> categorization structure for things like 'public', 'private', etc.
> > > >>>>
> > > >>>> Anyway, here are some suggestions for the various approaches above:
> > > >>>> *'all messages by users I follow'*
> > > >>>> val msgs = viewer.traverse(
> > > >>>> Order.BREADTH_FIRST, StopEvaluator.END_OF_GRAPH,
> > > >>>> (tp: TraversalPosition) => IsMessage(tp.currentNode()),
> > > >>>> Rels.FOLLOWS, Direction.OUTGOING,
> > > >>>> Rels.CREATED, Direction.OUTGOING)
> > > >>>>
> > > >>>> *'all messages visible to me'*
> > > >>>> val msgs = viewer.traverse(
> > > >>>> Order.BREADTH_FIRST, StopEvaluator.END_OF_GRAPH,
> > > >>>> ReturnableEvaluator.ALL_BUT_START_NODE,
> > > >>>> Rels.IS_VISIBLE_BY, Direction.INCOMING)
> > > >>>>
> > > >>>> *'all messages, visible to me, by people I follow'*
> > > >>>> val msgs = viewer.traverse(
> > > >>>> Order.BREADTH_FIRST, StopEvaluator.END_OF_GRAPH,
> > > >>>> (tp: TraversalPosition) => {
> > > >>>> val msg = tp.currentNode()
> > > >>>> IsMessage(msg) && IsVisibleBy(msg,viewer)
> > > >>>> },
> > > >>>> Rels.FOLLOWS, Direction.OUTGOING,
> > > >>>> Rels.CREATED, Direction.OUTGOING)
> > > >>>>
> > > >>>> Of course I assume you make the utility functions IsMessage(node:
> > Node)
> > > >>> and
> > > >>>> IsVisibleBy(msg: Node, user: Node), and these will test the
> > existance
> > > >> of
> > > >>>> properties and relations as appropriate to make the decision.
> > > >>>>
> > > >>>>
> > > >>>> On Wed, Mar 17, 2010 at 6:32 AM, Lincoln <linxbet...@gmail.com>
> > wrote:
> > > >>>>
> > > >>>>> Hi, I've just started looking at Neo4j and I'm quite intrigued.
> > > >>> However,
> > > >>>>> the cognitive dissonance that I've grown so used to in modeling
> > > >> storage
> > > >>>> is
> > > >>>>> proving to be a bit difficult to let go at this early stage :)
> > > >>>>>
> > > >>>>> I was hoping that if someone could help me through an example I
> > would
> > > >>> be
> > > >>>>> able to grok how to properly structure my data and query it in
> > Neo4j.
> > > >>>>>
> > > >>>>> Nodes:
> > > >>>>> Message( text: String )
> > > >>>>> User( id: Long )
> > > >>>>>
> > > >>>>> Relationships:
> > > >>>>> CREATED
> > > >>>>> FOLLOWS
> > > >>>>> IS_VISIBLE_BY
> > > >>>>>
> > > >>>>> So I might have a graph with entries like so:
> > > >>>>>
> > > >>>>> User(1) --> CREATED --> Message("i woke up late today")
> > > >>>>> User(2) --> CREATED --> Message("hello")
> > > >>>>> User(3) --> CREATED --> Message("ugh, i hate mondays")
> > > >>>>>
> > > >>>>> User(1) --> FOLLOWS --> User(2)
> > > >>>>>
> > > >>>>> Let's also say all messages are visible to User 1.
> > > >>>>>
> > > >>>>> Message("i woke up late today") --> IS_VISIBLE_BY --> User(1)
> > > >>>>> Message("hello") --> IS_VISIBLE_BY --> User(1)
> > > >>>>> Message("ugh, i hate mondays") --> IS_VISIBLE_BY --> User(1)
> > > >>>>>
> > > >>>>> So, I can do a simple traversal for visible:
> > > >>>>>
> > > >>>>> val graphDb = new EmbeddedGraphDatabase( "path/to/neo4j-db" )
> > > >>>>> val index = new LuceneIndexService( graphDb )
> > > >>>>> val viewer = index.getSingleNode("id", 1)
> > > >>>>> val msgs = viewer.traverse( Order.BREADTH_FIRST,
> > > >>>>> StopEvaluator.END_OF_GRAPH,
> > > >>>>> ReturnableEvaluator.ALL_BUT_START_NODE, Rels.IS_VISIBLE_BY,
> > > >>>>> Direction.INCOMING)
> > > >>>>> msgs.toList.map(_.toJson).mkString("{ msgs : [", ",", "] }")  //
> > > >>> assuming
> > > >>>> i
> > > >>>>> have the relevant functions
> > > >>>>>
> > > >>>>> But let's say that this is going to return too many messages.  Just
> > > >>>> because
> > > >>>>> all the messages are possibly visible to me, doesn't mean I want to
> > > >> see
> > > >>>>> them
> > > >>>>> all.  So, I'd like to additionally filter by the FOLLOWS
> > > >> relationship.
> > > >>>> I'd
> > > >>>>> like to express "get all messages that are visible and were created
> > > >> by
> > > >>> a
> > > >>>>> user that I follow."  Can someone show me an example of how to do
> > > >> that?
> > > >>>>>
> > > >>>>> I'm guessing that you need to implement a custom
> > ReturnableEvaluator,
> > > >>> but
> > > >>>> I
> > > >>>>> don't understand how you traverse multiple relationships at the
> > same
> > > >>>> time.
> > > >>>>>
> > > >>>>> Thanks,
> > > >>>>> Lincoln
> > > >>>>> _______________________________________________
> > > >>>>> Neo mailing list
> > > >>>>> User@lists.neo4j.org
> > > >>>>> https://lists.neo4j.org/mailman/listinfo/user
> > > >>>>>
> > > >>>> _______________________________________________
> > > >>>> Neo mailing list
> > > >>>> User@lists.neo4j.org
> > > >>>> https://lists.neo4j.org/mailman/listinfo/user
> > > >>>>
> > > >>> _______________________________________________
> > > >>> Neo mailing list
> > > >>> User@lists.neo4j.org
> > > >>> https://lists.neo4j.org/mailman/listinfo/user
> > > >>>
> > > >> _______________________________________________
> > > >> Neo mailing list
> > > >> User@lists.neo4j.org
> > > >> https://lists.neo4j.org/mailman/listinfo/user
> > > >>
> > > > _______________________________________________
> > > > Neo mailing list
> > > > User@lists.neo4j.org
> > > > https://lists.neo4j.org/mailman/listinfo/user
> > >
> > > _______________________________________________
> > > Neo mailing list
> > > User@lists.neo4j.org
> > > https://lists.neo4j.org/mailman/listinfo/user
> >
> > _________________________________________________________________
> > Express yourself instantly with MSN Messenger! Download today it's FREE!
> > http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/
> > _______________________________________________
> > Neo mailing list
> > User@lists.neo4j.org
> > https://lists.neo4j.org/mailman/listinfo/user
> >
> _______________________________________________
> Neo mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
                                          
_________________________________________________________________
New Windows 7: Simplify what you do everyday. Find the right PC for you.
http://windows.microsoft.com/shop
_______________________________________________
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo] basic questions

Reply via email to