Re: [Neo] basic questions

Lincoln Wed, 17 Mar 2010 14:49:30 -0700

Thanks Marko, looks very interesting.  I'll check it out.

On Wed, Mar 17, 2010 at 5:37 PM, Marko Rodriguez <okramma...@gmail.com>wrote:


> Hey,
>
> You might want to consider Blueprints Pipes for a more controlled traverser
> framework that doesn't require the use of for-loops and allows you to
> specify arbitrary paths through a graph.
>
> http://wiki.github.com/tinkerpop/blueprints/pipes-traversal-framework
>
> For the example viewer-->FOLLOWS-->user-->CREATED-->message do,
>
> //////////////////////////////////////////
> Pipe<Vertex,Edge> pipe1 = new
> VertexEdgePipe(VertexEdgePipe.Step.OUT_EDGES);
> Pipe<Edge,Edge> pipe2 = new LabelFilterPipe(Arrays.asList("FOLLOWS"),
> false);
> Pipe<Edge,Vertex> pipe3 = new
> EdgeVertexPipe(EdgeVertexPipe.Step.IN_VERTEX);
> Pipe<Vertex,Edge> pipe4 = new
> VertexEdgePipe(VertexEdgePipe.Step.OUT_EDGES);
> Pipe<Edge,Edge> pipe5 = new LabelFilterPipe(Arrays.asList("CREATED"),
> false);
> Pipe<Edge,Vertex> pipe6 = new
> EdgeVertexPipe(EdgeVertexPipe.Step.IN_VERTEX);
> Pipeline<Vertex,Vertex> pipeline = new
> Pipeline<Vertex,Vertex>(Arrays.asList(pipe1,pipe2,pipe3,pipe4,pipe5,pipe6));
>
> Neo4jGraph graph = new Neo4jGraph('/dir/neo')
> graph.startTransaction();
> pipeline.setStarts(Arrays.asList(viewer).iterator());
> while(pipeline.hasNext()) {
>        System.out.println("message: " + pipeline.next());
> }
> graph.stopTransaction(true);
> //////////////////////////////////////////
>
> NOTE: I hand typed this from memory so there might be some errors here or
> there....
>
> A Pipe/Pipeline implements Iterator so you can just page out as many items
> as you want that can legally flow through pipeline...
>
> If this is interesting to you and if you use MVN and Git, you may want to
> build the latest and greatest of Blueprints [
> http://blueprints.tinkerpop.com ] as I continually add new Pipes to do new
> things [ see
> http://wiki.github.com/tinkerpop/blueprints/pipes-traversal-framework#pipes_api].
>
> Also, a non-iterator based mechanism is provided by Gremlin [
> http://gremlin.tinkerpop.com ] which would express the same thing as:
>
> $messages := ./ou...@label='FOLLOWS']/inV/ou...@label='CREATED']/inV
>
> Take care,
> Marko.
>
> http://markorodriguez.com
> http://gremlin.tinkerpop.com
>
>
> On Mar 17, 2010, at 2:19 PM, Lincoln wrote:
>
> > Wow dude, this is blowing my mind just a little.
> >
> > Ok, sticking with the twitter example, I'm concerned about the edge
> cases.
> > I'd say it's easy to optimize with a relational db or any other storage
> for
> > that matter if I make the assumption that people only follow a few
> hundred
> > people and only want recent messages.  However some people follow
> hundreds
> > of thousands of people.  If Guy Kawasaki uses my app, I'd run into a
> problem
> > quickly.
> >
> > However I see your point that I don't have to limit myself to just the
> > obvious relationships, but can create relationships that serve specific
> > purposes and use-cases such as your day example.  I'm not sure how I
> would
> > want to model my use-case to allow for Guy Kawaski, I'll have to think
> more
> > about it.  Is there a threshold beyond which adding relationships between
> > nodes causes problems?  If not, or if it's high, you could create custom
> > relationships for every type of query you'd want to do.
> >
> > However, a secondary question comes up.  If we continue with the twitter
> > example, and I want to be able to page through results, is that directly
> > supported through Neo4j's API?  Coming from a more traditional storage
> > background I tend to think of what I'd want as a sort by time and then a
> > skip and limit on the results (so I could say give me messages 1-100
> sorted
> > by time descending).  Is there anything equivalent in Neo4j or is the
> > approach totally different?
> >
> > Thanks,
> > Lincoln
> >
> >
> > On Wed, Mar 17, 2010 at 12:41 PM, Craig Taverner <cr...@amanzi.com>
> wrote:
> >
> >> Hi Lincoln,
> >>
> >> So it sounds like you don't need the IS_VISIBLE relations after all. The
> >> traverser works by following all relationships of the specified types
> and
> >> directions from each current node (as you traverse, or walk the graph).
> You
> >> can have a complex graph and traverse to high depth very fast (thousands
> of
> >> relationships per second). The traverser will also automatically check
> that
> >> the same node is not returned twice. The test for the relationship type
> is
> >> efficient. Still reasonable, but less efficient is the custom test you
> >> might
> >> put in the returnable evaluator, but if the limiting factor is usually
> the
> >> number of relationships traversed, and if that is kept managable, the
> >> evaluator test is no concern.
> >>
> >> I think twitter is a good case in point, even with many millions of
> users,
> >> you will still only follow perhaps a hundred and they will tweet perhaps
> a
> >> hundred, or a thousand times, so your traverser will find the 10k-100k
> >> messages quite quickly. This can be speeded up further, but the right
> >> approach depends again on your use case. The idea with using a graph
> >> database is that the actual usage probably maps very well to the graph
> >> structure, so when deciding how to speed up your search, consider how it
> >> will be used. In twitter one normally only cares about recent messages,
> so
> >> how about not linking directly from the user to the message, but link to
> an
> >> intermediate node representing time, for example, a day-node. Then each
> new
> >> message is added to the day node for that day, and that will
> automatically
> >> become yesterday the next day. Then your traversal can have a stop
> >> evaluator
> >> to not follow old messages (unless your query is looking for old
> messages,
> >> of course). So the 100k messages might drop to only a few hundred, or
> even
> >> just a few dozen. Certainly that will be a query of the order of
> >> milliseconds!
> >>
> >> Moving away from the traverser, you also have the option to call
> directly
> >> the getRelationships() methods from the node. If you structure is
> >> predictable, like viewer-->FOLLOWS-->user-->CREATED-->message, then two
> >> nested for loops would work, the outer iterating over the followers and
> the
> >> inner iterating over the messages. If you changed to add a time-based
> >> interim node (which is a kind of graph-index), then you need to have
> three
> >> loops. If you made your time index a deeper tree (months->days->hours,
> >> etc.), then you would need to further refactor the code. However, if you
> >> stuck with a traverser, you might not need to change the traverser even
> of
> >> the graph structure changed, as long as the same relationship types were
> >> maintained. Does that make sense?
> >>
> >> Cheers, Craig
> >>
> >> On Wed, Mar 17, 2010 at 4:00 PM, Lincoln <linxbet...@gmail.com> wrote:
> >>
> >>> Thanks Craig,
> >>>
> >>> I'd like to clarify my question (I don't think it changes your answer
> >>> though).
> >>>
> >>> I wanted all messages visible to me created by users I follow.  Thus,
> the
> >>> FOLLOWS relationship is not enough.  I'd need to see messages that are
> >>> visible to me and then check if they were created by users I follow, or
> >> I'd
> >>> need to see messages created by users I follow and then see if they're
> >>> visible to me.
> >>>
> >>> I assume your last example still yields the result I'm looking for.
> >> Could
> >>> you describe what actually happens here though?  I'm unclear on what
> the
> >>> traversal looks like.  Would it first traverse every outgoing FOLLOWS
> >>> relationship from the viewer, yielding other users, and then traverse
> all
> >>> the CREATED relationships to get to messages?
> >>>
> >>> Also, given very large numbers of FOLLOWS and CREATED relationships
> (with
> >>> say, a twitter graph), how is this made efficient?
> >>>
> >>> Sorry for all the basic questions but I couldn't find this information
> in
> >>> the docs.  If there's something I should be reading before posting
> these
> >>> questions, please point me to it.
> >>>
> >>> Thanks!
> >>>
> >>> Lincoln
> >>>
> >>> On Wed, Mar 17, 2010 at 7:06 AM, Craig Taverner <cr...@amanzi.com>
> >> wrote:
> >>>
> >>>> I'm uncertain about one ambiguity in your model, you are able to find
> >>>> messages through FOLLOWS and IS_VISIBLE_BY. These will give two
> >> different
> >>>> sets, and my first impression was that FOLLOWS gives you the right
> >>> answer.
> >>>> In other words you want to query for 'all messages by users I follow'?
> >> In
> >>>> that case you do not need IS_VISIBLE_BY. However, if there are
> messages
> >>> by
> >>>> people you follow, but are not allowed to see, then you also need the
> >>>> IS_VISIBLE_BY. But I would still reconsider linking directly from the
> >>>> viewer
> >>>> to the message for that case. I'd rather have the messages linked to
> >> some
> >>>> categorization structure for things like 'public', 'private', etc.
> >>>>
> >>>> Anyway, here are some suggestions for the various approaches above:
> >>>> *'all messages by users I follow'*
> >>>> val msgs = viewer.traverse(
> >>>> Order.BREADTH_FIRST, StopEvaluator.END_OF_GRAPH,
> >>>> (tp: TraversalPosition) => IsMessage(tp.currentNode()),
> >>>> Rels.FOLLOWS, Direction.OUTGOING,
> >>>> Rels.CREATED, Direction.OUTGOING)
> >>>>
> >>>> *'all messages visible to me'*
> >>>> val msgs = viewer.traverse(
> >>>> Order.BREADTH_FIRST, StopEvaluator.END_OF_GRAPH,
> >>>> ReturnableEvaluator.ALL_BUT_START_NODE,
> >>>> Rels.IS_VISIBLE_BY, Direction.INCOMING)
> >>>>
> >>>> *'all messages, visible to me, by people I follow'*
> >>>> val msgs = viewer.traverse(
> >>>> Order.BREADTH_FIRST, StopEvaluator.END_OF_GRAPH,
> >>>> (tp: TraversalPosition) => {
> >>>> val msg = tp.currentNode()
> >>>> IsMessage(msg) && IsVisibleBy(msg,viewer)
> >>>> },
> >>>> Rels.FOLLOWS, Direction.OUTGOING,
> >>>> Rels.CREATED, Direction.OUTGOING)
> >>>>
> >>>> Of course I assume you make the utility functions IsMessage(node:
> Node)
> >>> and
> >>>> IsVisibleBy(msg: Node, user: Node), and these will test the existance
> >> of
> >>>> properties and relations as appropriate to make the decision.
> >>>>
> >>>>
> >>>> On Wed, Mar 17, 2010 at 6:32 AM, Lincoln <linxbet...@gmail.com>
> wrote:
> >>>>
> >>>>> Hi, I've just started looking at Neo4j and I'm quite intrigued.
> >>> However,
> >>>>> the cognitive dissonance that I've grown so used to in modeling
> >> storage
> >>>> is
> >>>>> proving to be a bit difficult to let go at this early stage :)
> >>>>>
> >>>>> I was hoping that if someone could help me through an example I would
> >>> be
> >>>>> able to grok how to properly structure my data and query it in Neo4j.
> >>>>>
> >>>>> Nodes:
> >>>>> Message( text: String )
> >>>>> User( id: Long )
> >>>>>
> >>>>> Relationships:
> >>>>> CREATED
> >>>>> FOLLOWS
> >>>>> IS_VISIBLE_BY
> >>>>>
> >>>>> So I might have a graph with entries like so:
> >>>>>
> >>>>> User(1) --> CREATED --> Message("i woke up late today")
> >>>>> User(2) --> CREATED --> Message("hello")
> >>>>> User(3) --> CREATED --> Message("ugh, i hate mondays")
> >>>>>
> >>>>> User(1) --> FOLLOWS --> User(2)
> >>>>>
> >>>>> Let's also say all messages are visible to User 1.
> >>>>>
> >>>>> Message("i woke up late today") --> IS_VISIBLE_BY --> User(1)
> >>>>> Message("hello") --> IS_VISIBLE_BY --> User(1)
> >>>>> Message("ugh, i hate mondays") --> IS_VISIBLE_BY --> User(1)
> >>>>>
> >>>>> So, I can do a simple traversal for visible:
> >>>>>
> >>>>> val graphDb = new EmbeddedGraphDatabase( "path/to/neo4j-db" )
> >>>>> val index = new LuceneIndexService( graphDb )
> >>>>> val viewer = index.getSingleNode("id", 1)
> >>>>> val msgs = viewer.traverse( Order.BREADTH_FIRST,
> >>>>> StopEvaluator.END_OF_GRAPH,
> >>>>> ReturnableEvaluator.ALL_BUT_START_NODE, Rels.IS_VISIBLE_BY,
> >>>>> Direction.INCOMING)
> >>>>> msgs.toList.map(_.toJson).mkString("{ msgs : [", ",", "] }")  //
> >>> assuming
> >>>> i
> >>>>> have the relevant functions
> >>>>>
> >>>>> But let's say that this is going to return too many messages.  Just
> >>>> because
> >>>>> all the messages are possibly visible to me, doesn't mean I want to
> >> see
> >>>>> them
> >>>>> all.  So, I'd like to additionally filter by the FOLLOWS
> >> relationship.
> >>>> I'd
> >>>>> like to express "get all messages that are visible and were created
> >> by
> >>> a
> >>>>> user that I follow."  Can someone show me an example of how to do
> >> that?
> >>>>>
> >>>>> I'm guessing that you need to implement a custom ReturnableEvaluator,
> >>> but
> >>>> I
> >>>>> don't understand how you traverse multiple relationships at the same
> >>>> time.
> >>>>>
> >>>>> Thanks,
> >>>>> Lincoln
> >>>>> _______________________________________________
> >>>>> Neo mailing list
> >>>>> User@lists.neo4j.org
> >>>>> https://lists.neo4j.org/mailman/listinfo/user
> >>>>>
> >>>> _______________________________________________
> >>>> Neo mailing list
> >>>> User@lists.neo4j.org
> >>>> https://lists.neo4j.org/mailman/listinfo/user
> >>>>
> >>> _______________________________________________
> >>> Neo mailing list
> >>> User@lists.neo4j.org
> >>> https://lists.neo4j.org/mailman/listinfo/user
> >>>
> >> _______________________________________________
> >> Neo mailing list
> >> User@lists.neo4j.org
> >> https://lists.neo4j.org/mailman/listinfo/user
> >>
> > _______________________________________________
> > Neo mailing list
> > User@lists.neo4j.org
> > https://lists.neo4j.org/mailman/listinfo/user
>
> _______________________________________________
> Neo mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>
_______________________________________________
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo] basic questions

Reply via email to