Thanks Marko, looks very interesting. I'll check it out. On Wed, Mar 17, 2010 at 5:37 PM, Marko Rodriguez <okramma...@gmail.com>wrote:
> Hey, > > You might want to consider Blueprints Pipes for a more controlled traverser > framework that doesn't require the use of for-loops and allows you to > specify arbitrary paths through a graph. > > http://wiki.github.com/tinkerpop/blueprints/pipes-traversal-framework > > For the example viewer-->FOLLOWS-->user-->CREATED-->message do, > > ////////////////////////////////////////// > Pipe<Vertex,Edge> pipe1 = new > VertexEdgePipe(VertexEdgePipe.Step.OUT_EDGES); > Pipe<Edge,Edge> pipe2 = new LabelFilterPipe(Arrays.asList("FOLLOWS"), > false); > Pipe<Edge,Vertex> pipe3 = new > EdgeVertexPipe(EdgeVertexPipe.Step.IN_VERTEX); > Pipe<Vertex,Edge> pipe4 = new > VertexEdgePipe(VertexEdgePipe.Step.OUT_EDGES); > Pipe<Edge,Edge> pipe5 = new LabelFilterPipe(Arrays.asList("CREATED"), > false); > Pipe<Edge,Vertex> pipe6 = new > EdgeVertexPipe(EdgeVertexPipe.Step.IN_VERTEX); > Pipeline<Vertex,Vertex> pipeline = new > Pipeline<Vertex,Vertex>(Arrays.asList(pipe1,pipe2,pipe3,pipe4,pipe5,pipe6)); > > Neo4jGraph graph = new Neo4jGraph('/dir/neo') > graph.startTransaction(); > pipeline.setStarts(Arrays.asList(viewer).iterator()); > while(pipeline.hasNext()) { > System.out.println("message: " + pipeline.next()); > } > graph.stopTransaction(true); > ////////////////////////////////////////// > > NOTE: I hand typed this from memory so there might be some errors here or > there.... > > A Pipe/Pipeline implements Iterator so you can just page out as many items > as you want that can legally flow through pipeline... > > If this is interesting to you and if you use MVN and Git, you may want to > build the latest and greatest of Blueprints [ > http://blueprints.tinkerpop.com ] as I continually add new Pipes to do new > things [ see > http://wiki.github.com/tinkerpop/blueprints/pipes-traversal-framework#pipes_api]. > > Also, a non-iterator based mechanism is provided by Gremlin [ > http://gremlin.tinkerpop.com ] which would express the same thing as: > > $messages := ./ou...@label='FOLLOWS']/inV/ou...@label='CREATED']/inV > > Take care, > Marko. > > http://markorodriguez.com > http://gremlin.tinkerpop.com > > > On Mar 17, 2010, at 2:19 PM, Lincoln wrote: > > > Wow dude, this is blowing my mind just a little. > > > > Ok, sticking with the twitter example, I'm concerned about the edge > cases. > > I'd say it's easy to optimize with a relational db or any other storage > for > > that matter if I make the assumption that people only follow a few > hundred > > people and only want recent messages. However some people follow > hundreds > > of thousands of people. If Guy Kawasaki uses my app, I'd run into a > problem > > quickly. > > > > However I see your point that I don't have to limit myself to just the > > obvious relationships, but can create relationships that serve specific > > purposes and use-cases such as your day example. I'm not sure how I > would > > want to model my use-case to allow for Guy Kawaski, I'll have to think > more > > about it. Is there a threshold beyond which adding relationships between > > nodes causes problems? If not, or if it's high, you could create custom > > relationships for every type of query you'd want to do. > > > > However, a secondary question comes up. If we continue with the twitter > > example, and I want to be able to page through results, is that directly > > supported through Neo4j's API? Coming from a more traditional storage > > background I tend to think of what I'd want as a sort by time and then a > > skip and limit on the results (so I could say give me messages 1-100 > sorted > > by time descending). Is there anything equivalent in Neo4j or is the > > approach totally different? > > > > Thanks, > > Lincoln > > > > > > On Wed, Mar 17, 2010 at 12:41 PM, Craig Taverner <cr...@amanzi.com> > wrote: > > > >> Hi Lincoln, > >> > >> So it sounds like you don't need the IS_VISIBLE relations after all. The > >> traverser works by following all relationships of the specified types > and > >> directions from each current node (as you traverse, or walk the graph). > You > >> can have a complex graph and traverse to high depth very fast (thousands > of > >> relationships per second). The traverser will also automatically check > that > >> the same node is not returned twice. The test for the relationship type > is > >> efficient. Still reasonable, but less efficient is the custom test you > >> might > >> put in the returnable evaluator, but if the limiting factor is usually > the > >> number of relationships traversed, and if that is kept managable, the > >> evaluator test is no concern. > >> > >> I think twitter is a good case in point, even with many millions of > users, > >> you will still only follow perhaps a hundred and they will tweet perhaps > a > >> hundred, or a thousand times, so your traverser will find the 10k-100k > >> messages quite quickly. This can be speeded up further, but the right > >> approach depends again on your use case. The idea with using a graph > >> database is that the actual usage probably maps very well to the graph > >> structure, so when deciding how to speed up your search, consider how it > >> will be used. In twitter one normally only cares about recent messages, > so > >> how about not linking directly from the user to the message, but link to > an > >> intermediate node representing time, for example, a day-node. Then each > new > >> message is added to the day node for that day, and that will > automatically > >> become yesterday the next day. Then your traversal can have a stop > >> evaluator > >> to not follow old messages (unless your query is looking for old > messages, > >> of course). So the 100k messages might drop to only a few hundred, or > even > >> just a few dozen. Certainly that will be a query of the order of > >> milliseconds! > >> > >> Moving away from the traverser, you also have the option to call > directly > >> the getRelationships() methods from the node. If you structure is > >> predictable, like viewer-->FOLLOWS-->user-->CREATED-->message, then two > >> nested for loops would work, the outer iterating over the followers and > the > >> inner iterating over the messages. If you changed to add a time-based > >> interim node (which is a kind of graph-index), then you need to have > three > >> loops. If you made your time index a deeper tree (months->days->hours, > >> etc.), then you would need to further refactor the code. However, if you > >> stuck with a traverser, you might not need to change the traverser even > of > >> the graph structure changed, as long as the same relationship types were > >> maintained. Does that make sense? > >> > >> Cheers, Craig > >> > >> On Wed, Mar 17, 2010 at 4:00 PM, Lincoln <linxbet...@gmail.com> wrote: > >> > >>> Thanks Craig, > >>> > >>> I'd like to clarify my question (I don't think it changes your answer > >>> though). > >>> > >>> I wanted all messages visible to me created by users I follow. Thus, > the > >>> FOLLOWS relationship is not enough. I'd need to see messages that are > >>> visible to me and then check if they were created by users I follow, or > >> I'd > >>> need to see messages created by users I follow and then see if they're > >>> visible to me. > >>> > >>> I assume your last example still yields the result I'm looking for. > >> Could > >>> you describe what actually happens here though? I'm unclear on what > the > >>> traversal looks like. Would it first traverse every outgoing FOLLOWS > >>> relationship from the viewer, yielding other users, and then traverse > all > >>> the CREATED relationships to get to messages? > >>> > >>> Also, given very large numbers of FOLLOWS and CREATED relationships > (with > >>> say, a twitter graph), how is this made efficient? > >>> > >>> Sorry for all the basic questions but I couldn't find this information > in > >>> the docs. If there's something I should be reading before posting > these > >>> questions, please point me to it. > >>> > >>> Thanks! > >>> > >>> Lincoln > >>> > >>> On Wed, Mar 17, 2010 at 7:06 AM, Craig Taverner <cr...@amanzi.com> > >> wrote: > >>> > >>>> I'm uncertain about one ambiguity in your model, you are able to find > >>>> messages through FOLLOWS and IS_VISIBLE_BY. These will give two > >> different > >>>> sets, and my first impression was that FOLLOWS gives you the right > >>> answer. > >>>> In other words you want to query for 'all messages by users I follow'? > >> In > >>>> that case you do not need IS_VISIBLE_BY. However, if there are > messages > >>> by > >>>> people you follow, but are not allowed to see, then you also need the > >>>> IS_VISIBLE_BY. But I would still reconsider linking directly from the > >>>> viewer > >>>> to the message for that case. I'd rather have the messages linked to > >> some > >>>> categorization structure for things like 'public', 'private', etc. > >>>> > >>>> Anyway, here are some suggestions for the various approaches above: > >>>> *'all messages by users I follow'* > >>>> val msgs = viewer.traverse( > >>>> Order.BREADTH_FIRST, StopEvaluator.END_OF_GRAPH, > >>>> (tp: TraversalPosition) => IsMessage(tp.currentNode()), > >>>> Rels.FOLLOWS, Direction.OUTGOING, > >>>> Rels.CREATED, Direction.OUTGOING) > >>>> > >>>> *'all messages visible to me'* > >>>> val msgs = viewer.traverse( > >>>> Order.BREADTH_FIRST, StopEvaluator.END_OF_GRAPH, > >>>> ReturnableEvaluator.ALL_BUT_START_NODE, > >>>> Rels.IS_VISIBLE_BY, Direction.INCOMING) > >>>> > >>>> *'all messages, visible to me, by people I follow'* > >>>> val msgs = viewer.traverse( > >>>> Order.BREADTH_FIRST, StopEvaluator.END_OF_GRAPH, > >>>> (tp: TraversalPosition) => { > >>>> val msg = tp.currentNode() > >>>> IsMessage(msg) && IsVisibleBy(msg,viewer) > >>>> }, > >>>> Rels.FOLLOWS, Direction.OUTGOING, > >>>> Rels.CREATED, Direction.OUTGOING) > >>>> > >>>> Of course I assume you make the utility functions IsMessage(node: > Node) > >>> and > >>>> IsVisibleBy(msg: Node, user: Node), and these will test the existance > >> of > >>>> properties and relations as appropriate to make the decision. > >>>> > >>>> > >>>> On Wed, Mar 17, 2010 at 6:32 AM, Lincoln <linxbet...@gmail.com> > wrote: > >>>> > >>>>> Hi, I've just started looking at Neo4j and I'm quite intrigued. > >>> However, > >>>>> the cognitive dissonance that I've grown so used to in modeling > >> storage > >>>> is > >>>>> proving to be a bit difficult to let go at this early stage :) > >>>>> > >>>>> I was hoping that if someone could help me through an example I would > >>> be > >>>>> able to grok how to properly structure my data and query it in Neo4j. > >>>>> > >>>>> Nodes: > >>>>> Message( text: String ) > >>>>> User( id: Long ) > >>>>> > >>>>> Relationships: > >>>>> CREATED > >>>>> FOLLOWS > >>>>> IS_VISIBLE_BY > >>>>> > >>>>> So I might have a graph with entries like so: > >>>>> > >>>>> User(1) --> CREATED --> Message("i woke up late today") > >>>>> User(2) --> CREATED --> Message("hello") > >>>>> User(3) --> CREATED --> Message("ugh, i hate mondays") > >>>>> > >>>>> User(1) --> FOLLOWS --> User(2) > >>>>> > >>>>> Let's also say all messages are visible to User 1. > >>>>> > >>>>> Message("i woke up late today") --> IS_VISIBLE_BY --> User(1) > >>>>> Message("hello") --> IS_VISIBLE_BY --> User(1) > >>>>> Message("ugh, i hate mondays") --> IS_VISIBLE_BY --> User(1) > >>>>> > >>>>> So, I can do a simple traversal for visible: > >>>>> > >>>>> val graphDb = new EmbeddedGraphDatabase( "path/to/neo4j-db" ) > >>>>> val index = new LuceneIndexService( graphDb ) > >>>>> val viewer = index.getSingleNode("id", 1) > >>>>> val msgs = viewer.traverse( Order.BREADTH_FIRST, > >>>>> StopEvaluator.END_OF_GRAPH, > >>>>> ReturnableEvaluator.ALL_BUT_START_NODE, Rels.IS_VISIBLE_BY, > >>>>> Direction.INCOMING) > >>>>> msgs.toList.map(_.toJson).mkString("{ msgs : [", ",", "] }") // > >>> assuming > >>>> i > >>>>> have the relevant functions > >>>>> > >>>>> But let's say that this is going to return too many messages. Just > >>>> because > >>>>> all the messages are possibly visible to me, doesn't mean I want to > >> see > >>>>> them > >>>>> all. So, I'd like to additionally filter by the FOLLOWS > >> relationship. > >>>> I'd > >>>>> like to express "get all messages that are visible and were created > >> by > >>> a > >>>>> user that I follow." Can someone show me an example of how to do > >> that? > >>>>> > >>>>> I'm guessing that you need to implement a custom ReturnableEvaluator, > >>> but > >>>> I > >>>>> don't understand how you traverse multiple relationships at the same > >>>> time. > >>>>> > >>>>> Thanks, > >>>>> Lincoln > >>>>> _______________________________________________ > >>>>> Neo mailing list > >>>>> User@lists.neo4j.org > >>>>> https://lists.neo4j.org/mailman/listinfo/user > >>>>> > >>>> _______________________________________________ > >>>> Neo mailing list > >>>> User@lists.neo4j.org > >>>> https://lists.neo4j.org/mailman/listinfo/user > >>>> > >>> _______________________________________________ > >>> Neo mailing list > >>> User@lists.neo4j.org > >>> https://lists.neo4j.org/mailman/listinfo/user > >>> > >> _______________________________________________ > >> Neo mailing list > >> User@lists.neo4j.org > >> https://lists.neo4j.org/mailman/listinfo/user > >> > > _______________________________________________ > > Neo mailing list > > User@lists.neo4j.org > > https://lists.neo4j.org/mailman/listinfo/user > > _______________________________________________ > Neo mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user > _______________________________________________ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user