Hey Craig, That looks like this thing called Linq (some Microsoft .NET thing -- http://en.wikipedia.org/wiki/Language_Integrated_Query ). It allows you to "talk all SQL-like" using dot notation. I don't know much about it, but seems super useful for those who like that type of graph searching. However, is that just "typical" Ruby?
Take care, Marko. http://markorodriguez.com On Mar 17, 2010, at 3:00 PM, Craig Taverner wrote: > This is a cool idea. Seems a bit like the pattern matching stuff in neo4j, > except you setup a traversal pattern. We have done a similar thing in Ruby > with a set of nested closures that each define the starting node for the > traversal of the outer closure, allowing a kind of multi-step traversal (or > chain of traversers). Here is an example we used to find the data required > for a specific bar-chart deep in a project: > > chart 'Distribution analysis' do > self.domain_axis='categories' > self.range_axis='values' > select 'First dataset',:categories=>'name',:values=>'value' do > from { > from { > traverse(:outgoing,:CHILD,1) > where {type=='gis' and name=='network.csv'} > } > traverse(:outgoing,:AGGREGATION,1) > where {name=='azimuth' and get_property(:select)=='max' and > distribute=='auto'} > } > traverse(:outgoing,:CHILD,:all) > end > end > > > > On Wed, Mar 17, 2010 at 10:37 PM, Marko Rodriguez <okramma...@gmail.com>wrote: > >> Hey, >> >> You might want to consider Blueprints Pipes for a more controlled traverser >> framework that doesn't require the use of for-loops and allows you to >> specify arbitrary paths through a graph. >> >> http://wiki.github.com/tinkerpop/blueprints/pipes-traversal-framework >> >> For the example viewer-->FOLLOWS-->user-->CREATED-->message do, >> >> ////////////////////////////////////////// >> Pipe<Vertex,Edge> pipe1 = new >> VertexEdgePipe(VertexEdgePipe.Step.OUT_EDGES); >> Pipe<Edge,Edge> pipe2 = new LabelFilterPipe(Arrays.asList("FOLLOWS"), >> false); >> Pipe<Edge,Vertex> pipe3 = new >> EdgeVertexPipe(EdgeVertexPipe.Step.IN_VERTEX); >> Pipe<Vertex,Edge> pipe4 = new >> VertexEdgePipe(VertexEdgePipe.Step.OUT_EDGES); >> Pipe<Edge,Edge> pipe5 = new LabelFilterPipe(Arrays.asList("CREATED"), >> false); >> Pipe<Edge,Vertex> pipe6 = new >> EdgeVertexPipe(EdgeVertexPipe.Step.IN_VERTEX); >> Pipeline<Vertex,Vertex> pipeline = new >> Pipeline<Vertex,Vertex>(Arrays.asList(pipe1,pipe2,pipe3,pipe4,pipe5,pipe6)); >> >> Neo4jGraph graph = new Neo4jGraph('/dir/neo') >> graph.startTransaction(); >> pipeline.setStarts(Arrays.asList(viewer).iterator()); >> while(pipeline.hasNext()) { >> System.out.println("message: " + pipeline.next()); >> } >> graph.stopTransaction(true); >> ////////////////////////////////////////// >> >> NOTE: I hand typed this from memory so there might be some errors here or >> there.... >> >> A Pipe/Pipeline implements Iterator so you can just page out as many items >> as you want that can legally flow through pipeline... >> >> If this is interesting to you and if you use MVN and Git, you may want to >> build the latest and greatest of Blueprints [ >> http://blueprints.tinkerpop.com ] as I continually add new Pipes to do new >> things [ see >> http://wiki.github.com/tinkerpop/blueprints/pipes-traversal-framework#pipes_api]. >> >> Also, a non-iterator based mechanism is provided by Gremlin [ >> http://gremlin.tinkerpop.com ] which would express the same thing as: >> >> $messages := ./ou...@label='FOLLOWS']/inV/ou...@label='CREATED']/inV >> >> Take care, >> Marko. >> >> http://markorodriguez.com >> http://gremlin.tinkerpop.com >> >> >> On Mar 17, 2010, at 2:19 PM, Lincoln wrote: >> >>> Wow dude, this is blowing my mind just a little. >>> >>> Ok, sticking with the twitter example, I'm concerned about the edge >> cases. >>> I'd say it's easy to optimize with a relational db or any other storage >> for >>> that matter if I make the assumption that people only follow a few >> hundred >>> people and only want recent messages. However some people follow >> hundreds >>> of thousands of people. If Guy Kawasaki uses my app, I'd run into a >> problem >>> quickly. >>> >>> However I see your point that I don't have to limit myself to just the >>> obvious relationships, but can create relationships that serve specific >>> purposes and use-cases such as your day example. I'm not sure how I >> would >>> want to model my use-case to allow for Guy Kawaski, I'll have to think >> more >>> about it. Is there a threshold beyond which adding relationships between >>> nodes causes problems? If not, or if it's high, you could create custom >>> relationships for every type of query you'd want to do. >>> >>> However, a secondary question comes up. If we continue with the twitter >>> example, and I want to be able to page through results, is that directly >>> supported through Neo4j's API? Coming from a more traditional storage >>> background I tend to think of what I'd want as a sort by time and then a >>> skip and limit on the results (so I could say give me messages 1-100 >> sorted >>> by time descending). Is there anything equivalent in Neo4j or is the >>> approach totally different? >>> >>> Thanks, >>> Lincoln >>> >>> >>> On Wed, Mar 17, 2010 at 12:41 PM, Craig Taverner <cr...@amanzi.com> >> wrote: >>> >>>> Hi Lincoln, >>>> >>>> So it sounds like you don't need the IS_VISIBLE relations after all. The >>>> traverser works by following all relationships of the specified types >> and >>>> directions from each current node (as you traverse, or walk the graph). >> You >>>> can have a complex graph and traverse to high depth very fast (thousands >> of >>>> relationships per second). The traverser will also automatically check >> that >>>> the same node is not returned twice. The test for the relationship type >> is >>>> efficient. Still reasonable, but less efficient is the custom test you >>>> might >>>> put in the returnable evaluator, but if the limiting factor is usually >> the >>>> number of relationships traversed, and if that is kept managable, the >>>> evaluator test is no concern. >>>> >>>> I think twitter is a good case in point, even with many millions of >> users, >>>> you will still only follow perhaps a hundred and they will tweet perhaps >> a >>>> hundred, or a thousand times, so your traverser will find the 10k-100k >>>> messages quite quickly. This can be speeded up further, but the right >>>> approach depends again on your use case. The idea with using a graph >>>> database is that the actual usage probably maps very well to the graph >>>> structure, so when deciding how to speed up your search, consider how it >>>> will be used. In twitter one normally only cares about recent messages, >> so >>>> how about not linking directly from the user to the message, but link to >> an >>>> intermediate node representing time, for example, a day-node. Then each >> new >>>> message is added to the day node for that day, and that will >> automatically >>>> become yesterday the next day. Then your traversal can have a stop >>>> evaluator >>>> to not follow old messages (unless your query is looking for old >> messages, >>>> of course). So the 100k messages might drop to only a few hundred, or >> even >>>> just a few dozen. Certainly that will be a query of the order of >>>> milliseconds! >>>> >>>> Moving away from the traverser, you also have the option to call >> directly >>>> the getRelationships() methods from the node. If you structure is >>>> predictable, like viewer-->FOLLOWS-->user-->CREATED-->message, then two >>>> nested for loops would work, the outer iterating over the followers and >> the >>>> inner iterating over the messages. If you changed to add a time-based >>>> interim node (which is a kind of graph-index), then you need to have >> three >>>> loops. If you made your time index a deeper tree (months->days->hours, >>>> etc.), then you would need to further refactor the code. However, if you >>>> stuck with a traverser, you might not need to change the traverser even >> of >>>> the graph structure changed, as long as the same relationship types were >>>> maintained. Does that make sense? >>>> >>>> Cheers, Craig >>>> >>>> On Wed, Mar 17, 2010 at 4:00 PM, Lincoln <linxbet...@gmail.com> wrote: >>>> >>>>> Thanks Craig, >>>>> >>>>> I'd like to clarify my question (I don't think it changes your answer >>>>> though). >>>>> >>>>> I wanted all messages visible to me created by users I follow. Thus, >> the >>>>> FOLLOWS relationship is not enough. I'd need to see messages that are >>>>> visible to me and then check if they were created by users I follow, or >>>> I'd >>>>> need to see messages created by users I follow and then see if they're >>>>> visible to me. >>>>> >>>>> I assume your last example still yields the result I'm looking for. >>>> Could >>>>> you describe what actually happens here though? I'm unclear on what >> the >>>>> traversal looks like. Would it first traverse every outgoing FOLLOWS >>>>> relationship from the viewer, yielding other users, and then traverse >> all >>>>> the CREATED relationships to get to messages? >>>>> >>>>> Also, given very large numbers of FOLLOWS and CREATED relationships >> (with >>>>> say, a twitter graph), how is this made efficient? >>>>> >>>>> Sorry for all the basic questions but I couldn't find this information >> in >>>>> the docs. If there's something I should be reading before posting >> these >>>>> questions, please point me to it. >>>>> >>>>> Thanks! >>>>> >>>>> Lincoln >>>>> >>>>> On Wed, Mar 17, 2010 at 7:06 AM, Craig Taverner <cr...@amanzi.com> >>>> wrote: >>>>> >>>>>> I'm uncertain about one ambiguity in your model, you are able to find >>>>>> messages through FOLLOWS and IS_VISIBLE_BY. These will give two >>>> different >>>>>> sets, and my first impression was that FOLLOWS gives you the right >>>>> answer. >>>>>> In other words you want to query for 'all messages by users I follow'? >>>> In >>>>>> that case you do not need IS_VISIBLE_BY. However, if there are >> messages >>>>> by >>>>>> people you follow, but are not allowed to see, then you also need the >>>>>> IS_VISIBLE_BY. But I would still reconsider linking directly from the >>>>>> viewer >>>>>> to the message for that case. I'd rather have the messages linked to >>>> some >>>>>> categorization structure for things like 'public', 'private', etc. >>>>>> >>>>>> Anyway, here are some suggestions for the various approaches above: >>>>>> *'all messages by users I follow'* >>>>>> val msgs = viewer.traverse( >>>>>> Order.BREADTH_FIRST, StopEvaluator.END_OF_GRAPH, >>>>>> (tp: TraversalPosition) => IsMessage(tp.currentNode()), >>>>>> Rels.FOLLOWS, Direction.OUTGOING, >>>>>> Rels.CREATED, Direction.OUTGOING) >>>>>> >>>>>> *'all messages visible to me'* >>>>>> val msgs = viewer.traverse( >>>>>> Order.BREADTH_FIRST, StopEvaluator.END_OF_GRAPH, >>>>>> ReturnableEvaluator.ALL_BUT_START_NODE, >>>>>> Rels.IS_VISIBLE_BY, Direction.INCOMING) >>>>>> >>>>>> *'all messages, visible to me, by people I follow'* >>>>>> val msgs = viewer.traverse( >>>>>> Order.BREADTH_FIRST, StopEvaluator.END_OF_GRAPH, >>>>>> (tp: TraversalPosition) => { >>>>>> val msg = tp.currentNode() >>>>>> IsMessage(msg) && IsVisibleBy(msg,viewer) >>>>>> }, >>>>>> Rels.FOLLOWS, Direction.OUTGOING, >>>>>> Rels.CREATED, Direction.OUTGOING) >>>>>> >>>>>> Of course I assume you make the utility functions IsMessage(node: >> Node) >>>>> and >>>>>> IsVisibleBy(msg: Node, user: Node), and these will test the existance >>>> of >>>>>> properties and relations as appropriate to make the decision. >>>>>> >>>>>> >>>>>> On Wed, Mar 17, 2010 at 6:32 AM, Lincoln <linxbet...@gmail.com> >> wrote: >>>>>> >>>>>>> Hi, I've just started looking at Neo4j and I'm quite intrigued. >>>>> However, >>>>>>> the cognitive dissonance that I've grown so used to in modeling >>>> storage >>>>>> is >>>>>>> proving to be a bit difficult to let go at this early stage :) >>>>>>> >>>>>>> I was hoping that if someone could help me through an example I would >>>>> be >>>>>>> able to grok how to properly structure my data and query it in Neo4j. >>>>>>> >>>>>>> Nodes: >>>>>>> Message( text: String ) >>>>>>> User( id: Long ) >>>>>>> >>>>>>> Relationships: >>>>>>> CREATED >>>>>>> FOLLOWS >>>>>>> IS_VISIBLE_BY >>>>>>> >>>>>>> So I might have a graph with entries like so: >>>>>>> >>>>>>> User(1) --> CREATED --> Message("i woke up late today") >>>>>>> User(2) --> CREATED --> Message("hello") >>>>>>> User(3) --> CREATED --> Message("ugh, i hate mondays") >>>>>>> >>>>>>> User(1) --> FOLLOWS --> User(2) >>>>>>> >>>>>>> Let's also say all messages are visible to User 1. >>>>>>> >>>>>>> Message("i woke up late today") --> IS_VISIBLE_BY --> User(1) >>>>>>> Message("hello") --> IS_VISIBLE_BY --> User(1) >>>>>>> Message("ugh, i hate mondays") --> IS_VISIBLE_BY --> User(1) >>>>>>> >>>>>>> So, I can do a simple traversal for visible: >>>>>>> >>>>>>> val graphDb = new EmbeddedGraphDatabase( "path/to/neo4j-db" ) >>>>>>> val index = new LuceneIndexService( graphDb ) >>>>>>> val viewer = index.getSingleNode("id", 1) >>>>>>> val msgs = viewer.traverse( Order.BREADTH_FIRST, >>>>>>> StopEvaluator.END_OF_GRAPH, >>>>>>> ReturnableEvaluator.ALL_BUT_START_NODE, Rels.IS_VISIBLE_BY, >>>>>>> Direction.INCOMING) >>>>>>> msgs.toList.map(_.toJson).mkString("{ msgs : [", ",", "] }") // >>>>> assuming >>>>>> i >>>>>>> have the relevant functions >>>>>>> >>>>>>> But let's say that this is going to return too many messages. Just >>>>>> because >>>>>>> all the messages are possibly visible to me, doesn't mean I want to >>>> see >>>>>>> them >>>>>>> all. So, I'd like to additionally filter by the FOLLOWS >>>> relationship. >>>>>> I'd >>>>>>> like to express "get all messages that are visible and were created >>>> by >>>>> a >>>>>>> user that I follow." Can someone show me an example of how to do >>>> that? >>>>>>> >>>>>>> I'm guessing that you need to implement a custom ReturnableEvaluator, >>>>> but >>>>>> I >>>>>>> don't understand how you traverse multiple relationships at the same >>>>>> time. >>>>>>> >>>>>>> Thanks, >>>>>>> Lincoln >>>>>>> _______________________________________________ >>>>>>> Neo mailing list >>>>>>> User@lists.neo4j.org >>>>>>> https://lists.neo4j.org/mailman/listinfo/user >>>>>>> >>>>>> _______________________________________________ >>>>>> Neo mailing list >>>>>> User@lists.neo4j.org >>>>>> https://lists.neo4j.org/mailman/listinfo/user >>>>>> >>>>> _______________________________________________ >>>>> Neo mailing list >>>>> User@lists.neo4j.org >>>>> https://lists.neo4j.org/mailman/listinfo/user >>>>> >>>> _______________________________________________ >>>> Neo mailing list >>>> User@lists.neo4j.org >>>> https://lists.neo4j.org/mailman/listinfo/user >>>> >>> _______________________________________________ >>> Neo mailing list >>> User@lists.neo4j.org >>> https://lists.neo4j.org/mailman/listinfo/user >> >> _______________________________________________ >> Neo mailing list >> User@lists.neo4j.org >> https://lists.neo4j.org/mailman/listinfo/user >> > _______________________________________________ > Neo mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user _______________________________________________ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user