Hey Marko, I've not looked at linq, and am not really a fan of using SQL-like syntax for non-SQL things. I did take a peek at GQL and did not really like it. Way too SQL-ish.
This syntax was devised by a developer new to Ruby, so this style is not specifically Ruby-esque either. However, once I convince her to change the words select, from, when to something else, I actually think the idea she has is really great. I think gremlin is also cool, but again that is a very different looking syntax than most java developers are used to. I think the Ruby one will be an easier learning curve (at least I hope so, it is too new and untested to be sure of). I also need to do a more careful comparison with Andreas Ronge's neo4j.rb traverser DSL, to be sure I'm not re-inventing the wheel. Although, even if they are functionally equivalent, I personally prefer nested closures to method chaining. Cheers, Craig On Wed, Mar 17, 2010 at 11:04 PM, Marko Rodriguez <okramma...@gmail.com>wrote: > Hey Craig, > > That looks like this thing called Linq (some Microsoft .NET thing -- > http://en.wikipedia.org/wiki/Language_Integrated_Query ). It allows you to > "talk all SQL-like" using dot notation. I don't know much about it, but > seems super useful for those who like that type of graph searching. However, > is that just "typical" Ruby? > > Take care, > Marko. > > http://markorodriguez.com > > On Mar 17, 2010, at 3:00 PM, Craig Taverner wrote: > > > This is a cool idea. Seems a bit like the pattern matching stuff in > neo4j, > > except you setup a traversal pattern. We have done a similar thing in > Ruby > > with a set of nested closures that each define the starting node for the > > traversal of the outer closure, allowing a kind of multi-step traversal > (or > > chain of traversers). Here is an example we used to find the data > required > > for a specific bar-chart deep in a project: > > > > chart 'Distribution analysis' do > > self.domain_axis='categories' > > self.range_axis='values' > > select 'First dataset',:categories=>'name',:values=>'value' do > > from { > > from { > > traverse(:outgoing,:CHILD,1) > > where {type=='gis' and name=='network.csv'} > > } > > traverse(:outgoing,:AGGREGATION,1) > > where {name=='azimuth' and get_property(:select)=='max' and > > distribute=='auto'} > > } > > traverse(:outgoing,:CHILD,:all) > > end > > end > > > > > > > > On Wed, Mar 17, 2010 at 10:37 PM, Marko Rodriguez <okramma...@gmail.com > >wrote: > > > >> Hey, > >> > >> You might want to consider Blueprints Pipes for a more controlled > traverser > >> framework that doesn't require the use of for-loops and allows you to > >> specify arbitrary paths through a graph. > >> > >> http://wiki.github.com/tinkerpop/blueprints/pipes-traversal-framework > >> > >> For the example viewer-->FOLLOWS-->user-->CREATED-->message do, > >> > >> ////////////////////////////////////////// > >> Pipe<Vertex,Edge> pipe1 = new > >> VertexEdgePipe(VertexEdgePipe.Step.OUT_EDGES); > >> Pipe<Edge,Edge> pipe2 = new LabelFilterPipe(Arrays.asList("FOLLOWS"), > >> false); > >> Pipe<Edge,Vertex> pipe3 = new > >> EdgeVertexPipe(EdgeVertexPipe.Step.IN_VERTEX); > >> Pipe<Vertex,Edge> pipe4 = new > >> VertexEdgePipe(VertexEdgePipe.Step.OUT_EDGES); > >> Pipe<Edge,Edge> pipe5 = new LabelFilterPipe(Arrays.asList("CREATED"), > >> false); > >> Pipe<Edge,Vertex> pipe6 = new > >> EdgeVertexPipe(EdgeVertexPipe.Step.IN_VERTEX); > >> Pipeline<Vertex,Vertex> pipeline = new > >> > Pipeline<Vertex,Vertex>(Arrays.asList(pipe1,pipe2,pipe3,pipe4,pipe5,pipe6)); > >> > >> Neo4jGraph graph = new Neo4jGraph('/dir/neo') > >> graph.startTransaction(); > >> pipeline.setStarts(Arrays.asList(viewer).iterator()); > >> while(pipeline.hasNext()) { > >> System.out.println("message: " + pipeline.next()); > >> } > >> graph.stopTransaction(true); > >> ////////////////////////////////////////// > >> > >> NOTE: I hand typed this from memory so there might be some errors here > or > >> there.... > >> > >> A Pipe/Pipeline implements Iterator so you can just page out as many > items > >> as you want that can legally flow through pipeline... > >> > >> If this is interesting to you and if you use MVN and Git, you may want > to > >> build the latest and greatest of Blueprints [ > >> http://blueprints.tinkerpop.com ] as I continually add new Pipes to do > new > >> things [ see > >> > http://wiki.github.com/tinkerpop/blueprints/pipes-traversal-framework#pipes_api > ]. > >> > >> Also, a non-iterator based mechanism is provided by Gremlin [ > >> http://gremlin.tinkerpop.com ] which would express the same thing as: > >> > >> $messages := ./ou...@label='FOLLOWS']/inV/ou...@label='CREATED']/inV > >> > >> Take care, > >> Marko. > >> > >> http://markorodriguez.com > >> http://gremlin.tinkerpop.com > >> > >> > >> On Mar 17, 2010, at 2:19 PM, Lincoln wrote: > >> > >>> Wow dude, this is blowing my mind just a little. > >>> > >>> Ok, sticking with the twitter example, I'm concerned about the edge > >> cases. > >>> I'd say it's easy to optimize with a relational db or any other storage > >> for > >>> that matter if I make the assumption that people only follow a few > >> hundred > >>> people and only want recent messages. However some people follow > >> hundreds > >>> of thousands of people. If Guy Kawasaki uses my app, I'd run into a > >> problem > >>> quickly. > >>> > >>> However I see your point that I don't have to limit myself to just the > >>> obvious relationships, but can create relationships that serve specific > >>> purposes and use-cases such as your day example. I'm not sure how I > >> would > >>> want to model my use-case to allow for Guy Kawaski, I'll have to think > >> more > >>> about it. Is there a threshold beyond which adding relationships > between > >>> nodes causes problems? If not, or if it's high, you could create > custom > >>> relationships for every type of query you'd want to do. > >>> > >>> However, a secondary question comes up. If we continue with the > twitter > >>> example, and I want to be able to page through results, is that > directly > >>> supported through Neo4j's API? Coming from a more traditional storage > >>> background I tend to think of what I'd want as a sort by time and then > a > >>> skip and limit on the results (so I could say give me messages 1-100 > >> sorted > >>> by time descending). Is there anything equivalent in Neo4j or is the > >>> approach totally different? > >>> > >>> Thanks, > >>> Lincoln > >>> > >>> > >>> On Wed, Mar 17, 2010 at 12:41 PM, Craig Taverner <cr...@amanzi.com> > >> wrote: > >>> > >>>> Hi Lincoln, > >>>> > >>>> So it sounds like you don't need the IS_VISIBLE relations after all. > The > >>>> traverser works by following all relationships of the specified types > >> and > >>>> directions from each current node (as you traverse, or walk the > graph). > >> You > >>>> can have a complex graph and traverse to high depth very fast > (thousands > >> of > >>>> relationships per second). The traverser will also automatically check > >> that > >>>> the same node is not returned twice. The test for the relationship > type > >> is > >>>> efficient. Still reasonable, but less efficient is the custom test you > >>>> might > >>>> put in the returnable evaluator, but if the limiting factor is usually > >> the > >>>> number of relationships traversed, and if that is kept managable, the > >>>> evaluator test is no concern. > >>>> > >>>> I think twitter is a good case in point, even with many millions of > >> users, > >>>> you will still only follow perhaps a hundred and they will tweet > perhaps > >> a > >>>> hundred, or a thousand times, so your traverser will find the 10k-100k > >>>> messages quite quickly. This can be speeded up further, but the right > >>>> approach depends again on your use case. The idea with using a graph > >>>> database is that the actual usage probably maps very well to the graph > >>>> structure, so when deciding how to speed up your search, consider how > it > >>>> will be used. In twitter one normally only cares about recent > messages, > >> so > >>>> how about not linking directly from the user to the message, but link > to > >> an > >>>> intermediate node representing time, for example, a day-node. Then > each > >> new > >>>> message is added to the day node for that day, and that will > >> automatically > >>>> become yesterday the next day. Then your traversal can have a stop > >>>> evaluator > >>>> to not follow old messages (unless your query is looking for old > >> messages, > >>>> of course). So the 100k messages might drop to only a few hundred, or > >> even > >>>> just a few dozen. Certainly that will be a query of the order of > >>>> milliseconds! > >>>> > >>>> Moving away from the traverser, you also have the option to call > >> directly > >>>> the getRelationships() methods from the node. If you structure is > >>>> predictable, like viewer-->FOLLOWS-->user-->CREATED-->message, then > two > >>>> nested for loops would work, the outer iterating over the followers > and > >> the > >>>> inner iterating over the messages. If you changed to add a time-based > >>>> interim node (which is a kind of graph-index), then you need to have > >> three > >>>> loops. If you made your time index a deeper tree (months->days->hours, > >>>> etc.), then you would need to further refactor the code. However, if > you > >>>> stuck with a traverser, you might not need to change the traverser > even > >> of > >>>> the graph structure changed, as long as the same relationship types > were > >>>> maintained. Does that make sense? > >>>> > >>>> Cheers, Craig > >>>> > >>>> On Wed, Mar 17, 2010 at 4:00 PM, Lincoln <linxbet...@gmail.com> > wrote: > >>>> > >>>>> Thanks Craig, > >>>>> > >>>>> I'd like to clarify my question (I don't think it changes your answer > >>>>> though). > >>>>> > >>>>> I wanted all messages visible to me created by users I follow. Thus, > >> the > >>>>> FOLLOWS relationship is not enough. I'd need to see messages that > are > >>>>> visible to me and then check if they were created by users I follow, > or > >>>> I'd > >>>>> need to see messages created by users I follow and then see if > they're > >>>>> visible to me. > >>>>> > >>>>> I assume your last example still yields the result I'm looking for. > >>>> Could > >>>>> you describe what actually happens here though? I'm unclear on what > >> the > >>>>> traversal looks like. Would it first traverse every outgoing FOLLOWS > >>>>> relationship from the viewer, yielding other users, and then traverse > >> all > >>>>> the CREATED relationships to get to messages? > >>>>> > >>>>> Also, given very large numbers of FOLLOWS and CREATED relationships > >> (with > >>>>> say, a twitter graph), how is this made efficient? > >>>>> > >>>>> Sorry for all the basic questions but I couldn't find this > information > >> in > >>>>> the docs. If there's something I should be reading before posting > >> these > >>>>> questions, please point me to it. > >>>>> > >>>>> Thanks! > >>>>> > >>>>> Lincoln > >>>>> > >>>>> On Wed, Mar 17, 2010 at 7:06 AM, Craig Taverner <cr...@amanzi.com> > >>>> wrote: > >>>>> > >>>>>> I'm uncertain about one ambiguity in your model, you are able to > find > >>>>>> messages through FOLLOWS and IS_VISIBLE_BY. These will give two > >>>> different > >>>>>> sets, and my first impression was that FOLLOWS gives you the right > >>>>> answer. > >>>>>> In other words you want to query for 'all messages by users I > follow'? > >>>> In > >>>>>> that case you do not need IS_VISIBLE_BY. However, if there are > >> messages > >>>>> by > >>>>>> people you follow, but are not allowed to see, then you also need > the > >>>>>> IS_VISIBLE_BY. But I would still reconsider linking directly from > the > >>>>>> viewer > >>>>>> to the message for that case. I'd rather have the messages linked to > >>>> some > >>>>>> categorization structure for things like 'public', 'private', etc. > >>>>>> > >>>>>> Anyway, here are some suggestions for the various approaches above: > >>>>>> *'all messages by users I follow'* > >>>>>> val msgs = viewer.traverse( > >>>>>> Order.BREADTH_FIRST, StopEvaluator.END_OF_GRAPH, > >>>>>> (tp: TraversalPosition) => IsMessage(tp.currentNode()), > >>>>>> Rels.FOLLOWS, Direction.OUTGOING, > >>>>>> Rels.CREATED, Direction.OUTGOING) > >>>>>> > >>>>>> *'all messages visible to me'* > >>>>>> val msgs = viewer.traverse( > >>>>>> Order.BREADTH_FIRST, StopEvaluator.END_OF_GRAPH, > >>>>>> ReturnableEvaluator.ALL_BUT_START_NODE, > >>>>>> Rels.IS_VISIBLE_BY, Direction.INCOMING) > >>>>>> > >>>>>> *'all messages, visible to me, by people I follow'* > >>>>>> val msgs = viewer.traverse( > >>>>>> Order.BREADTH_FIRST, StopEvaluator.END_OF_GRAPH, > >>>>>> (tp: TraversalPosition) => { > >>>>>> val msg = tp.currentNode() > >>>>>> IsMessage(msg) && IsVisibleBy(msg,viewer) > >>>>>> }, > >>>>>> Rels.FOLLOWS, Direction.OUTGOING, > >>>>>> Rels.CREATED, Direction.OUTGOING) > >>>>>> > >>>>>> Of course I assume you make the utility functions IsMessage(node: > >> Node) > >>>>> and > >>>>>> IsVisibleBy(msg: Node, user: Node), and these will test the > existance > >>>> of > >>>>>> properties and relations as appropriate to make the decision. > >>>>>> > >>>>>> > >>>>>> On Wed, Mar 17, 2010 at 6:32 AM, Lincoln <linxbet...@gmail.com> > >> wrote: > >>>>>> > >>>>>>> Hi, I've just started looking at Neo4j and I'm quite intrigued. > >>>>> However, > >>>>>>> the cognitive dissonance that I've grown so used to in modeling > >>>> storage > >>>>>> is > >>>>>>> proving to be a bit difficult to let go at this early stage :) > >>>>>>> > >>>>>>> I was hoping that if someone could help me through an example I > would > >>>>> be > >>>>>>> able to grok how to properly structure my data and query it in > Neo4j. > >>>>>>> > >>>>>>> Nodes: > >>>>>>> Message( text: String ) > >>>>>>> User( id: Long ) > >>>>>>> > >>>>>>> Relationships: > >>>>>>> CREATED > >>>>>>> FOLLOWS > >>>>>>> IS_VISIBLE_BY > >>>>>>> > >>>>>>> So I might have a graph with entries like so: > >>>>>>> > >>>>>>> User(1) --> CREATED --> Message("i woke up late today") > >>>>>>> User(2) --> CREATED --> Message("hello") > >>>>>>> User(3) --> CREATED --> Message("ugh, i hate mondays") > >>>>>>> > >>>>>>> User(1) --> FOLLOWS --> User(2) > >>>>>>> > >>>>>>> Let's also say all messages are visible to User 1. > >>>>>>> > >>>>>>> Message("i woke up late today") --> IS_VISIBLE_BY --> User(1) > >>>>>>> Message("hello") --> IS_VISIBLE_BY --> User(1) > >>>>>>> Message("ugh, i hate mondays") --> IS_VISIBLE_BY --> User(1) > >>>>>>> > >>>>>>> So, I can do a simple traversal for visible: > >>>>>>> > >>>>>>> val graphDb = new EmbeddedGraphDatabase( "path/to/neo4j-db" ) > >>>>>>> val index = new LuceneIndexService( graphDb ) > >>>>>>> val viewer = index.getSingleNode("id", 1) > >>>>>>> val msgs = viewer.traverse( Order.BREADTH_FIRST, > >>>>>>> StopEvaluator.END_OF_GRAPH, > >>>>>>> ReturnableEvaluator.ALL_BUT_START_NODE, Rels.IS_VISIBLE_BY, > >>>>>>> Direction.INCOMING) > >>>>>>> msgs.toList.map(_.toJson).mkString("{ msgs : [", ",", "] }") // > >>>>> assuming > >>>>>> i > >>>>>>> have the relevant functions > >>>>>>> > >>>>>>> But let's say that this is going to return too many messages. Just > >>>>>> because > >>>>>>> all the messages are possibly visible to me, doesn't mean I want to > >>>> see > >>>>>>> them > >>>>>>> all. So, I'd like to additionally filter by the FOLLOWS > >>>> relationship. > >>>>>> I'd > >>>>>>> like to express "get all messages that are visible and were created > >>>> by > >>>>> a > >>>>>>> user that I follow." Can someone show me an example of how to do > >>>> that? > >>>>>>> > >>>>>>> I'm guessing that you need to implement a custom > ReturnableEvaluator, > >>>>> but > >>>>>> I > >>>>>>> don't understand how you traverse multiple relationships at the > same > >>>>>> time. > >>>>>>> > >>>>>>> Thanks, > >>>>>>> Lincoln > >>>>>>> _______________________________________________ > >>>>>>> Neo mailing list > >>>>>>> User@lists.neo4j.org > >>>>>>> https://lists.neo4j.org/mailman/listinfo/user > >>>>>>> > >>>>>> _______________________________________________ > >>>>>> Neo mailing list > >>>>>> User@lists.neo4j.org > >>>>>> https://lists.neo4j.org/mailman/listinfo/user > >>>>>> > >>>>> _______________________________________________ > >>>>> Neo mailing list > >>>>> User@lists.neo4j.org > >>>>> https://lists.neo4j.org/mailman/listinfo/user > >>>>> > >>>> _______________________________________________ > >>>> Neo mailing list > >>>> User@lists.neo4j.org > >>>> https://lists.neo4j.org/mailman/listinfo/user > >>>> > >>> _______________________________________________ > >>> Neo mailing list > >>> User@lists.neo4j.org > >>> https://lists.neo4j.org/mailman/listinfo/user > >> > >> _______________________________________________ > >> Neo mailing list > >> User@lists.neo4j.org > >> https://lists.neo4j.org/mailman/listinfo/user > >> > > _______________________________________________ > > Neo mailing list > > User@lists.neo4j.org > > https://lists.neo4j.org/mailman/listinfo/user > > _______________________________________________ > Neo mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user > _______________________________________________ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user