Re: [Neo] basic questions

Craig Taverner Wed, 17 Mar 2010 15:17:17 -0700

Hey Marko,

I've not looked at linq, and am not really a fan of using SQL-like syntax
for non-SQL things. I did take a peek at GQL and did not really like it. Way
too SQL-ish.


This syntax was devised by a developer new to Ruby, so this style is not
specifically Ruby-esque either. However, once I convince her to change the
words select, from, when to something else, I actually think the idea she
has is really great. I think gremlin is also cool, but again that is a very
different looking syntax than most java developers are used to. I think the
Ruby one will be an easier learning curve (at least I hope so, it is too new
and untested to be sure of).

I also need to do a more careful comparison with Andreas Ronge's neo4j.rb
traverser DSL, to be sure I'm not re-inventing the wheel. Although, even if
they are functionally equivalent, I personally prefer nested closures to
method chaining.

Cheers, Craig

On Wed, Mar 17, 2010 at 11:04 PM, Marko Rodriguez <okramma...@gmail.com>wrote:

> Hey Craig,
>
> That looks like this thing called Linq (some Microsoft .NET thing --
> http://en.wikipedia.org/wiki/Language_Integrated_Query ). It allows you to
> "talk all SQL-like" using dot notation. I don't know much about it, but
> seems super useful for those who like that type of graph searching. However,
> is that just "typical" Ruby?
>
> Take care,
> Marko.
>
> http://markorodriguez.com
>
> On Mar 17, 2010, at 3:00 PM, Craig Taverner wrote:
>
> > This is a cool idea. Seems a bit like the pattern matching stuff in
> neo4j,
> > except you setup a traversal pattern. We have done a similar thing in
> Ruby
> > with a set of nested closures that each define the starting node for the
> > traversal of the outer closure, allowing a kind of multi-step traversal
> (or
> > chain of traversers). Here is an example we used to find the data
> required
> > for a specific bar-chart deep in a project:
> >
> > chart 'Distribution analysis' do
> >    self.domain_axis='categories'
> >    self.range_axis='values'
> >    select 'First dataset',:categories=>'name',:values=>'value' do
> >      from {
> >        from {
> >          traverse(:outgoing,:CHILD,1)
> >          where {type=='gis' and name=='network.csv'}
> >        }
> >        traverse(:outgoing,:AGGREGATION,1)
> >        where {name=='azimuth' and get_property(:select)=='max' and
> > distribute=='auto'}
> >      }
> >      traverse(:outgoing,:CHILD,:all)
> >    end
> >  end
> >
> >
> >
> > On Wed, Mar 17, 2010 at 10:37 PM, Marko Rodriguez <okramma...@gmail.com
> >wrote:
> >
> >> Hey,
> >>
> >> You might want to consider Blueprints Pipes for a more controlled
> traverser
> >> framework that doesn't require the use of for-loops and allows you to
> >> specify arbitrary paths through a graph.
> >>
> >> http://wiki.github.com/tinkerpop/blueprints/pipes-traversal-framework
> >>
> >> For the example viewer-->FOLLOWS-->user-->CREATED-->message do,
> >>
> >> //////////////////////////////////////////
> >> Pipe<Vertex,Edge> pipe1 = new
> >> VertexEdgePipe(VertexEdgePipe.Step.OUT_EDGES);
> >> Pipe<Edge,Edge> pipe2 = new LabelFilterPipe(Arrays.asList("FOLLOWS"),
> >> false);
> >> Pipe<Edge,Vertex> pipe3 = new
> >> EdgeVertexPipe(EdgeVertexPipe.Step.IN_VERTEX);
> >> Pipe<Vertex,Edge> pipe4 = new
> >> VertexEdgePipe(VertexEdgePipe.Step.OUT_EDGES);
> >> Pipe<Edge,Edge> pipe5 = new LabelFilterPipe(Arrays.asList("CREATED"),
> >> false);
> >> Pipe<Edge,Vertex> pipe6 = new
> >> EdgeVertexPipe(EdgeVertexPipe.Step.IN_VERTEX);
> >> Pipeline<Vertex,Vertex> pipeline = new
> >>
> Pipeline<Vertex,Vertex>(Arrays.asList(pipe1,pipe2,pipe3,pipe4,pipe5,pipe6));
> >>
> >> Neo4jGraph graph = new Neo4jGraph('/dir/neo')
> >> graph.startTransaction();
> >> pipeline.setStarts(Arrays.asList(viewer).iterator());
> >> while(pipeline.hasNext()) {
> >>       System.out.println("message: " + pipeline.next());
> >> }
> >> graph.stopTransaction(true);
> >> //////////////////////////////////////////
> >>
> >> NOTE: I hand typed this from memory so there might be some errors here
> or
> >> there....
> >>
> >> A Pipe/Pipeline implements Iterator so you can just page out as many
> items
> >> as you want that can legally flow through pipeline...
> >>
> >> If this is interesting to you and if you use MVN and Git, you may want
> to
> >> build the latest and greatest of Blueprints [
> >> http://blueprints.tinkerpop.com ] as I continually add new Pipes to do
> new
> >> things [ see
> >>
> http://wiki.github.com/tinkerpop/blueprints/pipes-traversal-framework#pipes_api
> ].
> >>
> >> Also, a non-iterator based mechanism is provided by Gremlin [
> >> http://gremlin.tinkerpop.com ] which would express the same thing as:
> >>
> >> $messages := ./ou...@label='FOLLOWS']/inV/ou...@label='CREATED']/inV
> >>
> >> Take care,
> >> Marko.
> >>
> >> http://markorodriguez.com
> >> http://gremlin.tinkerpop.com
> >>
> >>
> >> On Mar 17, 2010, at 2:19 PM, Lincoln wrote:
> >>
> >>> Wow dude, this is blowing my mind just a little.
> >>>
> >>> Ok, sticking with the twitter example, I'm concerned about the edge
> >> cases.
> >>> I'd say it's easy to optimize with a relational db or any other storage
> >> for
> >>> that matter if I make the assumption that people only follow a few
> >> hundred
> >>> people and only want recent messages.  However some people follow
> >> hundreds
> >>> of thousands of people.  If Guy Kawasaki uses my app, I'd run into a
> >> problem
> >>> quickly.
> >>>
> >>> However I see your point that I don't have to limit myself to just the
> >>> obvious relationships, but can create relationships that serve specific
> >>> purposes and use-cases such as your day example.  I'm not sure how I
> >> would
> >>> want to model my use-case to allow for Guy Kawaski, I'll have to think
> >> more
> >>> about it.  Is there a threshold beyond which adding relationships
> between
> >>> nodes causes problems?  If not, or if it's high, you could create
> custom
> >>> relationships for every type of query you'd want to do.
> >>>
> >>> However, a secondary question comes up.  If we continue with the
> twitter
> >>> example, and I want to be able to page through results, is that
> directly
> >>> supported through Neo4j's API?  Coming from a more traditional storage
> >>> background I tend to think of what I'd want as a sort by time and then
> a
> >>> skip and limit on the results (so I could say give me messages 1-100
> >> sorted
> >>> by time descending).  Is there anything equivalent in Neo4j or is the
> >>> approach totally different?
> >>>
> >>> Thanks,
> >>> Lincoln
> >>>
> >>>
> >>> On Wed, Mar 17, 2010 at 12:41 PM, Craig Taverner <cr...@amanzi.com>
> >> wrote:
> >>>
> >>>> Hi Lincoln,
> >>>>
> >>>> So it sounds like you don't need the IS_VISIBLE relations after all.
> The
> >>>> traverser works by following all relationships of the specified types
> >> and
> >>>> directions from each current node (as you traverse, or walk the
> graph).
> >> You
> >>>> can have a complex graph and traverse to high depth very fast
> (thousands
> >> of
> >>>> relationships per second). The traverser will also automatically check
> >> that
> >>>> the same node is not returned twice. The test for the relationship
> type
> >> is
> >>>> efficient. Still reasonable, but less efficient is the custom test you
> >>>> might
> >>>> put in the returnable evaluator, but if the limiting factor is usually
> >> the
> >>>> number of relationships traversed, and if that is kept managable, the
> >>>> evaluator test is no concern.
> >>>>
> >>>> I think twitter is a good case in point, even with many millions of
> >> users,
> >>>> you will still only follow perhaps a hundred and they will tweet
> perhaps
> >> a
> >>>> hundred, or a thousand times, so your traverser will find the 10k-100k
> >>>> messages quite quickly. This can be speeded up further, but the right
> >>>> approach depends again on your use case. The idea with using a graph
> >>>> database is that the actual usage probably maps very well to the graph
> >>>> structure, so when deciding how to speed up your search, consider how
> it
> >>>> will be used. In twitter one normally only cares about recent
> messages,
> >> so
> >>>> how about not linking directly from the user to the message, but link
> to
> >> an
> >>>> intermediate node representing time, for example, a day-node. Then
> each
> >> new
> >>>> message is added to the day node for that day, and that will
> >> automatically
> >>>> become yesterday the next day. Then your traversal can have a stop
> >>>> evaluator
> >>>> to not follow old messages (unless your query is looking for old
> >> messages,
> >>>> of course). So the 100k messages might drop to only a few hundred, or
> >> even
> >>>> just a few dozen. Certainly that will be a query of the order of
> >>>> milliseconds!
> >>>>
> >>>> Moving away from the traverser, you also have the option to call
> >> directly
> >>>> the getRelationships() methods from the node. If you structure is
> >>>> predictable, like viewer-->FOLLOWS-->user-->CREATED-->message, then
> two
> >>>> nested for loops would work, the outer iterating over the followers
> and
> >> the
> >>>> inner iterating over the messages. If you changed to add a time-based
> >>>> interim node (which is a kind of graph-index), then you need to have
> >> three
> >>>> loops. If you made your time index a deeper tree (months->days->hours,
> >>>> etc.), then you would need to further refactor the code. However, if
> you
> >>>> stuck with a traverser, you might not need to change the traverser
> even
> >> of
> >>>> the graph structure changed, as long as the same relationship types
> were
> >>>> maintained. Does that make sense?
> >>>>
> >>>> Cheers, Craig
> >>>>
> >>>> On Wed, Mar 17, 2010 at 4:00 PM, Lincoln <linxbet...@gmail.com>
> wrote:
> >>>>
> >>>>> Thanks Craig,
> >>>>>
> >>>>> I'd like to clarify my question (I don't think it changes your answer
> >>>>> though).
> >>>>>
> >>>>> I wanted all messages visible to me created by users I follow.  Thus,
> >> the
> >>>>> FOLLOWS relationship is not enough.  I'd need to see messages that
> are
> >>>>> visible to me and then check if they were created by users I follow,
> or
> >>>> I'd
> >>>>> need to see messages created by users I follow and then see if
> they're
> >>>>> visible to me.
> >>>>>
> >>>>> I assume your last example still yields the result I'm looking for.
> >>>> Could
> >>>>> you describe what actually happens here though?  I'm unclear on what
> >> the
> >>>>> traversal looks like.  Would it first traverse every outgoing FOLLOWS
> >>>>> relationship from the viewer, yielding other users, and then traverse
> >> all
> >>>>> the CREATED relationships to get to messages?
> >>>>>
> >>>>> Also, given very large numbers of FOLLOWS and CREATED relationships
> >> (with
> >>>>> say, a twitter graph), how is this made efficient?
> >>>>>
> >>>>> Sorry for all the basic questions but I couldn't find this
> information
> >> in
> >>>>> the docs.  If there's something I should be reading before posting
> >> these
> >>>>> questions, please point me to it.
> >>>>>
> >>>>> Thanks!
> >>>>>
> >>>>> Lincoln
> >>>>>
> >>>>> On Wed, Mar 17, 2010 at 7:06 AM, Craig Taverner <cr...@amanzi.com>
> >>>> wrote:
> >>>>>
> >>>>>> I'm uncertain about one ambiguity in your model, you are able to
> find
> >>>>>> messages through FOLLOWS and IS_VISIBLE_BY. These will give two
> >>>> different
> >>>>>> sets, and my first impression was that FOLLOWS gives you the right
> >>>>> answer.
> >>>>>> In other words you want to query for 'all messages by users I
> follow'?
> >>>> In
> >>>>>> that case you do not need IS_VISIBLE_BY. However, if there are
> >> messages
> >>>>> by
> >>>>>> people you follow, but are not allowed to see, then you also need
> the
> >>>>>> IS_VISIBLE_BY. But I would still reconsider linking directly from
> the
> >>>>>> viewer
> >>>>>> to the message for that case. I'd rather have the messages linked to
> >>>> some
> >>>>>> categorization structure for things like 'public', 'private', etc.
> >>>>>>
> >>>>>> Anyway, here are some suggestions for the various approaches above:
> >>>>>> *'all messages by users I follow'*
> >>>>>> val msgs = viewer.traverse(
> >>>>>> Order.BREADTH_FIRST, StopEvaluator.END_OF_GRAPH,
> >>>>>> (tp: TraversalPosition) => IsMessage(tp.currentNode()),
> >>>>>> Rels.FOLLOWS, Direction.OUTGOING,
> >>>>>> Rels.CREATED, Direction.OUTGOING)
> >>>>>>
> >>>>>> *'all messages visible to me'*
> >>>>>> val msgs = viewer.traverse(
> >>>>>> Order.BREADTH_FIRST, StopEvaluator.END_OF_GRAPH,
> >>>>>> ReturnableEvaluator.ALL_BUT_START_NODE,
> >>>>>> Rels.IS_VISIBLE_BY, Direction.INCOMING)
> >>>>>>
> >>>>>> *'all messages, visible to me, by people I follow'*
> >>>>>> val msgs = viewer.traverse(
> >>>>>> Order.BREADTH_FIRST, StopEvaluator.END_OF_GRAPH,
> >>>>>> (tp: TraversalPosition) => {
> >>>>>> val msg = tp.currentNode()
> >>>>>> IsMessage(msg) && IsVisibleBy(msg,viewer)
> >>>>>> },
> >>>>>> Rels.FOLLOWS, Direction.OUTGOING,
> >>>>>> Rels.CREATED, Direction.OUTGOING)
> >>>>>>
> >>>>>> Of course I assume you make the utility functions IsMessage(node:
> >> Node)
> >>>>> and
> >>>>>> IsVisibleBy(msg: Node, user: Node), and these will test the
> existance
> >>>> of
> >>>>>> properties and relations as appropriate to make the decision.
> >>>>>>
> >>>>>>
> >>>>>> On Wed, Mar 17, 2010 at 6:32 AM, Lincoln <linxbet...@gmail.com>
> >> wrote:
> >>>>>>
> >>>>>>> Hi, I've just started looking at Neo4j and I'm quite intrigued.
> >>>>> However,
> >>>>>>> the cognitive dissonance that I've grown so used to in modeling
> >>>> storage
> >>>>>> is
> >>>>>>> proving to be a bit difficult to let go at this early stage :)
> >>>>>>>
> >>>>>>> I was hoping that if someone could help me through an example I
> would
> >>>>> be
> >>>>>>> able to grok how to properly structure my data and query it in
> Neo4j.
> >>>>>>>
> >>>>>>> Nodes:
> >>>>>>> Message( text: String )
> >>>>>>> User( id: Long )
> >>>>>>>
> >>>>>>> Relationships:
> >>>>>>> CREATED
> >>>>>>> FOLLOWS
> >>>>>>> IS_VISIBLE_BY
> >>>>>>>
> >>>>>>> So I might have a graph with entries like so:
> >>>>>>>
> >>>>>>> User(1) --> CREATED --> Message("i woke up late today")
> >>>>>>> User(2) --> CREATED --> Message("hello")
> >>>>>>> User(3) --> CREATED --> Message("ugh, i hate mondays")
> >>>>>>>
> >>>>>>> User(1) --> FOLLOWS --> User(2)
> >>>>>>>
> >>>>>>> Let's also say all messages are visible to User 1.
> >>>>>>>
> >>>>>>> Message("i woke up late today") --> IS_VISIBLE_BY --> User(1)
> >>>>>>> Message("hello") --> IS_VISIBLE_BY --> User(1)
> >>>>>>> Message("ugh, i hate mondays") --> IS_VISIBLE_BY --> User(1)
> >>>>>>>
> >>>>>>> So, I can do a simple traversal for visible:
> >>>>>>>
> >>>>>>> val graphDb = new EmbeddedGraphDatabase( "path/to/neo4j-db" )
> >>>>>>> val index = new LuceneIndexService( graphDb )
> >>>>>>> val viewer = index.getSingleNode("id", 1)
> >>>>>>> val msgs = viewer.traverse( Order.BREADTH_FIRST,
> >>>>>>> StopEvaluator.END_OF_GRAPH,
> >>>>>>> ReturnableEvaluator.ALL_BUT_START_NODE, Rels.IS_VISIBLE_BY,
> >>>>>>> Direction.INCOMING)
> >>>>>>> msgs.toList.map(_.toJson).mkString("{ msgs : [", ",", "] }")  //
> >>>>> assuming
> >>>>>> i
> >>>>>>> have the relevant functions
> >>>>>>>
> >>>>>>> But let's say that this is going to return too many messages.  Just
> >>>>>> because
> >>>>>>> all the messages are possibly visible to me, doesn't mean I want to
> >>>> see
> >>>>>>> them
> >>>>>>> all.  So, I'd like to additionally filter by the FOLLOWS
> >>>> relationship.
> >>>>>> I'd
> >>>>>>> like to express "get all messages that are visible and were created
> >>>> by
> >>>>> a
> >>>>>>> user that I follow."  Can someone show me an example of how to do
> >>>> that?
> >>>>>>>
> >>>>>>> I'm guessing that you need to implement a custom
> ReturnableEvaluator,
> >>>>> but
> >>>>>> I
> >>>>>>> don't understand how you traverse multiple relationships at the
> same
> >>>>>> time.
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> Lincoln
> >>>>>>> _______________________________________________
> >>>>>>> Neo mailing list
> >>>>>>> User@lists.neo4j.org
> >>>>>>> https://lists.neo4j.org/mailman/listinfo/user
> >>>>>>>
> >>>>>> _______________________________________________
> >>>>>> Neo mailing list
> >>>>>> User@lists.neo4j.org
> >>>>>> https://lists.neo4j.org/mailman/listinfo/user
> >>>>>>
> >>>>> _______________________________________________
> >>>>> Neo mailing list
> >>>>> User@lists.neo4j.org
> >>>>> https://lists.neo4j.org/mailman/listinfo/user
> >>>>>
> >>>> _______________________________________________
> >>>> Neo mailing list
> >>>> User@lists.neo4j.org
> >>>> https://lists.neo4j.org/mailman/listinfo/user
> >>>>
> >>> _______________________________________________
> >>> Neo mailing list
> >>> User@lists.neo4j.org
> >>> https://lists.neo4j.org/mailman/listinfo/user
> >>
> >> _______________________________________________
> >> Neo mailing list
> >> User@lists.neo4j.org
> >> https://lists.neo4j.org/mailman/listinfo/user
> >>
> > _______________________________________________
> > Neo mailing list
> > User@lists.neo4j.org
> > https://lists.neo4j.org/mailman/listinfo/user
>
> _______________________________________________
> Neo mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>
_______________________________________________
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo] basic questions

Reply via email to