Re: [Neo] basic questions

Marko Rodriguez Wed, 17 Mar 2010 15:04:37 -0700

Hey Craig,

That looks like this thing called Linq (some Microsoft .NET thing -- 
http://en.wikipedia.org/wiki/Language_Integrated_Query ). It allows you to 
"talk all SQL-like" using dot notation. I don't know much about it, but seems 
super useful for those who like that type of graph searching. However, is that 
just "typical" Ruby?


Take care,
Marko.

http://markorodriguez.com

On Mar 17, 2010, at 3:00 PM, Craig Taverner wrote:

> This is a cool idea. Seems a bit like the pattern matching stuff in neo4j,
> except you setup a traversal pattern. We have done a similar thing in Ruby
> with a set of nested closures that each define the starting node for the
> traversal of the outer closure, allowing a kind of multi-step traversal (or
> chain of traversers). Here is an example we used to find the data required
> for a specific bar-chart deep in a project:
> 
> chart 'Distribution analysis' do
>    self.domain_axis='categories'
>    self.range_axis='values'
>    select 'First dataset',:categories=>'name',:values=>'value' do
>      from {
>        from {
>          traverse(:outgoing,:CHILD,1)
>          where {type=='gis' and name=='network.csv'}
>        }
>        traverse(:outgoing,:AGGREGATION,1)
>        where {name=='azimuth' and get_property(:select)=='max' and
> distribute=='auto'}
>      }
>      traverse(:outgoing,:CHILD,:all)
>    end
>  end
> 
> 
> 
> On Wed, Mar 17, 2010 at 10:37 PM, Marko Rodriguez <okramma...@gmail.com>wrote:
> 
>> Hey,
>> 
>> You might want to consider Blueprints Pipes for a more controlled traverser
>> framework that doesn't require the use of for-loops and allows you to
>> specify arbitrary paths through a graph.
>> 
>> http://wiki.github.com/tinkerpop/blueprints/pipes-traversal-framework
>> 
>> For the example viewer-->FOLLOWS-->user-->CREATED-->message do,
>> 
>> //////////////////////////////////////////
>> Pipe<Vertex,Edge> pipe1 = new
>> VertexEdgePipe(VertexEdgePipe.Step.OUT_EDGES);
>> Pipe<Edge,Edge> pipe2 = new LabelFilterPipe(Arrays.asList("FOLLOWS"),
>> false);
>> Pipe<Edge,Vertex> pipe3 = new
>> EdgeVertexPipe(EdgeVertexPipe.Step.IN_VERTEX);
>> Pipe<Vertex,Edge> pipe4 = new
>> VertexEdgePipe(VertexEdgePipe.Step.OUT_EDGES);
>> Pipe<Edge,Edge> pipe5 = new LabelFilterPipe(Arrays.asList("CREATED"),
>> false);
>> Pipe<Edge,Vertex> pipe6 = new
>> EdgeVertexPipe(EdgeVertexPipe.Step.IN_VERTEX);
>> Pipeline<Vertex,Vertex> pipeline = new
>> Pipeline<Vertex,Vertex>(Arrays.asList(pipe1,pipe2,pipe3,pipe4,pipe5,pipe6));
>> 
>> Neo4jGraph graph = new Neo4jGraph('/dir/neo')
>> graph.startTransaction();
>> pipeline.setStarts(Arrays.asList(viewer).iterator());
>> while(pipeline.hasNext()) {
>>       System.out.println("message: " + pipeline.next());
>> }
>> graph.stopTransaction(true);
>> //////////////////////////////////////////
>> 
>> NOTE: I hand typed this from memory so there might be some errors here or
>> there....
>> 
>> A Pipe/Pipeline implements Iterator so you can just page out as many items
>> as you want that can legally flow through pipeline...
>> 
>> If this is interesting to you and if you use MVN and Git, you may want to
>> build the latest and greatest of Blueprints [
>> http://blueprints.tinkerpop.com ] as I continually add new Pipes to do new
>> things [ see
>> http://wiki.github.com/tinkerpop/blueprints/pipes-traversal-framework#pipes_api].
>> 
>> Also, a non-iterator based mechanism is provided by Gremlin [
>> http://gremlin.tinkerpop.com ] which would express the same thing as:
>> 
>> $messages := ./ou...@label='FOLLOWS']/inV/ou...@label='CREATED']/inV
>> 
>> Take care,
>> Marko.
>> 
>> http://markorodriguez.com
>> http://gremlin.tinkerpop.com
>> 
>> 
>> On Mar 17, 2010, at 2:19 PM, Lincoln wrote:
>> 
>>> Wow dude, this is blowing my mind just a little.
>>> 
>>> Ok, sticking with the twitter example, I'm concerned about the edge
>> cases.
>>> I'd say it's easy to optimize with a relational db or any other storage
>> for
>>> that matter if I make the assumption that people only follow a few
>> hundred
>>> people and only want recent messages.  However some people follow
>> hundreds
>>> of thousands of people.  If Guy Kawasaki uses my app, I'd run into a
>> problem
>>> quickly.
>>> 
>>> However I see your point that I don't have to limit myself to just the
>>> obvious relationships, but can create relationships that serve specific
>>> purposes and use-cases such as your day example.  I'm not sure how I
>> would
>>> want to model my use-case to allow for Guy Kawaski, I'll have to think
>> more
>>> about it.  Is there a threshold beyond which adding relationships between
>>> nodes causes problems?  If not, or if it's high, you could create custom
>>> relationships for every type of query you'd want to do.
>>> 
>>> However, a secondary question comes up.  If we continue with the twitter
>>> example, and I want to be able to page through results, is that directly
>>> supported through Neo4j's API?  Coming from a more traditional storage
>>> background I tend to think of what I'd want as a sort by time and then a
>>> skip and limit on the results (so I could say give me messages 1-100
>> sorted
>>> by time descending).  Is there anything equivalent in Neo4j or is the
>>> approach totally different?
>>> 
>>> Thanks,
>>> Lincoln
>>> 
>>> 
>>> On Wed, Mar 17, 2010 at 12:41 PM, Craig Taverner <cr...@amanzi.com>
>> wrote:
>>> 
>>>> Hi Lincoln,
>>>> 
>>>> So it sounds like you don't need the IS_VISIBLE relations after all. The
>>>> traverser works by following all relationships of the specified types
>> and
>>>> directions from each current node (as you traverse, or walk the graph).
>> You
>>>> can have a complex graph and traverse to high depth very fast (thousands
>> of
>>>> relationships per second). The traverser will also automatically check
>> that
>>>> the same node is not returned twice. The test for the relationship type
>> is
>>>> efficient. Still reasonable, but less efficient is the custom test you
>>>> might
>>>> put in the returnable evaluator, but if the limiting factor is usually
>> the
>>>> number of relationships traversed, and if that is kept managable, the
>>>> evaluator test is no concern.
>>>> 
>>>> I think twitter is a good case in point, even with many millions of
>> users,
>>>> you will still only follow perhaps a hundred and they will tweet perhaps
>> a
>>>> hundred, or a thousand times, so your traverser will find the 10k-100k
>>>> messages quite quickly. This can be speeded up further, but the right
>>>> approach depends again on your use case. The idea with using a graph
>>>> database is that the actual usage probably maps very well to the graph
>>>> structure, so when deciding how to speed up your search, consider how it
>>>> will be used. In twitter one normally only cares about recent messages,
>> so
>>>> how about not linking directly from the user to the message, but link to
>> an
>>>> intermediate node representing time, for example, a day-node. Then each
>> new
>>>> message is added to the day node for that day, and that will
>> automatically
>>>> become yesterday the next day. Then your traversal can have a stop
>>>> evaluator
>>>> to not follow old messages (unless your query is looking for old
>> messages,
>>>> of course). So the 100k messages might drop to only a few hundred, or
>> even
>>>> just a few dozen. Certainly that will be a query of the order of
>>>> milliseconds!
>>>> 
>>>> Moving away from the traverser, you also have the option to call
>> directly
>>>> the getRelationships() methods from the node. If you structure is
>>>> predictable, like viewer-->FOLLOWS-->user-->CREATED-->message, then two
>>>> nested for loops would work, the outer iterating over the followers and
>> the
>>>> inner iterating over the messages. If you changed to add a time-based
>>>> interim node (which is a kind of graph-index), then you need to have
>> three
>>>> loops. If you made your time index a deeper tree (months->days->hours,
>>>> etc.), then you would need to further refactor the code. However, if you
>>>> stuck with a traverser, you might not need to change the traverser even
>> of
>>>> the graph structure changed, as long as the same relationship types were
>>>> maintained. Does that make sense?
>>>> 
>>>> Cheers, Craig
>>>> 
>>>> On Wed, Mar 17, 2010 at 4:00 PM, Lincoln <linxbet...@gmail.com> wrote:
>>>> 
>>>>> Thanks Craig,
>>>>> 
>>>>> I'd like to clarify my question (I don't think it changes your answer
>>>>> though).
>>>>> 
>>>>> I wanted all messages visible to me created by users I follow.  Thus,
>> the
>>>>> FOLLOWS relationship is not enough.  I'd need to see messages that are
>>>>> visible to me and then check if they were created by users I follow, or
>>>> I'd
>>>>> need to see messages created by users I follow and then see if they're
>>>>> visible to me.
>>>>> 
>>>>> I assume your last example still yields the result I'm looking for.
>>>> Could
>>>>> you describe what actually happens here though?  I'm unclear on what
>> the
>>>>> traversal looks like.  Would it first traverse every outgoing FOLLOWS
>>>>> relationship from the viewer, yielding other users, and then traverse
>> all
>>>>> the CREATED relationships to get to messages?
>>>>> 
>>>>> Also, given very large numbers of FOLLOWS and CREATED relationships
>> (with
>>>>> say, a twitter graph), how is this made efficient?
>>>>> 
>>>>> Sorry for all the basic questions but I couldn't find this information
>> in
>>>>> the docs.  If there's something I should be reading before posting
>> these
>>>>> questions, please point me to it.
>>>>> 
>>>>> Thanks!
>>>>> 
>>>>> Lincoln
>>>>> 
>>>>> On Wed, Mar 17, 2010 at 7:06 AM, Craig Taverner <cr...@amanzi.com>
>>>> wrote:
>>>>> 
>>>>>> I'm uncertain about one ambiguity in your model, you are able to find
>>>>>> messages through FOLLOWS and IS_VISIBLE_BY. These will give two
>>>> different
>>>>>> sets, and my first impression was that FOLLOWS gives you the right
>>>>> answer.
>>>>>> In other words you want to query for 'all messages by users I follow'?
>>>> In
>>>>>> that case you do not need IS_VISIBLE_BY. However, if there are
>> messages
>>>>> by
>>>>>> people you follow, but are not allowed to see, then you also need the
>>>>>> IS_VISIBLE_BY. But I would still reconsider linking directly from the
>>>>>> viewer
>>>>>> to the message for that case. I'd rather have the messages linked to
>>>> some
>>>>>> categorization structure for things like 'public', 'private', etc.
>>>>>> 
>>>>>> Anyway, here are some suggestions for the various approaches above:
>>>>>> *'all messages by users I follow'*
>>>>>> val msgs = viewer.traverse(
>>>>>> Order.BREADTH_FIRST, StopEvaluator.END_OF_GRAPH,
>>>>>> (tp: TraversalPosition) => IsMessage(tp.currentNode()),
>>>>>> Rels.FOLLOWS, Direction.OUTGOING,
>>>>>> Rels.CREATED, Direction.OUTGOING)
>>>>>> 
>>>>>> *'all messages visible to me'*
>>>>>> val msgs = viewer.traverse(
>>>>>> Order.BREADTH_FIRST, StopEvaluator.END_OF_GRAPH,
>>>>>> ReturnableEvaluator.ALL_BUT_START_NODE,
>>>>>> Rels.IS_VISIBLE_BY, Direction.INCOMING)
>>>>>> 
>>>>>> *'all messages, visible to me, by people I follow'*
>>>>>> val msgs = viewer.traverse(
>>>>>> Order.BREADTH_FIRST, StopEvaluator.END_OF_GRAPH,
>>>>>> (tp: TraversalPosition) => {
>>>>>> val msg = tp.currentNode()
>>>>>> IsMessage(msg) && IsVisibleBy(msg,viewer)
>>>>>> },
>>>>>> Rels.FOLLOWS, Direction.OUTGOING,
>>>>>> Rels.CREATED, Direction.OUTGOING)
>>>>>> 
>>>>>> Of course I assume you make the utility functions IsMessage(node:
>> Node)
>>>>> and
>>>>>> IsVisibleBy(msg: Node, user: Node), and these will test the existance
>>>> of
>>>>>> properties and relations as appropriate to make the decision.
>>>>>> 
>>>>>> 
>>>>>> On Wed, Mar 17, 2010 at 6:32 AM, Lincoln <linxbet...@gmail.com>
>> wrote:
>>>>>> 
>>>>>>> Hi, I've just started looking at Neo4j and I'm quite intrigued.
>>>>> However,
>>>>>>> the cognitive dissonance that I've grown so used to in modeling
>>>> storage
>>>>>> is
>>>>>>> proving to be a bit difficult to let go at this early stage :)
>>>>>>> 
>>>>>>> I was hoping that if someone could help me through an example I would
>>>>> be
>>>>>>> able to grok how to properly structure my data and query it in Neo4j.
>>>>>>> 
>>>>>>> Nodes:
>>>>>>> Message( text: String )
>>>>>>> User( id: Long )
>>>>>>> 
>>>>>>> Relationships:
>>>>>>> CREATED
>>>>>>> FOLLOWS
>>>>>>> IS_VISIBLE_BY
>>>>>>> 
>>>>>>> So I might have a graph with entries like so:
>>>>>>> 
>>>>>>> User(1) --> CREATED --> Message("i woke up late today")
>>>>>>> User(2) --> CREATED --> Message("hello")
>>>>>>> User(3) --> CREATED --> Message("ugh, i hate mondays")
>>>>>>> 
>>>>>>> User(1) --> FOLLOWS --> User(2)
>>>>>>> 
>>>>>>> Let's also say all messages are visible to User 1.
>>>>>>> 
>>>>>>> Message("i woke up late today") --> IS_VISIBLE_BY --> User(1)
>>>>>>> Message("hello") --> IS_VISIBLE_BY --> User(1)
>>>>>>> Message("ugh, i hate mondays") --> IS_VISIBLE_BY --> User(1)
>>>>>>> 
>>>>>>> So, I can do a simple traversal for visible:
>>>>>>> 
>>>>>>> val graphDb = new EmbeddedGraphDatabase( "path/to/neo4j-db" )
>>>>>>> val index = new LuceneIndexService( graphDb )
>>>>>>> val viewer = index.getSingleNode("id", 1)
>>>>>>> val msgs = viewer.traverse( Order.BREADTH_FIRST,
>>>>>>> StopEvaluator.END_OF_GRAPH,
>>>>>>> ReturnableEvaluator.ALL_BUT_START_NODE, Rels.IS_VISIBLE_BY,
>>>>>>> Direction.INCOMING)
>>>>>>> msgs.toList.map(_.toJson).mkString("{ msgs : [", ",", "] }")  //
>>>>> assuming
>>>>>> i
>>>>>>> have the relevant functions
>>>>>>> 
>>>>>>> But let's say that this is going to return too many messages.  Just
>>>>>> because
>>>>>>> all the messages are possibly visible to me, doesn't mean I want to
>>>> see
>>>>>>> them
>>>>>>> all.  So, I'd like to additionally filter by the FOLLOWS
>>>> relationship.
>>>>>> I'd
>>>>>>> like to express "get all messages that are visible and were created
>>>> by
>>>>> a
>>>>>>> user that I follow."  Can someone show me an example of how to do
>>>> that?
>>>>>>> 
>>>>>>> I'm guessing that you need to implement a custom ReturnableEvaluator,
>>>>> but
>>>>>> I
>>>>>>> don't understand how you traverse multiple relationships at the same
>>>>>> time.
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Lincoln
>>>>>>> _______________________________________________
>>>>>>> Neo mailing list
>>>>>>> User@lists.neo4j.org
>>>>>>> https://lists.neo4j.org/mailman/listinfo/user
>>>>>>> 
>>>>>> _______________________________________________
>>>>>> Neo mailing list
>>>>>> User@lists.neo4j.org
>>>>>> https://lists.neo4j.org/mailman/listinfo/user
>>>>>> 
>>>>> _______________________________________________
>>>>> Neo mailing list
>>>>> User@lists.neo4j.org
>>>>> https://lists.neo4j.org/mailman/listinfo/user
>>>>> 
>>>> _______________________________________________
>>>> Neo mailing list
>>>> User@lists.neo4j.org
>>>> https://lists.neo4j.org/mailman/listinfo/user
>>>> 
>>> _______________________________________________
>>> Neo mailing list
>>> User@lists.neo4j.org
>>> https://lists.neo4j.org/mailman/listinfo/user
>> 
>> _______________________________________________
>> Neo mailing list
>> User@lists.neo4j.org
>> https://lists.neo4j.org/mailman/listinfo/user
>> 
> _______________________________________________
> Neo mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user

_______________________________________________
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo] basic questions

Reply via email to