Re: [Neo4j] shortestPath slower than it could be

Niels Hoogeveen Sun, 24 Jul 2011 07:45:19 -0700
Are you sure this is true, Mattias?The response time of a getRelationship call 
depends on the total number of relationships on the node. So it makes a 
difference which side of the relationship makes the call. It is always faster 
to ask it from the side that has the lowest total number of relationships 
attached. This is even true, if for both sides of the relationship there is 
only one relationship of that particular relationship type.Niels
> Date: Sun, 24 Jul 2011 16:10:42 +0200
> From: [email protected]
> To: [email protected]
> Subject: Re: [Neo4j] shortestPath slower than it could be
> 
> What I'm saying is that:
> 
>   one --> two    (one,two) for OUTGOING
> 
> and
> 
>   two <-- one   (two,one) for INCOMING
> 
> should yield the same timing.
> 
> 2011/7/24 John cyuczieekc <[email protected]>
> 
> > Thanks Mattias. The way I understand what you said is, that swapping `one`
> > and `two` in *finder.findSinglePath( one, two );*  should yield the same
> > timing when Direction.BOTH is used; this would be great. But I don't see
> > how
> > this is possible (due to not knowing how it's stored too) unless you know
> > how many rels each node has, OR even better storage is using BTree-s (?!)
> > Btw, is it possible/practical that neo4j could store IDs in BTrees ?
> > (someone said that for rels it's a double linked list instead- I'll need to
> > recheck)
> >
> > irrelevant stuff follows (ie. don't read)
> > ----
> > Until then , I lame-tested something (with both neo4j and berkeleydb):
> > a node `one` having 1mil rels to 1million unique nodes, and one more to a
> > node `two`, and node `two` having 10 incoming rels from other unique nodes,
> > +1 the one rel which was already from `one`, trying shortestPath between
> > `one` and `two` (not `two` and `one`) with Direction.BOTH
> > output:
> > just created 1,000,000 rels, took=58,381 ms  (1)
> > Path from one to two: (one)-->(two) 121,310 ms (2)
> > Path from one to two: (one)-->(two) timedelta=1,066 ms
> > Path from one to two: (one)-->(two) timedelta=795 ms
> > Path from one to two: (one)-->(two) timedelta=772 ms
> >
> > (1) and (2) happened in the same transaction (ie. before tx.finish()), the
> > others after transaction finished (an in no new transaction - since they
> > were only reads) - this also means some caching must've happened from
> > before.
> > Because it happened in same transaction, adding 1 mil relationships is
> > slower than adding them in bursts of x, I am aware of this (and I'll try to
> > make a bench with that). For ie. 100k nodes neo4j is actually faster.
> > I even got this once (but in another modified bench):
> > first time creating the relationships...
> > just created 1,000,000 rels, took=49,307 ms (cpu was 100% here, instead of
> > the usual 77% limit)
> > Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
> > Environment is about to shut down the neo4j database
> > Exception in thread "Thread-0" java.lang.OutOfMemoryError: GC overhead
> > limit
> > exceeded
> >    at java.util.logging.LogManager.reset(LogManager.java:835)
> >    at java.util.logging.LogManager$Cleaner.run(LogManager.java:240)
> > Environment shutting down complete 23,962 ms
> >
> > Exception: java.lang.OutOfMemoryError thrown from the
> > UncaughtExceptionHandler in thread "main"
> >
> >
> > this is the output when running cold, (ie. the data already existed):
> > search started
> > Path from one to two: (one)-->(two) timedelta=7,934 ms
> > search started
> > Path from one to two: (one)-->(two) timedelta=2,027 ms
> > search started
> > Path from one to two: (one)-->(two) timedelta=875 ms
> > Environment is about to shut down the neo4j database
> > Environment shutting down complete 2,971 ms
> >
> > and the exact same thing when using berkeleydb (not using transactions
> > though):
> > just created 1,000,000 rels, took=29,418 ms
> > Path from one to two: true 0 ms
> > Path from one to two: true 0 ms
> > Path from one to two: true 0 ms
> > class org.bdb.BerkEnv$1 shutting down complete 0 ms  (most was 10 ms here)
> >
> > and when cold:
> > Path from one to two: true 10 ms
> > Path from one to two: true 0 ms
> > Path from one to two: true 0 ms
> > class org.bdb.BerkEnv$1 shutting down complete 0 ms
> > (the most was 10ms on first(sometimes 0ms, 10ms, 0ms), least was 0ms on
> > all)
> >
> > in bdb, this is actually how bdb does the search (I don't need to make sure
> > I iterate or something similar, on the node `two` which has least incoming
> > rels), in fact I execute a search on both nodes, one after the other and
> > make sure one->two and two<-one (since I am using two databases), anyway,
> > by
> > using db.getSearchBoth(...) the search is done *that *fast. I hope to see
> > this kind of fastness with neo4j too xD (unless I'm doing it wrong)
> >
> > if I put neo4j to find shortest path on incoming on `two` then:
> > search started
> > Path from one to two: (two)<--(one) timedelta=20 ms
> > search started
> > Path from one to two: (two)<--(one) timedelta=0 ms
> > search started
> > Path from one to two: (two)<--(one) timedelta=0 ms
> > Environment is about to shut down the neo4j database
> > Environment shutting down complete 2,590 ms
> > (most was 20ms, least was 20ms on first)
> >
> > TestLinkage.java (both progs) were used from here:
> >
> > https://github.com/13th-floor/neo4john/commit/a9f4b274de1d6c9ec9f1ea4a338b5c42325f19a4
> >
> > --------------
> >
> > Here's a lame benchmark for neo4j when I added another node `three` which
> > is
> > the first rel `one`->`three`, then added 1mil `one`->(random nodes) , and
> > then(as before) `one`->`two`
> > `three` has no other relationships
> > Also, creating nodes 10k per transaction until 1mil are reached.
> >
> > first time creating the relationships...
> > just created 1,000,000 rels, took=28,889 ms  (btw my cpu is limited to 77%
> > of its normal speed; usually!)
> > closed transaction
> > search started
> > (one)-->(three) 3,234 ms
> > search started
> > (one)-->(two) 661 ms
> > search started
> > (three)<--(one) 10 ms
> > search started
> > (two)<--(one) 0 ms
> > search started
> > (one)-->(three) 631 ms
> > search started
> > (one)-->(two) 913 ms
> > search started
> > (three)<--(one) 0 ms
> > search started
> > (two)<--(one) 0 ms
> > search started
> > (one)-->(three) 610 ms
> > search started
> > (one)-->(two) 773 ms
> > search started
> > (three)<--(one) 0 ms
> > search started
> > (two)<--(one) 0 ms
> > Environment is about to shut down the neo4j database
> > Environment shutting down complete 3,801 ms
> >
> > and on cold run:
> > search started
> > (one)-->(three) 8,454 ms
> > search started
> > (one)-->(two) 885 ms
> > search started
> > (three)<--(one) 0 ms
> > search started
> > (two)<--(one) 0 ms
> > search started
> > (one)-->(three) 752 ms
> > search started
> > (one)-->(two) 755 ms
> > search started
> > (three)<--(one) 0 ms
> > search started
> > (two)<--(one) 0 ms
> > search started
> > (one)-->(three) 804 ms
> > search started
> > (one)-->(two) 621 ms
> > search started
> > (three)<--(one) 0 ms
> > search started
> > (two)<--(one) 0 ms
> > Environment is about to shut down the neo4j database
> > Environment shutting down complete 3,141 ms
> > =====
> > and same thing in berkeleydb (no transactions tho):
> > first time creating the relationships...
> > just created 1,000,000 rels, took=29,658 ms
> > Path from one to three: true 10 ms
> > Path from one to two: true 0 ms
> > Path from three to one: false 0 ms
> > Path from two to one: false 0 ms
> > Path from one to three: true 0 ms
> > Path from one to two: true 0 ms
> > Path from three to one: false 0 ms
> > Path from two to one: false 0 ms
> > Path from one to three: true 0 ms
> > Path from one to two: true 0 ms
> > Path from three to one: false 0 ms
> > Path from two to one: false 0 ms
> > class org.bdb.BerkEnv$1 shutting down complete 152 ms
> >
> > and on cold:
> > Path from one to three: true 10 ms
> > Path from one to two: true 10 ms
> > Path from three to one: false 0 ms
> > Path from two to one: false 0 ms
> > Path from one to three: true 0 ms
> > Path from one to two: true 0 ms
> > Path from three to one: false 0 ms
> > Path from two to one: false 0 ms
> > Path from one to three: true 0 ms
> > Path from one to two: true 0 ms
> > Path from three to one: false 0 ms
> > Path from two to one: false 0 ms
> > class org.bdb.BerkEnv$1 shutting down complete 0 ms
> >
> >
> > these progs are on:
> >
> > https://github.com/13th-floor/neo4john/commit/d2fd5b27dcde5560e6ff980fe9320aedc4421ab7
> >
> > Cheerios,
> > John.
> >
> > Song of the day: Bic Runga - She Left On A Monday
> > PS: asserts were enable ie. vm arg "-ea" (no quotes)
> >
> > On Sat, Jul 23, 2011 at 4:31 PM, Mattias Persson
> > <[email protected]>wrote:
> >
> > > Hi John,
> > >
> > > the algorithm is written to dodge these kinds of pitfalls. Maybe there's
> > > some issue with the implementation, but in principal it should make no
> > > difference. I'll look at it when I get the time (I wrote that
> > > implementation).
> > >
> > > 2011/7/23 John cyuczieekc <[email protected]>
> > >
> > > > Hey guys, me bugging you again :)
> > > >
> > > > (This whole thing is kind of based on the lack of being able to get the
> > > > number of relationships a node has)
> > > >
> > > >  If I have two nodes, and the first one has 1 million outgoing
> > > > relationships
> > > > of the type X to 1 million unique/different nodes,
> > > > and the second node has 10 incoming relationships of type X (same type)
> > > of
> > > > which one is from the first node,
> > > > then using GraphAlgoFactory.shortestPath  (or suggest a better way?)
> > > >  How can I tell neo4j to iterate the search on the second node's
> > incoming
> > > > rels simply because it has 10 relationships instead of 1 million, in
> > > order
> > > > to check if each relationship is in the form of firstNode-->secondNode
> > ?
> > > >
> > > > For the case when first node has 100,000 relationships and second node
> > > has
> > > > 10,
> > > > it takes *1.7 seconds* for shortestPath to find the only one link
> > between
> > > > them using:
> > > >
> > > > final PathFinder<Path> finder = GraphAlgoFactory.shortestPath(
> > > > Traversal.expanderForTypes( rel, Direction.OUTGOING  ), 1 );
> > > > final Path foundPath = finder.findSinglePath( *one, two* );
> > > >
> > > > I can put Direction.*BOTH *and get the same amount of time
> > > > *Path from one to two: (one)-->(two) timedelta=1,862,726,634 ns*
> > > >
> > > > *BUT*, get this: if I swap the nodes:
> > > > finder.findSinglePath(* two, one*);
> > > > and i use either Direction.INCOMING or Direction.*BOTH  *(which makes
> > > sense
> > > > for the second node ,right) then I get *20ms* the time until it
> > > finishes...
> > > > *Path from one to two: (two)<--(one) timedelta=20,830,111 ns*
> > > >
> > > > (both cases are without data being priorly cached)
> > > >
> > > > I was expecting it to act like this: (but only when using
> > Direction.BOTH)
> > > >  see which node has the least number of relationships and iterate on
> > > those,
> > > > but this would work if findSinglePath would be made for depth 1 (aka
> > > > particular case), but as I read "Tries to find a single path between
> > > > startand
> > > > end nodes." then it makes sense to me why it works like it does... that
> > > is,
> > > > iterate on relationships from start node, rather than from end node...
> > > but
> > > > I'm not sure if it would *not *make sense to iterate on the end node
> > > > instead
> > > > of start node, when knowing that end node has less relationships, for
> > > make
> > > > the search faster (well at least if depth is one) - I didn't look into
> > > how
> > > > neo4j actually does stuff yet :D
> > > >
> > > > anyway, it's fairly clear to me that I could make a simple wrapper
> > method
> > > > to
> > > > make this kind of search faster, *IF* I had the ability to know how
> > many
> > > > relationships each node has, so I can call findSinglePath  with the
> > first
> > > > param being the node with the least relationship count :) But as I
> > > > understood it, it's not possible to find how many rels a node has...
> > > gimme
> > > > feat! :)) [by not possible I mean, without having to iterate thru all
> > and
> > > > count them, which would make the use case here obsolete]
> > > >
> > > > PS: clearly all the text I wrote here would benefit from being
> > > represented
> > > > by a graph, just think about all those grouping with autohiding the ie.
> > > > "[]"
> > > > and all kinds of stuff... heh
> > > > _______________________________________________
> > > > Neo4j mailing list
> > > > [email protected]
> > > > https://lists.neo4j.org/mailman/listinfo/user
> > > >
> > >
> > >
> > >
> > > --
> > > Mattias Persson, [[email protected]]
> > > Hacker, Neo Technology
> > > www.neotechnology.com
> > > _______________________________________________
> > > Neo4j mailing list
> > > [email protected]
> > > https://lists.neo4j.org/mailman/listinfo/user
> > >
> > _______________________________________________
> > Neo4j mailing list
> > [email protected]
> > https://lists.neo4j.org/mailman/listinfo/user
> >
> 
> 
> 
> -- 
> Mattias Persson, [[email protected]]
> Hacker, Neo Technology
> www.neotechnology.com
> _______________________________________________
> Neo4j mailing list
> [email protected]
> https://lists.neo4j.org/mailman/listinfo/user
                                          
_______________________________________________
Neo4j mailing list
[email protected]
https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] shortestPath slower than it could be

Reply via email to