Re: [Neo4j] shortestPath slower than it could be

John cyuczieekc Sun, 24 Jul 2011 06:43:51 -0700

Thanks Mattias. The way I understand what you said is, that swapping `one`
and `two` in *finder.findSinglePath( one, two );*  should yield the same
timing when Direction.BOTH is used; this would be great. But I don't see how
this is possible (due to not knowing how it's stored too) unless you know
how many rels each node has, OR even better storage is using BTree-s (?!)
Btw, is it possible/practical that neo4j could store IDs in BTrees ?
(someone said that for rels it's a double linked list instead- I'll need to
recheck)


irrelevant stuff follows (ie. don't read)
----
Until then , I lame-tested something (with both neo4j and berkeleydb):
a node `one` having 1mil rels to 1million unique nodes, and one more to a
node `two`, and node `two` having 10 incoming rels from other unique nodes,
+1 the one rel which was already from `one`, trying shortestPath between
`one` and `two` (not `two` and `one`) with Direction.BOTH
output:
just created 1,000,000 rels, took=58,381 ms  (1)
Path from one to two: (one)-->(two) 121,310 ms (2)
Path from one to two: (one)-->(two) timedelta=1,066 ms
Path from one to two: (one)-->(two) timedelta=795 ms
Path from one to two: (one)-->(two) timedelta=772 ms

(1) and (2) happened in the same transaction (ie. before tx.finish()), the
others after transaction finished (an in no new transaction - since they
were only reads) - this also means some caching must've happened from
before.
Because it happened in same transaction, adding 1 mil relationships is
slower than adding them in bursts of x, I am aware of this (and I'll try to
make a bench with that). For ie. 100k nodes neo4j is actually faster.
I even got this once (but in another modified bench):
first time creating the relationships...
just created 1,000,000 rels, took=49,307 ms (cpu was 100% here, instead of
the usual 77% limit)
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
Environment is about to shut down the neo4j database
Exception in thread "Thread-0" java.lang.OutOfMemoryError: GC overhead limit
exceeded
    at java.util.logging.LogManager.reset(LogManager.java:835)
    at java.util.logging.LogManager$Cleaner.run(LogManager.java:240)
Environment shutting down complete 23,962 ms

Exception: java.lang.OutOfMemoryError thrown from the
UncaughtExceptionHandler in thread "main"


this is the output when running cold, (ie. the data already existed):
search started
Path from one to two: (one)-->(two) timedelta=7,934 ms
search started
Path from one to two: (one)-->(two) timedelta=2,027 ms
search started
Path from one to two: (one)-->(two) timedelta=875 ms
Environment is about to shut down the neo4j database
Environment shutting down complete 2,971 ms

and the exact same thing when using berkeleydb (not using transactions
though):
just created 1,000,000 rels, took=29,418 ms
Path from one to two: true 0 ms
Path from one to two: true 0 ms
Path from one to two: true 0 ms
class org.bdb.BerkEnv$1 shutting down complete 0 ms  (most was 10 ms here)

and when cold:
Path from one to two: true 10 ms
Path from one to two: true 0 ms
Path from one to two: true 0 ms
class org.bdb.BerkEnv$1 shutting down complete 0 ms
(the most was 10ms on first(sometimes 0ms, 10ms, 0ms), least was 0ms on all)

in bdb, this is actually how bdb does the search (I don't need to make sure
I iterate or something similar, on the node `two` which has least incoming
rels), in fact I execute a search on both nodes, one after the other and
make sure one->two and two<-one (since I am using two databases), anyway, by
using db.getSearchBoth(...) the search is done *that *fast. I hope to see
this kind of fastness with neo4j too xD (unless I'm doing it wrong)

if I put neo4j to find shortest path on incoming on `two` then:
search started
Path from one to two: (two)<--(one) timedelta=20 ms
search started
Path from one to two: (two)<--(one) timedelta=0 ms
search started
Path from one to two: (two)<--(one) timedelta=0 ms
Environment is about to shut down the neo4j database
Environment shutting down complete 2,590 ms
(most was 20ms, least was 20ms on first)

TestLinkage.java (both progs) were used from here:
https://github.com/13th-floor/neo4john/commit/a9f4b274de1d6c9ec9f1ea4a338b5c42325f19a4

--------------

Here's a lame benchmark for neo4j when I added another node `three` which is
the first rel `one`->`three`, then added 1mil `one`->(random nodes) , and
then(as before) `one`->`two`
`three` has no other relationships
Also, creating nodes 10k per transaction until 1mil are reached.

first time creating the relationships...
just created 1,000,000 rels, took=28,889 ms  (btw my cpu is limited to 77%
of its normal speed; usually!)
closed transaction
search started
(one)-->(three) 3,234 ms
search started
(one)-->(two) 661 ms
search started
(three)<--(one) 10 ms
search started
(two)<--(one) 0 ms
search started
(one)-->(three) 631 ms
search started
(one)-->(two) 913 ms
search started
(three)<--(one) 0 ms
search started
(two)<--(one) 0 ms
search started
(one)-->(three) 610 ms
search started
(one)-->(two) 773 ms
search started
(three)<--(one) 0 ms
search started
(two)<--(one) 0 ms
Environment is about to shut down the neo4j database
Environment shutting down complete 3,801 ms

and on cold run:
search started
(one)-->(three) 8,454 ms
search started
(one)-->(two) 885 ms
search started
(three)<--(one) 0 ms
search started
(two)<--(one) 0 ms
search started
(one)-->(three) 752 ms
search started
(one)-->(two) 755 ms
search started
(three)<--(one) 0 ms
search started
(two)<--(one) 0 ms
search started
(one)-->(three) 804 ms
search started
(one)-->(two) 621 ms
search started
(three)<--(one) 0 ms
search started
(two)<--(one) 0 ms
Environment is about to shut down the neo4j database
Environment shutting down complete 3,141 ms
=====
and same thing in berkeleydb (no transactions tho):
first time creating the relationships...
just created 1,000,000 rels, took=29,658 ms
Path from one to three: true 10 ms
Path from one to two: true 0 ms
Path from three to one: false 0 ms
Path from two to one: false 0 ms
Path from one to three: true 0 ms
Path from one to two: true 0 ms
Path from three to one: false 0 ms
Path from two to one: false 0 ms
Path from one to three: true 0 ms
Path from one to two: true 0 ms
Path from three to one: false 0 ms
Path from two to one: false 0 ms
class org.bdb.BerkEnv$1 shutting down complete 152 ms

and on cold:
Path from one to three: true 10 ms
Path from one to two: true 10 ms
Path from three to one: false 0 ms
Path from two to one: false 0 ms
Path from one to three: true 0 ms
Path from one to two: true 0 ms
Path from three to one: false 0 ms
Path from two to one: false 0 ms
Path from one to three: true 0 ms
Path from one to two: true 0 ms
Path from three to one: false 0 ms
Path from two to one: false 0 ms
class org.bdb.BerkEnv$1 shutting down complete 0 ms


these progs are on:
https://github.com/13th-floor/neo4john/commit/d2fd5b27dcde5560e6ff980fe9320aedc4421ab7

Cheerios,
John.

Song of the day: Bic Runga - She Left On A Monday
PS: asserts were enable ie. vm arg "-ea" (no quotes)

On Sat, Jul 23, 2011 at 4:31 PM, Mattias Persson
<[email protected]>wrote:

> Hi John,
>
> the algorithm is written to dodge these kinds of pitfalls. Maybe there's
> some issue with the implementation, but in principal it should make no
> difference. I'll look at it when I get the time (I wrote that
> implementation).
>
> 2011/7/23 John cyuczieekc <[email protected]>
>
> > Hey guys, me bugging you again :)
> >
> > (This whole thing is kind of based on the lack of being able to get the
> > number of relationships a node has)
> >
> >  If I have two nodes, and the first one has 1 million outgoing
> > relationships
> > of the type X to 1 million unique/different nodes,
> > and the second node has 10 incoming relationships of type X (same type)
> of
> > which one is from the first node,
> > then using GraphAlgoFactory.shortestPath  (or suggest a better way?)
> >  How can I tell neo4j to iterate the search on the second node's incoming
> > rels simply because it has 10 relationships instead of 1 million, in
> order
> > to check if each relationship is in the form of firstNode-->secondNode ?
> >
> > For the case when first node has 100,000 relationships and second node
> has
> > 10,
> > it takes *1.7 seconds* for shortestPath to find the only one link between
> > them using:
> >
> > final PathFinder<Path> finder = GraphAlgoFactory.shortestPath(
> > Traversal.expanderForTypes( rel, Direction.OUTGOING  ), 1 );
> > final Path foundPath = finder.findSinglePath( *one, two* );
> >
> > I can put Direction.*BOTH *and get the same amount of time
> > *Path from one to two: (one)-->(two) timedelta=1,862,726,634 ns*
> >
> > *BUT*, get this: if I swap the nodes:
> > finder.findSinglePath(* two, one*);
> > and i use either Direction.INCOMING or Direction.*BOTH  *(which makes
> sense
> > for the second node ,right) then I get *20ms* the time until it
> finishes...
> > *Path from one to two: (two)<--(one) timedelta=20,830,111 ns*
> >
> > (both cases are without data being priorly cached)
> >
> > I was expecting it to act like this: (but only when using Direction.BOTH)
> >  see which node has the least number of relationships and iterate on
> those,
> > but this would work if findSinglePath would be made for depth 1 (aka
> > particular case), but as I read "Tries to find a single path between
> > startand
> > end nodes." then it makes sense to me why it works like it does... that
> is,
> > iterate on relationships from start node, rather than from end node...
> but
> > I'm not sure if it would *not *make sense to iterate on the end node
> > instead
> > of start node, when knowing that end node has less relationships, for
> make
> > the search faster (well at least if depth is one) - I didn't look into
> how
> > neo4j actually does stuff yet :D
> >
> > anyway, it's fairly clear to me that I could make a simple wrapper method
> > to
> > make this kind of search faster, *IF* I had the ability to know how many
> > relationships each node has, so I can call findSinglePath  with the first
> > param being the node with the least relationship count :) But as I
> > understood it, it's not possible to find how many rels a node has...
> gimme
> > feat! :)) [by not possible I mean, without having to iterate thru all and
> > count them, which would make the use case here obsolete]
> >
> > PS: clearly all the text I wrote here would benefit from being
> represented
> > by a graph, just think about all those grouping with autohiding the ie.
> > "[]"
> > and all kinds of stuff... heh
> > _______________________________________________
> > Neo4j mailing list
> > [email protected]
> > https://lists.neo4j.org/mailman/listinfo/user
> >
>
>
>
> --
> Mattias Persson, [[email protected]]
> Hacker, Neo Technology
> www.neotechnology.com
> _______________________________________________
> Neo4j mailing list
> [email protected]
> https://lists.neo4j.org/mailman/listinfo/user
>
_______________________________________________
Neo4j mailing list
[email protected]
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo4j] shortestPath slower than it could be

Reply via email to