Re: [Neo4j] shortestPath slower than it could be
Thanks Mattias. The way I understand what you said is, that swapping `one` and `two` in *finder.findSinglePath( one, two );* should yield the same timing when Direction.BOTH is used; this would be great. But I don't see how this is possible (due to not knowing how it's stored too) unless you know how many rels each node has, OR even better storage is using BTree-s (?!) Btw, is it possible/practical that neo4j could store IDs in BTrees ? (someone said that for rels it's a double linked list instead- I'll need to recheck) irrelevant stuff follows (ie. don't read) Until then , I lame-tested something (with both neo4j and berkeleydb): a node `one` having 1mil rels to 1million unique nodes, and one more to a node `two`, and node `two` having 10 incoming rels from other unique nodes, +1 the one rel which was already from `one`, trying shortestPath between `one` and `two` (not `two` and `one`) with Direction.BOTH output: just created 1,000,000 rels, took=58,381 ms (1) Path from one to two: (one)--(two) 121,310 ms (2) Path from one to two: (one)--(two) timedelta=1,066 ms Path from one to two: (one)--(two) timedelta=795 ms Path from one to two: (one)--(two) timedelta=772 ms (1) and (2) happened in the same transaction (ie. before tx.finish()), the others after transaction finished (an in no new transaction - since they were only reads) - this also means some caching must've happened from before. Because it happened in same transaction, adding 1 mil relationships is slower than adding them in bursts of x, I am aware of this (and I'll try to make a bench with that). For ie. 100k nodes neo4j is actually faster. I even got this once (but in another modified bench): first time creating the relationships... just created 1,000,000 rels, took=49,307 ms (cpu was 100% here, instead of the usual 77% limit) Exception in thread main java.lang.OutOfMemoryError: Java heap space Environment is about to shut down the neo4j database Exception in thread Thread-0 java.lang.OutOfMemoryError: GC overhead limit exceeded at java.util.logging.LogManager.reset(LogManager.java:835) at java.util.logging.LogManager$Cleaner.run(LogManager.java:240) Environment shutting down complete 23,962 ms Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread main this is the output when running cold, (ie. the data already existed): search started Path from one to two: (one)--(two) timedelta=7,934 ms search started Path from one to two: (one)--(two) timedelta=2,027 ms search started Path from one to two: (one)--(two) timedelta=875 ms Environment is about to shut down the neo4j database Environment shutting down complete 2,971 ms and the exact same thing when using berkeleydb (not using transactions though): just created 1,000,000 rels, took=29,418 ms Path from one to two: true 0 ms Path from one to two: true 0 ms Path from one to two: true 0 ms class org.bdb.BerkEnv$1 shutting down complete 0 ms (most was 10 ms here) and when cold: Path from one to two: true 10 ms Path from one to two: true 0 ms Path from one to two: true 0 ms class org.bdb.BerkEnv$1 shutting down complete 0 ms (the most was 10ms on first(sometimes 0ms, 10ms, 0ms), least was 0ms on all) in bdb, this is actually how bdb does the search (I don't need to make sure I iterate or something similar, on the node `two` which has least incoming rels), in fact I execute a search on both nodes, one after the other and make sure one-two and two-one (since I am using two databases), anyway, by using db.getSearchBoth(...) the search is done *that *fast. I hope to see this kind of fastness with neo4j too xD (unless I'm doing it wrong) if I put neo4j to find shortest path on incoming on `two` then: search started Path from one to two: (two)--(one) timedelta=20 ms search started Path from one to two: (two)--(one) timedelta=0 ms search started Path from one to two: (two)--(one) timedelta=0 ms Environment is about to shut down the neo4j database Environment shutting down complete 2,590 ms (most was 20ms, least was 20ms on first) TestLinkage.java (both progs) were used from here: https://github.com/13th-floor/neo4john/commit/a9f4b274de1d6c9ec9f1ea4a338b5c42325f19a4 -- Here's a lame benchmark for neo4j when I added another node `three` which is the first rel `one`-`three`, then added 1mil `one`-(random nodes) , and then(as before) `one`-`two` `three` has no other relationships Also, creating nodes 10k per transaction until 1mil are reached. first time creating the relationships... just created 1,000,000 rels, took=28,889 ms (btw my cpu is limited to 77% of its normal speed; usually!) closed transaction search started (one)--(three) 3,234 ms search started (one)--(two) 661 ms search started (three)--(one) 10 ms search started (two)--(one) 0 ms search started (one)--(three) 631 ms search started (one)--(two) 913 ms search started (three)--(one) 0 ms search started (two)--(one) 0 ms search started (one)--(three) 610 ms search started
Re: [Neo4j] shortestPath slower than it could be
What I'm saying is that: one -- two(one,two) for OUTGOING and two -- one (two,one) for INCOMING should yield the same timing. 2011/7/24 John cyuczieekc cyuczie...@gmail.com Thanks Mattias. The way I understand what you said is, that swapping `one` and `two` in *finder.findSinglePath( one, two );* should yield the same timing when Direction.BOTH is used; this would be great. But I don't see how this is possible (due to not knowing how it's stored too) unless you know how many rels each node has, OR even better storage is using BTree-s (?!) Btw, is it possible/practical that neo4j could store IDs in BTrees ? (someone said that for rels it's a double linked list instead- I'll need to recheck) irrelevant stuff follows (ie. don't read) Until then , I lame-tested something (with both neo4j and berkeleydb): a node `one` having 1mil rels to 1million unique nodes, and one more to a node `two`, and node `two` having 10 incoming rels from other unique nodes, +1 the one rel which was already from `one`, trying shortestPath between `one` and `two` (not `two` and `one`) with Direction.BOTH output: just created 1,000,000 rels, took=58,381 ms (1) Path from one to two: (one)--(two) 121,310 ms (2) Path from one to two: (one)--(two) timedelta=1,066 ms Path from one to two: (one)--(two) timedelta=795 ms Path from one to two: (one)--(two) timedelta=772 ms (1) and (2) happened in the same transaction (ie. before tx.finish()), the others after transaction finished (an in no new transaction - since they were only reads) - this also means some caching must've happened from before. Because it happened in same transaction, adding 1 mil relationships is slower than adding them in bursts of x, I am aware of this (and I'll try to make a bench with that). For ie. 100k nodes neo4j is actually faster. I even got this once (but in another modified bench): first time creating the relationships... just created 1,000,000 rels, took=49,307 ms (cpu was 100% here, instead of the usual 77% limit) Exception in thread main java.lang.OutOfMemoryError: Java heap space Environment is about to shut down the neo4j database Exception in thread Thread-0 java.lang.OutOfMemoryError: GC overhead limit exceeded at java.util.logging.LogManager.reset(LogManager.java:835) at java.util.logging.LogManager$Cleaner.run(LogManager.java:240) Environment shutting down complete 23,962 ms Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread main this is the output when running cold, (ie. the data already existed): search started Path from one to two: (one)--(two) timedelta=7,934 ms search started Path from one to two: (one)--(two) timedelta=2,027 ms search started Path from one to two: (one)--(two) timedelta=875 ms Environment is about to shut down the neo4j database Environment shutting down complete 2,971 ms and the exact same thing when using berkeleydb (not using transactions though): just created 1,000,000 rels, took=29,418 ms Path from one to two: true 0 ms Path from one to two: true 0 ms Path from one to two: true 0 ms class org.bdb.BerkEnv$1 shutting down complete 0 ms (most was 10 ms here) and when cold: Path from one to two: true 10 ms Path from one to two: true 0 ms Path from one to two: true 0 ms class org.bdb.BerkEnv$1 shutting down complete 0 ms (the most was 10ms on first(sometimes 0ms, 10ms, 0ms), least was 0ms on all) in bdb, this is actually how bdb does the search (I don't need to make sure I iterate or something similar, on the node `two` which has least incoming rels), in fact I execute a search on both nodes, one after the other and make sure one-two and two-one (since I am using two databases), anyway, by using db.getSearchBoth(...) the search is done *that *fast. I hope to see this kind of fastness with neo4j too xD (unless I'm doing it wrong) if I put neo4j to find shortest path on incoming on `two` then: search started Path from one to two: (two)--(one) timedelta=20 ms search started Path from one to two: (two)--(one) timedelta=0 ms search started Path from one to two: (two)--(one) timedelta=0 ms Environment is about to shut down the neo4j database Environment shutting down complete 2,590 ms (most was 20ms, least was 20ms on first) TestLinkage.java (both progs) were used from here: https://github.com/13th-floor/neo4john/commit/a9f4b274de1d6c9ec9f1ea4a338b5c42325f19a4 -- Here's a lame benchmark for neo4j when I added another node `three` which is the first rel `one`-`three`, then added 1mil `one`-(random nodes) , and then(as before) `one`-`two` `three` has no other relationships Also, creating nodes 10k per transaction until 1mil are reached. first time creating the relationships... just created 1,000,000 rels, took=28,889 ms (btw my cpu is limited to 77% of its normal speed; usually!) closed transaction search started (one)--(three) 3,234 ms search started
Re: [Neo4j] shortestPath slower than it could be
but, even when `one` has 1 million outgoing rels and `two` has 10 or 11 incoming rels cool then ;) I was trying to say the same thing ... Have a nice day, John. On Sun, Jul 24, 2011 at 4:10 PM, Mattias Persson matt...@neotechnology.comwrote: What I'm saying is that: one -- two(one,two) for OUTGOING and two -- one (two,one) for INCOMING should yield the same timing. 2011/7/24 John cyuczieekc cyuczie...@gmail.com Thanks Mattias. The way I understand what you said is, that swapping `one` and `two` in *finder.findSinglePath( one, two );* should yield the same timing when Direction.BOTH is used; this would be great. But I don't see how this is possible (due to not knowing how it's stored too) unless you know how many rels each node has, OR even better storage is using BTree-s (?!) Btw, is it possible/practical that neo4j could store IDs in BTrees ? (someone said that for rels it's a double linked list instead- I'll need to recheck) irrelevant stuff follows (ie. don't read) Until then , I lame-tested something (with both neo4j and berkeleydb): a node `one` having 1mil rels to 1million unique nodes, and one more to a node `two`, and node `two` having 10 incoming rels from other unique nodes, +1 the one rel which was already from `one`, trying shortestPath between `one` and `two` (not `two` and `one`) with Direction.BOTH output: just created 1,000,000 rels, took=58,381 ms (1) Path from one to two: (one)--(two) 121,310 ms (2) Path from one to two: (one)--(two) timedelta=1,066 ms Path from one to two: (one)--(two) timedelta=795 ms Path from one to two: (one)--(two) timedelta=772 ms (1) and (2) happened in the same transaction (ie. before tx.finish()), the others after transaction finished (an in no new transaction - since they were only reads) - this also means some caching must've happened from before. Because it happened in same transaction, adding 1 mil relationships is slower than adding them in bursts of x, I am aware of this (and I'll try to make a bench with that). For ie. 100k nodes neo4j is actually faster. I even got this once (but in another modified bench): first time creating the relationships... just created 1,000,000 rels, took=49,307 ms (cpu was 100% here, instead of the usual 77% limit) Exception in thread main java.lang.OutOfMemoryError: Java heap space Environment is about to shut down the neo4j database Exception in thread Thread-0 java.lang.OutOfMemoryError: GC overhead limit exceeded at java.util.logging.LogManager.reset(LogManager.java:835) at java.util.logging.LogManager$Cleaner.run(LogManager.java:240) Environment shutting down complete 23,962 ms Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread main this is the output when running cold, (ie. the data already existed): search started Path from one to two: (one)--(two) timedelta=7,934 ms search started Path from one to two: (one)--(two) timedelta=2,027 ms search started Path from one to two: (one)--(two) timedelta=875 ms Environment is about to shut down the neo4j database Environment shutting down complete 2,971 ms and the exact same thing when using berkeleydb (not using transactions though): just created 1,000,000 rels, took=29,418 ms Path from one to two: true 0 ms Path from one to two: true 0 ms Path from one to two: true 0 ms class org.bdb.BerkEnv$1 shutting down complete 0 ms (most was 10 ms here) and when cold: Path from one to two: true 10 ms Path from one to two: true 0 ms Path from one to two: true 0 ms class org.bdb.BerkEnv$1 shutting down complete 0 ms (the most was 10ms on first(sometimes 0ms, 10ms, 0ms), least was 0ms on all) in bdb, this is actually how bdb does the search (I don't need to make sure I iterate or something similar, on the node `two` which has least incoming rels), in fact I execute a search on both nodes, one after the other and make sure one-two and two-one (since I am using two databases), anyway, by using db.getSearchBoth(...) the search is done *that *fast. I hope to see this kind of fastness with neo4j too xD (unless I'm doing it wrong) if I put neo4j to find shortest path on incoming on `two` then: search started Path from one to two: (two)--(one) timedelta=20 ms search started Path from one to two: (two)--(one) timedelta=0 ms search started Path from one to two: (two)--(one) timedelta=0 ms Environment is about to shut down the neo4j database Environment shutting down complete 2,590 ms (most was 20ms, least was 20ms on first) TestLinkage.java (both progs) were used from here: https://github.com/13th-floor/neo4john/commit/a9f4b274de1d6c9ec9f1ea4a338b5c42325f19a4 -- Here's a lame benchmark for neo4j when I added another node `three` which is the first rel `one`-`three`, then added 1mil `one`-(random nodes) ,
Re: [Neo4j] shortestPath slower than it could be
Are you sure this is true, Mattias?The response time of a getRelationship call depends on the total number of relationships on the node. So it makes a difference which side of the relationship makes the call. It is always faster to ask it from the side that has the lowest total number of relationships attached. This is even true, if for both sides of the relationship there is only one relationship of that particular relationship type.Niels Date: Sun, 24 Jul 2011 16:10:42 +0200 From: matt...@neotechnology.com To: user@lists.neo4j.org Subject: Re: [Neo4j] shortestPath slower than it could be What I'm saying is that: one -- two(one,two) for OUTGOING and two -- one (two,one) for INCOMING should yield the same timing. 2011/7/24 John cyuczieekc cyuczie...@gmail.com Thanks Mattias. The way I understand what you said is, that swapping `one` and `two` in *finder.findSinglePath( one, two );* should yield the same timing when Direction.BOTH is used; this would be great. But I don't see how this is possible (due to not knowing how it's stored too) unless you know how many rels each node has, OR even better storage is using BTree-s (?!) Btw, is it possible/practical that neo4j could store IDs in BTrees ? (someone said that for rels it's a double linked list instead- I'll need to recheck) irrelevant stuff follows (ie. don't read) Until then , I lame-tested something (with both neo4j and berkeleydb): a node `one` having 1mil rels to 1million unique nodes, and one more to a node `two`, and node `two` having 10 incoming rels from other unique nodes, +1 the one rel which was already from `one`, trying shortestPath between `one` and `two` (not `two` and `one`) with Direction.BOTH output: just created 1,000,000 rels, took=58,381 ms (1) Path from one to two: (one)--(two) 121,310 ms (2) Path from one to two: (one)--(two) timedelta=1,066 ms Path from one to two: (one)--(two) timedelta=795 ms Path from one to two: (one)--(two) timedelta=772 ms (1) and (2) happened in the same transaction (ie. before tx.finish()), the others after transaction finished (an in no new transaction - since they were only reads) - this also means some caching must've happened from before. Because it happened in same transaction, adding 1 mil relationships is slower than adding them in bursts of x, I am aware of this (and I'll try to make a bench with that). For ie. 100k nodes neo4j is actually faster. I even got this once (but in another modified bench): first time creating the relationships... just created 1,000,000 rels, took=49,307 ms (cpu was 100% here, instead of the usual 77% limit) Exception in thread main java.lang.OutOfMemoryError: Java heap space Environment is about to shut down the neo4j database Exception in thread Thread-0 java.lang.OutOfMemoryError: GC overhead limit exceeded at java.util.logging.LogManager.reset(LogManager.java:835) at java.util.logging.LogManager$Cleaner.run(LogManager.java:240) Environment shutting down complete 23,962 ms Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread main this is the output when running cold, (ie. the data already existed): search started Path from one to two: (one)--(two) timedelta=7,934 ms search started Path from one to two: (one)--(two) timedelta=2,027 ms search started Path from one to two: (one)--(two) timedelta=875 ms Environment is about to shut down the neo4j database Environment shutting down complete 2,971 ms and the exact same thing when using berkeleydb (not using transactions though): just created 1,000,000 rels, took=29,418 ms Path from one to two: true 0 ms Path from one to two: true 0 ms Path from one to two: true 0 ms class org.bdb.BerkEnv$1 shutting down complete 0 ms (most was 10 ms here) and when cold: Path from one to two: true 10 ms Path from one to two: true 0 ms Path from one to two: true 0 ms class org.bdb.BerkEnv$1 shutting down complete 0 ms (the most was 10ms on first(sometimes 0ms, 10ms, 0ms), least was 0ms on all) in bdb, this is actually how bdb does the search (I don't need to make sure I iterate or something similar, on the node `two` which has least incoming rels), in fact I execute a search on both nodes, one after the other and make sure one-two and two-one (since I am using two databases), anyway, by using db.getSearchBoth(...) the search is done *that *fast. I hope to see this kind of fastness with neo4j too xD (unless I'm doing it wrong) if I put neo4j to find shortest path on incoming on `two` then: search started Path from one to two: (two)--(one) timedelta=20 ms search started Path from one to two: (two)--(one) timedelta=0 ms search started Path from one to two: (two)--(one) timedelta=0 ms Environment is about to shut down the neo4j database Environment shutting down complete 2,590 ms
Re: [Neo4j] shortestPath slower than it could be
It doesn't matter since the algorithm is bi-directional... so: one -- two will start from one OUTGOING and two INCOMING, whereas two -- one will start from two INCOMING and one OUTGOING see, no difference. It alternates side for each relationship. It will, however depend on where the INCOMING/OUTGOING relationships reside in the relationship chain, but sticking to this discussion: those two calls will yield the exact same speed. Though I just discovered a little thingie where findSinglePath didn't stop right away when after finding the first one, but now it does! 2011/7/24 Niels Hoogeveen pd_aficion...@hotmail.com Are you sure this is true, Mattias?The response time of a getRelationship call depends on the total number of relationships on the node. So it makes a difference which side of the relationship makes the call. It is always faster to ask it from the side that has the lowest total number of relationships attached. This is even true, if for both sides of the relationship there is only one relationship of that particular relationship type.Niels Date: Sun, 24 Jul 2011 16:10:42 +0200 From: matt...@neotechnology.com To: user@lists.neo4j.org Subject: Re: [Neo4j] shortestPath slower than it could be What I'm saying is that: one -- two(one,two) for OUTGOING and two -- one (two,one) for INCOMING should yield the same timing. 2011/7/24 John cyuczieekc cyuczie...@gmail.com Thanks Mattias. The way I understand what you said is, that swapping `one` and `two` in *finder.findSinglePath( one, two );* should yield the same timing when Direction.BOTH is used; this would be great. But I don't see how this is possible (due to not knowing how it's stored too) unless you know how many rels each node has, OR even better storage is using BTree-s (?!) Btw, is it possible/practical that neo4j could store IDs in BTrees ? (someone said that for rels it's a double linked list instead- I'll need to recheck) irrelevant stuff follows (ie. don't read) Until then , I lame-tested something (with both neo4j and berkeleydb): a node `one` having 1mil rels to 1million unique nodes, and one more to a node `two`, and node `two` having 10 incoming rels from other unique nodes, +1 the one rel which was already from `one`, trying shortestPath between `one` and `two` (not `two` and `one`) with Direction.BOTH output: just created 1,000,000 rels, took=58,381 ms (1) Path from one to two: (one)--(two) 121,310 ms (2) Path from one to two: (one)--(two) timedelta=1,066 ms Path from one to two: (one)--(two) timedelta=795 ms Path from one to two: (one)--(two) timedelta=772 ms (1) and (2) happened in the same transaction (ie. before tx.finish()), the others after transaction finished (an in no new transaction - since they were only reads) - this also means some caching must've happened from before. Because it happened in same transaction, adding 1 mil relationships is slower than adding them in bursts of x, I am aware of this (and I'll try to make a bench with that). For ie. 100k nodes neo4j is actually faster. I even got this once (but in another modified bench): first time creating the relationships... just created 1,000,000 rels, took=49,307 ms (cpu was 100% here, instead of the usual 77% limit) Exception in thread main java.lang.OutOfMemoryError: Java heap space Environment is about to shut down the neo4j database Exception in thread Thread-0 java.lang.OutOfMemoryError: GC overhead limit exceeded at java.util.logging.LogManager.reset(LogManager.java:835) at java.util.logging.LogManager$Cleaner.run(LogManager.java:240) Environment shutting down complete 23,962 ms Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread main this is the output when running cold, (ie. the data already existed): search started Path from one to two: (one)--(two) timedelta=7,934 ms search started Path from one to two: (one)--(two) timedelta=2,027 ms search started Path from one to two: (one)--(two) timedelta=875 ms Environment is about to shut down the neo4j database Environment shutting down complete 2,971 ms and the exact same thing when using berkeleydb (not using transactions though): just created 1,000,000 rels, took=29,418 ms Path from one to two: true 0 ms Path from one to two: true 0 ms Path from one to two: true 0 ms class org.bdb.BerkEnv$1 shutting down complete 0 ms (most was 10 ms here) and when cold: Path from one to two: true 10 ms Path from one to two: true 0 ms Path from one to two: true 0 ms class org.bdb.BerkEnv$1 shutting down complete 0 ms (the most was 10ms on first(sometimes 0ms, 10ms, 0ms), least was 0ms on all) in bdb, this is actually how bdb does the search (I don't need to make sure I
Re: [Neo4j] shortestPath slower than it could be
updated to latest from github, Stops the algo as soon as possible in findSinglePath graphdb contains: one--three one--{ a million other random nodes } one--two { 10 random nodes } -- two (added in that order, except that `one--two` is somewhere between those 10 random nodes) output: with Direction.BOTH: (one)--(three) 7,660 ms (one)--(two) 0 ms (three)--(one) 0 ms (two)--(one) 0 ms (one)--(three) 2,195 ms (one)--(two) 0 ms *(three)--(one) 0 ms* (two)--(one) 0 ms *(one)--(three) 875 ms* (one)--(two) 0 ms (three)--(one) 0 ms (two)--(one) 0 ms (one)--(three) 630 ms (one)--(two) 0 ms (three)--(one) 0 ms (two)--(one) 0 ms (one)--(three) 723 ms (one)--(two) 0 ms (three)--(one) 0 ms (two)--(one) 0 ms with Direction.INCOMING: (three)--(one) 0 ms (two)--(one) 0 ms (three)--(one) 0 ms (two)--(one) 0 ms (three)--(one) 0 ms (two)--(one) 0 ms (three)--(one) 0 ms (two)--(one) 0 ms (three)--(one) 0 ms (two)--(one) 0 ms with Direction.OUTGOING: (one)--(three) 802 ms (one)--(two) 0 ms with Direction.OUTGOING: (one)--(three) 631 ms (one)--(two) 0 ms with Direction.OUTGOING: (one)--(three) 732 ms (one)--(two) 0 ms with Direction.OUTGOING: (one)--(three) 873 ms (one)--(two) 0 ms with Direction.OUTGOING: (one)--(three) 630 ms Environment is about to shut down the neo4j database (one)--(two) 0 ms Environment shutting down complete 3,171 ms For some reason, one--three takes more than 0ms, even though three has only 1 incoming rel, and it's the first rel of `one` that outgoes to `three` (for whoever cares)the test program used is in this commit here: https://github.com/13th-floor/neo4john/commit/b819a0d418d953a675aaa74749f284b88e4f47ee I tried to revert that commit, and for some reason I'm getting the same results [one--three is the only one over 0ms] (maybe I failed in reverting it, though the 2 sources seem reverted) Peace :) On Sun, Jul 24, 2011 at 4:49 PM, Mattias Persson matt...@neotechnology.comwrote: It doesn't matter since the algorithm is bi-directional... so: one -- two will start from one OUTGOING and two INCOMING, whereas two -- one will start from two INCOMING and one OUTGOING see, no difference. It alternates side for each relationship. It will, however depend on where the INCOMING/OUTGOING relationships reside in the relationship chain, but sticking to this discussion: those two calls will yield the exact same speed. Though I just discovered a little thingie where findSinglePath didn't stop right away when after finding the first one, but now it does! 2011/7/24 Niels Hoogeveen pd_aficion...@hotmail.com Are you sure this is true, Mattias?The response time of a getRelationship call depends on the total number of relationships on the node. So it makes a difference which side of the relationship makes the call. It is always faster to ask it from the side that has the lowest total number of relationships attached. This is even true, if for both sides of the relationship there is only one relationship of that particular relationship type.Niels Date: Sun, 24 Jul 2011 16:10:42 +0200 From: matt...@neotechnology.com To: user@lists.neo4j.org Subject: Re: [Neo4j] shortestPath slower than it could be What I'm saying is that: one -- two(one,two) for OUTGOING and two -- one (two,one) for INCOMING should yield the same timing. 2011/7/24 John cyuczieekc cyuczie...@gmail.com Thanks Mattias. The way I understand what you said is, that swapping `one` and `two` in *finder.findSinglePath( one, two );* should yield the same timing when Direction.BOTH is used; this would be great. But I don't see how this is possible (due to not knowing how it's stored too) unless you know how many rels each node has, OR even better storage is using BTree-s (?!) Btw, is it possible/practical that neo4j could store IDs in BTrees ? (someone said that for rels it's a double linked list instead- I'll need to recheck) irrelevant stuff follows (ie. don't read) Until then , I lame-tested something (with both neo4j and berkeleydb): a node `one` having 1mil rels to 1million unique nodes, and one more to a node `two`, and node `two` having 10 incoming rels from other unique nodes, +1 the one rel which was already from `one`, trying shortestPath between `one` and `two` (not `two` and `one`) with Direction.BOTH output: just created 1,000,000 rels, took=58,381 ms (1) Path from one to two: (one)--(two) 121,310 ms (2) Path from one to two: (one)--(two) timedelta=1,066 ms Path from one to two: (one)--(two) timedelta=795 ms Path from one to two: (one)--(two) timedelta=772 ms (1) and (2) happened in the same transaction (ie. before tx.finish()), the others after transaction finished (an in no new transaction - since they were only reads) - this also means some caching must've happened from before. Because
Re: [Neo4j] shortestPath slower than it could be
a difference which side of the relationship makes the call. It is always faster to ask it from the side that has the lowest total number of relationships attached. This is even true, if for both sides of the relationship there is only one relationship of that particular relationship type.Niels Date: Sun, 24 Jul 2011 16:10:42 +0200 From: matt...@neotechnology.com To: user@lists.neo4j.org Subject: Re: [Neo4j] shortestPath slower than it could be What I'm saying is that: one -- two(one,two) for OUTGOING and two -- one (two,one) for INCOMING should yield the same timing. 2011/7/24 John cyuczieekc cyuczie...@gmail.com Thanks Mattias. The way I understand what you said is, that swapping `one` and `two` in *finder.findSinglePath( one, two );* should yield the same timing when Direction.BOTH is used; this would be great. But I don't see how this is possible (due to not knowing how it's stored too) unless you know how many rels each node has, OR even better storage is using BTree-s (?!) Btw, is it possible/practical that neo4j could store IDs in BTrees ? (someone said that for rels it's a double linked list instead- I'll need to recheck) irrelevant stuff follows (ie. don't read) Until then , I lame-tested something (with both neo4j and berkeleydb): a node `one` having 1mil rels to 1million unique nodes, and one more to a node `two`, and node `two` having 10 incoming rels from other unique nodes, +1 the one rel which was already from `one`, trying shortestPath between `one` and `two` (not `two` and `one`) with Direction.BOTH output: just created 1,000,000 rels, took=58,381 ms (1) Path from one to two: (one)--(two) 121,310 ms (2) Path from one to two: (one)--(two) timedelta=1,066 ms Path from one to two: (one)--(two) timedelta=795 ms Path from one to two: (one)--(two) timedelta=772 ms (1) and (2) happened in the same transaction (ie. before tx.finish()), the others after transaction finished (an in no new transaction - since they were only reads) - this also means some caching must've happened from before. Because it happened in same transaction, adding 1 mil relationships is slower than adding them in bursts of x, I am aware of this (and I'll try to make a bench with that). For ie. 100k nodes neo4j is actually faster. I even got this once (but in another modified bench): first time creating the relationships... just created 1,000,000 rels, took=49,307 ms (cpu was 100% here, instead of the usual 77% limit) Exception in thread main java.lang.OutOfMemoryError: Java heap space Environment is about to shut down the neo4j database Exception in thread Thread-0 java.lang.OutOfMemoryError: GC overhead limit exceeded at java.util.logging.LogManager.reset(LogManager.java:835) at java.util.logging.LogManager$Cleaner.run(LogManager.java:240) Environment shutting down complete 23,962 ms Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread main this is the output when running cold, (ie. the data already existed): search started Path from one to two: (one)--(two) timedelta=7,934 ms search started Path from one to two: (one)--(two) timedelta=2,027 ms search started Path from one to two: (one)--(two) timedelta=875 ms Environment is about to shut down the neo4j database Environment shutting down complete 2,971 ms and the exact same thing when using berkeleydb (not using transactions though): just created 1,000,000 rels, took=29,418 ms Path from one to two: true 0 ms Path from one to two: true 0 ms Path from one to two: true 0 ms class org.bdb.BerkEnv$1 shutting down complete 0 ms (most was 10 ms here) and when cold: Path from one to two: true 10 ms Path from one to two: true 0 ms Path from one to two: true 0 ms class org.bdb.BerkEnv$1 shutting down complete 0 ms (the most was 10ms on first(sometimes 0ms, 10ms, 0ms), least was 0ms on all) in bdb, this is actually how bdb does the search (I don't need to make sure I iterate or something similar, on the node `two` which has least incoming rels), in fact I execute a search on both nodes, one after the other and make sure one-two and two-one (since I am using two databases), anyway, by using db.getSearchBoth(...) the search is done *that *fast. I hope to see this kind of fastness with neo4j too xD (unless I'm doing it wrong) if I put neo4j to find shortest path on incoming on `two` then: search started Path from one to two: (two)--(one) timedelta=20 ms search started Path from one to two: (two)--(one) timedelta=0 ms search started Path from one to two: (two)--(one) timedelta=0
Re: [Neo4j] shortestPath slower than it could be
Environment is about to shut down the neo4j database (one)--(two) 0 ms Environment shutting down complete 2,711 ms So yeah, one--two got fixed, what say you about one--three though ? :D *whistle* :- On Sun, Jul 24, 2011 at 5:28 PM, John cyuczieekc cyuczie...@gmail.comwrote: updated to latest from github, Stops the algo as soon as possible in findSinglePath graphdb contains: one--three one--{ a million other random nodes } one--two { 10 random nodes } -- two (added in that order, except that `one--two` is somewhere between those 10 random nodes) output: with Direction.BOTH: (one)--(three) 7,660 ms (one)--(two) 0 ms (three)--(one) 0 ms (two)--(one) 0 ms (one)--(three) 2,195 ms (one)--(two) 0 ms *(three)--(one) 0 ms* (two)--(one) 0 ms *(one)--(three) 875 ms* (one)--(two) 0 ms (three)--(one) 0 ms (two)--(one) 0 ms (one)--(three) 630 ms (one)--(two) 0 ms (three)--(one) 0 ms (two)--(one) 0 ms (one)--(three) 723 ms (one)--(two) 0 ms (three)--(one) 0 ms (two)--(one) 0 ms with Direction.INCOMING: (three)--(one) 0 ms (two)--(one) 0 ms (three)--(one) 0 ms (two)--(one) 0 ms (three)--(one) 0 ms (two)--(one) 0 ms (three)--(one) 0 ms (two)--(one) 0 ms (three)--(one) 0 ms (two)--(one) 0 ms with Direction.OUTGOING: (one)--(three) 802 ms (one)--(two) 0 ms with Direction.OUTGOING: (one)--(three) 631 ms (one)--(two) 0 ms with Direction.OUTGOING: (one)--(three) 732 ms (one)--(two) 0 ms with Direction.OUTGOING: (one)--(three) 873 ms (one)--(two) 0 ms with Direction.OUTGOING: (one)--(three) 630 ms Environment is about to shut down the neo4j database (one)--(two) 0 ms Environment shutting down complete 3,171 ms For some reason, one--three takes more than 0ms, even though three has only 1 incoming rel, and it's the first rel of `one` that outgoes to `three` (for whoever cares)the test program used is in this commit here: https://github.com/13th-floor/neo4john/commit/b819a0d418d953a675aaa74749f284b88e4f47ee I tried to revert that commit, and for some reason I'm getting the same results [one--three is the only one over 0ms] (maybe I failed in reverting it, though the 2 sources seem reverted) Peace :) On Sun, Jul 24, 2011 at 4:49 PM, Mattias Persson matt...@neotechnology.com wrote: It doesn't matter since the algorithm is bi-directional... so: one -- two will start from one OUTGOING and two INCOMING, whereas two -- one will start from two INCOMING and one OUTGOING see, no difference. It alternates side for each relationship. It will, however depend on where the INCOMING/OUTGOING relationships reside in the relationship chain, but sticking to this discussion: those two calls will yield the exact same speed. Though I just discovered a little thingie where findSinglePath didn't stop right away when after finding the first one, but now it does! 2011/7/24 Niels Hoogeveen pd_aficion...@hotmail.com Are you sure this is true, Mattias?The response time of a getRelationship call depends on the total number of relationships on the node. So it makes a difference which side of the relationship makes the call. It is always faster to ask it from the side that has the lowest total number of relationships attached. This is even true, if for both sides of the relationship there is only one relationship of that particular relationship type.Niels Date: Sun, 24 Jul 2011 16:10:42 +0200 From: matt...@neotechnology.com To: user@lists.neo4j.org Subject: Re: [Neo4j] shortestPath slower than it could be What I'm saying is that: one -- two(one,two) for OUTGOING and two -- one (two,one) for INCOMING should yield the same timing. 2011/7/24 John cyuczieekc cyuczie...@gmail.com Thanks Mattias. The way I understand what you said is, that swapping `one` and `two` in *finder.findSinglePath( one, two );* should yield the same timing when Direction.BOTH is used; this would be great. But I don't see how this is possible (due to not knowing how it's stored too) unless you know how many rels each node has, OR even better storage is using BTree-s (?!) Btw, is it possible/practical that neo4j could store IDs in BTrees ? (someone said that for rels it's a double linked list instead- I'll need to recheck) irrelevant stuff follows (ie. don't read) Until then , I lame-tested something (with both neo4j and berkeleydb): a node `one` having 1mil rels to 1million unique nodes, and one more to a node `two`, and node `two` having 10 incoming rels from other unique nodes, +1 the one rel which was already from `one`, trying shortestPath between `one` and `two` (not `two` and `one`) with Direction.BOTH output: just created 1,000,000 rels, took=58,381 ms (1) Path from one to two: (one)--(two) 121,310 ms (2) Path from one to two: (one)--(two) timedelta=1,066 ms Path from one
Re: [Neo4j] shortestPath slower than it could be
Hail :) Beware, lot of noise follows (ie. webnoise kinda noise), while I thank anyone in advance for reading, I must also apologize for failing to be concise :- On Sat, Jul 23, 2011 at 10:42 AM, Michael Hunger michael.hun...@neotechnology.com wrote: John, no problem :) You pointed out both problems: - cold caches - lots of rels on the one side There are some performance implications of loading millions of rels from disk. We'll be working on improving those in 1.5. The easiest way to solve that is to switch start and end-node which you already did. It is much easier for you _with domain knowledge_ about the graph than for the algorithm. Yeah I would probably apply that manually, that is: call a particular method when I know which one of the nodes has least rels. (as a ' `temporary` or `acceptable so far` workaround ' [u see the graph here?! xD]* I will for now use findSinglePath as it is*, simply because I don't want the overhead of using a particular method for some use case scenario and the generic one for others - besides it's fast enough, for now) My problem here is that I might not know (in the future) which node (first or second) will have the least nodes. So I was hoping to create a generic method for such find. For now, I could say, I can know. For example: AllLists-A AllLists-B AllLists-C where A,B,C are considered nodes representing lists, in this case, I could maybe say that while AllLists can have lots of outgoing rels (aka lots of lists), list A may not have that many nodes pointing to it (aka list is not used in as many places as there are lists), so I could consider A as being node with least nodes. Though I would want a generic method that would always know which node has least nodes and parse that one, especially in my 1 hop case. I am, as seen here, basically tagging the nodes A,B,C with AllLists, by having AllLists point to them, and this way, I know that node A is of AllLists type (so to speak). This was my old way of thinking while using bdb je, I might need to revise it since then I was completely ignoring relationships and not considering them as a single and accessible object. (I am currently trying to port that old way of thinking to neo4j) In this old way, I was using the node AllLists pointing/outgoing to A as being (in neo4j) a relationship of the type AllLists incoming to A. Especially if you have 1 hop paths. In my particular use case, I will probably not (yet!) have cases with 1 hop. (I can't really visualize that far in the future xD ) but for sure I will have lots of ==1 hops cases where I just want to see if thisnode is tagged with thatnode by checking thatnode--thisnode There might be ways in improving the algorithm, e.g. by iterating both sides at the same time, which would lead to the end with the fewer relationships being exhausted and resolved first. for now, I totally avoided getting into the findSinglePath 's source and considering doing changes there or even seeing how it works:) But that algo sounds like something I'd do, in like 2 (reusable) threads even xD (I remember having considered this in the past, when using bdb je but I opted to parse the node with least rels instead) How large is your graph at all? And how is it structured, e.g. how many rels do the other 9 nodes have that your second node points to? I was merely trying this as an example(but this should be a practical use case in my project), the thing is, the other 9 nodes can have any number or rels, in fact *any node could have any number of rels incoming and outgoing from it*,... (but in this case, the 9 nodes each had a new unused node point(outgoing) to them; I'd post the source but i'm already using wrappers and the whole example would be more than 1 java file, but I can link to it: https://github.com/13th-floor/neo4john/blob/52e2d75a39a28200b34032aaf45a5a09c1e1b22c/src/org/benchtests/neo4j/TestLinkage.java that's the main file to run, tho you don't have to and you prolly don't have the time; just putting it here for consistency/historical-reasons I guess xD ) In my project, I am trying something generic... something like building sets and lists of nodes based on graphs. Imagine having no metadata key-value properties, and instead *trying to use only nodes to represent *those properties (well actually I do have one key aka name so i could reference the same node id across application restarts). What are both times in a second or third run (cached) ? Path from one to two: (one)--(two) timedelta=2,151,018,862 ns Path from one to two: (one)--(two) timedelta=131,458,292 ns Path from one to two: (one)--(two) timedelta=119,015,019 ns another run: Path from one to two: (one)--(two) timedelta=1,824,099,688 ns Path from one to two: (one)--(two) timedelta=275,772,087 ns Path from one to two: (one)--(two) timedelta=352,562,121 ns and when swapped: Path from one to two: (two)--(one) timedelta=20,601,181 ns Path from one to two: (two)--(one) timedelta=284,212
Re: [Neo4j] shortestPath slower than it could be
Perhaps you can catch up with Niels Hoogeven who's currently implementing Collection data structures, B- and R-Trees to be transparently mapped into graph structures at: https://github.com/peterneubauer/graph-collections Cheers Michael Perhaps you should describe the intent / idea behind your project. Dropping too early into implementation details might remove some possibilities that are able with graph databases by loosing the big picture. And I wouldn't care about performance right now (btw. System.currentTimeMillis() is mostly good enough for measurement). But rather on modeling your domain to the graph and specify which concepts and operations/use-cases you'd like to support. You can optimize afterwards anyway. Am 23.07.2011 um 14:29 schrieb John cyuczieekc: Hail :) Beware, lot of noise follows (ie. webnoise kinda noise), while I thank anyone in advance for reading, I must also apologize for failing to be concise :- On Sat, Jul 23, 2011 at 10:42 AM, Michael Hunger michael.hun...@neotechnology.com wrote: John, no problem :) You pointed out both problems: - cold caches - lots of rels on the one side There are some performance implications of loading millions of rels from disk. We'll be working on improving those in 1.5. The easiest way to solve that is to switch start and end-node which you already did. It is much easier for you _with domain knowledge_ about the graph than for the algorithm. Yeah I would probably apply that manually, that is: call a particular method when I know which one of the nodes has least rels. (as a ' `temporary` or `acceptable so far` workaround ' [u see the graph here?! xD]* I will for now use findSinglePath as it is*, simply because I don't want the overhead of using a particular method for some use case scenario and the generic one for others - besides it's fast enough, for now) My problem here is that I might not know (in the future) which node (first or second) will have the least nodes. So I was hoping to create a generic method for such find. For now, I could say, I can know. For example: AllLists-A AllLists-B AllLists-C where A,B,C are considered nodes representing lists, in this case, I could maybe say that while AllLists can have lots of outgoing rels (aka lots of lists), list A may not have that many nodes pointing to it (aka list is not used in as many places as there are lists), so I could consider A as being node with least nodes. Though I would want a generic method that would always know which node has least nodes and parse that one, especially in my 1 hop case. I am, as seen here, basically tagging the nodes A,B,C with AllLists, by having AllLists point to them, and this way, I know that node A is of AllLists type (so to speak). This was my old way of thinking while using bdb je, I might need to revise it since then I was completely ignoring relationships and not considering them as a single and accessible object. (I am currently trying to port that old way of thinking to neo4j) In this old way, I was using the node AllLists pointing/outgoing to A as being (in neo4j) a relationship of the type AllLists incoming to A. Especially if you have 1 hop paths. In my particular use case, I will probably not (yet!) have cases with 1 hop. (I can't really visualize that far in the future xD ) but for sure I will have lots of ==1 hops cases where I just want to see if thisnode is tagged with thatnode by checking thatnode--thisnode There might be ways in improving the algorithm, e.g. by iterating both sides at the same time, which would lead to the end with the fewer relationships being exhausted and resolved first. for now, I totally avoided getting into the findSinglePath 's source and considering doing changes there or even seeing how it works:) But that algo sounds like something I'd do, in like 2 (reusable) threads even xD (I remember having considered this in the past, when using bdb je but I opted to parse the node with least rels instead) How large is your graph at all? And how is it structured, e.g. how many rels do the other 9 nodes have that your second node points to? I was merely trying this as an example(but this should be a practical use case in my project), the thing is, the other 9 nodes can have any number or rels, in fact *any node could have any number of rels incoming and outgoing from it*,... (but in this case, the 9 nodes each had a new unused node point(outgoing) to them; I'd post the source but i'm already using wrappers and the whole example would be more than 1 java file, but I can link to it: https://github.com/13th-floor/neo4john/blob/52e2d75a39a28200b34032aaf45a5a09c1e1b22c/src/org/benchtests/neo4j/TestLinkage.java that's the main file to run, tho you don't have to and you prolly don't have the time; just putting it here for consistency/historical-reasons I guess xD ) In my project, I am trying something generic... something
Re: [Neo4j] shortestPath slower than it could be
I would gladly try to catch up with Niels but from what I've read (from his prev. emails) I hardly understand what he was talking about (implying I am no way near as smart to understand that; which might mean I am simply missing the background/unstated-assumptions or focusing fail :)) [also I don't know if we're doing similar things? probably because I don't yet understand what he's doing, or I don't fully understand which causes me to drop even what I did understand, flagging it as `didn't understand anything` xD] I would describe the idea behind my project, but I'm not sure that I can envision that far (due to failure to keep my focus perhaps), hence I was hoping I could work my way bottom-up implementing (though this would never really work). And you're right about performance, I shouldn't care, but I lose the sight sometimes and get stuck into small details like that, effectively preventing me to continue. Anyway, enough about what is, let's change that heh. I really appreciate what you said btw and I find it helpful. Ok so idea would be something like, building systems on top of other systems with the main goal being accessibility. It's kind of generic really. I don't want something very specific. I just want to be able to access information and have the computer also access this information in an AI-friendly kinda way. That is, if I don't know anything, but I know how to navigate links (supposedly most basic task), then I(as an AI or intelligence) would be able to parse and understand how this (part of the) system works, and even change it (evolve it, ie. multiversioning? / git-like). Specific examples, let's see, when done, I should be able to do programming or storing random thoughts/phrases have them linked to the timestamp of their creation even, create a small random monitor (think 2D screen in 3D) and have it show something and starting from any pixel on it, I would be able to track things down and see/understand how that pixel ended up on that particular spot. So basically a right way of organizing and accessing anything, be it offline data or live programs, such that an intelligence (human or otherwise AI/computer) would have no trouble understanding/learning how it works). I think the key to building such a thing is simply links/relationships. I just know this can be done (on top of this lowest level:) by simply creating Nodes (which are basically random unique IDs) and having nothing but those nodes and the relationships between them (no real metadata would really be needed), just the ability to know that this node X is different than this node Y (which is done via their IDs really; but if the java level requires accessing the same node ID then it would require a layer for itself only to have a name associated with that ID, so it can access it by name, across app restarts), and knowing if X points to Y or Y is pointed-to by X. (btw, I don't really know or use BTrees but I was hoping them being use at the lower than Node/Relationship level, would benefit fast searches ie. is X-Y or find x such that {A-x-B; F-C-x-K; P-x}; but at the high level I am referring with Nodes/Relationships I don't see the need for a BTree altho it could be implemented on top of whatever exists; don't know about Rtree though, I am wikipedia-ing it right now, looks yummy xD) Anyway, the point is, like in a graph I guess, start from any point and be able to parse/understand the entire system (except if some grouped nodes are simply disconnected from the grouped ones I started in, you'd say; but really there would/should probably be no such nodes, at least the reference node would be pointing to them, but most likely they would be grouped together by a parent node pointing to each of them) So basically, I lost it :) There will be issues that need to be tackled, like not using locks, using multiversioning instead (or something)... I don't even know anymore - I know sometimes it seemed impossible even theoretically but sometimes it seemed those impossibilities vanished. What more should I say? (really, 'cause I don't know) I just want a system you know, where stuff can be accessed to the very core, to all level depths, such that nothing would really be inaccessible, so that I wouldn't need to go search for how something is made by looking on a 3rd party site or doc file just because there's no tool/way to allow me the kind of accessibility I know I want, sort of like going around it and hoping the doc/src is in sync with the binary I'm just running (just an example), but rather bin+src+doc would be one, and even allow me to change them on the fly with changes propagating to all dependencies... erm, yeah feel free to stop me from blabbing xD Peace off :) On Sat, Jul 23, 2011 at 3:16 PM, Michael Hunger michael.hun...@neotechnology.com wrote: Perhaps you can catch up with Niels Hoogeven who's currently implementing Collection data structures, B- and R-Trees to be transparently mapped into graph structures at:
Re: [Neo4j] shortestPath slower than it could be
Hi John, the algorithm is written to dodge these kinds of pitfalls. Maybe there's some issue with the implementation, but in principal it should make no difference. I'll look at it when I get the time (I wrote that implementation). 2011/7/23 John cyuczieekc cyuczie...@gmail.com Hey guys, me bugging you again :) (This whole thing is kind of based on the lack of being able to get the number of relationships a node has) If I have two nodes, and the first one has 1 million outgoing relationships of the type X to 1 million unique/different nodes, and the second node has 10 incoming relationships of type X (same type) of which one is from the first node, then using GraphAlgoFactory.shortestPath (or suggest a better way?) How can I tell neo4j to iterate the search on the second node's incoming rels simply because it has 10 relationships instead of 1 million, in order to check if each relationship is in the form of firstNode--secondNode ? For the case when first node has 100,000 relationships and second node has 10, it takes *1.7 seconds* for shortestPath to find the only one link between them using: final PathFinderPath finder = GraphAlgoFactory.shortestPath( Traversal.expanderForTypes( rel, Direction.OUTGOING ), 1 ); final Path foundPath = finder.findSinglePath( *one, two* ); I can put Direction.*BOTH *and get the same amount of time *Path from one to two: (one)--(two) timedelta=1,862,726,634 ns* *BUT*, get this: if I swap the nodes: finder.findSinglePath(* two, one*); and i use either Direction.INCOMING or Direction.*BOTH *(which makes sense for the second node ,right) then I get *20ms* the time until it finishes... *Path from one to two: (two)--(one) timedelta=20,830,111 ns* (both cases are without data being priorly cached) I was expecting it to act like this: (but only when using Direction.BOTH) see which node has the least number of relationships and iterate on those, but this would work if findSinglePath would be made for depth 1 (aka particular case), but as I read Tries to find a single path between startand end nodes. then it makes sense to me why it works like it does... that is, iterate on relationships from start node, rather than from end node... but I'm not sure if it would *not *make sense to iterate on the end node instead of start node, when knowing that end node has less relationships, for make the search faster (well at least if depth is one) - I didn't look into how neo4j actually does stuff yet :D anyway, it's fairly clear to me that I could make a simple wrapper method to make this kind of search faster, *IF* I had the ability to know how many relationships each node has, so I can call findSinglePath with the first param being the node with the least relationship count :) But as I understood it, it's not possible to find how many rels a node has... gimme feat! :)) [by not possible I mean, without having to iterate thru all and count them, which would make the use case here obsolete] PS: clearly all the text I wrote here would benefit from being represented by a graph, just think about all those grouping with autohiding the ie. [] and all kinds of stuff... heh ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Mattias Persson, [matt...@neotechnology.com] Hacker, Neo Technology www.neotechnology.com ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
[Neo4j] shortestPath slower than it could be
Hey guys, me bugging you again :) (This whole thing is kind of based on the lack of being able to get the number of relationships a node has) If I have two nodes, and the first one has 1 million outgoing relationships of the type X to 1 million unique/different nodes, and the second node has 10 incoming relationships of type X (same type) of which one is from the first node, then using GraphAlgoFactory.shortestPath (or suggest a better way?) How can I tell neo4j to iterate the search on the second node's incoming rels simply because it has 10 relationships instead of 1 million, in order to check if each relationship is in the form of firstNode--secondNode ? For the case when first node has 100,000 relationships and second node has 10, it takes *1.7 seconds* for shortestPath to find the only one link between them using: final PathFinderPath finder = GraphAlgoFactory.shortestPath( Traversal.expanderForTypes( rel, Direction.OUTGOING ), 1 ); final Path foundPath = finder.findSinglePath( *one, two* ); I can put Direction.*BOTH *and get the same amount of time *Path from one to two: (one)--(two) timedelta=1,862,726,634 ns* *BUT*, get this: if I swap the nodes: finder.findSinglePath(* two, one*); and i use either Direction.INCOMING or Direction.*BOTH *(which makes sense for the second node ,right) then I get *20ms* the time until it finishes... *Path from one to two: (two)--(one) timedelta=20,830,111 ns* (both cases are without data being priorly cached) I was expecting it to act like this: (but only when using Direction.BOTH) see which node has the least number of relationships and iterate on those, but this would work if findSinglePath would be made for depth 1 (aka particular case), but as I read Tries to find a single path between startand end nodes. then it makes sense to me why it works like it does... that is, iterate on relationships from start node, rather than from end node... but I'm not sure if it would *not *make sense to iterate on the end node instead of start node, when knowing that end node has less relationships, for make the search faster (well at least if depth is one) - I didn't look into how neo4j actually does stuff yet :D anyway, it's fairly clear to me that I could make a simple wrapper method to make this kind of search faster, *IF* I had the ability to know how many relationships each node has, so I can call findSinglePath with the first param being the node with the least relationship count :) But as I understood it, it's not possible to find how many rels a node has... gimme feat! :)) [by not possible I mean, without having to iterate thru all and count them, which would make the use case here obsolete] PS: clearly all the text I wrote here would benefit from being represented by a graph, just think about all those grouping with autohiding the ie. [] and all kinds of stuff... heh ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user