Re: [Neo4j] shortestPath slower than it could be

2011-07-24 Thread John cyuczieekc
Thanks Mattias. The way I understand what you said is, that swapping `one`
and `two` in *finder.findSinglePath( one, two );*  should yield the same
timing when Direction.BOTH is used; this would be great. But I don't see how
this is possible (due to not knowing how it's stored too) unless you know
how many rels each node has, OR even better storage is using BTree-s (?!)
Btw, is it possible/practical that neo4j could store IDs in BTrees ?
(someone said that for rels it's a double linked list instead- I'll need to
recheck)

irrelevant stuff follows (ie. don't read)

Until then , I lame-tested something (with both neo4j and berkeleydb):
a node `one` having 1mil rels to 1million unique nodes, and one more to a
node `two`, and node `two` having 10 incoming rels from other unique nodes,
+1 the one rel which was already from `one`, trying shortestPath between
`one` and `two` (not `two` and `one`) with Direction.BOTH
output:
just created 1,000,000 rels, took=58,381 ms  (1)
Path from one to two: (one)--(two) 121,310 ms (2)
Path from one to two: (one)--(two) timedelta=1,066 ms
Path from one to two: (one)--(two) timedelta=795 ms
Path from one to two: (one)--(two) timedelta=772 ms

(1) and (2) happened in the same transaction (ie. before tx.finish()), the
others after transaction finished (an in no new transaction - since they
were only reads) - this also means some caching must've happened from
before.
Because it happened in same transaction, adding 1 mil relationships is
slower than adding them in bursts of x, I am aware of this (and I'll try to
make a bench with that). For ie. 100k nodes neo4j is actually faster.
I even got this once (but in another modified bench):
first time creating the relationships...
just created 1,000,000 rels, took=49,307 ms (cpu was 100% here, instead of
the usual 77% limit)
Exception in thread main java.lang.OutOfMemoryError: Java heap space
Environment is about to shut down the neo4j database
Exception in thread Thread-0 java.lang.OutOfMemoryError: GC overhead limit
exceeded
at java.util.logging.LogManager.reset(LogManager.java:835)
at java.util.logging.LogManager$Cleaner.run(LogManager.java:240)
Environment shutting down complete 23,962 ms

Exception: java.lang.OutOfMemoryError thrown from the
UncaughtExceptionHandler in thread main


this is the output when running cold, (ie. the data already existed):
search started
Path from one to two: (one)--(two) timedelta=7,934 ms
search started
Path from one to two: (one)--(two) timedelta=2,027 ms
search started
Path from one to two: (one)--(two) timedelta=875 ms
Environment is about to shut down the neo4j database
Environment shutting down complete 2,971 ms

and the exact same thing when using berkeleydb (not using transactions
though):
just created 1,000,000 rels, took=29,418 ms
Path from one to two: true 0 ms
Path from one to two: true 0 ms
Path from one to two: true 0 ms
class org.bdb.BerkEnv$1 shutting down complete 0 ms  (most was 10 ms here)

and when cold:
Path from one to two: true 10 ms
Path from one to two: true 0 ms
Path from one to two: true 0 ms
class org.bdb.BerkEnv$1 shutting down complete 0 ms
(the most was 10ms on first(sometimes 0ms, 10ms, 0ms), least was 0ms on all)

in bdb, this is actually how bdb does the search (I don't need to make sure
I iterate or something similar, on the node `two` which has least incoming
rels), in fact I execute a search on both nodes, one after the other and
make sure one-two and two-one (since I am using two databases), anyway, by
using db.getSearchBoth(...) the search is done *that *fast. I hope to see
this kind of fastness with neo4j too xD (unless I'm doing it wrong)

if I put neo4j to find shortest path on incoming on `two` then:
search started
Path from one to two: (two)--(one) timedelta=20 ms
search started
Path from one to two: (two)--(one) timedelta=0 ms
search started
Path from one to two: (two)--(one) timedelta=0 ms
Environment is about to shut down the neo4j database
Environment shutting down complete 2,590 ms
(most was 20ms, least was 20ms on first)

TestLinkage.java (both progs) were used from here:
https://github.com/13th-floor/neo4john/commit/a9f4b274de1d6c9ec9f1ea4a338b5c42325f19a4

--

Here's a lame benchmark for neo4j when I added another node `three` which is
the first rel `one`-`three`, then added 1mil `one`-(random nodes) , and
then(as before) `one`-`two`
`three` has no other relationships
Also, creating nodes 10k per transaction until 1mil are reached.

first time creating the relationships...
just created 1,000,000 rels, took=28,889 ms  (btw my cpu is limited to 77%
of its normal speed; usually!)
closed transaction
search started
(one)--(three) 3,234 ms
search started
(one)--(two) 661 ms
search started
(three)--(one) 10 ms
search started
(two)--(one) 0 ms
search started
(one)--(three) 631 ms
search started
(one)--(two) 913 ms
search started
(three)--(one) 0 ms
search started
(two)--(one) 0 ms
search started
(one)--(three) 610 ms
search started

Re: [Neo4j] shortestPath slower than it could be

2011-07-24 Thread Mattias Persson
What I'm saying is that:

  one -- two(one,two) for OUTGOING

and

  two -- one   (two,one) for INCOMING

should yield the same timing.

2011/7/24 John cyuczieekc cyuczie...@gmail.com

 Thanks Mattias. The way I understand what you said is, that swapping `one`
 and `two` in *finder.findSinglePath( one, two );*  should yield the same
 timing when Direction.BOTH is used; this would be great. But I don't see
 how
 this is possible (due to not knowing how it's stored too) unless you know
 how many rels each node has, OR even better storage is using BTree-s (?!)
 Btw, is it possible/practical that neo4j could store IDs in BTrees ?
 (someone said that for rels it's a double linked list instead- I'll need to
 recheck)

 irrelevant stuff follows (ie. don't read)
 
 Until then , I lame-tested something (with both neo4j and berkeleydb):
 a node `one` having 1mil rels to 1million unique nodes, and one more to a
 node `two`, and node `two` having 10 incoming rels from other unique nodes,
 +1 the one rel which was already from `one`, trying shortestPath between
 `one` and `two` (not `two` and `one`) with Direction.BOTH
 output:
 just created 1,000,000 rels, took=58,381 ms  (1)
 Path from one to two: (one)--(two) 121,310 ms (2)
 Path from one to two: (one)--(two) timedelta=1,066 ms
 Path from one to two: (one)--(two) timedelta=795 ms
 Path from one to two: (one)--(two) timedelta=772 ms

 (1) and (2) happened in the same transaction (ie. before tx.finish()), the
 others after transaction finished (an in no new transaction - since they
 were only reads) - this also means some caching must've happened from
 before.
 Because it happened in same transaction, adding 1 mil relationships is
 slower than adding them in bursts of x, I am aware of this (and I'll try to
 make a bench with that). For ie. 100k nodes neo4j is actually faster.
 I even got this once (but in another modified bench):
 first time creating the relationships...
 just created 1,000,000 rels, took=49,307 ms (cpu was 100% here, instead of
 the usual 77% limit)
 Exception in thread main java.lang.OutOfMemoryError: Java heap space
 Environment is about to shut down the neo4j database
 Exception in thread Thread-0 java.lang.OutOfMemoryError: GC overhead
 limit
 exceeded
at java.util.logging.LogManager.reset(LogManager.java:835)
at java.util.logging.LogManager$Cleaner.run(LogManager.java:240)
 Environment shutting down complete 23,962 ms

 Exception: java.lang.OutOfMemoryError thrown from the
 UncaughtExceptionHandler in thread main


 this is the output when running cold, (ie. the data already existed):
 search started
 Path from one to two: (one)--(two) timedelta=7,934 ms
 search started
 Path from one to two: (one)--(two) timedelta=2,027 ms
 search started
 Path from one to two: (one)--(two) timedelta=875 ms
 Environment is about to shut down the neo4j database
 Environment shutting down complete 2,971 ms

 and the exact same thing when using berkeleydb (not using transactions
 though):
 just created 1,000,000 rels, took=29,418 ms
 Path from one to two: true 0 ms
 Path from one to two: true 0 ms
 Path from one to two: true 0 ms
 class org.bdb.BerkEnv$1 shutting down complete 0 ms  (most was 10 ms here)

 and when cold:
 Path from one to two: true 10 ms
 Path from one to two: true 0 ms
 Path from one to two: true 0 ms
 class org.bdb.BerkEnv$1 shutting down complete 0 ms
 (the most was 10ms on first(sometimes 0ms, 10ms, 0ms), least was 0ms on
 all)

 in bdb, this is actually how bdb does the search (I don't need to make sure
 I iterate or something similar, on the node `two` which has least incoming
 rels), in fact I execute a search on both nodes, one after the other and
 make sure one-two and two-one (since I am using two databases), anyway,
 by
 using db.getSearchBoth(...) the search is done *that *fast. I hope to see
 this kind of fastness with neo4j too xD (unless I'm doing it wrong)

 if I put neo4j to find shortest path on incoming on `two` then:
 search started
 Path from one to two: (two)--(one) timedelta=20 ms
 search started
 Path from one to two: (two)--(one) timedelta=0 ms
 search started
 Path from one to two: (two)--(one) timedelta=0 ms
 Environment is about to shut down the neo4j database
 Environment shutting down complete 2,590 ms
 (most was 20ms, least was 20ms on first)

 TestLinkage.java (both progs) were used from here:

 https://github.com/13th-floor/neo4john/commit/a9f4b274de1d6c9ec9f1ea4a338b5c42325f19a4

 --

 Here's a lame benchmark for neo4j when I added another node `three` which
 is
 the first rel `one`-`three`, then added 1mil `one`-(random nodes) , and
 then(as before) `one`-`two`
 `three` has no other relationships
 Also, creating nodes 10k per transaction until 1mil are reached.

 first time creating the relationships...
 just created 1,000,000 rels, took=28,889 ms  (btw my cpu is limited to 77%
 of its normal speed; usually!)
 closed transaction
 search started
 (one)--(three) 3,234 ms
 search started

Re: [Neo4j] shortestPath slower than it could be

2011-07-24 Thread John cyuczieekc
but, even when `one` has 1 million outgoing rels
and `two` has 10 or 11 incoming rels

cool then ;) I was trying to say the same thing ...

Have a nice day,
John.

On Sun, Jul 24, 2011 at 4:10 PM, Mattias Persson
matt...@neotechnology.comwrote:

 What I'm saying is that:

  one -- two(one,two) for OUTGOING

 and

  two -- one   (two,one) for INCOMING

 should yield the same timing.

 2011/7/24 John cyuczieekc cyuczie...@gmail.com

  Thanks Mattias. The way I understand what you said is, that swapping
 `one`
  and `two` in *finder.findSinglePath( one, two );*  should yield the same
  timing when Direction.BOTH is used; this would be great. But I don't see
  how
  this is possible (due to not knowing how it's stored too) unless you know
  how many rels each node has, OR even better storage is using BTree-s (?!)
  Btw, is it possible/practical that neo4j could store IDs in BTrees ?
  (someone said that for rels it's a double linked list instead- I'll need
 to
  recheck)
 
  irrelevant stuff follows (ie. don't read)
  
  Until then , I lame-tested something (with both neo4j and berkeleydb):
  a node `one` having 1mil rels to 1million unique nodes, and one more to a
  node `two`, and node `two` having 10 incoming rels from other unique
 nodes,
  +1 the one rel which was already from `one`, trying shortestPath between
  `one` and `two` (not `two` and `one`) with Direction.BOTH
  output:
  just created 1,000,000 rels, took=58,381 ms  (1)
  Path from one to two: (one)--(two) 121,310 ms (2)
  Path from one to two: (one)--(two) timedelta=1,066 ms
  Path from one to two: (one)--(two) timedelta=795 ms
  Path from one to two: (one)--(two) timedelta=772 ms
 
  (1) and (2) happened in the same transaction (ie. before tx.finish()),
 the
  others after transaction finished (an in no new transaction - since they
  were only reads) - this also means some caching must've happened from
  before.
  Because it happened in same transaction, adding 1 mil relationships is
  slower than adding them in bursts of x, I am aware of this (and I'll try
 to
  make a bench with that). For ie. 100k nodes neo4j is actually faster.
  I even got this once (but in another modified bench):
  first time creating the relationships...
  just created 1,000,000 rels, took=49,307 ms (cpu was 100% here, instead
 of
  the usual 77% limit)
  Exception in thread main java.lang.OutOfMemoryError: Java heap space
  Environment is about to shut down the neo4j database
  Exception in thread Thread-0 java.lang.OutOfMemoryError: GC overhead
  limit
  exceeded
 at java.util.logging.LogManager.reset(LogManager.java:835)
 at java.util.logging.LogManager$Cleaner.run(LogManager.java:240)
  Environment shutting down complete 23,962 ms
 
  Exception: java.lang.OutOfMemoryError thrown from the
  UncaughtExceptionHandler in thread main
 
 
  this is the output when running cold, (ie. the data already existed):
  search started
  Path from one to two: (one)--(two) timedelta=7,934 ms
  search started
  Path from one to two: (one)--(two) timedelta=2,027 ms
  search started
  Path from one to two: (one)--(two) timedelta=875 ms
  Environment is about to shut down the neo4j database
  Environment shutting down complete 2,971 ms
 
  and the exact same thing when using berkeleydb (not using transactions
  though):
  just created 1,000,000 rels, took=29,418 ms
  Path from one to two: true 0 ms
  Path from one to two: true 0 ms
  Path from one to two: true 0 ms
  class org.bdb.BerkEnv$1 shutting down complete 0 ms  (most was 10 ms
 here)
 
  and when cold:
  Path from one to two: true 10 ms
  Path from one to two: true 0 ms
  Path from one to two: true 0 ms
  class org.bdb.BerkEnv$1 shutting down complete 0 ms
  (the most was 10ms on first(sometimes 0ms, 10ms, 0ms), least was 0ms on
  all)
 
  in bdb, this is actually how bdb does the search (I don't need to make
 sure
  I iterate or something similar, on the node `two` which has least
 incoming
  rels), in fact I execute a search on both nodes, one after the other and
  make sure one-two and two-one (since I am using two databases), anyway,
  by
  using db.getSearchBoth(...) the search is done *that *fast. I hope to see
  this kind of fastness with neo4j too xD (unless I'm doing it wrong)
 
  if I put neo4j to find shortest path on incoming on `two` then:
  search started
  Path from one to two: (two)--(one) timedelta=20 ms
  search started
  Path from one to two: (two)--(one) timedelta=0 ms
  search started
  Path from one to two: (two)--(one) timedelta=0 ms
  Environment is about to shut down the neo4j database
  Environment shutting down complete 2,590 ms
  (most was 20ms, least was 20ms on first)
 
  TestLinkage.java (both progs) were used from here:
 
 
 https://github.com/13th-floor/neo4john/commit/a9f4b274de1d6c9ec9f1ea4a338b5c42325f19a4
 
  --
 
  Here's a lame benchmark for neo4j when I added another node `three` which
  is
  the first rel `one`-`three`, then added 1mil `one`-(random nodes) , 

Re: [Neo4j] shortestPath slower than it could be

2011-07-24 Thread Niels Hoogeveen

Are you sure this is true, Mattias?The response time of a getRelationship call 
depends on the total number of relationships on the node. So it makes a 
difference which side of the relationship makes the call. It is always faster 
to ask it from the side that has the lowest total number of relationships 
attached. This is even true, if for both sides of the relationship there is 
only one relationship of that particular relationship type.Niels
 Date: Sun, 24 Jul 2011 16:10:42 +0200
 From: matt...@neotechnology.com
 To: user@lists.neo4j.org
 Subject: Re: [Neo4j] shortestPath slower than it could be
 
 What I'm saying is that:
 
   one -- two(one,two) for OUTGOING
 
 and
 
   two -- one   (two,one) for INCOMING
 
 should yield the same timing.
 
 2011/7/24 John cyuczieekc cyuczie...@gmail.com
 
  Thanks Mattias. The way I understand what you said is, that swapping `one`
  and `two` in *finder.findSinglePath( one, two );*  should yield the same
  timing when Direction.BOTH is used; this would be great. But I don't see
  how
  this is possible (due to not knowing how it's stored too) unless you know
  how many rels each node has, OR even better storage is using BTree-s (?!)
  Btw, is it possible/practical that neo4j could store IDs in BTrees ?
  (someone said that for rels it's a double linked list instead- I'll need to
  recheck)
 
  irrelevant stuff follows (ie. don't read)
  
  Until then , I lame-tested something (with both neo4j and berkeleydb):
  a node `one` having 1mil rels to 1million unique nodes, and one more to a
  node `two`, and node `two` having 10 incoming rels from other unique nodes,
  +1 the one rel which was already from `one`, trying shortestPath between
  `one` and `two` (not `two` and `one`) with Direction.BOTH
  output:
  just created 1,000,000 rels, took=58,381 ms  (1)
  Path from one to two: (one)--(two) 121,310 ms (2)
  Path from one to two: (one)--(two) timedelta=1,066 ms
  Path from one to two: (one)--(two) timedelta=795 ms
  Path from one to two: (one)--(two) timedelta=772 ms
 
  (1) and (2) happened in the same transaction (ie. before tx.finish()), the
  others after transaction finished (an in no new transaction - since they
  were only reads) - this also means some caching must've happened from
  before.
  Because it happened in same transaction, adding 1 mil relationships is
  slower than adding them in bursts of x, I am aware of this (and I'll try to
  make a bench with that). For ie. 100k nodes neo4j is actually faster.
  I even got this once (but in another modified bench):
  first time creating the relationships...
  just created 1,000,000 rels, took=49,307 ms (cpu was 100% here, instead of
  the usual 77% limit)
  Exception in thread main java.lang.OutOfMemoryError: Java heap space
  Environment is about to shut down the neo4j database
  Exception in thread Thread-0 java.lang.OutOfMemoryError: GC overhead
  limit
  exceeded
 at java.util.logging.LogManager.reset(LogManager.java:835)
 at java.util.logging.LogManager$Cleaner.run(LogManager.java:240)
  Environment shutting down complete 23,962 ms
 
  Exception: java.lang.OutOfMemoryError thrown from the
  UncaughtExceptionHandler in thread main
 
 
  this is the output when running cold, (ie. the data already existed):
  search started
  Path from one to two: (one)--(two) timedelta=7,934 ms
  search started
  Path from one to two: (one)--(two) timedelta=2,027 ms
  search started
  Path from one to two: (one)--(two) timedelta=875 ms
  Environment is about to shut down the neo4j database
  Environment shutting down complete 2,971 ms
 
  and the exact same thing when using berkeleydb (not using transactions
  though):
  just created 1,000,000 rels, took=29,418 ms
  Path from one to two: true 0 ms
  Path from one to two: true 0 ms
  Path from one to two: true 0 ms
  class org.bdb.BerkEnv$1 shutting down complete 0 ms  (most was 10 ms here)
 
  and when cold:
  Path from one to two: true 10 ms
  Path from one to two: true 0 ms
  Path from one to two: true 0 ms
  class org.bdb.BerkEnv$1 shutting down complete 0 ms
  (the most was 10ms on first(sometimes 0ms, 10ms, 0ms), least was 0ms on
  all)
 
  in bdb, this is actually how bdb does the search (I don't need to make sure
  I iterate or something similar, on the node `two` which has least incoming
  rels), in fact I execute a search on both nodes, one after the other and
  make sure one-two and two-one (since I am using two databases), anyway,
  by
  using db.getSearchBoth(...) the search is done *that *fast. I hope to see
  this kind of fastness with neo4j too xD (unless I'm doing it wrong)
 
  if I put neo4j to find shortest path on incoming on `two` then:
  search started
  Path from one to two: (two)--(one) timedelta=20 ms
  search started
  Path from one to two: (two)--(one) timedelta=0 ms
  search started
  Path from one to two: (two)--(one) timedelta=0 ms
  Environment is about to shut down the neo4j database
  Environment shutting down complete 2,590 ms

Re: [Neo4j] shortestPath slower than it could be

2011-07-24 Thread Mattias Persson
It doesn't matter since the algorithm is bi-directional... so:

one -- two

will start from one OUTGOING and two INCOMING, whereas

two -- one

will start from two INCOMING and one OUTGOING see, no difference.

It alternates side for each relationship. It will, however depend on where
the INCOMING/OUTGOING relationships reside in the relationship chain, but
sticking to this discussion: those two calls will yield the exact same
speed. Though I just discovered a little thingie where findSinglePath didn't
stop right away when after finding the first one, but now it does!

2011/7/24 Niels Hoogeveen pd_aficion...@hotmail.com


 Are you sure this is true, Mattias?The response time of a getRelationship
 call depends on the total number of relationships on the node. So it makes a
 difference which side of the relationship makes the call. It is always
 faster to ask it from the side that has the lowest total number of
 relationships attached. This is even true, if for both sides of the
 relationship there is only one relationship of that particular relationship
 type.Niels
  Date: Sun, 24 Jul 2011 16:10:42 +0200
  From: matt...@neotechnology.com
  To: user@lists.neo4j.org
  Subject: Re: [Neo4j] shortestPath slower than it could be
 
  What I'm saying is that:
 
one -- two(one,two) for OUTGOING
 
  and
 
two -- one   (two,one) for INCOMING
 
  should yield the same timing.
 
  2011/7/24 John cyuczieekc cyuczie...@gmail.com
 
   Thanks Mattias. The way I understand what you said is, that swapping
 `one`
   and `two` in *finder.findSinglePath( one, two );*  should yield the
 same
   timing when Direction.BOTH is used; this would be great. But I don't
 see
   how
   this is possible (due to not knowing how it's stored too) unless you
 know
   how many rels each node has, OR even better storage is using BTree-s
 (?!)
   Btw, is it possible/practical that neo4j could store IDs in BTrees ?
   (someone said that for rels it's a double linked list instead- I'll
 need to
   recheck)
  
   irrelevant stuff follows (ie. don't read)
   
   Until then , I lame-tested something (with both neo4j and berkeleydb):
   a node `one` having 1mil rels to 1million unique nodes, and one more to
 a
   node `two`, and node `two` having 10 incoming rels from other unique
 nodes,
   +1 the one rel which was already from `one`, trying shortestPath
 between
   `one` and `two` (not `two` and `one`) with Direction.BOTH
   output:
   just created 1,000,000 rels, took=58,381 ms  (1)
   Path from one to two: (one)--(two) 121,310 ms (2)
   Path from one to two: (one)--(two) timedelta=1,066 ms
   Path from one to two: (one)--(two) timedelta=795 ms
   Path from one to two: (one)--(two) timedelta=772 ms
  
   (1) and (2) happened in the same transaction (ie. before tx.finish()),
 the
   others after transaction finished (an in no new transaction - since
 they
   were only reads) - this also means some caching must've happened from
   before.
   Because it happened in same transaction, adding 1 mil relationships is
   slower than adding them in bursts of x, I am aware of this (and I'll
 try to
   make a bench with that). For ie. 100k nodes neo4j is actually faster.
   I even got this once (but in another modified bench):
   first time creating the relationships...
   just created 1,000,000 rels, took=49,307 ms (cpu was 100% here, instead
 of
   the usual 77% limit)
   Exception in thread main java.lang.OutOfMemoryError: Java heap space
   Environment is about to shut down the neo4j database
   Exception in thread Thread-0 java.lang.OutOfMemoryError: GC overhead
   limit
   exceeded
  at java.util.logging.LogManager.reset(LogManager.java:835)
  at java.util.logging.LogManager$Cleaner.run(LogManager.java:240)
   Environment shutting down complete 23,962 ms
  
   Exception: java.lang.OutOfMemoryError thrown from the
   UncaughtExceptionHandler in thread main
  
  
   this is the output when running cold, (ie. the data already existed):
   search started
   Path from one to two: (one)--(two) timedelta=7,934 ms
   search started
   Path from one to two: (one)--(two) timedelta=2,027 ms
   search started
   Path from one to two: (one)--(two) timedelta=875 ms
   Environment is about to shut down the neo4j database
   Environment shutting down complete 2,971 ms
  
   and the exact same thing when using berkeleydb (not using transactions
   though):
   just created 1,000,000 rels, took=29,418 ms
   Path from one to two: true 0 ms
   Path from one to two: true 0 ms
   Path from one to two: true 0 ms
   class org.bdb.BerkEnv$1 shutting down complete 0 ms  (most was 10 ms
 here)
  
   and when cold:
   Path from one to two: true 10 ms
   Path from one to two: true 0 ms
   Path from one to two: true 0 ms
   class org.bdb.BerkEnv$1 shutting down complete 0 ms
   (the most was 10ms on first(sometimes 0ms, 10ms, 0ms), least was 0ms on
   all)
  
   in bdb, this is actually how bdb does the search (I don't need to make
 sure
   I

Re: [Neo4j] shortestPath slower than it could be

2011-07-24 Thread John cyuczieekc
updated to latest from github, Stops the algo as soon as possible in
findSinglePath

graphdb contains:
one--three
one--{ a million other random nodes }
one--two
{ 10 random nodes } -- two

(added in that order, except that `one--two` is somewhere between those 10
random nodes)

output:

with Direction.BOTH:
(one)--(three) 7,660 ms
(one)--(two) 0 ms
(three)--(one) 0 ms
(two)--(one) 0 ms
(one)--(three) 2,195 ms
(one)--(two) 0 ms
*(three)--(one) 0 ms*
(two)--(one) 0 ms
*(one)--(three) 875 ms*
(one)--(two) 0 ms
(three)--(one) 0 ms
(two)--(one) 0 ms
(one)--(three) 630 ms
(one)--(two) 0 ms
(three)--(one) 0 ms
(two)--(one) 0 ms
(one)--(three) 723 ms
(one)--(two) 0 ms
(three)--(one) 0 ms
(two)--(one) 0 ms
with Direction.INCOMING:
(three)--(one) 0 ms
(two)--(one) 0 ms
(three)--(one) 0 ms
(two)--(one) 0 ms
(three)--(one) 0 ms
(two)--(one) 0 ms
(three)--(one) 0 ms
(two)--(one) 0 ms
(three)--(one) 0 ms
(two)--(one) 0 ms
with Direction.OUTGOING:
(one)--(three) 802 ms
(one)--(two) 0 ms
with Direction.OUTGOING:
(one)--(three) 631 ms
(one)--(two) 0 ms
with Direction.OUTGOING:
(one)--(three) 732 ms
(one)--(two) 0 ms
with Direction.OUTGOING:
(one)--(three) 873 ms
(one)--(two) 0 ms
with Direction.OUTGOING:
(one)--(three) 630 ms
Environment is about to shut down the neo4j database
(one)--(two) 0 ms
Environment shutting down complete 3,171 ms

For some reason, one--three takes more than 0ms, even though three has only
1 incoming rel, and it's the first rel of `one` that outgoes to `three`

(for whoever cares)the test program used is in this commit here:
https://github.com/13th-floor/neo4john/commit/b819a0d418d953a675aaa74749f284b88e4f47ee

I tried to revert that commit, and for some reason I'm getting the same
results [one--three is the only one over 0ms] (maybe I failed in reverting
it, though the 2 sources seem reverted)

Peace :)

On Sun, Jul 24, 2011 at 4:49 PM, Mattias Persson
matt...@neotechnology.comwrote:

 It doesn't matter since the algorithm is bi-directional... so:

one -- two

 will start from one OUTGOING and two INCOMING, whereas

two -- one

 will start from two INCOMING and one OUTGOING see, no difference.

 It alternates side for each relationship. It will, however depend on where
 the INCOMING/OUTGOING relationships reside in the relationship chain, but
 sticking to this discussion: those two calls will yield the exact same
 speed. Though I just discovered a little thingie where findSinglePath
 didn't
 stop right away when after finding the first one, but now it does!

 2011/7/24 Niels Hoogeveen pd_aficion...@hotmail.com

 
  Are you sure this is true, Mattias?The response time of a getRelationship
  call depends on the total number of relationships on the node. So it
 makes a
  difference which side of the relationship makes the call. It is always
  faster to ask it from the side that has the lowest total number of
  relationships attached. This is even true, if for both sides of the
  relationship there is only one relationship of that particular
 relationship
  type.Niels
   Date: Sun, 24 Jul 2011 16:10:42 +0200
   From: matt...@neotechnology.com
   To: user@lists.neo4j.org
   Subject: Re: [Neo4j] shortestPath slower than it could be
  
   What I'm saying is that:
  
 one -- two(one,two) for OUTGOING
  
   and
  
 two -- one   (two,one) for INCOMING
  
   should yield the same timing.
  
   2011/7/24 John cyuczieekc cyuczie...@gmail.com
  
Thanks Mattias. The way I understand what you said is, that swapping
  `one`
and `two` in *finder.findSinglePath( one, two );*  should yield the
  same
timing when Direction.BOTH is used; this would be great. But I don't
  see
how
this is possible (due to not knowing how it's stored too) unless you
  know
how many rels each node has, OR even better storage is using BTree-s
  (?!)
Btw, is it possible/practical that neo4j could store IDs in BTrees ?
(someone said that for rels it's a double linked list instead- I'll
  need to
recheck)
   
irrelevant stuff follows (ie. don't read)

Until then , I lame-tested something (with both neo4j and
 berkeleydb):
a node `one` having 1mil rels to 1million unique nodes, and one more
 to
  a
node `two`, and node `two` having 10 incoming rels from other unique
  nodes,
+1 the one rel which was already from `one`, trying shortestPath
  between
`one` and `two` (not `two` and `one`) with Direction.BOTH
output:
just created 1,000,000 rels, took=58,381 ms  (1)
Path from one to two: (one)--(two) 121,310 ms (2)
Path from one to two: (one)--(two) timedelta=1,066 ms
Path from one to two: (one)--(two) timedelta=795 ms
Path from one to two: (one)--(two) timedelta=772 ms
   
(1) and (2) happened in the same transaction (ie. before
 tx.finish()),
  the
others after transaction finished (an in no new transaction - since
  they
were only reads) - this also means some caching must've happened from
before.
Because

Re: [Neo4j] shortestPath slower than it could be

2011-07-24 Thread John cyuczieekc
 a
  difference which side of the relationship makes the call. It is always
  faster to ask it from the side that has the lowest total number of
  relationships attached. This is even true, if for both sides of the
  relationship there is only one relationship of that particular
 relationship
  type.Niels
   Date: Sun, 24 Jul 2011 16:10:42 +0200
   From: matt...@neotechnology.com
   To: user@lists.neo4j.org
   Subject: Re: [Neo4j] shortestPath slower than it could be
  
   What I'm saying is that:
  
 one -- two(one,two) for OUTGOING
  
   and
  
 two -- one   (two,one) for INCOMING
  
   should yield the same timing.
  
   2011/7/24 John cyuczieekc cyuczie...@gmail.com
  
Thanks Mattias. The way I understand what you said is, that swapping
  `one`
and `two` in *finder.findSinglePath( one, two );*  should yield the
  same
timing when Direction.BOTH is used; this would be great. But I don't
  see
how
this is possible (due to not knowing how it's stored too) unless you
  know
how many rels each node has, OR even better storage is using BTree-s
  (?!)
Btw, is it possible/practical that neo4j could store IDs in BTrees ?
(someone said that for rels it's a double linked list instead- I'll
  need to
recheck)
   
irrelevant stuff follows (ie. don't read)

Until then , I lame-tested something (with both neo4j and
 berkeleydb):
a node `one` having 1mil rels to 1million unique nodes, and one more
 to
  a
node `two`, and node `two` having 10 incoming rels from other unique
  nodes,
+1 the one rel which was already from `one`, trying shortestPath
  between
`one` and `two` (not `two` and `one`) with Direction.BOTH
output:
just created 1,000,000 rels, took=58,381 ms  (1)
Path from one to two: (one)--(two) 121,310 ms (2)
Path from one to two: (one)--(two) timedelta=1,066 ms
Path from one to two: (one)--(two) timedelta=795 ms
Path from one to two: (one)--(two) timedelta=772 ms
   
(1) and (2) happened in the same transaction (ie. before
 tx.finish()),
  the
others after transaction finished (an in no new transaction - since
  they
were only reads) - this also means some caching must've happened
 from
before.
Because it happened in same transaction, adding 1 mil relationships
 is
slower than adding them in bursts of x, I am aware of this (and I'll
  try to
make a bench with that). For ie. 100k nodes neo4j is actually
 faster.
I even got this once (but in another modified bench):
first time creating the relationships...
just created 1,000,000 rels, took=49,307 ms (cpu was 100% here,
 instead
  of
the usual 77% limit)
Exception in thread main java.lang.OutOfMemoryError: Java heap
 space
Environment is about to shut down the neo4j database
Exception in thread Thread-0 java.lang.OutOfMemoryError: GC
 overhead
limit
exceeded
   at java.util.logging.LogManager.reset(LogManager.java:835)
   at java.util.logging.LogManager$Cleaner.run(LogManager.java:240)
Environment shutting down complete 23,962 ms
   
Exception: java.lang.OutOfMemoryError thrown from the
UncaughtExceptionHandler in thread main
   
   
this is the output when running cold, (ie. the data already
 existed):
search started
Path from one to two: (one)--(two) timedelta=7,934 ms
search started
Path from one to two: (one)--(two) timedelta=2,027 ms
search started
Path from one to two: (one)--(two) timedelta=875 ms
Environment is about to shut down the neo4j database
Environment shutting down complete 2,971 ms
   
and the exact same thing when using berkeleydb (not using
 transactions
though):
just created 1,000,000 rels, took=29,418 ms
Path from one to two: true 0 ms
Path from one to two: true 0 ms
Path from one to two: true 0 ms
class org.bdb.BerkEnv$1 shutting down complete 0 ms  (most was 10 ms
  here)
   
and when cold:
Path from one to two: true 10 ms
Path from one to two: true 0 ms
Path from one to two: true 0 ms
class org.bdb.BerkEnv$1 shutting down complete 0 ms
(the most was 10ms on first(sometimes 0ms, 10ms, 0ms), least was 0ms
 on
all)
   
in bdb, this is actually how bdb does the search (I don't need to
 make
  sure
I iterate or something similar, on the node `two` which has least
  incoming
rels), in fact I execute a search on both nodes, one after the other
  and
make sure one-two and two-one (since I am using two databases),
  anyway,
by
using db.getSearchBoth(...) the search is done *that *fast. I hope
 to
  see
this kind of fastness with neo4j too xD (unless I'm doing it wrong)
   
if I put neo4j to find shortest path on incoming on `two` then:
search started
Path from one to two: (two)--(one) timedelta=20 ms
search started
Path from one to two: (two)--(one) timedelta=0 ms
search started
Path from one to two: (two)--(one) timedelta=0

Re: [Neo4j] shortestPath slower than it could be

2011-07-24 Thread John cyuczieekc

 Environment is about to shut down the neo4j database
 (one)--(two) 0 ms
 Environment shutting down complete 2,711 ms


 So yeah, one--two got fixed, what say you about one--three though ? :D
 *whistle* :-


 On Sun, Jul 24, 2011 at 5:28 PM, John cyuczieekc cyuczie...@gmail.comwrote:

 updated to latest from github, Stops the algo as soon as possible in
 findSinglePath

 graphdb contains:
 one--three
 one--{ a million other random nodes }
 one--two
 { 10 random nodes } -- two

 (added in that order, except that `one--two` is somewhere between those
 10 random nodes)

 output:

 with Direction.BOTH:
 (one)--(three) 7,660 ms
 (one)--(two) 0 ms
 (three)--(one) 0 ms
 (two)--(one) 0 ms
 (one)--(three) 2,195 ms
 (one)--(two) 0 ms
 *(three)--(one) 0 ms*
 (two)--(one) 0 ms
 *(one)--(three) 875 ms*
 (one)--(two) 0 ms
 (three)--(one) 0 ms
 (two)--(one) 0 ms
 (one)--(three) 630 ms
 (one)--(two) 0 ms
 (three)--(one) 0 ms
 (two)--(one) 0 ms
 (one)--(three) 723 ms
 (one)--(two) 0 ms
 (three)--(one) 0 ms
 (two)--(one) 0 ms
 with Direction.INCOMING:
 (three)--(one) 0 ms
 (two)--(one) 0 ms
 (three)--(one) 0 ms
 (two)--(one) 0 ms
 (three)--(one) 0 ms
 (two)--(one) 0 ms
 (three)--(one) 0 ms
 (two)--(one) 0 ms
 (three)--(one) 0 ms
 (two)--(one) 0 ms
 with Direction.OUTGOING:
 (one)--(three) 802 ms
 (one)--(two) 0 ms
 with Direction.OUTGOING:
 (one)--(three) 631 ms
 (one)--(two) 0 ms
 with Direction.OUTGOING:
 (one)--(three) 732 ms
 (one)--(two) 0 ms
 with Direction.OUTGOING:
 (one)--(three) 873 ms
 (one)--(two) 0 ms
 with Direction.OUTGOING:
 (one)--(three) 630 ms

 Environment is about to shut down the neo4j database
  (one)--(two) 0 ms
 Environment shutting down complete 3,171 ms

 For some reason, one--three takes more than 0ms, even though three has
 only 1 incoming rel, and it's the first rel of `one` that outgoes to `three`

 (for whoever cares)the test program used is in this commit here:

 https://github.com/13th-floor/neo4john/commit/b819a0d418d953a675aaa74749f284b88e4f47ee

 I tried to revert that commit, and for some reason I'm getting the same
 results [one--three is the only one over 0ms] (maybe I failed in reverting
 it, though the 2 sources seem reverted)

 Peace :)


 On Sun, Jul 24, 2011 at 4:49 PM, Mattias Persson 
 matt...@neotechnology.com wrote:

 It doesn't matter since the algorithm is bi-directional... so:

one -- two

 will start from one OUTGOING and two INCOMING, whereas

two -- one

 will start from two INCOMING and one OUTGOING see, no difference.

 It alternates side for each relationship. It will, however depend on
 where
 the INCOMING/OUTGOING relationships reside in the relationship chain, but
 sticking to this discussion: those two calls will yield the exact same
 speed. Though I just discovered a little thingie where findSinglePath
 didn't
 stop right away when after finding the first one, but now it does!

 2011/7/24 Niels Hoogeveen pd_aficion...@hotmail.com

 
  Are you sure this is true, Mattias?The response time of a
 getRelationship
  call depends on the total number of relationships on the node. So it
 makes a
  difference which side of the relationship makes the call. It is always
  faster to ask it from the side that has the lowest total number of
  relationships attached. This is even true, if for both sides of the
  relationship there is only one relationship of that particular
 relationship
  type.Niels
   Date: Sun, 24 Jul 2011 16:10:42 +0200
   From: matt...@neotechnology.com
   To: user@lists.neo4j.org
   Subject: Re: [Neo4j] shortestPath slower than it could be
  
   What I'm saying is that:
  
 one -- two(one,two) for OUTGOING
  
   and
  
 two -- one   (two,one) for INCOMING
  
   should yield the same timing.
  
   2011/7/24 John cyuczieekc cyuczie...@gmail.com
  
Thanks Mattias. The way I understand what you said is, that
 swapping
  `one`
and `two` in *finder.findSinglePath( one, two );*  should yield the
  same
timing when Direction.BOTH is used; this would be great. But I
 don't
  see
how
this is possible (due to not knowing how it's stored too) unless
 you
  know
how many rels each node has, OR even better storage is using
 BTree-s
  (?!)
Btw, is it possible/practical that neo4j could store IDs in BTrees
 ?
(someone said that for rels it's a double linked list instead- I'll
  need to
recheck)
   
irrelevant stuff follows (ie. don't read)

Until then , I lame-tested something (with both neo4j and
 berkeleydb):
a node `one` having 1mil rels to 1million unique nodes, and one
 more to
  a
node `two`, and node `two` having 10 incoming rels from other
 unique
  nodes,
+1 the one rel which was already from `one`, trying shortestPath
  between
`one` and `two` (not `two` and `one`) with Direction.BOTH
output:
just created 1,000,000 rels, took=58,381 ms  (1)
Path from one to two: (one)--(two) 121,310 ms (2)
Path from one to two: (one)--(two) timedelta=1,066 ms
Path from one

Re: [Neo4j] shortestPath slower than it could be

2011-07-23 Thread John cyuczieekc
Hail :)
Beware, lot of noise follows (ie. webnoise kinda noise), while I thank
anyone in advance for reading, I must also apologize for failing to be
concise :-
On Sat, Jul 23, 2011 at 10:42 AM, Michael Hunger 
michael.hun...@neotechnology.com wrote:

 John,

 no problem :)

 You pointed out both problems:
 - cold caches
 - lots of rels on the one side

 There are some performance implications of loading millions of rels from
 disk. We'll be working on improving those in 1.5.

 The easiest way to solve that is to switch start and end-node which you
 already did. It is much easier for you _with domain knowledge_ about the
 graph than for the algorithm.

Yeah I would probably apply that manually, that is: call a particular method
when I know which one of the nodes has least rels. (as a ' `temporary` or
`acceptable so far` workaround ' [u see the graph here?! xD]* I will for now
use findSinglePath as it is*, simply because I don't want the overhead of
using a particular method for some use case scenario and the generic one for
others - besides it's fast enough, for now)
 My problem here is that I might not know (in the future) which node (first
or second) will have the least nodes. So I was hoping to create a generic
method for such find.
 For now, I could say, I can know.
For example:
AllLists-A
AllLists-B
AllLists-C
where A,B,C are considered nodes representing lists, in this case, I could
maybe say that while AllLists can have lots of outgoing rels (aka lots of
lists), list A may not have that many nodes pointing to it (aka list is not
used in as many places as there are lists), so I could consider A as being
node with least nodes.
Though I would want a generic method that would always know which node has
least nodes and parse that one, especially in my 1 hop case.
  I am, as seen here, basically tagging the nodes A,B,C with AllLists, by
having AllLists point to them, and this way, I know that node A is of
AllLists type (so to speak). This was my old way of thinking while using bdb
je, I might need to revise it since then I was completely ignoring
relationships and not considering them as a single and accessible object. (I
am currently trying to port that old way of thinking to neo4j)
  In this old way, I was using the node AllLists pointing/outgoing to A as
being (in neo4j) a relationship of the type AllLists incoming to A.


 Especially if you have  1 hop paths.

In my particular use case, I will probably not (yet!) have cases with 1
hop. (I can't really visualize that far in the future xD ) but for sure I
will have lots of ==1 hops cases where I just want to see if thisnode is
tagged with thatnode by checking thatnode--thisnode


 There might be ways in improving the algorithm, e.g. by iterating both
 sides at the same time, which would lead to the end with the fewer
 relationships being exhausted and resolved first.

for now, I totally avoided getting into the findSinglePath 's source and
considering doing changes there or even seeing how it works:)
But that algo sounds like something I'd do, in like 2 (reusable) threads
even xD (I remember having considered this in the past, when using bdb je
but I opted to parse the node with least rels instead)


 How large is your graph at all? And how is it structured, e.g. how many
 rels do the other 9 nodes have that your second node points to?

I was merely trying this as an example(but this should be a practical use
case in my project), the thing is, the other 9 nodes can have any number or
rels, in fact *any node could have any number of rels incoming and outgoing
from it*,... (but in this case, the 9 nodes each had a new unused node
point(outgoing) to them; I'd post the source but i'm already using wrappers
and the whole example would be more than 1 java file, but I can link to it:
https://github.com/13th-floor/neo4john/blob/52e2d75a39a28200b34032aaf45a5a09c1e1b22c/src/org/benchtests/neo4j/TestLinkage.java
that's the main file to run, tho you don't have to and you prolly don't have
the time; just putting it here for consistency/historical-reasons I guess xD
)
  In my project, I am trying something generic... something like building
sets and lists of nodes based on graphs. Imagine having no metadata
key-value properties, and instead *trying to use only nodes to represent *those
properties (well actually I do have one key aka name so i could reference
the same node id across application restarts).



 What are both times in a second or third run (cached) ?

Path from one to two: (one)--(two) timedelta=2,151,018,862 ns
Path from one to two: (one)--(two) timedelta=131,458,292 ns
Path from one to two: (one)--(two) timedelta=119,015,019 ns
another run:
Path from one to two: (one)--(two) timedelta=1,824,099,688 ns
Path from one to two: (one)--(two) timedelta=275,772,087 ns
Path from one to two: (one)--(two) timedelta=352,562,121 ns
 and when swapped:
Path from one to two: (two)--(one) timedelta=20,601,181 ns
Path from one to two: (two)--(one) timedelta=284,212 

Re: [Neo4j] shortestPath slower than it could be

2011-07-23 Thread Michael Hunger
Perhaps you can catch up with Niels Hoogeven who's currently implementing 
Collection data structures, B- and R-Trees to be transparently mapped into 
graph structures at:

https://github.com/peterneubauer/graph-collections

Cheers

Michael

Perhaps you should describe the intent / idea behind your project. Dropping too 
early into implementation details might remove some possibilities that are able 
with graph databases by loosing the big picture.

And I wouldn't care about performance right now (btw. 
System.currentTimeMillis() is mostly good enough for measurement). But rather 
on modeling your domain to the graph and specify which concepts and 
operations/use-cases you'd like to support.

You can optimize afterwards anyway.

Am 23.07.2011 um 14:29 schrieb John cyuczieekc:

 Hail :)
 Beware, lot of noise follows (ie. webnoise kinda noise), while I thank
 anyone in advance for reading, I must also apologize for failing to be
 concise :-
 On Sat, Jul 23, 2011 at 10:42 AM, Michael Hunger 
 michael.hun...@neotechnology.com wrote:
 
 John,
 
 no problem :)
 
 You pointed out both problems:
 - cold caches
 - lots of rels on the one side
 
 There are some performance implications of loading millions of rels from
 disk. We'll be working on improving those in 1.5.
 
 The easiest way to solve that is to switch start and end-node which you
 already did. It is much easier for you _with domain knowledge_ about the
 graph than for the algorithm.
 
 Yeah I would probably apply that manually, that is: call a particular method
 when I know which one of the nodes has least rels. (as a ' `temporary` or
 `acceptable so far` workaround ' [u see the graph here?! xD]* I will for now
 use findSinglePath as it is*, simply because I don't want the overhead of
 using a particular method for some use case scenario and the generic one for
 others - besides it's fast enough, for now)
 My problem here is that I might not know (in the future) which node (first
 or second) will have the least nodes. So I was hoping to create a generic
 method for such find.
 For now, I could say, I can know.
 For example:
 AllLists-A
 AllLists-B
 AllLists-C
 where A,B,C are considered nodes representing lists, in this case, I could
 maybe say that while AllLists can have lots of outgoing rels (aka lots of
 lists), list A may not have that many nodes pointing to it (aka list is not
 used in as many places as there are lists), so I could consider A as being
 node with least nodes.
 Though I would want a generic method that would always know which node has
 least nodes and parse that one, especially in my 1 hop case.
  I am, as seen here, basically tagging the nodes A,B,C with AllLists, by
 having AllLists point to them, and this way, I know that node A is of
 AllLists type (so to speak). This was my old way of thinking while using bdb
 je, I might need to revise it since then I was completely ignoring
 relationships and not considering them as a single and accessible object. (I
 am currently trying to port that old way of thinking to neo4j)
  In this old way, I was using the node AllLists pointing/outgoing to A as
 being (in neo4j) a relationship of the type AllLists incoming to A.
 
 
 Especially if you have  1 hop paths.
 
 In my particular use case, I will probably not (yet!) have cases with 1
 hop. (I can't really visualize that far in the future xD ) but for sure I
 will have lots of ==1 hops cases where I just want to see if thisnode is
 tagged with thatnode by checking thatnode--thisnode
 
 
 There might be ways in improving the algorithm, e.g. by iterating both
 sides at the same time, which would lead to the end with the fewer
 relationships being exhausted and resolved first.
 
 for now, I totally avoided getting into the findSinglePath 's source and
 considering doing changes there or even seeing how it works:)
 But that algo sounds like something I'd do, in like 2 (reusable) threads
 even xD (I remember having considered this in the past, when using bdb je
 but I opted to parse the node with least rels instead)
 
 
 How large is your graph at all? And how is it structured, e.g. how many
 rels do the other 9 nodes have that your second node points to?
 
 I was merely trying this as an example(but this should be a practical use
 case in my project), the thing is, the other 9 nodes can have any number or
 rels, in fact *any node could have any number of rels incoming and outgoing
 from it*,... (but in this case, the 9 nodes each had a new unused node
 point(outgoing) to them; I'd post the source but i'm already using wrappers
 and the whole example would be more than 1 java file, but I can link to it:
 https://github.com/13th-floor/neo4john/blob/52e2d75a39a28200b34032aaf45a5a09c1e1b22c/src/org/benchtests/neo4j/TestLinkage.java
 that's the main file to run, tho you don't have to and you prolly don't have
 the time; just putting it here for consistency/historical-reasons I guess xD
 )
  In my project, I am trying something generic... something 

Re: [Neo4j] shortestPath slower than it could be

2011-07-23 Thread John cyuczieekc
I would gladly try to catch up with Niels but from what I've read (from his
prev. emails) I hardly understand what he was talking about (implying I am
no way near as smart to understand that; which might mean I am simply
missing the background/unstated-assumptions or focusing fail :)) [also I
don't know if we're doing similar things? probably because I don't yet
understand what he's doing, or I don't fully understand which causes me to
drop even what I did understand, flagging it as `didn't understand anything`
xD]

I would describe the idea behind my project, but I'm not sure that I can
envision that far (due to failure to keep my focus perhaps), hence I was
hoping I could work my way bottom-up implementing (though this would never
really work). And you're right about performance, I shouldn't care, but I
lose the sight sometimes and get stuck into small details like that,
effectively preventing me to continue. Anyway, enough about what is, let's
change that heh. I really appreciate what you said btw and I find it
helpful.

Ok so idea would be something like, building systems on top of other systems
with the main goal being accessibility. It's kind of generic really. I don't
want something very specific. I just want to be able to access information
and have the computer also access this information in an AI-friendly kinda
way. That is, if I don't know anything, but I know how to navigate links
(supposedly most basic task), then I(as an AI or intelligence) would be able
to parse and understand how this (part of the) system works, and even change
it (evolve it, ie. multiversioning? / git-like).
 Specific examples, let's see, when done, I should be able to do programming
or storing random thoughts/phrases have them linked to the timestamp of
their creation even, create a small random monitor (think 2D screen in 3D)
and have it show something and starting from any pixel on it, I would be
able to track things down and see/understand how that pixel ended up on that
particular spot. So basically a right way of organizing and accessing
anything, be it offline data or live programs, such that an intelligence
(human or otherwise AI/computer) would have no trouble
understanding/learning how it works). I think the key to building such a
thing is simply links/relationships. I just know this can be done (on top of
this lowest level:) by simply creating Nodes (which are basically random
unique IDs) and having nothing but those nodes and the relationships between
them (no real metadata would really be needed), just the ability to know
that this node X is different than this node Y (which is done via their IDs
really; but if the java level requires accessing the same node ID then it
would require a layer for itself only to have a name associated with that
ID, so it can access it by name, across app restarts), and knowing if X
points to Y or Y is pointed-to by X.
(btw, I don't really know or use BTrees but I was hoping them being use at
the lower than Node/Relationship level, would benefit fast searches ie. is
X-Y or find x such that {A-x-B; F-C-x-K; P-x}; but at the high level
I am referring with Nodes/Relationships I don't see the need for a BTree
altho it could be implemented on top of whatever exists; don't know about
Rtree though, I am wikipedia-ing it right now, looks yummy xD)

Anyway, the point is, like in a graph I guess, start from any point and be
able to parse/understand the entire system (except if some grouped nodes are
simply disconnected from the grouped ones I started in, you'd say; but
really there would/should probably be no such nodes, at least the reference
node would be pointing to them, but most likely they would be grouped
together by a parent node pointing to each of them)

So basically, I lost it :)
There will be issues that need to be tackled, like not using locks, using
multiversioning instead (or something)... I don't even know anymore - I know
sometimes it seemed impossible even theoretically but sometimes it seemed
those impossibilities vanished.

What more should I say? (really, 'cause I don't know)
I just want a system you know, where stuff can be accessed to the very core,
to all level depths, such that nothing would really be inaccessible, so that
I wouldn't need to go search for how something is made by looking on a 3rd
party site or doc file just because there's no tool/way to allow me the kind
of accessibility I know I want, sort of like going around it and hoping the
doc/src is in sync with the binary I'm just running (just an example), but
rather bin+src+doc would be one, and even allow me to change them on the fly
with changes propagating to all dependencies... erm, yeah feel free to stop
me from blabbing xD

Peace off :)

On Sat, Jul 23, 2011 at 3:16 PM, Michael Hunger 
michael.hun...@neotechnology.com wrote:

 Perhaps you can catch up with Niels Hoogeven who's currently implementing
 Collection data structures, B- and R-Trees to be transparently mapped into
 graph structures at:

 

Re: [Neo4j] shortestPath slower than it could be

2011-07-23 Thread Mattias Persson
Hi John,

the algorithm is written to dodge these kinds of pitfalls. Maybe there's
some issue with the implementation, but in principal it should make no
difference. I'll look at it when I get the time (I wrote that
implementation).

2011/7/23 John cyuczieekc cyuczie...@gmail.com

 Hey guys, me bugging you again :)

 (This whole thing is kind of based on the lack of being able to get the
 number of relationships a node has)

  If I have two nodes, and the first one has 1 million outgoing
 relationships
 of the type X to 1 million unique/different nodes,
 and the second node has 10 incoming relationships of type X (same type) of
 which one is from the first node,
 then using GraphAlgoFactory.shortestPath  (or suggest a better way?)
  How can I tell neo4j to iterate the search on the second node's incoming
 rels simply because it has 10 relationships instead of 1 million, in order
 to check if each relationship is in the form of firstNode--secondNode ?

 For the case when first node has 100,000 relationships and second node has
 10,
 it takes *1.7 seconds* for shortestPath to find the only one link between
 them using:

 final PathFinderPath finder = GraphAlgoFactory.shortestPath(
 Traversal.expanderForTypes( rel, Direction.OUTGOING  ), 1 );
 final Path foundPath = finder.findSinglePath( *one, two* );

 I can put Direction.*BOTH *and get the same amount of time
 *Path from one to two: (one)--(two) timedelta=1,862,726,634 ns*

 *BUT*, get this: if I swap the nodes:
 finder.findSinglePath(* two, one*);
 and i use either Direction.INCOMING or Direction.*BOTH  *(which makes sense
 for the second node ,right) then I get *20ms* the time until it finishes...
 *Path from one to two: (two)--(one) timedelta=20,830,111 ns*

 (both cases are without data being priorly cached)

 I was expecting it to act like this: (but only when using Direction.BOTH)
  see which node has the least number of relationships and iterate on those,
 but this would work if findSinglePath would be made for depth 1 (aka
 particular case), but as I read Tries to find a single path between
 startand
 end nodes. then it makes sense to me why it works like it does... that is,
 iterate on relationships from start node, rather than from end node... but
 I'm not sure if it would *not *make sense to iterate on the end node
 instead
 of start node, when knowing that end node has less relationships, for make
 the search faster (well at least if depth is one) - I didn't look into how
 neo4j actually does stuff yet :D

 anyway, it's fairly clear to me that I could make a simple wrapper method
 to
 make this kind of search faster, *IF* I had the ability to know how many
 relationships each node has, so I can call findSinglePath  with the first
 param being the node with the least relationship count :) But as I
 understood it, it's not possible to find how many rels a node has... gimme
 feat! :)) [by not possible I mean, without having to iterate thru all and
 count them, which would make the use case here obsolete]

 PS: clearly all the text I wrote here would benefit from being represented
 by a graph, just think about all those grouping with autohiding the ie.
 []
 and all kinds of stuff... heh
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user




-- 
Mattias Persson, [matt...@neotechnology.com]
Hacker, Neo Technology
www.neotechnology.com
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


[Neo4j] shortestPath slower than it could be

2011-07-22 Thread John cyuczieekc
Hey guys, me bugging you again :)

(This whole thing is kind of based on the lack of being able to get the
number of relationships a node has)

 If I have two nodes, and the first one has 1 million outgoing relationships
of the type X to 1 million unique/different nodes,
and the second node has 10 incoming relationships of type X (same type) of
which one is from the first node,
then using GraphAlgoFactory.shortestPath  (or suggest a better way?)
 How can I tell neo4j to iterate the search on the second node's incoming
rels simply because it has 10 relationships instead of 1 million, in order
to check if each relationship is in the form of firstNode--secondNode ?

For the case when first node has 100,000 relationships and second node has
10,
it takes *1.7 seconds* for shortestPath to find the only one link between
them using:

final PathFinderPath finder = GraphAlgoFactory.shortestPath(
Traversal.expanderForTypes( rel, Direction.OUTGOING  ), 1 );
final Path foundPath = finder.findSinglePath( *one, two* );

I can put Direction.*BOTH *and get the same amount of time
*Path from one to two: (one)--(two) timedelta=1,862,726,634 ns*

*BUT*, get this: if I swap the nodes:
finder.findSinglePath(* two, one*);
and i use either Direction.INCOMING or Direction.*BOTH  *(which makes sense
for the second node ,right) then I get *20ms* the time until it finishes...
*Path from one to two: (two)--(one) timedelta=20,830,111 ns*

(both cases are without data being priorly cached)

I was expecting it to act like this: (but only when using Direction.BOTH)
  see which node has the least number of relationships and iterate on those,
but this would work if findSinglePath would be made for depth 1 (aka
particular case), but as I read Tries to find a single path between startand
end nodes. then it makes sense to me why it works like it does... that is,
iterate on relationships from start node, rather than from end node... but
I'm not sure if it would *not *make sense to iterate on the end node instead
of start node, when knowing that end node has less relationships, for make
the search faster (well at least if depth is one) - I didn't look into how
neo4j actually does stuff yet :D

anyway, it's fairly clear to me that I could make a simple wrapper method to
make this kind of search faster, *IF* I had the ability to know how many
relationships each node has, so I can call findSinglePath  with the first
param being the node with the least relationship count :) But as I
understood it, it's not possible to find how many rels a node has... gimme
feat! :)) [by not possible I mean, without having to iterate thru all and
count them, which would make the use case here obsolete]

PS: clearly all the text I wrote here would benefit from being represented
by a graph, just think about all those grouping with autohiding the ie. []
and all kinds of stuff... heh
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user