but, even when `one` has 1 million outgoing rels and `two` has 10 or 11 incoming rels
cool then ;) I was trying to say the same thing ... Have a nice day, John. On Sun, Jul 24, 2011 at 4:10 PM, Mattias Persson <[email protected]>wrote: > What I'm saying is that: > > one --> two (one,two) for OUTGOING > > and > > two <-- one (two,one) for INCOMING > > should yield the same timing. > > 2011/7/24 John cyuczieekc <[email protected]> > > > Thanks Mattias. The way I understand what you said is, that swapping > `one` > > and `two` in *finder.findSinglePath( one, two );* should yield the same > > timing when Direction.BOTH is used; this would be great. But I don't see > > how > > this is possible (due to not knowing how it's stored too) unless you know > > how many rels each node has, OR even better storage is using BTree-s (?!) > > Btw, is it possible/practical that neo4j could store IDs in BTrees ? > > (someone said that for rels it's a double linked list instead- I'll need > to > > recheck) > > > > irrelevant stuff follows (ie. don't read) > > ---- > > Until then , I lame-tested something (with both neo4j and berkeleydb): > > a node `one` having 1mil rels to 1million unique nodes, and one more to a > > node `two`, and node `two` having 10 incoming rels from other unique > nodes, > > +1 the one rel which was already from `one`, trying shortestPath between > > `one` and `two` (not `two` and `one`) with Direction.BOTH > > output: > > just created 1,000,000 rels, took=58,381 ms (1) > > Path from one to two: (one)-->(two) 121,310 ms (2) > > Path from one to two: (one)-->(two) timedelta=1,066 ms > > Path from one to two: (one)-->(two) timedelta=795 ms > > Path from one to two: (one)-->(two) timedelta=772 ms > > > > (1) and (2) happened in the same transaction (ie. before tx.finish()), > the > > others after transaction finished (an in no new transaction - since they > > were only reads) - this also means some caching must've happened from > > before. > > Because it happened in same transaction, adding 1 mil relationships is > > slower than adding them in bursts of x, I am aware of this (and I'll try > to > > make a bench with that). For ie. 100k nodes neo4j is actually faster. > > I even got this once (but in another modified bench): > > first time creating the relationships... > > just created 1,000,000 rels, took=49,307 ms (cpu was 100% here, instead > of > > the usual 77% limit) > > Exception in thread "main" java.lang.OutOfMemoryError: Java heap space > > Environment is about to shut down the neo4j database > > Exception in thread "Thread-0" java.lang.OutOfMemoryError: GC overhead > > limit > > exceeded > > at java.util.logging.LogManager.reset(LogManager.java:835) > > at java.util.logging.LogManager$Cleaner.run(LogManager.java:240) > > Environment shutting down complete 23,962 ms > > > > Exception: java.lang.OutOfMemoryError thrown from the > > UncaughtExceptionHandler in thread "main" > > > > > > this is the output when running cold, (ie. the data already existed): > > search started > > Path from one to two: (one)-->(two) timedelta=7,934 ms > > search started > > Path from one to two: (one)-->(two) timedelta=2,027 ms > > search started > > Path from one to two: (one)-->(two) timedelta=875 ms > > Environment is about to shut down the neo4j database > > Environment shutting down complete 2,971 ms > > > > and the exact same thing when using berkeleydb (not using transactions > > though): > > just created 1,000,000 rels, took=29,418 ms > > Path from one to two: true 0 ms > > Path from one to two: true 0 ms > > Path from one to two: true 0 ms > > class org.bdb.BerkEnv$1 shutting down complete 0 ms (most was 10 ms > here) > > > > and when cold: > > Path from one to two: true 10 ms > > Path from one to two: true 0 ms > > Path from one to two: true 0 ms > > class org.bdb.BerkEnv$1 shutting down complete 0 ms > > (the most was 10ms on first(sometimes 0ms, 10ms, 0ms), least was 0ms on > > all) > > > > in bdb, this is actually how bdb does the search (I don't need to make > sure > > I iterate or something similar, on the node `two` which has least > incoming > > rels), in fact I execute a search on both nodes, one after the other and > > make sure one->two and two<-one (since I am using two databases), anyway, > > by > > using db.getSearchBoth(...) the search is done *that *fast. I hope to see > > this kind of fastness with neo4j too xD (unless I'm doing it wrong) > > > > if I put neo4j to find shortest path on incoming on `two` then: > > search started > > Path from one to two: (two)<--(one) timedelta=20 ms > > search started > > Path from one to two: (two)<--(one) timedelta=0 ms > > search started > > Path from one to two: (two)<--(one) timedelta=0 ms > > Environment is about to shut down the neo4j database > > Environment shutting down complete 2,590 ms > > (most was 20ms, least was 20ms on first) > > > > TestLinkage.java (both progs) were used from here: > > > > > https://github.com/13th-floor/neo4john/commit/a9f4b274de1d6c9ec9f1ea4a338b5c42325f19a4 > > > > -------------- > > > > Here's a lame benchmark for neo4j when I added another node `three` which > > is > > the first rel `one`->`three`, then added 1mil `one`->(random nodes) , and > > then(as before) `one`->`two` > > `three` has no other relationships > > Also, creating nodes 10k per transaction until 1mil are reached. > > > > first time creating the relationships... > > just created 1,000,000 rels, took=28,889 ms (btw my cpu is limited to > 77% > > of its normal speed; usually!) > > closed transaction > > search started > > (one)-->(three) 3,234 ms > > search started > > (one)-->(two) 661 ms > > search started > > (three)<--(one) 10 ms > > search started > > (two)<--(one) 0 ms > > search started > > (one)-->(three) 631 ms > > search started > > (one)-->(two) 913 ms > > search started > > (three)<--(one) 0 ms > > search started > > (two)<--(one) 0 ms > > search started > > (one)-->(three) 610 ms > > search started > > (one)-->(two) 773 ms > > search started > > (three)<--(one) 0 ms > > search started > > (two)<--(one) 0 ms > > Environment is about to shut down the neo4j database > > Environment shutting down complete 3,801 ms > > > > and on cold run: > > search started > > (one)-->(three) 8,454 ms > > search started > > (one)-->(two) 885 ms > > search started > > (three)<--(one) 0 ms > > search started > > (two)<--(one) 0 ms > > search started > > (one)-->(three) 752 ms > > search started > > (one)-->(two) 755 ms > > search started > > (three)<--(one) 0 ms > > search started > > (two)<--(one) 0 ms > > search started > > (one)-->(three) 804 ms > > search started > > (one)-->(two) 621 ms > > search started > > (three)<--(one) 0 ms > > search started > > (two)<--(one) 0 ms > > Environment is about to shut down the neo4j database > > Environment shutting down complete 3,141 ms > > ===== > > and same thing in berkeleydb (no transactions tho): > > first time creating the relationships... > > just created 1,000,000 rels, took=29,658 ms > > Path from one to three: true 10 ms > > Path from one to two: true 0 ms > > Path from three to one: false 0 ms > > Path from two to one: false 0 ms > > Path from one to three: true 0 ms > > Path from one to two: true 0 ms > > Path from three to one: false 0 ms > > Path from two to one: false 0 ms > > Path from one to three: true 0 ms > > Path from one to two: true 0 ms > > Path from three to one: false 0 ms > > Path from two to one: false 0 ms > > class org.bdb.BerkEnv$1 shutting down complete 152 ms > > > > and on cold: > > Path from one to three: true 10 ms > > Path from one to two: true 10 ms > > Path from three to one: false 0 ms > > Path from two to one: false 0 ms > > Path from one to three: true 0 ms > > Path from one to two: true 0 ms > > Path from three to one: false 0 ms > > Path from two to one: false 0 ms > > Path from one to three: true 0 ms > > Path from one to two: true 0 ms > > Path from three to one: false 0 ms > > Path from two to one: false 0 ms > > class org.bdb.BerkEnv$1 shutting down complete 0 ms > > > > > > these progs are on: > > > > > https://github.com/13th-floor/neo4john/commit/d2fd5b27dcde5560e6ff980fe9320aedc4421ab7 > > > > Cheerios, > > John. > > > > Song of the day: Bic Runga - She Left On A Monday > > PS: asserts were enable ie. vm arg "-ea" (no quotes) > > > > On Sat, Jul 23, 2011 at 4:31 PM, Mattias Persson > > <[email protected]>wrote: > > > > > Hi John, > > > > > > the algorithm is written to dodge these kinds of pitfalls. Maybe > there's > > > some issue with the implementation, but in principal it should make no > > > difference. I'll look at it when I get the time (I wrote that > > > implementation). > > > > > > 2011/7/23 John cyuczieekc <[email protected]> > > > > > > > Hey guys, me bugging you again :) > > > > > > > > (This whole thing is kind of based on the lack of being able to get > the > > > > number of relationships a node has) > > > > > > > > If I have two nodes, and the first one has 1 million outgoing > > > > relationships > > > > of the type X to 1 million unique/different nodes, > > > > and the second node has 10 incoming relationships of type X (same > type) > > > of > > > > which one is from the first node, > > > > then using GraphAlgoFactory.shortestPath (or suggest a better way?) > > > > How can I tell neo4j to iterate the search on the second node's > > incoming > > > > rels simply because it has 10 relationships instead of 1 million, in > > > order > > > > to check if each relationship is in the form of > firstNode-->secondNode > > ? > > > > > > > > For the case when first node has 100,000 relationships and second > node > > > has > > > > 10, > > > > it takes *1.7 seconds* for shortestPath to find the only one link > > between > > > > them using: > > > > > > > > final PathFinder<Path> finder = GraphAlgoFactory.shortestPath( > > > > Traversal.expanderForTypes( rel, Direction.OUTGOING ), 1 ); > > > > final Path foundPath = finder.findSinglePath( *one, two* ); > > > > > > > > I can put Direction.*BOTH *and get the same amount of time > > > > *Path from one to two: (one)-->(two) timedelta=1,862,726,634 ns* > > > > > > > > *BUT*, get this: if I swap the nodes: > > > > finder.findSinglePath(* two, one*); > > > > and i use either Direction.INCOMING or Direction.*BOTH *(which makes > > > sense > > > > for the second node ,right) then I get *20ms* the time until it > > > finishes... > > > > *Path from one to two: (two)<--(one) timedelta=20,830,111 ns* > > > > > > > > (both cases are without data being priorly cached) > > > > > > > > I was expecting it to act like this: (but only when using > > Direction.BOTH) > > > > see which node has the least number of relationships and iterate on > > > those, > > > > but this would work if findSinglePath would be made for depth 1 (aka > > > > particular case), but as I read "Tries to find a single path between > > > > startand > > > > end nodes." then it makes sense to me why it works like it does... > that > > > is, > > > > iterate on relationships from start node, rather than from end > node... > > > but > > > > I'm not sure if it would *not *make sense to iterate on the end node > > > > instead > > > > of start node, when knowing that end node has less relationships, for > > > make > > > > the search faster (well at least if depth is one) - I didn't look > into > > > how > > > > neo4j actually does stuff yet :D > > > > > > > > anyway, it's fairly clear to me that I could make a simple wrapper > > method > > > > to > > > > make this kind of search faster, *IF* I had the ability to know how > > many > > > > relationships each node has, so I can call findSinglePath with the > > first > > > > param being the node with the least relationship count :) But as I > > > > understood it, it's not possible to find how many rels a node has... > > > gimme > > > > feat! :)) [by not possible I mean, without having to iterate thru all > > and > > > > count them, which would make the use case here obsolete] > > > > > > > > PS: clearly all the text I wrote here would benefit from being > > > represented > > > > by a graph, just think about all those grouping with autohiding the > ie. > > > > "[]" > > > > and all kinds of stuff... heh > > > > _______________________________________________ > > > > Neo4j mailing list > > > > [email protected] > > > > https://lists.neo4j.org/mailman/listinfo/user > > > > > > > > > > > > > > > > -- > > > Mattias Persson, [[email protected]] > > > Hacker, Neo Technology > > > www.neotechnology.com > > > _______________________________________________ > > > Neo4j mailing list > > > [email protected] > > > https://lists.neo4j.org/mailman/listinfo/user > > > > > _______________________________________________ > > Neo4j mailing list > > [email protected] > > https://lists.neo4j.org/mailman/listinfo/user > > > > > > -- > Mattias Persson, [[email protected]] > Hacker, Neo Technology > www.neotechnology.com > _______________________________________________ > Neo4j mailing list > [email protected] > https://lists.neo4j.org/mailman/listinfo/user > _______________________________________________ Neo4j mailing list [email protected] https://lists.neo4j.org/mailman/listinfo/user

