Niels could you perhaps write up a blog post detailing the usage (also for your own scenario and how that solution would compare to the naive supernodes with just millions of relationships.
Also I'd like to see a performance comparision of both approaches. Thanks so much for your work Michael Am 07.07.2011 um 22:24 schrieb Niels Hoogeveen: > > I am glad to see a solution will be provided at the core level. > Today, I pushed IndexedRelationships and IndexedRelationshipExpander to Git, > see: > https://github.com/peterneubauer/graph-collections/tree/master/src/main/java/org/neo4j/collections/indexedrelationship > This provides a solution to the issue, but is certainly not as fast as a > solution in core would be. > However, it does solve my issues and as a bonus, indexed relationships can be > traversed in sorted order,this is especially pleasant, since I usually want > to know only the recent additions of dense relationships. > Niels > > >> Date: Thu, 7 Jul 2011 21:37:26 +0200 >> From: matt...@neotechnology.com >> To: user@lists.neo4j.org >> Subject: Re: [Neo4j] Performance issue on nodes with lots of relationships >> >> 2011/7/7 Agelos Pikoulas <agelos.pikou...@gmail.com> >> >>> I think its the same problem pattern that been in discussion lately with >>> dense nodes or supernodes (check >>> http://lists.neo4j.org/pipermail/user/2011-July/009832.html). >>> >>> Michael Hunger has provided a quick solution to visiting the *few* >>> RelationshipTypes on a node that has *millions* of others, utilizing a >>> RelationshipExpander with an Index (check >>> http://paste.pocoo.org/show/traM5oY1ng7dRQAaf1oV/) >>> >>> Ideally this would be abstracted & implemented in the core distribution so >>> that all API's (including Cypher & tinkerpop Pipes/Gremlin) can use it >>> efficiently... >>> >> >> Yes, I'm positive that something will be done on a core level to make >> getting relationships of a specific type regardless of the total number of >> relationships fast. In the foreseeable future hopefully. >> >>> >>> Agelos >>> >>> On Thu, Jul 7, 2011 at 3:16 PM, Andrew White <li...@andrewewhite.net> >>> wrote: >>> >>>> I use the shell as-is, but the messages.log is reporting... >>>> >>>> Physical mem: 3962MB, Heap size: 881MB >>>> >>>> My point is that if you ignore caching altogether, why did one run take >>>> 17x longer with only 2.4x more data? Considering this is a rather >>>> iterative algorithm, I don't see why you would even read a node or >>>> relationship more than once and thus a cache shouldn't matter at all. >>>> >>>> In this particular case, I can't imagine taking 9+ minutes to read a >>>> mear 3.4M nodes (that's only 6k nodes per sec). Perhaps this is just an >>>> artifact of Cypher in which it is building a set of Rs before applying >>>> `count` rather than making count accept an iterable stream. >>>> >>>> Andrew >>>> >>>> On 07/06/2011 11:33 PM, David Montag wrote: >>>>> Hi Andrew, >>>>> >>>>> How big is your configured Java heap? It could be that all the nodes >>> and >>>>> relationships don't fit into the cache. >>>>> >>>>> David >>>>> >>>>> On Wed, Jul 6, 2011 at 8:03 PM, Andrew White<li...@andrewewhite.net> >>>> wrote: >>>>> >>>>>> Here is some interesting stats to consider. First, I split my nodes >>> into >>>>>> two groups, one node with 1.4M children and the other with 3.4M >>>>>> children. While I do see some cache warm-up improvements, the >>>>>> transversal doesn't seem to scale linearly; ie the larger super-node >>> has >>>>>> 2.4x more children but takes 17x longer to transverse. >>>>>> >>>>>> neo4j-sh (0)$ start n=(1) match (n)-[r]-(x) return count(r) >>>>>> +----------+ >>>>>> | count(r) | >>>>>> +----------+ >>>>>> | 1468486 | >>>>>> +----------+ >>>>>> 1 rows, 25724 ms >>>>>> neo4j-sh (0)$ start n=(1) match (n)-[r]-(x) return count(r) >>>>>> +----------+ >>>>>> | count(r) | >>>>>> +----------+ >>>>>> | 1468486 | >>>>>> +----------+ >>>>>> 1 rows, 19763 ms >>>>>> >>>>>> neo4j-sh (0)$ start n=(2) match (n)-[r]-(x) return count(r) >>>>>> +----------+ >>>>>> | count(r) | >>>>>> +----------+ >>>>>> | 3472174 | >>>>>> +----------+ >>>>>> 1 rows, 565448 ms >>>>>> neo4j-sh (0)$ start n=(2) match (n)-[r]-(x) return count(r) >>>>>> +----------+ >>>>>> | count(r) | >>>>>> +----------+ >>>>>> | 3472174 | >>>>>> +----------+ >>>>>> 1 rows, 337975 ms >>>>>> >>>>>> Any ideas on this? >>>>>> Andrew >>>>>> >>>>>> On 07/06/2011 09:55 AM, Peter Neubauer wrote: >>>>>>> Andrew, >>>>>>> if you upgrade to 1.4.M06, your shell should be able to do Cypher in >>>>>>> order to count the relationships of a node, not returning them: >>>>>>> >>>>>>> start n=(1) match (n)-[r]-(x) return count(r) >>>>>>> >>>>>>> and try that several times to see if cold caches are initially >>> slowing >>>>>>> down things. >>>>>>> >>>>>>> or something along these lines. In the LS and Neoclipse the output >>> and >>>>>>> visualization will be slow for that amount of data. >>>>>>> >>>>>>> Cheers, >>>>>>> >>>>>>> /peter neubauer >>>>>>> >>>>>>> GTalk: neubauer.peter >>>>>>> Skype peter.neubauer >>>>>>> Phone +46 704 106975 >>>>>>> LinkedIn http://www.linkedin.com/in/neubauer >>>>>>> Twitter http://twitter.com/peterneubauer >>>>>>> >>>>>>> http://www.neo4j.org - Your high performance graph >>>>>> database. >>>>>>> http://startupbootcamp.org/ - Ă–resund - Innovation happens HERE. >>>>>>> http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing >>>> party. >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Wed, Jul 6, 2011 at 4:15 PM, Andrew White<li...@andrewewhite.net> >>>>>> wrote: >>>>>>>> I have a graph with roughly 10M nodes. Some of these nodes are >>> highly >>>>>>>> connected to other nodes. For example I may have a single node with >>>> 1M+ >>>>>>>> relationships. A good analogy is a population that has a "lives-in" >>>>>>>> relationship to a state. Now the problem... >>>>>>>> >>>>>>>> Both neoclipse or neo4j-shell are terribly slow when working with >>>> these >>>>>>>> nodes. In the shell I would expect a `cd<node-id>` to be very fast, >>>>>>>> much like selecting via a rowid in a standard DB. Instead, I usually >>>> see >>>>>>>> several seconds delay. Doing a `ls` takes so long that I usually >>> have >>>> to >>>>>>>> just kill the process. In fact `ls` never outputs anything which is >>>> odd >>>>>>>> since I would expect it to "stream" the output as it found it. I >>> have >>>>>>>> very similar performance issues with neoclipse. >>>>>>>> >>>>>>>> I am using Neo4j 1.3 embedded on Ubuntu 10.04 with 4GB of RAM. >>>>>>>> Disclaimer, I am new to Neo4j. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Andrew >>>>>>>> _______________________________________________ >>>>>>>> Neo4j mailing list >>>>>>>> User@lists.neo4j.org >>>>>>>> https://lists.neo4j.org/mailman/listinfo/user >>>>>>>> >>>>>>> _______________________________________________ >>>>>>> Neo4j mailing list >>>>>>> User@lists.neo4j.org >>>>>>> https://lists.neo4j.org/mailman/listinfo/user >>>>>>> >>>>>> _______________________________________________ >>>>>> Neo4j mailing list >>>>>> User@lists.neo4j.org >>>>>> https://lists.neo4j.org/mailman/listinfo/user >>>>>> >>>>> >>>>> >>>> >>>> _______________________________________________ >>>> Neo4j mailing list >>>> User@lists.neo4j.org >>>> https://lists.neo4j.org/mailman/listinfo/user >>>> >>> _______________________________________________ >>> Neo4j mailing list >>> User@lists.neo4j.org >>> https://lists.neo4j.org/mailman/listinfo/user >>> >> >> >> >> -- >> Mattias Persson, [matt...@neotechnology.com] >> Hacker, Neo Technology >> www.neotechnology.com >> _______________________________________________ >> Neo4j mailing list >> User@lists.neo4j.org >> https://lists.neo4j.org/mailman/listinfo/user > > _______________________________________________ > Neo4j mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user _______________________________________________ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user