I think its the same problem pattern that been in discussion lately with dense nodes or supernodes (check http://lists.neo4j.org/pipermail/user/2011-July/009832.html).
Michael Hunger has provided a quick solution to visiting the *few* RelationshipTypes on a node that has *millions* of others, utilizing a RelationshipExpander with an Index (check http://paste.pocoo.org/show/traM5oY1ng7dRQAaf1oV/) Ideally this would be abstracted & implemented in the core distribution so that all API's (including Cypher & tinkerpop Pipes/Gremlin) can use it efficiently... Agelos On Thu, Jul 7, 2011 at 3:16 PM, Andrew White <[email protected]> wrote: > I use the shell as-is, but the messages.log is reporting... > > Physical mem: 3962MB, Heap size: 881MB > > My point is that if you ignore caching altogether, why did one run take > 17x longer with only 2.4x more data? Considering this is a rather > iterative algorithm, I don't see why you would even read a node or > relationship more than once and thus a cache shouldn't matter at all. > > In this particular case, I can't imagine taking 9+ minutes to read a > mear 3.4M nodes (that's only 6k nodes per sec). Perhaps this is just an > artifact of Cypher in which it is building a set of Rs before applying > `count` rather than making count accept an iterable stream. > > Andrew > > On 07/06/2011 11:33 PM, David Montag wrote: > > Hi Andrew, > > > > How big is your configured Java heap? It could be that all the nodes and > > relationships don't fit into the cache. > > > > David > > > > On Wed, Jul 6, 2011 at 8:03 PM, Andrew White<[email protected]> > wrote: > > > >> Here is some interesting stats to consider. First, I split my nodes into > >> two groups, one node with 1.4M children and the other with 3.4M > >> children. While I do see some cache warm-up improvements, the > >> transversal doesn't seem to scale linearly; ie the larger super-node has > >> 2.4x more children but takes 17x longer to transverse. > >> > >> neo4j-sh (0)$ start n=(1) match (n)-[r]-(x) return count(r) > >> +----------+ > >> | count(r) | > >> +----------+ > >> | 1468486 | > >> +----------+ > >> 1 rows, 25724 ms > >> neo4j-sh (0)$ start n=(1) match (n)-[r]-(x) return count(r) > >> +----------+ > >> | count(r) | > >> +----------+ > >> | 1468486 | > >> +----------+ > >> 1 rows, 19763 ms > >> > >> neo4j-sh (0)$ start n=(2) match (n)-[r]-(x) return count(r) > >> +----------+ > >> | count(r) | > >> +----------+ > >> | 3472174 | > >> +----------+ > >> 1 rows, 565448 ms > >> neo4j-sh (0)$ start n=(2) match (n)-[r]-(x) return count(r) > >> +----------+ > >> | count(r) | > >> +----------+ > >> | 3472174 | > >> +----------+ > >> 1 rows, 337975 ms > >> > >> Any ideas on this? > >> Andrew > >> > >> On 07/06/2011 09:55 AM, Peter Neubauer wrote: > >>> Andrew, > >>> if you upgrade to 1.4.M06, your shell should be able to do Cypher in > >>> order to count the relationships of a node, not returning them: > >>> > >>> start n=(1) match (n)-[r]-(x) return count(r) > >>> > >>> and try that several times to see if cold caches are initially slowing > >>> down things. > >>> > >>> or something along these lines. In the LS and Neoclipse the output and > >>> visualization will be slow for that amount of data. > >>> > >>> Cheers, > >>> > >>> /peter neubauer > >>> > >>> GTalk: neubauer.peter > >>> Skype peter.neubauer > >>> Phone +46 704 106975 > >>> LinkedIn http://www.linkedin.com/in/neubauer > >>> Twitter http://twitter.com/peterneubauer > >>> > >>> http://www.neo4j.org - Your high performance graph > >> database. > >>> http://startupbootcamp.org/ - Ă–resund - Innovation happens HERE. > >>> http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing > party. > >>> > >>> > >>> > >>> On Wed, Jul 6, 2011 at 4:15 PM, Andrew White<[email protected]> > >> wrote: > >>>> I have a graph with roughly 10M nodes. Some of these nodes are highly > >>>> connected to other nodes. For example I may have a single node with > 1M+ > >>>> relationships. A good analogy is a population that has a "lives-in" > >>>> relationship to a state. Now the problem... > >>>> > >>>> Both neoclipse or neo4j-shell are terribly slow when working with > these > >>>> nodes. In the shell I would expect a `cd<node-id>` to be very fast, > >>>> much like selecting via a rowid in a standard DB. Instead, I usually > see > >>>> several seconds delay. Doing a `ls` takes so long that I usually have > to > >>>> just kill the process. In fact `ls` never outputs anything which is > odd > >>>> since I would expect it to "stream" the output as it found it. I have > >>>> very similar performance issues with neoclipse. > >>>> > >>>> I am using Neo4j 1.3 embedded on Ubuntu 10.04 with 4GB of RAM. > >>>> Disclaimer, I am new to Neo4j. > >>>> > >>>> Thanks, > >>>> Andrew > >>>> _______________________________________________ > >>>> Neo4j mailing list > >>>> [email protected] > >>>> https://lists.neo4j.org/mailman/listinfo/user > >>>> > >>> _______________________________________________ > >>> Neo4j mailing list > >>> [email protected] > >>> https://lists.neo4j.org/mailman/listinfo/user > >>> > >> _______________________________________________ > >> Neo4j mailing list > >> [email protected] > >> https://lists.neo4j.org/mailman/listinfo/user > >> > > > > > > _______________________________________________ > Neo4j mailing list > [email protected] > https://lists.neo4j.org/mailman/listinfo/user > _______________________________________________ Neo4j mailing list [email protected] https://lists.neo4j.org/mailman/listinfo/user

