I use the shell as-is, but the messages.log is reporting...

     Physical mem: 3962MB, Heap size: 881MB

My point is that if you ignore caching altogether, why did one run take 
17x longer with only 2.4x more data? Considering this is a rather 
iterative algorithm, I don't see why you would even read a node or 
relationship more than once and thus a cache shouldn't matter at all.

In this particular case, I can't imagine taking 9+ minutes to read a 
mear 3.4M nodes (that's only 6k nodes per sec). Perhaps this is just an 
artifact of Cypher in which it is building a set of Rs before applying 
`count` rather than making count accept an iterable stream.

Andrew

On 07/06/2011 11:33 PM, David Montag wrote:
> Hi Andrew,
>
> How big is your configured Java heap? It could be that all the nodes and
> relationships don't fit into the cache.
>
> David
>
> On Wed, Jul 6, 2011 at 8:03 PM, Andrew White<[email protected]>  wrote:
>
>> Here is some interesting stats to consider. First, I split my nodes into
>> two groups, one node with 1.4M children and the other with 3.4M
>> children. While I do see some cache warm-up improvements, the
>> transversal doesn't seem to scale linearly; ie the larger super-node has
>> 2.4x more children but takes 17x longer to transverse.
>>
>> neo4j-sh (0)$ start n=(1) match (n)-[r]-(x) return count(r)
>> +----------+
>> | count(r) |
>> +----------+
>> | 1468486  |
>> +----------+
>> 1 rows, 25724 ms
>> neo4j-sh (0)$ start n=(1) match (n)-[r]-(x) return count(r)
>> +----------+
>> | count(r) |
>> +----------+
>> | 1468486  |
>> +----------+
>> 1 rows, 19763 ms
>>
>> neo4j-sh (0)$ start n=(2) match (n)-[r]-(x) return count(r)
>> +----------+
>> | count(r) |
>> +----------+
>> | 3472174  |
>> +----------+
>> 1 rows, 565448 ms
>> neo4j-sh (0)$ start n=(2) match (n)-[r]-(x) return count(r)
>> +----------+
>> | count(r) |
>> +----------+
>> | 3472174  |
>> +----------+
>> 1 rows, 337975 ms
>>
>> Any ideas on this?
>> Andrew
>>
>> On 07/06/2011 09:55 AM, Peter Neubauer wrote:
>>> Andrew,
>>> if you upgrade to 1.4.M06, your shell should be able to do Cypher in
>>> order to count the relationships of a node, not returning them:
>>>
>>> start n=(1) match (n)-[r]-(x) return count(r)
>>>
>>> and try that several times to see if cold caches are initially slowing
>>> down things.
>>>
>>> or something along these lines. In the LS and Neoclipse the output and
>>> visualization will be slow for that amount of data.
>>>
>>> Cheers,
>>>
>>> /peter neubauer
>>>
>>> GTalk:      neubauer.peter
>>> Skype       peter.neubauer
>>> Phone       +46 704 106975
>>> LinkedIn   http://www.linkedin.com/in/neubauer
>>> Twitter      http://twitter.com/peterneubauer
>>>
>>> http://www.neo4j.org               - Your high performance graph
>> database.
>>> http://startupbootcamp.org/    - Ă–resund - Innovation happens HERE.
>>> http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party.
>>>
>>>
>>>
>>> On Wed, Jul 6, 2011 at 4:15 PM, Andrew White<[email protected]>
>>   wrote:
>>>> I have a graph with roughly 10M nodes. Some of these nodes are highly
>>>> connected to other nodes. For example I may have a single node with 1M+
>>>> relationships. A good analogy is a population that has a  "lives-in"
>>>> relationship to a state. Now the problem...
>>>>
>>>> Both neoclipse or neo4j-shell are terribly slow when working with these
>>>> nodes. In the shell I would expect a `cd<node-id>` to be very fast,
>>>> much like selecting via a rowid in a standard DB. Instead, I usually see
>>>> several seconds delay. Doing a `ls` takes so long that I usually have to
>>>> just kill the process. In fact `ls` never outputs anything which is odd
>>>> since I would expect it to "stream" the output as it found it. I have
>>>> very similar performance issues with neoclipse.
>>>>
>>>> I am using Neo4j 1.3 embedded on Ubuntu 10.04 with 4GB of RAM.
>>>> Disclaimer, I am new to Neo4j.
>>>>
>>>> Thanks,
>>>> Andrew
>>>> _______________________________________________
>>>> Neo4j mailing list
>>>> [email protected]
>>>> https://lists.neo4j.org/mailman/listinfo/user
>>>>
>>> _______________________________________________
>>> Neo4j mailing list
>>> [email protected]
>>> https://lists.neo4j.org/mailman/listinfo/user
>>>
>> _______________________________________________
>> Neo4j mailing list
>> [email protected]
>> https://lists.neo4j.org/mailman/listinfo/user
>>
>
>

_______________________________________________
Neo4j mailing list
[email protected]
https://lists.neo4j.org/mailman/listinfo/user

Reply via email to