On Mon, Jun 22, 2009 at 10:21 AM, Johan Svensson<[email protected]> wrote:
> Hi Macarse,
>
> I had a look at this and here are some comments:
>
> Running your code with default parameters on a 1.6 JVM about 50% of
> the time will be spent in garbage collection. Giving the JVM "-Xms128M
> -Xmx128M" will cut the variation of the query times from 2500-3000 ms
> down to 1300-1600 ms (removing most of the GC time). I am sure it
> would be possible to get the times down even further since this is to
> much of a "micro benchmark" for Hotspot to kick in.
>

Nice.  When trying more than 100.000 entries I got a heap exception.
This should solve it.
http://www.ln.go.cn/resin-doc/performance/jvm-tuning.xtp

> The use of String.format to format the Customers/Order/Item objects to
> a string similar to the SQL select output also takes some time. In
> most use-cases there is no need to work with your data in a string
> representation.

True

>
> You also run everything in the same transaction. I tried getting some
> information on transaction and what isolation level the insert/query
> you where running on the relational database was using but could not
> see any info about it. If you split the create/insert part in one
> transaction and run the query in a separate transaction the query
> should drop down to about 350ms (that is with 128M heap and no
> String.format).

I thought that was a benefit for neo4j in the benchmark.
Why separating inserts and queries in two transactions improves speed?

>
> Finally 80ms for the select is quite impressive, almost too good to be
> true. I would say just transforming an in memory representation of
> Item/Order/Customer objects to a string representation will take
> longer. Was the full result set from the SQL query traversed? Also
> many relational databases keep an "exact query match cache" meaning if
> you run the exact same query twice the second run will not perform any
> real work. This can be avoided by running an update on the tables
> participating in the query.
>

I took care of it by creating the database and droping it every time I
run the benchmark.
Took like 2 mins to create it.

Unless mysql saves queries from dropped db but it doesn't make sense.


> Regards,
> Johan


Thanks for you excellent mail and the time you took to test everything, Johan!


>
> On Sun, Jun 21, 2009 at 6:04 AM, Macarse<[email protected]> wrote:
>> I gave up my count(*) test and try something that hurts relational databases!
>>
>> //Data inserted
>> So I have 4 persons (Charly, maxi, juani, laura) who buys 2 movies
>> each in a for loop.
>> That for loop runs 1k times.
>>
>> //SQL
>> Database schema
>> http://code.google.com/p/grafos2009/source/browse/trunk/mysql/video/create.sql
>>
>> Inserts are done with a python script that creates the inserts:
>> http://code.google.com/p/grafos2009/source/browse/trunk/mysql/video/test.py
>>
>> Query to be done:
>> http://code.google.com/p/grafos2009/source/browse/trunk/mysql/video/queries.sql
>>
>>
>> //neo4j
>> http://code.google.com/p/grafos2009/source/browse/trunk/src/main/java/org/seminario/Seminario.java
>>
>>
>> SQL big select:
>> 8000 rows in set (0.08 sec)
>>
>> neo4j 8000 lines:
>> 2950 ms.
>>
>> Then I tried sql with 10k loop, and gave me (0.94 sec).
>>
>> I can't beat sql :(
>> Comments are welcomed!
>>
>>
>> PS: I know the select in the db is not the same as I do with neo4j,
>> but calls my attention that big difference.
> _______________________________________________
> Neo mailing list
> [email protected]
> https://lists.neo4j.org/mailman/listinfo/user
>
_______________________________________________
Neo mailing list
[email protected]
https://lists.neo4j.org/mailman/listinfo/user

Reply via email to