On Mon, Jun 22, 2009 at 10:21 AM, Johan Svensson<[email protected]> wrote: > Hi Macarse, > > I had a look at this and here are some comments: > > Running your code with default parameters on a 1.6 JVM about 50% of > the time will be spent in garbage collection. Giving the JVM "-Xms128M > -Xmx128M" will cut the variation of the query times from 2500-3000 ms > down to 1300-1600 ms (removing most of the GC time). I am sure it > would be possible to get the times down even further since this is to > much of a "micro benchmark" for Hotspot to kick in. >
Nice. When trying more than 100.000 entries I got a heap exception. This should solve it. http://www.ln.go.cn/resin-doc/performance/jvm-tuning.xtp > The use of String.format to format the Customers/Order/Item objects to > a string similar to the SQL select output also takes some time. In > most use-cases there is no need to work with your data in a string > representation. True > > You also run everything in the same transaction. I tried getting some > information on transaction and what isolation level the insert/query > you where running on the relational database was using but could not > see any info about it. If you split the create/insert part in one > transaction and run the query in a separate transaction the query > should drop down to about 350ms (that is with 128M heap and no > String.format). I thought that was a benefit for neo4j in the benchmark. Why separating inserts and queries in two transactions improves speed? > > Finally 80ms for the select is quite impressive, almost too good to be > true. I would say just transforming an in memory representation of > Item/Order/Customer objects to a string representation will take > longer. Was the full result set from the SQL query traversed? Also > many relational databases keep an "exact query match cache" meaning if > you run the exact same query twice the second run will not perform any > real work. This can be avoided by running an update on the tables > participating in the query. > I took care of it by creating the database and droping it every time I run the benchmark. Took like 2 mins to create it. Unless mysql saves queries from dropped db but it doesn't make sense. > Regards, > Johan Thanks for you excellent mail and the time you took to test everything, Johan! > > On Sun, Jun 21, 2009 at 6:04 AM, Macarse<[email protected]> wrote: >> I gave up my count(*) test and try something that hurts relational databases! >> >> //Data inserted >> So I have 4 persons (Charly, maxi, juani, laura) who buys 2 movies >> each in a for loop. >> That for loop runs 1k times. >> >> //SQL >> Database schema >> http://code.google.com/p/grafos2009/source/browse/trunk/mysql/video/create.sql >> >> Inserts are done with a python script that creates the inserts: >> http://code.google.com/p/grafos2009/source/browse/trunk/mysql/video/test.py >> >> Query to be done: >> http://code.google.com/p/grafos2009/source/browse/trunk/mysql/video/queries.sql >> >> >> //neo4j >> http://code.google.com/p/grafos2009/source/browse/trunk/src/main/java/org/seminario/Seminario.java >> >> >> SQL big select: >> 8000 rows in set (0.08 sec) >> >> neo4j 8000 lines: >> 2950 ms. >> >> Then I tried sql with 10k loop, and gave me (0.94 sec). >> >> I can't beat sql :( >> Comments are welcomed! >> >> >> PS: I know the select in the db is not the same as I do with neo4j, >> but calls my attention that big difference. > _______________________________________________ > Neo mailing list > [email protected] > https://lists.neo4j.org/mailman/listinfo/user > _______________________________________________ Neo mailing list [email protected] https://lists.neo4j.org/mailman/listinfo/user

