Re: [Neo4j] Neo4jPHP batch insert benchmarks
On Fri, Aug 26, 2011 at 2:46 PM, jadell josh.ad...@gmail.com wrote: Jim, Fair enough. For now, I'll just know not to try and make batches that big :-) My own use case is for the transaction safety rather than trying to create thousands of entities at once, so it doesn't effect me that much. I just wanted to have something more concrete to tell other users who might try. Thanks to all for helping me investigate! Update on this: I did a bit of hacking on this just now, and was able to improve the memory usage quite a bit. The major culprits was jacksons deserializer (or rather, that we use their mapping serializer), and using a list of strings to store and eventually aggregate results rather than StringBuilder (which is my fault, since it's my code, and I have no clue why I did it like that). I just pushed this to master. My benchmarks look like this, showing number of inserts, each insert being a node and a property: Old: --- 1000 : 1.340s 5000 : 3.013s 1 : 4.304s 5 : 18.120s 10 : OutOfMemory 50 : OutOfMemory New: --- 1000 : 1.546s 5000 : 3.116s 1 : 3.702s 5 : 17.183s 10 : 35.427s 50 : OutOfMemory This is on a JVM with 1GB heap. The OutOfMemory that we see in the new setup is because I didn't go all the way and implement streaming output, so that would be cool to try as well. /Jake -- Josh -- View this message in context: http://neo4j-community-discussions.438527.n3.nabble.com/Neo4jPHP-batch-insert-benchmarks-tp3282984p3286721.html Sent from the Neo4j Community Discussions mailing list archive at Nabble.com. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Jacob Hansson Phone: +46 (0) 763503395 Twitter: @jakewins ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Neo4jPHP batch insert benchmarks
Josh, it might be that the parsing of the JSON load is taking up increasingly much time when you get big batches. At least that is my suspicion. Also, that might be the reason for the heap problems - basically the String parsing is taking over :/ Do you have any means of verifying that? Cheers, /peter neubauer GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - Your high performance graph database. http://startupbootcamp.org/- Ă–resund - Innovation happens HERE. http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. On Thu, Aug 25, 2011 at 6:15 AM, jadell josh.ad...@gmail.com wrote: Hey all, I've been working on adding batch support to http://github.com/jadell/Neo4jPHP Neo4jPHP . Here are the results of my latest benchmarks. First column is the number of nodes being inserted, second column is the average in seconds over 5 runs to insert that many nodes in a single batch, third column is the average in seconds over 5 runs to insert that many nodes one at a time: #nodes batchsingle 100 0 100 0.2 0.4 250 0 1 500 0.8 2 10001.4 4 25006 10.6 500023.2 21.2 1 91.6 40.4 It seems like batches win out until right around 5000 nodes at a time. I've profiled my code, and it seems like the time spent in PHP is roughly equivalent for batch vs. single. All the time difference is spent in a curl_exec call, talking to or waiting to hear back from the server. I tried going up to 10 nodes. Single insert handled this just fine, but the server kept returning a 500 Java Heap space error on the batch, even with 512M max heap. Benchmark script can be found here: http://gist.github.com/1169394 http://gist.github.com/1169394 Benchmarks were run on an 4 x 2.3GHz core Intel i7, 4G RAM, running Ubuntu 10.10. Neo4j server was run with out-of-the-box settings in a VM runnning Ubuntu 10.10 with 1 dedicated core and 1G RAM. I hope this is of interest to anyone. I'd love to get some feedback from anyone using Neo4j from PHP, with Neo4jPHP or any other library. -- Josh Adell -- View this message in context: http://neo4j-community-discussions.438527.n3.nabble.com/Neo4jPHP-batch-insert-benchmarks-tp3282984p3282984.html Sent from the Neo4j Community Discussions mailing list archive at Nabble.com. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Neo4jPHP batch insert benchmarks
Hey Peter, I don't have any way of verifying on the server side, other than measuring the time it takes for curl_exec to return a response. On the client side I can see that PHP's json_encode/json_decode functions are taking less than .5% of the total run time, even with a batch size of 1. During one of my 10 node attempts, I printed out the server response of the 500 Heap space error. It seemed like the last method in the stack trace was dealing with a Deserializer class or method. I will try again and capture the stack trace output to post here. Thanks, -- Josh Adell -- View this message in context: http://neo4j-community-discussions.438527.n3.nabble.com/Neo4jPHP-batch-insert-benchmarks-tp3282984p3283926.html Sent from the Neo4j Community Discussions mailing list archive at Nabble.com. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Neo4jPHP batch insert benchmarks
The heap space stuff would make sense I think, because we currently deserialize and serialize in-place, keeping the whole thing in memory. Would be interesting to see if we could implement a setup that can stream the deserialization/serialization, getting rid of the memory overhead.. You said you are using out-of-the-box settings for the server, I don't remember off the top of my head what the default heap size is, but you might want to try giving it more RAM. I'm gonna guess that's where performance dies.. I'll have to look at what is proper HTTP behavior, but there should be a way we could start streaming back the response as it is being calculated, as long as we can come up with a good way of aborting if something fails.. Doing that would mean we don't have to keep a hundred thousand requests and responses in memory, which would completely change the performance situation. Big thanks for taking the time to put this together! /jake On Thu, Aug 25, 2011 at 3:52 PM, jadell josh.ad...@gmail.com wrote: Hey Peter, I don't have any way of verifying on the server side, other than measuring the time it takes for curl_exec to return a response. On the client side I can see that PHP's json_encode/json_decode functions are taking less than .5% of the total run time, even with a batch size of 1. During one of my 10 node attempts, I printed out the server response of the 500 Heap space error. It seemed like the last method in the stack trace was dealing with a Deserializer class or method. I will try again and capture the stack trace output to post here. Thanks, -- Josh Adell -- View this message in context: http://neo4j-community-discussions.438527.n3.nabble.com/Neo4jPHP-batch-insert-benchmarks-tp3282984p3283926.html Sent from the Neo4j Community Discussions mailing list archive at Nabble.com. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Jacob Hansson Phone: +46 (0) 763503395 Twitter: @jakewins ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Neo4jPHP batch insert benchmarks
Hey Josh, You can validate what Peter's suggesting by setting a small heap when you run the server. If you edit conf/neo4j-wrapper.conf you can override the property for heap size with something like this: wrapper.java.maxmemory=1 Then you should (in theory) be able to see the batch operation fail much earlier if it is the JSON components barfing. Jim ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Neo4jPHP batch insert benchmarks
Jim, When I was running into the issue, I set the maxmemory=256 and can confirm that it took much longer to fail, but it did fail in the same way. I didn't think of setting it smaller than the default, but I suspect you are correct. I'll try it that way when I attempt to generate the stack trace later so that I don't have to wait several minutes for it to fail. -- Josh -- View this message in context: http://neo4j-community-discussions.438527.n3.nabble.com/Neo4jPHP-batch-insert-benchmarks-tp3282984p3284382.html Sent from the Neo4j Community Discussions mailing list archive at Nabble.com. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Neo4jPHP batch insert benchmarks
I bumped the maxmemory up to 512 and ran a batch to create 10 nodes (repeated 10 times). After an average of 20 seconds, I always received the following response: HTTP/1.1 100 Continue HTTP/1.1 500 Java heap space Content-Type: text/html; charset=iso-8859-1 Cache-Control: must-revalidate,no-cache,no-store Content-Length: 4389 Server: Jetty(6.1.25) html head meta http-equiv=Content-Type content=text/html; charset=ISO-8859-1/ titleError 500 Java heap space/title /head body HTTP ERROR 500 pProblem accessing /db/data/batch. Reason: preJava heap space/pre/p Caused by: prejava.lang.OutOfMemoryError: Java heap space at java.util.HashMap.lt;initgt;(HashMap.java:209) at java.util.LinkedHashMap.lt;initgt;(LinkedHashMap.java:181) at org.codehaus.jackson.map.deser.UntypedObjectDeserializer.mapObject(UntypedObjectDeserializer.java:199) at org.codehaus.jackson.map.deser.UntypedObjectDeserializer.deserialize(UntypedObjectDeserializer.java:77) at org.codehaus.jackson.map.deser.UntypedObjectDeserializer.mapArray(UntypedObjectDeserializer.java:155) at org.codehaus.jackson.map.deser.UntypedObjectDeserializer.deserialize(UntypedObjectDeserializer.java:73) at org.codehaus.jackson.map.ObjectMapper._readMapAndClose(ObjectMapper.java:1980) at org.codehaus.jackson.map.ObjectMapper.readValue(ObjectMapper.java:1271) at org.neo4j.server.rest.domain.JsonHelper.readJson(JsonHelper.java:54) at org.neo4j.server.rest.repr.formats.JsonFormat.readList(JsonFormat.java:101) at org.neo4j.server.rest.web.BatchOperationService.performBatchOperations(BatchOperationService.java:78) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:184) at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:67) at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:276) at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:83) at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:133) at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:71) at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1171) at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1103) at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1053) at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1043) at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:406) at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:477) at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:662) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:390) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) /pre hr //smallPowered by Jetty:///small/br/ br/ br/ br/ /body /html So it seems to be an issue with deserializing the JSON. Unfortunately, I am not familiar enough with Java's dev environment or tools to diagnose any further. Any ideas? -- Josh -- View this message in context: http://neo4j-community-discussions.438527.n3.nabble.com/Neo4jPHP-batch-insert-benchmarks-tp3282984p3285635.html Sent from the Neo4j Community Discussions mailing list archive at Nabble.com. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user