Re: [Neo4j] Neo4jPHP batch insert benchmarks

2011-09-09 Thread Jacob Hansson
On Fri, Aug 26, 2011 at 2:46 PM, jadell josh.ad...@gmail.com wrote:

 Jim,

 Fair enough.  For now, I'll just know not to try and make batches that big
 :-) My own use case is for the transaction safety rather than trying to
 create thousands of entities at once, so it doesn't effect me that much.  I
 just wanted to have something more concrete to tell other users who might
 try.

 Thanks to all for helping me investigate!


Update on this: I did a bit of hacking on this just now, and was able to
improve the memory usage quite a bit. The major culprits was jacksons
deserializer (or rather, that we use their mapping serializer), and using a
list of strings to store and eventually aggregate results rather than
StringBuilder (which is my fault, since it's my code, and I have no clue why
I did it like that).

I just pushed this to master. My benchmarks look like this, showing number
of inserts, each insert being a node and a property:

Old:
---
1000   : 1.340s
5000   : 3.013s
1  : 4.304s
5  : 18.120s
10 : OutOfMemory
50 : OutOfMemory

New:
---
1000   : 1.546s
5000   : 3.116s
1  : 3.702s
5  : 17.183s
10 : 35.427s
50 : OutOfMemory

This is on a JVM with 1GB heap. The OutOfMemory that we see in the new setup
is because I didn't go all the way and implement streaming output, so that
would be cool to try as well.

/Jake


 -- Josh

 --
 View this message in context:
 http://neo4j-community-discussions.438527.n3.nabble.com/Neo4jPHP-batch-insert-benchmarks-tp3282984p3286721.html
 Sent from the Neo4j Community Discussions mailing list archive at
 Nabble.com.
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user




-- 
Jacob Hansson
Phone: +46 (0) 763503395
Twitter: @jakewins
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Neo4jPHP batch insert benchmarks

2011-08-25 Thread Peter Neubauer
Josh,
it might be that the parsing of the JSON load is taking up increasingly much
time when you get big batches. At least that is my suspicion. Also, that
might be the reason for the heap problems - basically the String parsing is
taking over :/

Do you have any means of verifying that?

Cheers,

/peter neubauer

GTalk:  neubauer.peter
Skype   peter.neubauer
Phone   +46 704 106975
LinkedIn   http://www.linkedin.com/in/neubauer
Twitter  http://twitter.com/peterneubauer

http://www.neo4j.org   - Your high performance graph database.
http://startupbootcamp.org/- Ă–resund - Innovation happens HERE.
http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party.


On Thu, Aug 25, 2011 at 6:15 AM, jadell josh.ad...@gmail.com wrote:

 Hey all,

 I've been working on adding batch support to
 http://github.com/jadell/Neo4jPHP Neo4jPHP .  Here are the results of my
 latest benchmarks.  First column is the number of nodes being inserted,
 second column is the average in seconds over 5 runs to insert that many
 nodes in a single batch, third column is the average in seconds over 5 runs
 to insert that many nodes one at a time:

 #nodes  batchsingle
 100  0
 100  0.2   0.4
 250  0  1
 500  0.8   2
 10001.4   4
 25006  10.6
 500023.2 21.2
 1  91.6 40.4

 It seems like batches win out until right around 5000 nodes at a time.
  I've
 profiled my code, and it seems like the time spent in PHP is roughly
 equivalent for batch vs. single.  All the time difference is spent in a
 curl_exec call, talking to or waiting to hear back from the server.

 I tried going up to 10 nodes.  Single insert handled this just fine,
 but
 the server kept returning a 500 Java Heap space error on the batch, even
 with 512M max heap.

 Benchmark script can be found here:  http://gist.github.com/1169394
 http://gist.github.com/1169394
 Benchmarks were run on an 4 x 2.3GHz core Intel i7, 4G RAM, running Ubuntu
 10.10.  Neo4j server was run with out-of-the-box settings in a VM runnning
 Ubuntu 10.10 with 1 dedicated core and 1G RAM.

 I hope this is of interest to anyone.  I'd love to get some feedback from
 anyone using Neo4j from PHP, with Neo4jPHP or any other library.

 -- Josh Adell


 --
 View this message in context:
 http://neo4j-community-discussions.438527.n3.nabble.com/Neo4jPHP-batch-insert-benchmarks-tp3282984p3282984.html
 Sent from the Neo4j Community Discussions mailing list archive at
 Nabble.com.
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Neo4jPHP batch insert benchmarks

2011-08-25 Thread jadell
Hey Peter,

I don't have any way of verifying on the server side, other than measuring
the time it takes for curl_exec to return a response. On the client side I
can see that PHP's json_encode/json_decode functions are taking less than
.5% of the total run time, even with a batch size of 1. During one of my
10 node attempts, I printed out the server response of the 500 Heap
space error. It seemed like the last method in the stack trace was dealing
with a Deserializer class or method.  I will try again and capture the
stack trace output to post here.

Thanks,

-- Josh Adell



--
View this message in context: 
http://neo4j-community-discussions.438527.n3.nabble.com/Neo4jPHP-batch-insert-benchmarks-tp3282984p3283926.html
Sent from the Neo4j Community Discussions mailing list archive at Nabble.com.
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Neo4jPHP batch insert benchmarks

2011-08-25 Thread Jacob Hansson
The heap space stuff would make sense I think, because we currently
deserialize and serialize in-place, keeping the whole thing in memory. Would
be interesting to see if we could implement a setup that can stream the
deserialization/serialization, getting rid of the memory overhead..

You said you are using out-of-the-box settings for the server, I don't
remember off the top of my head what the default heap size is, but you might
want to try giving it more RAM. I'm gonna guess that's where performance
dies..

I'll have to look at what is proper HTTP behavior, but there should be a way
we could start streaming back the response as it is being calculated, as
long as we can come up with a good way of aborting if something fails..
Doing that would mean we don't have to keep a hundred thousand requests and
responses in memory, which would completely change the performance
situation.


Big thanks for taking the time to put this together!

/jake

On Thu, Aug 25, 2011 at 3:52 PM, jadell josh.ad...@gmail.com wrote:

 Hey Peter,

 I don't have any way of verifying on the server side, other than measuring
 the time it takes for curl_exec to return a response. On the client side I
 can see that PHP's json_encode/json_decode functions are taking less than
 .5% of the total run time, even with a batch size of 1. During one of
 my
 10 node attempts, I printed out the server response of the 500 Heap
 space error. It seemed like the last method in the stack trace was dealing
 with a Deserializer class or method.  I will try again and capture the
 stack trace output to post here.

 Thanks,

 -- Josh Adell



 --
 View this message in context:
 http://neo4j-community-discussions.438527.n3.nabble.com/Neo4jPHP-batch-insert-benchmarks-tp3282984p3283926.html
 Sent from the Neo4j Community Discussions mailing list archive at
 Nabble.com.
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user




-- 
Jacob Hansson
Phone: +46 (0) 763503395
Twitter: @jakewins
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Neo4jPHP batch insert benchmarks

2011-08-25 Thread Jim Webber
Hey Josh,

You can validate what Peter's suggesting by setting a small heap when you run 
the server.

If you edit conf/neo4j-wrapper.conf you can override the property for heap size 
with something like this:

wrapper.java.maxmemory=1

Then you should (in theory) be able to see the batch operation fail much 
earlier if it is the JSON components barfing.

Jim
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Neo4jPHP batch insert benchmarks

2011-08-25 Thread jadell
Jim,

When I was running into the issue, I set the maxmemory=256 and can confirm
that it took much longer to fail, but it did fail in the same way.  I didn't
think of setting it smaller than the default, but I suspect you are correct. 
I'll try it that way when I attempt to generate the stack trace later so
that I don't have to wait several minutes for it to fail.

-- Josh

--
View this message in context: 
http://neo4j-community-discussions.438527.n3.nabble.com/Neo4jPHP-batch-insert-benchmarks-tp3282984p3284382.html
Sent from the Neo4j Community Discussions mailing list archive at Nabble.com.
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Neo4jPHP batch insert benchmarks

2011-08-25 Thread jadell
I bumped the maxmemory up to 512 and ran a batch to create 10 nodes
(repeated 10 times).  After an average of 20 seconds, I always received the
following response:


HTTP/1.1 100 Continue

HTTP/1.1 500 Java heap space
Content-Type: text/html; charset=iso-8859-1
Cache-Control: must-revalidate,no-cache,no-store
Content-Length: 4389
Server: Jetty(6.1.25)

html
head
meta http-equiv=Content-Type content=text/html; charset=ISO-8859-1/
titleError 500 Java heap space/title
/head
body
HTTP ERROR 500

pProblem accessing /db/data/batch. Reason:
preJava heap space/pre/p
Caused by:
prejava.lang.OutOfMemoryError: Java heap space
at java.util.HashMap.lt;initgt;(HashMap.java:209)
at java.util.LinkedHashMap.lt;initgt;(LinkedHashMap.java:181)
at
org.codehaus.jackson.map.deser.UntypedObjectDeserializer.mapObject(UntypedObjectDeserializer.java:199)
at
org.codehaus.jackson.map.deser.UntypedObjectDeserializer.deserialize(UntypedObjectDeserializer.java:77)
at
org.codehaus.jackson.map.deser.UntypedObjectDeserializer.mapArray(UntypedObjectDeserializer.java:155)
at
org.codehaus.jackson.map.deser.UntypedObjectDeserializer.deserialize(UntypedObjectDeserializer.java:73)
at
org.codehaus.jackson.map.ObjectMapper._readMapAndClose(ObjectMapper.java:1980)
at 
org.codehaus.jackson.map.ObjectMapper.readValue(ObjectMapper.java:1271)
at org.neo4j.server.rest.domain.JsonHelper.readJson(JsonHelper.java:54)
at
org.neo4j.server.rest.repr.formats.JsonFormat.readList(JsonFormat.java:101)
at
org.neo4j.server.rest.web.BatchOperationService.performBatchOperations(BatchOperationService.java:78)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:184)
at
com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:67)
at
com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:276)
at
com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:83)
at
com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:133)
at
com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:71)
at
com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1171)
at
com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1103)
at
com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1053)
at
com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1043)
at
com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:406)
at
com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:477)
at
com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:662)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at 
org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:390)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
/pre
hr //smallPowered by Jetty:///small/br/ 
   
br/
br/
br/
/body
/html


So it seems to be an issue with deserializing the JSON.  Unfortunately, I am
not familiar enough with Java's dev environment or tools to diagnose any
further.

Any ideas?

-- Josh


--
View this message in context: 
http://neo4j-community-discussions.438527.n3.nabble.com/Neo4jPHP-batch-insert-benchmarks-tp3282984p3285635.html
Sent from the Neo4j Community Discussions mailing list archive at Nabble.com.
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


[Neo4j] Neo4jPHP batch insert benchmarks

2011-08-24 Thread jadell
Hey all,

I've been working on adding batch support to 
http://github.com/jadell/Neo4jPHP Neo4jPHP .  Here are the results of my
latest benchmarks.  First column is the number of nodes being inserted,
second column is the average in seconds over 5 runs to insert that many
nodes in a single batch, third column is the average in seconds over 5 runs
to insert that many nodes one at a time:

#nodes  batchsingle
100  0
100  0.2   0.4
250  0  1
500  0.8   2
10001.4   4
25006  10.6
500023.2 21.2
1  91.6 40.4

It seems like batches win out until right around 5000 nodes at a time.  I've
profiled my code, and it seems like the time spent in PHP is roughly
equivalent for batch vs. single.  All the time difference is spent in a
curl_exec call, talking to or waiting to hear back from the server.

I tried going up to 10 nodes.  Single insert handled this just fine, but
the server kept returning a 500 Java Heap space error on the batch, even
with 512M max heap.

Benchmark script can be found here:  http://gist.github.com/1169394
http://gist.github.com/1169394 
Benchmarks were run on an 4 x 2.3GHz core Intel i7, 4G RAM, running Ubuntu
10.10.  Neo4j server was run with out-of-the-box settings in a VM runnning
Ubuntu 10.10 with 1 dedicated core and 1G RAM.

I hope this is of interest to anyone.  I'd love to get some feedback from
anyone using Neo4j from PHP, with Neo4jPHP or any other library.

-- Josh Adell


--
View this message in context: 
http://neo4j-community-discussions.438527.n3.nabble.com/Neo4jPHP-batch-insert-benchmarks-tp3282984p3282984.html
Sent from the Neo4j Community Discussions mailing list archive at Nabble.com.
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user