Re: [Neo4j] Batch Insert : poooor performance
Olivier, please let us know your progress, and feel free to issue a pull request when you get things working! Cheers, /peter neubauer GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - NOSQL for the Enterprise. http://startupbootcamp.org/ - Öresund - Innovation happens HERE. On Fri, Nov 18, 2011 at 2:16 PM, ov wrote: > Thanks for your answer Michael, > > Indeed when creating a relationship between 2 nodes, I need to retrieve neo4j > nodeID (from customID) for both nodes ... > I expected the cache to have a real big effect on this mechanism, but alas ... > > For this "small" graph, I suppose I can fully work in RAM, but this surely > won't do for a much bigger graph > > Thanks a lot, > I'll try with my own cache mechanism > > Regards > > Le 18 nov. 2011 à 13:14, Michael Hunger [via Neo4j Community Discussions] a > écrit : > >> Please try not to use lucene for lookups during batch-inserts just index >> your nodes (for later use) but use a custom, in memory cache for the >> insertion process. >> >> customID -> nodeId, like Map. >> >> Using lucene for lookups takes up to 1000 times longer during batch - >> inserts (probably, as the merge threads in the background have to finish up >> before you can include their >> results in the query). >> >> the luceneBatchInserterIndex.setCacheCapacity() seems not to work as >> expected, we will investigate that. >> >> Cheers >> >> Michael >> >> Here is the original post: >> >> Hi, >> I am in almost the same case as a previous post concerning Batch Insert poor >> performance >> but, I still can figure out how to do it correctly with good performances. >> >> Nodes: 30 millions >> Relationships : 250 millions >> >> I am on a MacOSX 10.7.1, 4 cpus, 8Go RAM >> 1) Insert Nodes : >> JVM -server -d64 -Xmx4G -XX:+UseParNewGC -XX:+UseNUMA -XX:+UseConcMarkSweepGC >> from 80 000 down to 50 000 inserts / seconds with properties (customID,url) >> with LuceneIndexing on "customID" and "url" >> a bit disappointing >> >> 2) Insert Relationships >> JVM -server -d64 -Xmx6G -XX:+UseParNewGC -XX:+UseNUMA -XX:+UseConcMarkSweepGC >> Index cache capacity 30 000 000 (whole nodes) on customID >> neostore.nodestore.db.mapped_memory=300M >> neostore.relationshipstore.db.mapped_memory=1G >> neostore.propertystore.db.mapped_memory=2.2G >> neostore.propertystore.db.strings.mapped_memory=100M >> neostore.propertystore.db.arrays.mapped_memory=10M >> >> => insertion rate ~ 50 relationships / seconds >> and going down ... >> >> (many many tests ... but always very poor performances) >> >> Do you have any idea, on how to have this work correctly ? >> >> I am really stuck here >> >> if you want to have a look at my code : no issues ! :) >> >> Many many thanks for your help >> >> Am 18.11.2011 um 12:47 schrieb Krzysztof Raczyński: >> >> > Btw, inserting 600k nodes over REST with about 8 properties in batches >> > of 100 takes 20-30minutes for me. It's not awesomely fast, but it's >> > not slow either. What settings are affecting insertion speeds, Peter? >> > ___ >> > Neo4j mailing list >> > [hidden email] >> > https://lists.neo4j.org/mailman/listinfo/user >> >> ___ >> Neo4j mailing list >> [hidden email] >> https://lists.neo4j.org/mailman/listinfo/user >> >> >> If you reply to this email, your message will be added to the discussion >> below: >> http://neo4j-community-discussions.438527.n3.nabble.com/Batch-Insert-pr-performance-tp3513211p3518444.html >> To unsubscribe from Batch Insert : pr performance, click here. >> NAML > > > > -- > View this message in context: > http://neo4j-community-discussions.438527.n3.nabble.com/Batch-Insert-pr-performance-tp3513211p3518559.html > Sent from the Neo4j Community Discussions mailing list archive at Nabble.com. > ___ > Neo4j mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user > ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Batch Insert : poooor performance
Thanks for your answer Michael, Indeed when creating a relationship between 2 nodes, I need to retrieve neo4j nodeID (from customID) for both nodes ... I expected the cache to have a real big effect on this mechanism, but alas ... For this "small" graph, I suppose I can fully work in RAM, but this surely won't do for a much bigger graph Thanks a lot, I'll try with my own cache mechanism Regards Le 18 nov. 2011 à 13:14, Michael Hunger [via Neo4j Community Discussions] a écrit : > Please try not to use lucene for lookups during batch-inserts just index your > nodes (for later use) but use a custom, in memory cache for the insertion > process. > > customID -> nodeId, like Map. > > Using lucene for lookups takes up to 1000 times longer during batch - inserts > (probably, as the merge threads in the background have to finish up before > you can include their > results in the query). > > the luceneBatchInserterIndex.setCacheCapacity() seems not to work as > expected, we will investigate that. > > Cheers > > Michael > > Here is the original post: > > Hi, > I am in almost the same case as a previous post concerning Batch Insert poor > performance > but, I still can figure out how to do it correctly with good performances. > > Nodes: 30 millions > Relationships : 250 millions > > I am on a MacOSX 10.7.1, 4 cpus, 8Go RAM > 1) Insert Nodes : > JVM -server -d64 -Xmx4G -XX:+UseParNewGC -XX:+UseNUMA -XX:+UseConcMarkSweepGC > from 80 000 down to 50 000 inserts / seconds with properties (customID,url) > with LuceneIndexing on "customID" and "url" > a bit disappointing > > 2) Insert Relationships > JVM -server -d64 -Xmx6G -XX:+UseParNewGC -XX:+UseNUMA -XX:+UseConcMarkSweepGC > Index cache capacity 30 000 000 (whole nodes) on customID > neostore.nodestore.db.mapped_memory=300M > neostore.relationshipstore.db.mapped_memory=1G > neostore.propertystore.db.mapped_memory=2.2G > neostore.propertystore.db.strings.mapped_memory=100M > neostore.propertystore.db.arrays.mapped_memory=10M > > => insertion rate ~ 50 relationships / seconds > and going down ... > > (many many tests ... but always very poor performances) > > Do you have any idea, on how to have this work correctly ? > > I am really stuck here > > if you want to have a look at my code : no issues ! :) > > Many many thanks for your help > > Am 18.11.2011 um 12:47 schrieb Krzysztof Raczyński: > > > Btw, inserting 600k nodes over REST with about 8 properties in batches > > of 100 takes 20-30minutes for me. It's not awesomely fast, but it's > > not slow either. What settings are affecting insertion speeds, Peter? > > ___ > > Neo4j mailing list > > [hidden email] > > https://lists.neo4j.org/mailman/listinfo/user > > ___ > Neo4j mailing list > [hidden email] > https://lists.neo4j.org/mailman/listinfo/user > > > If you reply to this email, your message will be added to the discussion > below: > http://neo4j-community-discussions.438527.n3.nabble.com/Batch-Insert-pr-performance-tp3513211p3518444.html > To unsubscribe from Batch Insert : pr performance, click here. > NAML -- View this message in context: http://neo4j-community-discussions.438527.n3.nabble.com/Batch-Insert-pr-performance-tp3513211p3518559.html Sent from the Neo4j Community Discussions mailing list archive at Nabble.com. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Batch Insert : poooor performance
Please try not to use lucene for lookups during batch-inserts just index your nodes (for later use) but use a custom, in memory cache for the insertion process. customID -> nodeId, like Map. Using lucene for lookups takes up to 1000 times longer during batch - inserts (probably, as the merge threads in the background have to finish up before you can include their results in the query). the luceneBatchInserterIndex.setCacheCapacity() seems not to work as expected, we will investigate that. Cheers Michael Here is the original post: Hi, I am in almost the same case as a previous post concerning Batch Insert poor performance but, I still can figure out how to do it correctly with good performances. Nodes: 30 millions Relationships : 250 millions I am on a MacOSX 10.7.1, 4 cpus, 8Go RAM 1) Insert Nodes : JVM -server -d64 -Xmx4G -XX:+UseParNewGC -XX:+UseNUMA -XX:+UseConcMarkSweepGC from 80 000 down to 50 000 inserts / seconds with properties (customID,url) with LuceneIndexing on "customID" and "url" a bit disappointing 2) Insert Relationships JVM -server -d64 -Xmx6G -XX:+UseParNewGC -XX:+UseNUMA -XX:+UseConcMarkSweepGC Index cache capacity 30 000 000 (whole nodes) on customID neostore.nodestore.db.mapped_memory=300M neostore.relationshipstore.db.mapped_memory=1G neostore.propertystore.db.mapped_memory=2.2G neostore.propertystore.db.strings.mapped_memory=100M neostore.propertystore.db.arrays.mapped_memory=10M => insertion rate ~ 50 relationships / seconds and going down ... (many many tests ... but always very poor performances) Do you have any idea, on how to have this work correctly ? I am really stuck here if you want to have a look at my code : no issues ! :) Many many thanks for your help Am 18.11.2011 um 12:47 schrieb Krzysztof Raczyński: > Btw, inserting 600k nodes over REST with about 8 properties in batches > of 100 takes 20-30minutes for me. It's not awesomely fast, but it's > not slow either. What settings are affecting insertion speeds, Peter? > ___ > Neo4j mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Batch Insert : poooor performance
That seems about normal. The good news is that it is much faster (usually) than an RDBMS on the same hardware. -Original Message- From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On Behalf Of Krzysztof Raczynski Sent: Friday, November 18, 2011 6:47 AM To: Neo4j user discussions Subject: Re: [Neo4j] Batch Insert : pr performance Btw, inserting 600k nodes over REST with about 8 properties in batches of 100 takes 20-30minutes for me. It's not awesomely fast, but it's not slow either. What settings are affecting insertion speeds, Peter? ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Batch Insert : poooor performance
Btw, inserting 600k nodes over REST with about 8 properties in batches of 100 takes 20-30minutes for me. It's not awesomely fast, but it's not slow either. What settings are affecting insertion speeds, Peter? ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Batch Insert : poooor performance
Yes, I think you should resend your original post that got stuck... On Nov 18, 2011 12:40 PM, "Krzysztof Raczyński" wrote: > Of course providing some more context would be poor too? How are > we supposed to know what's the problem? > ___ > Neo4j mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user > ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Batch Insert : poooor performance
Of course providing some more context would be poor too? How are we supposed to know what's the problem? ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Batch Insert : poooor performance
Any one ? -- View this message in context: http://neo4j-community-discussions.438527.n3.nabble.com/Batch-Insert-pr-performance-tp3513211p3518340.html Sent from the Neo4j Community Discussions mailing list archive at Nabble.com. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user