[Neo4j] possibility to merge some neo4j databases?
Hi, I need to batch insert millions of data in neo4j. It is quite difficult to keep all in a Map to get node ids, so it needs frequent lookups in index to get some node ids for relationships, and result is quite low. Is there any way to build several neo4j databases (independantly) then to merge them? (I could build many small db in parallel) Thanks Olivier -- View this message in context: http://neo4j-community-discussions.438527.n3.nabble.com/possibility-to-merge-some-neo4j-databases-tp3544694p3544694.html Sent from the Neo4j Community Discussions mailing list archive at Nabble.com. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] possibility to merge some neo4j databases?
There are two approaches I can think of: - use a better index for mapping ids. Lucent is too slow. Memory hashtables are memory bound.Peter has been investigating alternative dbs like bdb. I tried, but did not finish a hashmap of cached arrays, and Chris wrote his big data import project on github, which is a hashmap of cached hashmaps. Many promising solutions, but none yet complete. All Target the general case of id mapping. - for this specific case, merging small databases, I had an idea a couple of years ago which I still think will work. Bulk appending entire databases, by offsetting all internal ids by the current max id. I remember the reason Johan did not like this idea was that it suffered from the same flaws as the batch inserter, locking the entire db, no rollback and risk of entire db corruption. For people happy with the batch inserter, perhaps this is still an option, but unlikely to get prioritized by the neo team because if the corruption risks. It would, however, perform spectacularly well since the id map is a trivial function. Personally I hope someone completes Chris persistent hashmap or a similar solution. Id maps are a recurring theme and would be very valuable. On Nov 29, 2011 12:07 PM, "osallou" wrote: > Hi, > I need to batch insert millions of data in neo4j. > It is quite difficult to keep all in a Map to get node ids, so it needs > frequent lookups in index to get some node ids for relationships, and > result is quite low. > > Is there any way to build several neo4j databases (independantly) then > to merge them? (I could build many small db in parallel) > > Thanks > > Olivier > > -- > View this message in context: > http://neo4j-community-discussions.438527.n3.nabble.com/possibility-to-merge-some-neo4j-databases-tp3544694p3544694.html > Sent from the Neo4j Community Discussions mailing list archive at > Nabble.com. > ___ > Neo4j mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user > ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] possibility to merge some neo4j databases?
Sure the limitations apply, but as only the target database would be corrupted and none of the ones being used for the import that should be ok. That is actually like a nice lab-day project. I'll add it to the list. Michael Am 29.11.2011 um 15:18 schrieb Craig Taverner: > There are two approaches I can think of: > - use a better index for mapping ids. Lucent is too slow. Memory hashtables > are memory bound.Peter has been investigating alternative dbs like bdb. I > tried, but did not finish a hashmap of cached arrays, and Chris wrote his > big data import project on github, which is a hashmap of cached hashmaps. > Many promising solutions, but none yet complete. All Target the general > case of id mapping. > - for this specific case, merging small databases, I had an idea a couple > of years ago which I still think will work. Bulk appending entire > databases, by offsetting all internal ids by the current max id. I remember > the reason Johan did not like this idea was that it suffered from the same > flaws as the batch inserter, locking the entire db, no rollback and risk of > entire db corruption. For people happy with the batch inserter, perhaps > this is still an option, but unlikely to get prioritized by the neo team > because if the corruption risks. It would, however, perform spectacularly > well since the id map is a trivial function. > > Personally I hope someone completes Chris persistent hashmap or a similar > solution. Id maps are a recurring theme and would be very valuable. > On Nov 29, 2011 12:07 PM, "osallou" wrote: > >> Hi, >> I need to batch insert millions of data in neo4j. >> It is quite difficult to keep all in a Map to get node ids, so it needs >> frequent lookups in index to get some node ids for relationships, and >> result is quite low. >> >> Is there any way to build several neo4j databases (independantly) then >> to merge them? (I could build many small db in parallel) >> >> Thanks >> >> Olivier >> >> -- >> View this message in context: >> http://neo4j-community-discussions.438527.n3.nabble.com/possibility-to-merge-some-neo4j-databases-tp3544694p3544694.html >> Sent from the Neo4j Community Discussions mailing list archive at >> Nabble.com. >> ___ >> Neo4j mailing list >> User@lists.neo4j.org >> https://lists.neo4j.org/mailman/listinfo/user >> > ___ > Neo4j mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user