[Neo4j] Loading large dataset
Hi there! Continuing our trials with neo4j, I need to load a reasonable amount of data (250k nodes + 20M relationships) into a neo server. This data lives in a mySQL db and a mongodb. For obvious reasons I'm not going to use the REST API for that, but I'd also would like to avoid using a plugin (I need some more control using Spring beans). So my question is: Would it be a bad idea, turning off the neo4j server, and running a java app with an embedded neo4j instance pointing to the storage of the server, load it up with all data, and then restart the server? I just wanna be clear that I'm not doing something stupid or ugly here :) Also, our IDs are all varchars (they came from mongo, so it's a big HEX String), is it possible to use a different ID besides long on neo? Or will I need to create a property and index it for retrieval? Many thanks ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Loading large dataset
Vinicius, doing the import in Java is a VERY sane idea, go for it. As for the custom IDs, we are goind to address the issue further down the roadmap, for now I think an index is your best option since these are non-scalar values if I understand it right? For my next lab day, I would love to test out in-graph structures for indexing scalar custom IDs, e.g. build up a B-Tree or so and see what that means. Welcome to join in :) /peter On Mon, Nov 21, 2011 at 11:06 AM, Vinicius Carvalho java.vinic...@gmail.com wrote: Hi there! Continuing our trials with neo4j, I need to load a reasonable amount of data (250k nodes + 20M relationships) into a neo server. This data lives in a mySQL db and a mongodb. For obvious reasons I'm not going to use the REST API for that, but I'd also would like to avoid using a plugin (I need some more control using Spring beans). So my question is: Would it be a bad idea, turning off the neo4j server, and running a java app with an embedded neo4j instance pointing to the storage of the server, load it up with all data, and then restart the server? I just wanna be clear that I'm not doing something stupid or ugly here :) Also, our IDs are all varchars (they came from mongo, so it's a big HEX String), is it possible to use a different ID besides long on neo? Or will I need to create a property and index it for retrieval? Many thanks ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Loading large dataset
Vinicius, as Peter said, good idea. Please try to avoid lucene index lookups during the import (use a hashmap cache String, Node or String,Long instead). If you want to have ultrafast import, please use the batch-inserter API, for an example look here: https://gist.github.com/1375679 Cheers Michael Am 21.11.2011 um 11:06 schrieb Vinicius Carvalho: Hi there! Continuing our trials with neo4j, I need to load a reasonable amount of data (250k nodes + 20M relationships) into a neo server. This data lives in a mySQL db and a mongodb. For obvious reasons I'm not going to use the REST API for that, but I'd also would like to avoid using a plugin (I need some more control using Spring beans). So my question is: Would it be a bad idea, turning off the neo4j server, and running a java app with an embedded neo4j instance pointing to the storage of the server, load it up with all data, and then restart the server? I just wanna be clear that I'm not doing something stupid or ugly here :) Also, our IDs are all varchars (they came from mongo, so it's a big HEX String), is it possible to use a different ID besides long on neo? Or will I need to create a property and index it for retrieval? Many thanks ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Loading large dataset
Thank you both for helping out. This list is just the best :D Michael I was considering that, now that you said, I'm definitely going to do it, use a hashmap to store the nodes as they get inserted, and then lookup there to create the relations. I'll have a look at the batch-inserter thanks. I'm doing a POC at LMI Ericsson, I strongly belive that neo4j is the answer for our network topology storage, not relational data. I just need to show some numbers to get more ppl on board, I have *no* doubt that traversing the network will be 1000x faster on neo than doing hundreds of SQL joins :) Regards On Mon, Nov 21, 2011 at 10:42 AM, Michael Hunger michael.hun...@neotechnology.com wrote: Vinicius, as Peter said, good idea. Please try to avoid lucene index lookups during the import (use a hashmap cache String, Node or String,Long instead). If you want to have ultrafast import, please use the batch-inserter API, for an example look here: https://gist.github.com/1375679 Cheers Michael Am 21.11.2011 um 11:06 schrieb Vinicius Carvalho: Hi there! Continuing our trials with neo4j, I need to load a reasonable amount of data (250k nodes + 20M relationships) into a neo server. This data lives in a mySQL db and a mongodb. For obvious reasons I'm not going to use the REST API for that, but I'd also would like to avoid using a plugin (I need some more control using Spring beans). So my question is: Would it be a bad idea, turning off the neo4j server, and running a java app with an embedded neo4j instance pointing to the storage of the server, load it up with all data, and then restart the server? I just wanna be clear that I'm not doing something stupid or ugly here :) Also, our IDs are all varchars (they came from mongo, so it's a big HEX String), is it possible to use a different ID besides long on neo? Or will I need to create a property and index it for retrieval? Many thanks ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Loading large dataset
Sounds great, if you need any help just ping me. Yes read performance should soar, are the numbers you provided (250k nodes + 20M relationships) your real dataset or what is the data amount that you think you see in production. You can also answer that off-list :) (to Peter and me). Cheers Michael Am 21.11.2011 um 12:11 schrieb Vinicius Carvalho: Thank you both for helping out. This list is just the best :D Michael I was considering that, now that you said, I'm definitely going to do it, use a hashmap to store the nodes as they get inserted, and then lookup there to create the relations. I'll have a look at the batch-inserter thanks. I'm doing a POC at LMI Ericsson, I strongly belive that neo4j is the answer for our network topology storage, not relational data. I just need to show some numbers to get more ppl on board, I have *no* doubt that traversing the network will be 1000x faster on neo than doing hundreds of SQL joins :) Regards On Mon, Nov 21, 2011 at 10:42 AM, Michael Hunger michael.hun...@neotechnology.com wrote: Vinicius, as Peter said, good idea. Please try to avoid lucene index lookups during the import (use a hashmap cache String, Node or String,Long instead). If you want to have ultrafast import, please use the batch-inserter API, for an example look here: https://gist.github.com/1375679 Cheers Michael Am 21.11.2011 um 11:06 schrieb Vinicius Carvalho: Hi there! Continuing our trials with neo4j, I need to load a reasonable amount of data (250k nodes + 20M relationships) into a neo server. This data lives in a mySQL db and a mongodb. For obvious reasons I'm not going to use the REST API for that, but I'd also would like to avoid using a plugin (I need some more control using Spring beans). So my question is: Would it be a bad idea, turning off the neo4j server, and running a java app with an embedded neo4j instance pointing to the storage of the server, load it up with all data, and then restart the server? I just wanna be clear that I'm not doing something stupid or ugly here :) Also, our IDs are all varchars (they came from mongo, so it's a big HEX String), is it possible to use a different ID besides long on neo? Or will I need to create a property and index it for retrieval? Many thanks ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user