Re: [Neo4j] Loading large dataset
Sounds great, if you need any help just ping me. Yes read performance should soar, are the numbers you provided (250k nodes + 20M relationships) your real dataset or what is the data amount that you think you see in production. You can also answer that off-list :) (to Peter and me). Cheers Michael Am 21.11.2011 um 12:11 schrieb Vinicius Carvalho: > Thank you both for helping out. This list is just the best :D > > Michael I was considering that, now that you said, I'm definitely going to > do it, use a hashmap to store the nodes as they get inserted, and then > lookup there to create the relations. > > I'll have a look at the batch-inserter thanks. > > I'm doing a POC at LMI Ericsson, I strongly belive that neo4j is the answer > for our network topology storage, not relational data. I just need to show > some numbers to get more ppl on board, I have *no* doubt that traversing > the network will be 1000x faster on neo than doing hundreds of SQL joins :) > > Regards > > On Mon, Nov 21, 2011 at 10:42 AM, Michael Hunger < > michael.hun...@neotechnology.com> wrote: > >> Vinicius, >> >> as Peter said, good idea. >> >> Please try to avoid lucene index lookups during the import (use a hashmap >> cache or instead). >> >> If you want to have ultrafast import, please use the batch-inserter API, >> >> for an example look here: https://gist.github.com/1375679 >> >> Cheers >> >> Michael >> >> Am 21.11.2011 um 11:06 schrieb Vinicius Carvalho: >> >>> Hi there! Continuing our trials with neo4j, I need to load a reasonable >>> amount of data (250k nodes + 20M relationships) into a neo server. >>> >>> This data lives in a mySQL db and a mongodb. >>> >>> For obvious reasons I'm not going to use the REST API for that, but I'd >>> also would like to avoid using a plugin (I need some more control using >>> Spring beans). >>> >>> So my question is: >>> >>> Would it be a bad idea, turning off the neo4j server, and running a java >>> app with an embedded neo4j instance pointing to the storage of the >> server, >>> load it up with all data, and then restart the server? I just wanna be >>> clear that I'm not doing something stupid or ugly here :) >>> >>> Also, our IDs are all varchars (they came from mongo, so it's a big HEX >>> String), is it possible to use a different ID besides long on neo? Or >> will >>> I need to create a property and index it for retrieval? >>> >>> Many thanks >>> ___ >>> Neo4j mailing list >>> User@lists.neo4j.org >>> https://lists.neo4j.org/mailman/listinfo/user >> >> ___ >> Neo4j mailing list >> User@lists.neo4j.org >> https://lists.neo4j.org/mailman/listinfo/user >> > ___ > Neo4j mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Loading large dataset
Thank you both for helping out. This list is just the best :D Michael I was considering that, now that you said, I'm definitely going to do it, use a hashmap to store the nodes as they get inserted, and then lookup there to create the relations. I'll have a look at the batch-inserter thanks. I'm doing a POC at LMI Ericsson, I strongly belive that neo4j is the answer for our network topology storage, not relational data. I just need to show some numbers to get more ppl on board, I have *no* doubt that traversing the network will be 1000x faster on neo than doing hundreds of SQL joins :) Regards On Mon, Nov 21, 2011 at 10:42 AM, Michael Hunger < michael.hun...@neotechnology.com> wrote: > Vinicius, > > as Peter said, good idea. > > Please try to avoid lucene index lookups during the import (use a hashmap > cache or instead). > > If you want to have ultrafast import, please use the batch-inserter API, > > for an example look here: https://gist.github.com/1375679 > > Cheers > > Michael > > Am 21.11.2011 um 11:06 schrieb Vinicius Carvalho: > > > Hi there! Continuing our trials with neo4j, I need to load a reasonable > > amount of data (250k nodes + 20M relationships) into a neo server. > > > > This data lives in a mySQL db and a mongodb. > > > > For obvious reasons I'm not going to use the REST API for that, but I'd > > also would like to avoid using a plugin (I need some more control using > > Spring beans). > > > > So my question is: > > > > Would it be a bad idea, turning off the neo4j server, and running a java > > app with an embedded neo4j instance pointing to the storage of the > server, > > load it up with all data, and then restart the server? I just wanna be > > clear that I'm not doing something stupid or ugly here :) > > > > Also, our IDs are all varchars (they came from mongo, so it's a big HEX > > String), is it possible to use a different ID besides long on neo? Or > will > > I need to create a property and index it for retrieval? > > > > Many thanks > > ___ > > Neo4j mailing list > > User@lists.neo4j.org > > https://lists.neo4j.org/mailman/listinfo/user > > ___ > Neo4j mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user > ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Loading large dataset
Vinicius, as Peter said, good idea. Please try to avoid lucene index lookups during the import (use a hashmap cache or instead). If you want to have ultrafast import, please use the batch-inserter API, for an example look here: https://gist.github.com/1375679 Cheers Michael Am 21.11.2011 um 11:06 schrieb Vinicius Carvalho: > Hi there! Continuing our trials with neo4j, I need to load a reasonable > amount of data (250k nodes + 20M relationships) into a neo server. > > This data lives in a mySQL db and a mongodb. > > For obvious reasons I'm not going to use the REST API for that, but I'd > also would like to avoid using a plugin (I need some more control using > Spring beans). > > So my question is: > > Would it be a bad idea, turning off the neo4j server, and running a java > app with an embedded neo4j instance pointing to the storage of the server, > load it up with all data, and then restart the server? I just wanna be > clear that I'm not doing something stupid or ugly here :) > > Also, our IDs are all varchars (they came from mongo, so it's a big HEX > String), is it possible to use a different ID besides long on neo? Or will > I need to create a property and index it for retrieval? > > Many thanks > ___ > Neo4j mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Loading large dataset
Vinicius, doing the import in Java is a VERY sane idea, go for it. As for the custom IDs, we are goind to address the issue further down the roadmap, for now I think an index is your best option since these are non-scalar values if I understand it right? For my next lab day, I would love to test out in-graph structures for indexing scalar custom IDs, e.g. build up a B-Tree or so and see what that means. Welcome to join in :) /peter On Mon, Nov 21, 2011 at 11:06 AM, Vinicius Carvalho wrote: > Hi there! Continuing our trials with neo4j, I need to load a reasonable > amount of data (250k nodes + 20M relationships) into a neo server. > > This data lives in a mySQL db and a mongodb. > > For obvious reasons I'm not going to use the REST API for that, but I'd > also would like to avoid using a plugin (I need some more control using > Spring beans). > > So my question is: > > Would it be a bad idea, turning off the neo4j server, and running a java > app with an embedded neo4j instance pointing to the storage of the server, > load it up with all data, and then restart the server? I just wanna be > clear that I'm not doing something stupid or ugly here :) > > Also, our IDs are all varchars (they came from mongo, so it's a big HEX > String), is it possible to use a different ID besides long on neo? Or will > I need to create a property and index it for retrieval? > > Many thanks > ___ > Neo4j mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user > ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
[Neo4j] Loading large dataset
Hi there! Continuing our trials with neo4j, I need to load a reasonable amount of data (250k nodes + 20M relationships) into a neo server. This data lives in a mySQL db and a mongodb. For obvious reasons I'm not going to use the REST API for that, but I'd also would like to avoid using a plugin (I need some more control using Spring beans). So my question is: Would it be a bad idea, turning off the neo4j server, and running a java app with an embedded neo4j instance pointing to the storage of the server, load it up with all data, and then restart the server? I just wanna be clear that I'm not doing something stupid or ugly here :) Also, our IDs are all varchars (they came from mongo, so it's a big HEX String), is it possible to use a different ID besides long on neo? Or will I need to create a property and index it for retrieval? Many thanks ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user