[Neo4j] Loading large dataset

2011-11-21 Thread Vinicius Carvalho
Hi there! Continuing our trials with neo4j, I need to load a reasonable amount of data (250k nodes + 20M relationships) into a neo server. This data lives in a mySQL db and a mongodb. For obvious reasons I'm not going to use the REST API for that, but I'd also would like to avoid using a plugin

Re: [Neo4j] Loading large dataset

2011-11-21 Thread Peter Neubauer
Vinicius, doing the import in Java is a VERY sane idea, go for it. As for the custom IDs, we are goind to address the issue further down the roadmap, for now I think an index is your best option since these are non-scalar values if I understand it right? For my next lab day, I would love to test

Re: [Neo4j] Loading large dataset

2011-11-21 Thread Michael Hunger
Vinicius, as Peter said, good idea. Please try to avoid lucene index lookups during the import (use a hashmap cache String, Node or String,Long instead). If you want to have ultrafast import, please use the batch-inserter API, for an example look here: https://gist.github.com/1375679 Cheers

Re: [Neo4j] Loading large dataset

2011-11-21 Thread Vinicius Carvalho
Thank you both for helping out. This list is just the best :D Michael I was considering that, now that you said, I'm definitely going to do it, use a hashmap to store the nodes as they get inserted, and then lookup there to create the relations. I'll have a look at the batch-inserter thanks.

Re: [Neo4j] Loading large dataset

2011-11-21 Thread Michael Hunger
Sounds great, if you need any help just ping me. Yes read performance should soar, are the numbers you provided (250k nodes + 20M relationships) your real dataset or what is the data amount that you think you see in production. You can also answer that off-list :) (to Peter and me). Cheers