[Neo4j] Loading large dataset

2011-11-21 Thread Vinicius Carvalho
Hi there! Continuing our trials with neo4j, I need to load a reasonable
amount of data (250k nodes + 20M relationships) into a neo server.

This data lives in a mySQL db and a mongodb.

For obvious reasons I'm not going to use the REST API for that, but I'd
also would like to avoid using a plugin (I need some more control using
Spring beans).

So my question is:

Would it be a bad idea, turning off the neo4j server, and running a java
app with an embedded neo4j instance pointing to the storage of the server,
load it up with all data, and then restart the server? I just wanna be
clear that I'm not doing something stupid or ugly here :)

Also, our IDs are all varchars (they came from mongo, so it's a big HEX
String), is it possible to use a different ID besides long on neo? Or will
I need to create a property and index it for retrieval?

Many thanks
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Loading large dataset

2011-11-21 Thread Peter Neubauer
Vinicius,
doing the import in Java is a VERY sane idea, go for it. As for the custom
IDs, we are goind to address the issue further down the roadmap, for now I
think an index is your best option since these are non-scalar values if I
understand it right?

For my next lab day, I would love to test out in-graph structures for
indexing scalar custom IDs, e.g. build up a B-Tree or so and see what that
means. Welcome to join in :)

/peter


On Mon, Nov 21, 2011 at 11:06 AM, Vinicius Carvalho java.vinic...@gmail.com
 wrote:

 Hi there! Continuing our trials with neo4j, I need to load a reasonable
 amount of data (250k nodes + 20M relationships) into a neo server.

 This data lives in a mySQL db and a mongodb.

 For obvious reasons I'm not going to use the REST API for that, but I'd
 also would like to avoid using a plugin (I need some more control using
 Spring beans).

 So my question is:

 Would it be a bad idea, turning off the neo4j server, and running a java
 app with an embedded neo4j instance pointing to the storage of the server,
 load it up with all data, and then restart the server? I just wanna be
 clear that I'm not doing something stupid or ugly here :)

 Also, our IDs are all varchars (they came from mongo, so it's a big HEX
 String), is it possible to use a different ID besides long on neo? Or will
 I need to create a property and index it for retrieval?

 Many thanks
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Loading large dataset

2011-11-21 Thread Michael Hunger
Vinicius,

as Peter said, good idea.

Please try to avoid lucene index lookups during the import (use a hashmap cache 
String, Node or String,Long instead).

If you want to have ultrafast import, please use the batch-inserter API,

for an example look here: https://gist.github.com/1375679

Cheers

Michael
 
Am 21.11.2011 um 11:06 schrieb Vinicius Carvalho:

 Hi there! Continuing our trials with neo4j, I need to load a reasonable
 amount of data (250k nodes + 20M relationships) into a neo server.
 
 This data lives in a mySQL db and a mongodb.
 
 For obvious reasons I'm not going to use the REST API for that, but I'd
 also would like to avoid using a plugin (I need some more control using
 Spring beans).
 
 So my question is:
 
 Would it be a bad idea, turning off the neo4j server, and running a java
 app with an embedded neo4j instance pointing to the storage of the server,
 load it up with all data, and then restart the server? I just wanna be
 clear that I'm not doing something stupid or ugly here :)
 
 Also, our IDs are all varchars (they came from mongo, so it's a big HEX
 String), is it possible to use a different ID besides long on neo? Or will
 I need to create a property and index it for retrieval?
 
 Many thanks
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Loading large dataset

2011-11-21 Thread Vinicius Carvalho
Thank you both for helping out. This list is just the best :D

Michael I was considering that, now that you said, I'm definitely going to
do it, use a hashmap to store the nodes as they get inserted, and then
lookup there to create the relations.

I'll have a look at the batch-inserter thanks.

I'm doing a POC at LMI Ericsson, I strongly belive that neo4j is the answer
for our network topology storage, not relational data. I just need to show
some numbers to get more ppl on board, I have *no* doubt that traversing
the network will be 1000x faster on neo than doing hundreds of SQL joins :)

Regards

On Mon, Nov 21, 2011 at 10:42 AM, Michael Hunger 
michael.hun...@neotechnology.com wrote:

 Vinicius,

 as Peter said, good idea.

 Please try to avoid lucene index lookups during the import (use a hashmap
 cache String, Node or String,Long instead).

 If you want to have ultrafast import, please use the batch-inserter API,

 for an example look here: https://gist.github.com/1375679

 Cheers

 Michael

 Am 21.11.2011 um 11:06 schrieb Vinicius Carvalho:

  Hi there! Continuing our trials with neo4j, I need to load a reasonable
  amount of data (250k nodes + 20M relationships) into a neo server.
 
  This data lives in a mySQL db and a mongodb.
 
  For obvious reasons I'm not going to use the REST API for that, but I'd
  also would like to avoid using a plugin (I need some more control using
  Spring beans).
 
  So my question is:
 
  Would it be a bad idea, turning off the neo4j server, and running a java
  app with an embedded neo4j instance pointing to the storage of the
 server,
  load it up with all data, and then restart the server? I just wanna be
  clear that I'm not doing something stupid or ugly here :)
 
  Also, our IDs are all varchars (they came from mongo, so it's a big HEX
  String), is it possible to use a different ID besides long on neo? Or
 will
  I need to create a property and index it for retrieval?
 
  Many thanks
  ___
  Neo4j mailing list
  User@lists.neo4j.org
  https://lists.neo4j.org/mailman/listinfo/user

 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Loading large dataset

2011-11-21 Thread Michael Hunger
Sounds great, if you need any help just ping me.

Yes read performance should soar,

are the numbers you provided (250k nodes + 20M relationships) your real dataset
or what is the data amount that you think you see in production.

You can also answer that off-list :) (to Peter and me).

Cheers

Michael

Am 21.11.2011 um 12:11 schrieb Vinicius Carvalho:

 Thank you both for helping out. This list is just the best :D
 
 Michael I was considering that, now that you said, I'm definitely going to
 do it, use a hashmap to store the nodes as they get inserted, and then
 lookup there to create the relations.
 
 I'll have a look at the batch-inserter thanks.
 
 I'm doing a POC at LMI Ericsson, I strongly belive that neo4j is the answer
 for our network topology storage, not relational data. I just need to show
 some numbers to get more ppl on board, I have *no* doubt that traversing
 the network will be 1000x faster on neo than doing hundreds of SQL joins :)
 
 Regards
 
 On Mon, Nov 21, 2011 at 10:42 AM, Michael Hunger 
 michael.hun...@neotechnology.com wrote:
 
 Vinicius,
 
 as Peter said, good idea.
 
 Please try to avoid lucene index lookups during the import (use a hashmap
 cache String, Node or String,Long instead).
 
 If you want to have ultrafast import, please use the batch-inserter API,
 
 for an example look here: https://gist.github.com/1375679
 
 Cheers
 
 Michael
 
 Am 21.11.2011 um 11:06 schrieb Vinicius Carvalho:
 
 Hi there! Continuing our trials with neo4j, I need to load a reasonable
 amount of data (250k nodes + 20M relationships) into a neo server.
 
 This data lives in a mySQL db and a mongodb.
 
 For obvious reasons I'm not going to use the REST API for that, but I'd
 also would like to avoid using a plugin (I need some more control using
 Spring beans).
 
 So my question is:
 
 Would it be a bad idea, turning off the neo4j server, and running a java
 app with an embedded neo4j instance pointing to the storage of the
 server,
 load it up with all data, and then restart the server? I just wanna be
 clear that I'm not doing something stupid or ugly here :)
 
 Also, our IDs are all varchars (they came from mongo, so it's a big HEX
 String), is it possible to use a different ID besides long on neo? Or
 will
 I need to create a property and index it for retrieval?
 
 Many thanks
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user
 
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user
 
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user