Re: [Neo4j] Loading large dataset

2011-11-21 Thread Michael Hunger
Sounds great, if you need any help just ping me.

Yes read performance should soar,

are the numbers you provided (250k nodes + 20M relationships) your real dataset
or what is the data amount that you think you see in production.

You can also answer that off-list :) (to Peter and me).

Cheers

Michael

Am 21.11.2011 um 12:11 schrieb Vinicius Carvalho:

> Thank you both for helping out. This list is just the best :D
> 
> Michael I was considering that, now that you said, I'm definitely going to
> do it, use a hashmap to store the nodes as they get inserted, and then
> lookup there to create the relations.
> 
> I'll have a look at the batch-inserter thanks.
> 
> I'm doing a POC at LMI Ericsson, I strongly belive that neo4j is the answer
> for our network topology storage, not relational data. I just need to show
> some numbers to get more ppl on board, I have *no* doubt that traversing
> the network will be 1000x faster on neo than doing hundreds of SQL joins :)
> 
> Regards
> 
> On Mon, Nov 21, 2011 at 10:42 AM, Michael Hunger <
> michael.hun...@neotechnology.com> wrote:
> 
>> Vinicius,
>> 
>> as Peter said, good idea.
>> 
>> Please try to avoid lucene index lookups during the import (use a hashmap
>> cache  or  instead).
>> 
>> If you want to have ultrafast import, please use the batch-inserter API,
>> 
>> for an example look here: https://gist.github.com/1375679
>> 
>> Cheers
>> 
>> Michael
>> 
>> Am 21.11.2011 um 11:06 schrieb Vinicius Carvalho:
>> 
>>> Hi there! Continuing our trials with neo4j, I need to load a reasonable
>>> amount of data (250k nodes + 20M relationships) into a neo server.
>>> 
>>> This data lives in a mySQL db and a mongodb.
>>> 
>>> For obvious reasons I'm not going to use the REST API for that, but I'd
>>> also would like to avoid using a plugin (I need some more control using
>>> Spring beans).
>>> 
>>> So my question is:
>>> 
>>> Would it be a bad idea, turning off the neo4j server, and running a java
>>> app with an embedded neo4j instance pointing to the storage of the
>> server,
>>> load it up with all data, and then restart the server? I just wanna be
>>> clear that I'm not doing something stupid or ugly here :)
>>> 
>>> Also, our IDs are all varchars (they came from mongo, so it's a big HEX
>>> String), is it possible to use a different ID besides long on neo? Or
>> will
>>> I need to create a property and index it for retrieval?
>>> 
>>> Many thanks
>>> ___
>>> Neo4j mailing list
>>> User@lists.neo4j.org
>>> https://lists.neo4j.org/mailman/listinfo/user
>> 
>> ___
>> Neo4j mailing list
>> User@lists.neo4j.org
>> https://lists.neo4j.org/mailman/listinfo/user
>> 
> ___
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Loading large dataset

2011-11-21 Thread Vinicius Carvalho
Thank you both for helping out. This list is just the best :D

Michael I was considering that, now that you said, I'm definitely going to
do it, use a hashmap to store the nodes as they get inserted, and then
lookup there to create the relations.

I'll have a look at the batch-inserter thanks.

I'm doing a POC at LMI Ericsson, I strongly belive that neo4j is the answer
for our network topology storage, not relational data. I just need to show
some numbers to get more ppl on board, I have *no* doubt that traversing
the network will be 1000x faster on neo than doing hundreds of SQL joins :)

Regards

On Mon, Nov 21, 2011 at 10:42 AM, Michael Hunger <
michael.hun...@neotechnology.com> wrote:

> Vinicius,
>
> as Peter said, good idea.
>
> Please try to avoid lucene index lookups during the import (use a hashmap
> cache  or  instead).
>
> If you want to have ultrafast import, please use the batch-inserter API,
>
> for an example look here: https://gist.github.com/1375679
>
> Cheers
>
> Michael
>
> Am 21.11.2011 um 11:06 schrieb Vinicius Carvalho:
>
> > Hi there! Continuing our trials with neo4j, I need to load a reasonable
> > amount of data (250k nodes + 20M relationships) into a neo server.
> >
> > This data lives in a mySQL db and a mongodb.
> >
> > For obvious reasons I'm not going to use the REST API for that, but I'd
> > also would like to avoid using a plugin (I need some more control using
> > Spring beans).
> >
> > So my question is:
> >
> > Would it be a bad idea, turning off the neo4j server, and running a java
> > app with an embedded neo4j instance pointing to the storage of the
> server,
> > load it up with all data, and then restart the server? I just wanna be
> > clear that I'm not doing something stupid or ugly here :)
> >
> > Also, our IDs are all varchars (they came from mongo, so it's a big HEX
> > String), is it possible to use a different ID besides long on neo? Or
> will
> > I need to create a property and index it for retrieval?
> >
> > Many thanks
> > ___
> > Neo4j mailing list
> > User@lists.neo4j.org
> > https://lists.neo4j.org/mailman/listinfo/user
>
> ___
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Loading large dataset

2011-11-21 Thread Michael Hunger
Vinicius,

as Peter said, good idea.

Please try to avoid lucene index lookups during the import (use a hashmap cache 
 or  instead).

If you want to have ultrafast import, please use the batch-inserter API,

for an example look here: https://gist.github.com/1375679

Cheers

Michael
 
Am 21.11.2011 um 11:06 schrieb Vinicius Carvalho:

> Hi there! Continuing our trials with neo4j, I need to load a reasonable
> amount of data (250k nodes + 20M relationships) into a neo server.
> 
> This data lives in a mySQL db and a mongodb.
> 
> For obvious reasons I'm not going to use the REST API for that, but I'd
> also would like to avoid using a plugin (I need some more control using
> Spring beans).
> 
> So my question is:
> 
> Would it be a bad idea, turning off the neo4j server, and running a java
> app with an embedded neo4j instance pointing to the storage of the server,
> load it up with all data, and then restart the server? I just wanna be
> clear that I'm not doing something stupid or ugly here :)
> 
> Also, our IDs are all varchars (they came from mongo, so it's a big HEX
> String), is it possible to use a different ID besides long on neo? Or will
> I need to create a property and index it for retrieval?
> 
> Many thanks
> ___
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Loading large dataset

2011-11-21 Thread Peter Neubauer
Vinicius,
doing the import in Java is a VERY sane idea, go for it. As for the custom
IDs, we are goind to address the issue further down the roadmap, for now I
think an index is your best option since these are non-scalar values if I
understand it right?

For my next lab day, I would love to test out in-graph structures for
indexing scalar custom IDs, e.g. build up a B-Tree or so and see what that
means. Welcome to join in :)

/peter


On Mon, Nov 21, 2011 at 11:06 AM, Vinicius Carvalho  wrote:

> Hi there! Continuing our trials with neo4j, I need to load a reasonable
> amount of data (250k nodes + 20M relationships) into a neo server.
>
> This data lives in a mySQL db and a mongodb.
>
> For obvious reasons I'm not going to use the REST API for that, but I'd
> also would like to avoid using a plugin (I need some more control using
> Spring beans).
>
> So my question is:
>
> Would it be a bad idea, turning off the neo4j server, and running a java
> app with an embedded neo4j instance pointing to the storage of the server,
> load it up with all data, and then restart the server? I just wanna be
> clear that I'm not doing something stupid or ugly here :)
>
> Also, our IDs are all varchars (they came from mongo, so it's a big HEX
> String), is it possible to use a different ID besides long on neo? Or will
> I need to create a property and index it for retrieval?
>
> Many thanks
> ___
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


[Neo4j] Loading large dataset

2011-11-21 Thread Vinicius Carvalho
Hi there! Continuing our trials with neo4j, I need to load a reasonable
amount of data (250k nodes + 20M relationships) into a neo server.

This data lives in a mySQL db and a mongodb.

For obvious reasons I'm not going to use the REST API for that, but I'd
also would like to avoid using a plugin (I need some more control using
Spring beans).

So my question is:

Would it be a bad idea, turning off the neo4j server, and running a java
app with an embedded neo4j instance pointing to the storage of the server,
load it up with all data, and then restart the server? I just wanna be
clear that I'm not doing something stupid or ugly here :)

Also, our IDs are all varchars (they came from mongo, so it's a big HEX
String), is it possible to use a different ID besides long on neo? Or will
I need to create a property and index it for retrieval?

Many thanks
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user