Re: [Neo4j] Batch Insert : poooor performance

2011-11-18 Thread Peter Neubauer
Olivier,
please let us know your progress, and feel free to issue a pull
request when you get things working!

Cheers,

/peter neubauer

GTalk:      neubauer.peter
Skype       peter.neubauer
Phone       +46 704 106975
LinkedIn   http://www.linkedin.com/in/neubauer
Twitter      http://twitter.com/peterneubauer

http://www.neo4j.org              - NOSQL for the Enterprise.
http://startupbootcamp.org/    - Öresund - Innovation happens HERE.



On Fri, Nov 18, 2011 at 2:16 PM, ov  wrote:
> Thanks for your answer Michael,
>
> Indeed when creating a relationship between 2 nodes, I need to retrieve neo4j 
> nodeID (from customID) for both nodes ...
> I expected the cache to have a real big effect on this mechanism, but alas ...
>
> For this "small" graph, I suppose I can fully work in RAM, but this surely 
> won't do for a much bigger graph
>
> Thanks a lot,
> I'll try with my own cache mechanism
>
> Regards
>
> Le 18 nov. 2011 à 13:14, Michael Hunger [via Neo4j Community Discussions] a 
> écrit :
>
>> Please try not to use lucene for lookups during batch-inserts just index 
>> your nodes (for later use) but use a custom, in memory cache for the 
>> insertion process.
>>
>> customID -> nodeId, like Map.
>>
>> Using lucene for lookups takes up to 1000 times longer during batch - 
>> inserts (probably, as the merge threads in the background have to finish up 
>> before you can include their
>> results in the query).
>>
>> the luceneBatchInserterIndex.setCacheCapacity() seems not to work as 
>> expected, we will investigate that.
>>
>> Cheers
>>
>> Michael
>>
>> Here is the original post:
>>
>> Hi,
>> I am in almost the same case as a previous post concerning Batch Insert poor 
>> performance
>> but, I still can figure out how to do it correctly with good performances.
>>
>> Nodes: 30 millions
>> Relationships : 250 millions
>>
>> I am on a MacOSX 10.7.1, 4 cpus, 8Go RAM
>> 1) Insert Nodes :
>> JVM -server -d64 -Xmx4G -XX:+UseParNewGC -XX:+UseNUMA -XX:+UseConcMarkSweepGC
>> from 80 000 down to 50 000 inserts / seconds with properties (customID,url)
>> with LuceneIndexing on "customID" and "url"
>> a bit disappointing
>>
>> 2) Insert Relationships
>> JVM -server -d64 -Xmx6G -XX:+UseParNewGC -XX:+UseNUMA -XX:+UseConcMarkSweepGC
>> Index cache capacity 30 000 000 (whole nodes) on customID
>> neostore.nodestore.db.mapped_memory=300M
>> neostore.relationshipstore.db.mapped_memory=1G
>> neostore.propertystore.db.mapped_memory=2.2G
>> neostore.propertystore.db.strings.mapped_memory=100M
>> neostore.propertystore.db.arrays.mapped_memory=10M
>>
>> => insertion rate ~ 50 relationships / seconds
>> and going down ...
>>
>> (many many tests ... but always very poor performances)
>>
>> Do you have any idea, on how to have this work correctly ?
>>
>> I am really stuck here
>>
>> if you want to have a look at my code : no issues ! :)
>>
>> Many many thanks for your help
>>
>> Am 18.11.2011 um 12:47 schrieb Krzysztof Raczyński:
>>
>> > Btw, inserting 600k nodes over REST with about 8 properties in batches
>> > of 100 takes 20-30minutes for me. It's not awesomely fast, but it's
>> > not slow either. What settings are affecting insertion speeds, Peter?
>> > ___
>> > Neo4j mailing list
>> > [hidden email]
>> > https://lists.neo4j.org/mailman/listinfo/user
>>
>> ___
>> Neo4j mailing list
>> [hidden email]
>> https://lists.neo4j.org/mailman/listinfo/user
>>
>>
>> If you reply to this email, your message will be added to the discussion 
>> below:
>> http://neo4j-community-discussions.438527.n3.nabble.com/Batch-Insert-pr-performance-tp3513211p3518444.html
>> To unsubscribe from Batch Insert : pr performance, click here.
>> NAML
>
>
>
> --
> View this message in context: 
> http://neo4j-community-discussions.438527.n3.nabble.com/Batch-Insert-pr-performance-tp3513211p3518559.html
> Sent from the Neo4j Community Discussions mailing list archive at Nabble.com.
> ___
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Batch Insert : poooor performance

2011-11-18 Thread ov
Thanks for your answer Michael,

Indeed when creating a relationship between 2 nodes, I need to retrieve neo4j 
nodeID (from customID) for both nodes ...
I expected the cache to have a real big effect on this mechanism, but alas ...

For this "small" graph, I suppose I can fully work in RAM, but this surely 
won't do for a much bigger graph

Thanks a lot,
I'll try with my own cache mechanism

Regards

Le 18 nov. 2011 à 13:14, Michael Hunger [via Neo4j Community Discussions] a 
écrit :

> Please try not to use lucene for lookups during batch-inserts just index your 
> nodes (for later use) but use a custom, in memory cache for the insertion 
> process. 
> 
> customID -> nodeId, like Map. 
> 
> Using lucene for lookups takes up to 1000 times longer during batch - inserts 
> (probably, as the merge threads in the background have to finish up before 
> you can include their 
> results in the query). 
> 
> the luceneBatchInserterIndex.setCacheCapacity() seems not to work as 
> expected, we will investigate that. 
> 
> Cheers 
> 
> Michael 
> 
> Here is the original post: 
> 
> Hi, 
> I am in almost the same case as a previous post concerning Batch Insert poor 
> performance 
> but, I still can figure out how to do it correctly with good performances. 
> 
> Nodes: 30 millions 
> Relationships : 250 millions 
> 
> I am on a MacOSX 10.7.1, 4 cpus, 8Go RAM 
> 1) Insert Nodes : 
> JVM -server -d64 -Xmx4G -XX:+UseParNewGC -XX:+UseNUMA -XX:+UseConcMarkSweepGC 
> from 80 000 down to 50 000 inserts / seconds with properties (customID,url) 
> with LuceneIndexing on "customID" and "url" 
> a bit disappointing 
> 
> 2) Insert Relationships 
> JVM -server -d64 -Xmx6G -XX:+UseParNewGC -XX:+UseNUMA -XX:+UseConcMarkSweepGC 
> Index cache capacity 30 000 000 (whole nodes) on customID 
> neostore.nodestore.db.mapped_memory=300M 
> neostore.relationshipstore.db.mapped_memory=1G 
> neostore.propertystore.db.mapped_memory=2.2G 
> neostore.propertystore.db.strings.mapped_memory=100M 
> neostore.propertystore.db.arrays.mapped_memory=10M 
> 
> => insertion rate ~ 50 relationships / seconds 
> and going down ... 
> 
> (many many tests ... but always very poor performances) 
> 
> Do you have any idea, on how to have this work correctly ? 
> 
> I am really stuck here 
> 
> if you want to have a look at my code : no issues ! :) 
> 
> Many many thanks for your help 
> 
> Am 18.11.2011 um 12:47 schrieb Krzysztof Raczyński: 
> 
> > Btw, inserting 600k nodes over REST with about 8 properties in batches 
> > of 100 takes 20-30minutes for me. It's not awesomely fast, but it's 
> > not slow either. What settings are affecting insertion speeds, Peter? 
> > ___ 
> > Neo4j mailing list 
> > [hidden email] 
> > https://lists.neo4j.org/mailman/listinfo/user
> 
> ___ 
> Neo4j mailing list 
> [hidden email] 
> https://lists.neo4j.org/mailman/listinfo/user
> 
> 
> If you reply to this email, your message will be added to the discussion 
> below:
> http://neo4j-community-discussions.438527.n3.nabble.com/Batch-Insert-pr-performance-tp3513211p3518444.html
> To unsubscribe from Batch Insert : pr performance, click here.
> NAML



--
View this message in context: 
http://neo4j-community-discussions.438527.n3.nabble.com/Batch-Insert-pr-performance-tp3513211p3518559.html
Sent from the Neo4j Community Discussions mailing list archive at Nabble.com.
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Batch Insert : poooor performance

2011-11-18 Thread Michael Hunger
Please try not to use lucene for lookups during batch-inserts just index your 
nodes (for later use) but use a custom, in memory cache for the insertion 
process.

customID -> nodeId, like Map.

Using lucene for lookups takes up to 1000 times longer during batch - inserts 
(probably, as the merge threads in the background have to finish up before you 
can include their
results in the query).

the luceneBatchInserterIndex.setCacheCapacity() seems not to work as expected, 
we will investigate that.

Cheers

Michael

Here is the original post:

Hi, 
I am in almost the same case as a previous post concerning Batch Insert poor 
performance 
but, I still can figure out how to do it correctly with good performances. 

Nodes: 30 millions 
Relationships : 250 millions 

I am on a MacOSX 10.7.1, 4 cpus, 8Go RAM 
1) Insert Nodes : 
JVM -server -d64 -Xmx4G -XX:+UseParNewGC -XX:+UseNUMA -XX:+UseConcMarkSweepGC 
from 80 000 down to 50 000 inserts / seconds with properties (customID,url) 
with LuceneIndexing on "customID" and "url" 
a bit disappointing 

2) Insert Relationships 
JVM -server -d64 -Xmx6G -XX:+UseParNewGC -XX:+UseNUMA -XX:+UseConcMarkSweepGC 
Index cache capacity 30 000 000 (whole nodes) on customID 
neostore.nodestore.db.mapped_memory=300M 
neostore.relationshipstore.db.mapped_memory=1G 
neostore.propertystore.db.mapped_memory=2.2G 
neostore.propertystore.db.strings.mapped_memory=100M 
neostore.propertystore.db.arrays.mapped_memory=10M 

=> insertion rate ~ 50 relationships / seconds 
and going down ... 

(many many tests ... but always very poor performances) 

Do you have any idea, on how to have this work correctly ? 

I am really stuck here 

if you want to have a look at my code : no issues ! :) 

Many many thanks for your help 

Am 18.11.2011 um 12:47 schrieb Krzysztof Raczyński:

> Btw, inserting 600k nodes over REST with about 8 properties in batches
> of 100 takes 20-30minutes for me. It's not awesomely fast, but it's
> not slow either. What settings are affecting insertion speeds, Peter?
> ___
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Batch Insert : poooor performance

2011-11-18 Thread Rick Bullotta
That seems about normal.  The good news is that it is much faster (usually) 
than an RDBMS on the same hardware.

-Original Message-
From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On 
Behalf Of Krzysztof Raczynski
Sent: Friday, November 18, 2011 6:47 AM
To: Neo4j user discussions
Subject: Re: [Neo4j] Batch Insert : pr performance

Btw, inserting 600k nodes over REST with about 8 properties in batches
of 100 takes 20-30minutes for me. It's not awesomely fast, but it's
not slow either. What settings are affecting insertion speeds, Peter?
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Batch Insert : poooor performance

2011-11-18 Thread Krzysztof Raczyński
Btw, inserting 600k nodes over REST with about 8 properties in batches
of 100 takes 20-30minutes for me. It's not awesomely fast, but it's
not slow either. What settings are affecting insertion speeds, Peter?
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Batch Insert : poooor performance

2011-11-18 Thread Peter Neubauer
Yes,
I think you should resend your original post that got stuck...
On Nov 18, 2011 12:40 PM, "Krzysztof Raczyński"  wrote:

> Of course providing some more context would be poor too? How are
> we supposed to know what's the problem?
> ___
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Batch Insert : poooor performance

2011-11-18 Thread Krzysztof Raczyński
Of course providing some more context would be poor too? How are
we supposed to know what's the problem?
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Batch Insert : poooor performance

2011-11-18 Thread ov

Any one ?

--
View this message in context: 
http://neo4j-community-discussions.438527.n3.nabble.com/Batch-Insert-pr-performance-tp3513211p3518340.html
Sent from the Neo4j Community Discussions mailing list archive at Nabble.com.
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user