Re: [Neo4j] Database performance difference between batch inserter and normal insert

Tobias Ivarsson Wed, 30 Jun 2010 10:37:33 -0700

Hi Suruchi,

I'll answer each question inline below.

On Wed, Jun 30, 2010 at 6:27 PM, Suruchi Deodhar
<[email protected]>wrote:

> Hello!
>
> I had a few questions regarding Batch insert and normal insert in neo4j:
>
> - All the properties of nodes need to be set initially while creating graph
> db in batch insert mode.Can the values of a subset of the nodes be
> updated/changed later on? Does this lead to any performance issues?
>

Yes, you can update the properties when running in normal operations mode
(using the GraphDatabaseService API), but you could also use the
setNodeProperties(...) and setRelationshipProperties(...) methods in the
batch inserter, this latter option is not what the batch inserter is
optimized for though.

>
> - While creating graph using EmbeddedGraphDatabase, I am reading from
> Oracle
> and creating in increments of around 20000. Nodes get created pretty fast
> (2
> million nodes->10 minutes)
> But while creating relationships in increments of 20000, I get the
> following
> error:
>
> *org.neo4j.kernel.impl.transaction.xaframework.XaLogicalLog close*
>

This is not something I recognize, could you provide a stack trace?

My guess would be that you shut down the GraphDatabase after importing the
nodes, so that when you start importing relationships, it is already closed.

>
> I am not shutting down the database intermediately in the code.
> Is there any other reason becasue of which this error may occur?
>
> - Is there a difference in performance while running queries on a database
> created using batch insert as opposed to one created using
> EmbeddedGraphDatabase? I somehow am seeing significant performance
> difference while running queries on my db created using batch insert in
> comparison to inserts using GraphDatabaseService. This may be because I am
> updating values of some nodes intermediately. Is anyone else facing similar
> issues.
> Do you have any suggestions.
>

If you create the graph using the EmbeddedGraphDatabase API and then start
doing queries right after, you are going to get better performance than if
you create the graph using the BatchInserter API, then start up an
EmbeddedGraphDatabase for queries. This is because the EmbeddedGraphDatabase
will put your nodes and relationships in the cache as you create them,
meaning that large portions of your graph will already be cached when you
start doing queries. Whereas the EmbeddedGraphDatabase that started up cold
will have to start caching as you perform your queries.

The second time you run the queries (on the same GraphDatabase instance)
both cases should report about the same execution times.

Cheers,
-- 
Tobias Ivarsson <[email protected]>
Hacker, Neo Technology
www.neotechnology.com
Cellphone: +46 706 534857
_______________________________________________
Neo4j mailing list
[email protected]
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo4j] Database performance difference between batch inserter and normal insert

Reply via email to