Re: [Neo4j] Neo4j Tuning for specific application

Mattias Persson Fri, 16 Jul 2010 08:51:54 -0700

2010/7/16, Amir Hossein Jadidinejad <amir.jad...@yahoo.com>:
> OK.
> I think it's better if we have an "InMemoryEmbeddedGraphDatabase", derive
> from
> "EmbeddedGraphDatabase" that load the whole graph in memory. It seems that
> the
> current interface is not appropriate for all applications.
>
>
yep, an in-memory graph db would be handy in some cases (testing and such).


How exactely is the interface not appropriate for all applications?
It'd be great to hear more specific details about your view on that.
And also, are you thinking of the GraphDatabaseService interface or
the EmbeddedGraphDatabase implementation in particular?
>
>
> ________________________________
> From: Mattias Persson <matt...@neotechnology.com>
> To: Neo4j user discussions <user@lists.neo4j.org>
> Sent: Fri, July 16, 2010 5:53:08 PM
> Subject: Re: [Neo4j] Neo4j Tuning for specific application
>
> 2010/7/15, Amir Hossein Jadidinejad <amir.jad...@yahoo.com>:
>> Hi,
>> I have checked all the mentioned issues. But currently it's too slow!
>> I takes 1sec for each node in order to get a list of its neighbors. The
>> disk
>> is
>> overloaded while the memory is free!
>> The following is my running command:
>> java -d64 -server -XX:+UseNUMA -XX:+UseConcMarkSweepGC -Xmx4096m
>> -classpath
>>"$CLASSPATH:../lib/geronimo-jta_1.1_spec-1.1.1.jar:../lib/jline-0.9.94.jar:../lib/lucene-core-2.9.2.jar:../lib/mysql-connector-java-5.1.7-bin.jar:../lib/neo4j-commons-1.0.jar:../lib/neo4j-index-1.1-20100714.135430-157.jar:../lib/neo4j-kernel-1.1-20100714.134745-137.jar:../lib/neo4j-remote-graphdb-0.7-20100714.140411-116.jar:../lib/neo4j-shell-1.1-20100714.140808-144.jar:../lib/neo4j-utils-1.0.jar:../lib/servlet-api.jar:../lib/trove.jar:../lib/weka.jar:."
>>"
>>  org.graph.InferenceEngine
>
> Although this has nothing with performance to do, please remove the
> neo4j-commons (deprecated component) and use neo4j-utils-1.1-SNAPSHOT
> instead of version 1.0
>
>>
>>
>> and the following is the configuration parameters:
>> neostore.nodestore.db.mapped_memory=120M
>> neostore.relationshipstore.db.mapped_memory=5G
>> neostore.propertystore.db.mapped_memory=100M
>> neostore.propertystore.db.strings.mapped_memory=200M
>> neostore.propertystore.db.arrays.mapped_memory=0M
>>
>> What's the problem?!
>>
>>
>>
>> ________________________________
>> From: Mattias Persson <matt...@neotechnology.com>
>> To: Neo4j user discussions <user@lists.neo4j.org>
>> Sent: Sat, July 10, 2010 11:46:47 PM
>> Subject: Re: [Neo4j] Neo4j Tuning for specific application
>>
>> Are you using kernel/index version 1.0? Regarding the index lookups (each
>> lookup in its own separate transaction): I think there's a bug in
>> neo4j-index 1.0 which causes such a transaction (which contains a call to
>> index.getNodes) to write stuff to and flush the logical log, which of
>> course
>> is completely unnecessary. That may very well be the cause of the disk
>> being
>> so heavily used.
>>
>> What you could try is to update to latest kernel/index version
>> 1.1-SNAPSHOT
>> where this problem have been fixed, also in that version you aren't forced
>> to wrap reads in transactions. If you cannot update to latest 1.1-SNAPSHOT
>> then try to do more "cui"s in the each transaction.
>>
>> 2010/7/10 Arjen van der Meijden <acmmail...@tweakers.net>
>>
>>> Hi Amir,
>>>
>>> I'm just starting with neo4j, but saw some issues with your code from a
>>> normal java-standpoint. Please note, some of them are just
>>> micro-optimizations that may not matter much. But a lot of them are in
>>> your critical path, so perhaps they're worth a look.
>>>
>>> On 10-7-2010 17:59 Amir Hossein Jadidinejad wrote:
>>> > Hi,
>>> > I have a GraphDB with the following attributes:
>>> > Number of nodes: 3.6M
>>> > Number of relation types: 2
>>> > Total size of DB: 9GB
>>> >      lucene : 160MB
>>> >      neostore.nodestore.db : 31MB
>>> >      neostore.propertystore.db : 2GB
>>> >      neostore.propertystore.db.strings : 4GB
>>> >      neostore.relationshipstore.db : 1.5GB
>>> >
>>> > Machine characteristics:
>>> >      vm.dirty_background_ratio = 50
>>> >      vm.dirty_ratio = 80
>>> >      OS: Ubuntu x64
>>> >      CPU: Corei7
>>> >      MEM: 12GB
>>> >
>>> > The following is our running scenario (The source code is attached):
>>> > 1. Iterate over all nodes and extract a list of node IDs ("fillNodes"
>>> function).
>>> > 2. For each node ID, initiate a worker thread that process the
>>> > following
>>> items
>>> > (8 threads are executed in parallel using a pool - "walk" function):
>>> >      -extract relationships of this node.
>>> >      -perform a light processing.
>>> >      -update results (in a ConcurrentHashMap).
>>> >
>>> > Note that:
>>> >      -The above scenario is iterative. Roughly it runs 10 times.
>>> >      -No update is applied to the DB during running (read only).
>>> >
>>> > After running the application:
>>> >      -Less than 4GB/12GB of memory is occupied. It seems that Neo4j is
>>> leveraged
>>> > only 2GB of memory.
>>>
>>> What jvm-flags did you specify? I take it, you didn't forget to include
>>> a high -Xmx, to allow more memory and perhaps the parallel 'old
>>> generation' garbage collector to allow more throughput. Otherwise, most
>>> 64-bit jvm's start with system-dependent maximums (afaik at most 2GB).
>>>
>>> >      -The hard disk is overloaded.
>>> >      -Only less than 20% of 8 cores is utilized in average.
>>>
>>> What is your disk doing? Reading, writing, seeking? (see iostat, iotop
>>> or similar tools, if its seeking, you see just a few mb/sec reads and no
>>> writes). You have actually 4 real cores, the other 4 are just
>>> hyperthread-cores which may alleviate some of the work-load if you're
>>> cpu-bound. If you're disk-bound, you may actually overwhelm your disk
>>> even further with all the additional threads.
>>>
>>> >
>>> > Some documents are available in the wiki regarding performance
>>> (Performance
>>> > Guide, Configuration Settings, Linux Performance Guide). They are so
>>> general.
>>> > Would you please instruct me to have a better memory map and speed up
>>> > my
>>> > application?
>>> > I can benchmark different configurations and reflect the results in the
>>> wiki for
>>> > future users.
>>> > Kind regards,
>>> > Amir
>>>
>>> Have you checked which parts of your application take a long time? Is it
>>> the fillNodes as well as the walk-methods? Or only the walk-variant. A
>>> decent profile may be useful.
>>>
>>> Are you sure you need to first retrieve all nodes, than store the
>>> cui-property in a hashset and than re-retrieve that same node via the
>>> index? It sounds to me, it should be possible to actually start working
>>> on the node right away? Or are your multiple threads (and thus the
>>> separate transactions) working against you here?
>>>
>>> Apart from that, I see a few things that may actually cost a bit of
>>> performance:
>>> - You're storing unique (?) values in a hashset, to iterate them later.
>>> An arraylist is faster for this scenario and uses less memory.
>>> - You're boxing and unboxing Double's continuously to and from double's
>>> (for instance your 'temp' and 'result'). I don't know how many of these
>>> the jvm is able to optimize away, but preventing them to begin with may
>>> save a few cpu-cycles per iteration.
>>> - You're recalculating 1 - alpha needlessly.
>>> - You're using Math.pow rather than diff = v1 - v2; diff_value += diff *
>>> diff, the latter has no serious mathematical side-effects (afaik) and
>>> should be a bit faster.
>>> - You're starting many transactions (for each cui you process), without
>>> modifying your graph. I've no idea how heavy these are (relative to the
>>> rest of your application), so you may or may not have a need to reduce
>>> the amount of transactions. With the above mention of a list it should
>>> be relatively easy to adjust your walkerthread to process several cui's
>>> rather than just one by using sublist's.
>>> - You're retrieving the reference node for each iteration, rather than
>>> just once outside the loop in fillNodes.
>>> - Why are you converting the weight-property to a string, to then
>>> convert it to a Double? If its stored as a string, perhaps it'd be a
>>> good idea to change it to a Double?
>>> - Perhaps the cui-value can also be stored in a more efficient storage
>>> format (long?), thus saving space and memory.
>>> - Why are you filling v_star if you're not using the result?
>>>
>>> Best regards and good luck,
>>>
>>> Arjen
>>>
>>> PS, shouldn't a random walk do some random stuff?
>>> _______________________________________________
>>> Neo4j mailing list
>>> User@lists.neo4j.org
>>> https://lists.neo4j.org/mailman/listinfo/user
>>>
>>
>>
>>
>> --
>> Mattias Persson, [matt...@neotechnology.com]
>> Hacker, Neo Technology
>> www.neotechnology.com
>> _______________________________________________
>> Neo4j mailing list
>> User@lists.neo4j.org
>> https://lists.neo4j.org/mailman/listinfo/user
>>
>>
>>
>>
>> _______________________________________________
>> Neo4j mailing list
>> User@lists.neo4j.org
>> https://lists.neo4j.org/mailman/listinfo/user
>>
>
>
> --
> Mattias Persson, [matt...@neotechnology.com]
> Hacker, Neo Technology
> www.neotechnology.com
> _______________________________________________
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>
>
>
>
> _______________________________________________
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>


-- 
Mattias Persson, [matt...@neotechnology.com]
Hacker, Neo Technology
www.neotechnology.com
_______________________________________________
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo4j] Neo4j Tuning for specific application

Reply via email to