Re: [Neo4j] Neo4j Tuning for specific application

Amir Hossein Jadidinejad Fri, 16 Jul 2010 06:55:02 -0700

OK.
I think it's better if we have an "InMemoryEmbeddedGraphDatabase", derive from 
"EmbeddedGraphDatabase" that load the whole graph in memory. It seems that the 
current interface is not appropriate for all applications.





________________________________
From: Mattias Persson <[email protected]>
To: Neo4j user discussions <[email protected]>
Sent: Fri, July 16, 2010 5:53:08 PM
Subject: Re: [Neo4j] Neo4j Tuning for specific application

2010/7/15, Amir Hossein Jadidinejad <[email protected]>:
> Hi,
> I have checked all the mentioned issues. But currently it's too slow!
> I takes 1sec for each node in order to get a list of its neighbors. The disk
> is
> overloaded while the memory is free!
> The following is my running command:
> java -d64 -server -XX:+UseNUMA -XX:+UseConcMarkSweepGC -Xmx4096m -classpath
>"$CLASSPATH:../lib/geronimo-jta_1.1_spec-1.1.1.jar:../lib/jline-0.9.94.jar:../lib/lucene-core-2.9.2.jar:../lib/mysql-connector-java-5.1.7-bin.jar:../lib/neo4j-commons-1.0.jar:../lib/neo4j-index-1.1-20100714.135430-157.jar:../lib/neo4j-kernel-1.1-20100714.134745-137.jar:../lib/neo4j-remote-graphdb-0.7-20100714.140411-116.jar:../lib/neo4j-shell-1.1-20100714.140808-144.jar:../lib/neo4j-utils-1.0.jar:../lib/servlet-api.jar:../lib/trove.jar:../lib/weka.jar:."
>"
>  org.graph.InferenceEngine

Although this has nothing with performance to do, please remove the
neo4j-commons (deprecated component) and use neo4j-utils-1.1-SNAPSHOT
instead of version 1.0

>
>
> and the following is the configuration parameters:
> neostore.nodestore.db.mapped_memory=120M
> neostore.relationshipstore.db.mapped_memory=5G
> neostore.propertystore.db.mapped_memory=100M
> neostore.propertystore.db.strings.mapped_memory=200M
> neostore.propertystore.db.arrays.mapped_memory=0M
>
> What's the problem?!
>
>
>
> ________________________________
> From: Mattias Persson <[email protected]>
> To: Neo4j user discussions <[email protected]>
> Sent: Sat, July 10, 2010 11:46:47 PM
> Subject: Re: [Neo4j] Neo4j Tuning for specific application
>
> Are you using kernel/index version 1.0? Regarding the index lookups (each
> lookup in its own separate transaction): I think there's a bug in
> neo4j-index 1.0 which causes such a transaction (which contains a call to
> index.getNodes) to write stuff to and flush the logical log, which of course
> is completely unnecessary. That may very well be the cause of the disk being
> so heavily used.
>
> What you could try is to update to latest kernel/index version 1.1-SNAPSHOT
> where this problem have been fixed, also in that version you aren't forced
> to wrap reads in transactions. If you cannot update to latest 1.1-SNAPSHOT
> then try to do more "cui"s in the each transaction.
>
> 2010/7/10 Arjen van der Meijden <[email protected]>
>
>> Hi Amir,
>>
>> I'm just starting with neo4j, but saw some issues with your code from a
>> normal java-standpoint. Please note, some of them are just
>> micro-optimizations that may not matter much. But a lot of them are in
>> your critical path, so perhaps they're worth a look.
>>
>> On 10-7-2010 17:59 Amir Hossein Jadidinejad wrote:
>> > Hi,
>> > I have a GraphDB with the following attributes:
>> > Number of nodes: 3.6M
>> > Number of relation types: 2
>> > Total size of DB: 9GB
>> >      lucene : 160MB
>> >      neostore.nodestore.db : 31MB
>> >      neostore.propertystore.db : 2GB
>> >      neostore.propertystore.db.strings : 4GB
>> >      neostore.relationshipstore.db : 1.5GB
>> >
>> > Machine characteristics:
>> >      vm.dirty_background_ratio = 50
>> >      vm.dirty_ratio = 80
>> >      OS: Ubuntu x64
>> >      CPU: Corei7
>> >      MEM: 12GB
>> >
>> > The following is our running scenario (The source code is attached):
>> > 1. Iterate over all nodes and extract a list of node IDs ("fillNodes"
>> function).
>> > 2. For each node ID, initiate a worker thread that process the following
>> items
>> > (8 threads are executed in parallel using a pool - "walk" function):
>> >      -extract relationships of this node.
>> >      -perform a light processing.
>> >      -update results (in a ConcurrentHashMap).
>> >
>> > Note that:
>> >      -The above scenario is iterative. Roughly it runs 10 times.
>> >      -No update is applied to the DB during running (read only).
>> >
>> > After running the application:
>> >      -Less than 4GB/12GB of memory is occupied. It seems that Neo4j is
>> leveraged
>> > only 2GB of memory.
>>
>> What jvm-flags did you specify? I take it, you didn't forget to include
>> a high -Xmx, to allow more memory and perhaps the parallel 'old
>> generation' garbage collector to allow more throughput. Otherwise, most
>> 64-bit jvm's start with system-dependent maximums (afaik at most 2GB).
>>
>> >      -The hard disk is overloaded.
>> >      -Only less than 20% of 8 cores is utilized in average.
>>
>> What is your disk doing? Reading, writing, seeking? (see iostat, iotop
>> or similar tools, if its seeking, you see just a few mb/sec reads and no
>> writes). You have actually 4 real cores, the other 4 are just
>> hyperthread-cores which may alleviate some of the work-load if you're
>> cpu-bound. If you're disk-bound, you may actually overwhelm your disk
>> even further with all the additional threads.
>>
>> >
>> > Some documents are available in the wiki regarding performance
>> (Performance
>> > Guide, Configuration Settings, Linux Performance Guide). They are so
>> general.
>> > Would you please instruct me to have a better memory map and speed up my
>> > application?
>> > I can benchmark different configurations and reflect the results in the
>> wiki for
>> > future users.
>> > Kind regards,
>> > Amir
>>
>> Have you checked which parts of your application take a long time? Is it
>> the fillNodes as well as the walk-methods? Or only the walk-variant. A
>> decent profile may be useful.
>>
>> Are you sure you need to first retrieve all nodes, than store the
>> cui-property in a hashset and than re-retrieve that same node via the
>> index? It sounds to me, it should be possible to actually start working
>> on the node right away? Or are your multiple threads (and thus the
>> separate transactions) working against you here?
>>
>> Apart from that, I see a few things that may actually cost a bit of
>> performance:
>> - You're storing unique (?) values in a hashset, to iterate them later.
>> An arraylist is faster for this scenario and uses less memory.
>> - You're boxing and unboxing Double's continuously to and from double's
>> (for instance your 'temp' and 'result'). I don't know how many of these
>> the jvm is able to optimize away, but preventing them to begin with may
>> save a few cpu-cycles per iteration.
>> - You're recalculating 1 - alpha needlessly.
>> - You're using Math.pow rather than diff = v1 - v2; diff_value += diff *
>> diff, the latter has no serious mathematical side-effects (afaik) and
>> should be a bit faster.
>> - You're starting many transactions (for each cui you process), without
>> modifying your graph. I've no idea how heavy these are (relative to the
>> rest of your application), so you may or may not have a need to reduce
>> the amount of transactions. With the above mention of a list it should
>> be relatively easy to adjust your walkerthread to process several cui's
>> rather than just one by using sublist's.
>> - You're retrieving the reference node for each iteration, rather than
>> just once outside the loop in fillNodes.
>> - Why are you converting the weight-property to a string, to then
>> convert it to a Double? If its stored as a string, perhaps it'd be a
>> good idea to change it to a Double?
>> - Perhaps the cui-value can also be stored in a more efficient storage
>> format (long?), thus saving space and memory.
>> - Why are you filling v_star if you're not using the result?
>>
>> Best regards and good luck,
>>
>> Arjen
>>
>> PS, shouldn't a random walk do some random stuff?
>> _______________________________________________
>> Neo4j mailing list
>> [email protected]
>> https://lists.neo4j.org/mailman/listinfo/user
>>
>
>
>
> --
> Mattias Persson, [[email protected]]
> Hacker, Neo Technology
> www.neotechnology.com
> _______________________________________________
> Neo4j mailing list
> [email protected]
> https://lists.neo4j.org/mailman/listinfo/user
>
>
>
>
> _______________________________________________
> Neo4j mailing list
> [email protected]
> https://lists.neo4j.org/mailman/listinfo/user
>


-- 
Mattias Persson, [[email protected]]
Hacker, Neo Technology
www.neotechnology.com
_______________________________________________
Neo4j mailing list
[email protected]
https://lists.neo4j.org/mailman/listinfo/user



      
_______________________________________________
Neo4j mailing list
[email protected]
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo4j] Neo4j Tuning for specific application

Reply via email to