OK. I think it's better if we have an "InMemoryEmbeddedGraphDatabase", derive from "EmbeddedGraphDatabase" that load the whole graph in memory. It seems that the current interface is not appropriate for all applications.
________________________________ From: Mattias Persson <[email protected]> To: Neo4j user discussions <[email protected]> Sent: Fri, July 16, 2010 5:53:08 PM Subject: Re: [Neo4j] Neo4j Tuning for specific application 2010/7/15, Amir Hossein Jadidinejad <[email protected]>: > Hi, > I have checked all the mentioned issues. But currently it's too slow! > I takes 1sec for each node in order to get a list of its neighbors. The disk > is > overloaded while the memory is free! > The following is my running command: > java -d64 -server -XX:+UseNUMA -XX:+UseConcMarkSweepGC -Xmx4096m -classpath >"$CLASSPATH:../lib/geronimo-jta_1.1_spec-1.1.1.jar:../lib/jline-0.9.94.jar:../lib/lucene-core-2.9.2.jar:../lib/mysql-connector-java-5.1.7-bin.jar:../lib/neo4j-commons-1.0.jar:../lib/neo4j-index-1.1-20100714.135430-157.jar:../lib/neo4j-kernel-1.1-20100714.134745-137.jar:../lib/neo4j-remote-graphdb-0.7-20100714.140411-116.jar:../lib/neo4j-shell-1.1-20100714.140808-144.jar:../lib/neo4j-utils-1.0.jar:../lib/servlet-api.jar:../lib/trove.jar:../lib/weka.jar:." >" > org.graph.InferenceEngine Although this has nothing with performance to do, please remove the neo4j-commons (deprecated component) and use neo4j-utils-1.1-SNAPSHOT instead of version 1.0 > > > and the following is the configuration parameters: > neostore.nodestore.db.mapped_memory=120M > neostore.relationshipstore.db.mapped_memory=5G > neostore.propertystore.db.mapped_memory=100M > neostore.propertystore.db.strings.mapped_memory=200M > neostore.propertystore.db.arrays.mapped_memory=0M > > What's the problem?! > > > > ________________________________ > From: Mattias Persson <[email protected]> > To: Neo4j user discussions <[email protected]> > Sent: Sat, July 10, 2010 11:46:47 PM > Subject: Re: [Neo4j] Neo4j Tuning for specific application > > Are you using kernel/index version 1.0? Regarding the index lookups (each > lookup in its own separate transaction): I think there's a bug in > neo4j-index 1.0 which causes such a transaction (which contains a call to > index.getNodes) to write stuff to and flush the logical log, which of course > is completely unnecessary. That may very well be the cause of the disk being > so heavily used. > > What you could try is to update to latest kernel/index version 1.1-SNAPSHOT > where this problem have been fixed, also in that version you aren't forced > to wrap reads in transactions. If you cannot update to latest 1.1-SNAPSHOT > then try to do more "cui"s in the each transaction. > > 2010/7/10 Arjen van der Meijden <[email protected]> > >> Hi Amir, >> >> I'm just starting with neo4j, but saw some issues with your code from a >> normal java-standpoint. Please note, some of them are just >> micro-optimizations that may not matter much. But a lot of them are in >> your critical path, so perhaps they're worth a look. >> >> On 10-7-2010 17:59 Amir Hossein Jadidinejad wrote: >> > Hi, >> > I have a GraphDB with the following attributes: >> > Number of nodes: 3.6M >> > Number of relation types: 2 >> > Total size of DB: 9GB >> > lucene : 160MB >> > neostore.nodestore.db : 31MB >> > neostore.propertystore.db : 2GB >> > neostore.propertystore.db.strings : 4GB >> > neostore.relationshipstore.db : 1.5GB >> > >> > Machine characteristics: >> > vm.dirty_background_ratio = 50 >> > vm.dirty_ratio = 80 >> > OS: Ubuntu x64 >> > CPU: Corei7 >> > MEM: 12GB >> > >> > The following is our running scenario (The source code is attached): >> > 1. Iterate over all nodes and extract a list of node IDs ("fillNodes" >> function). >> > 2. For each node ID, initiate a worker thread that process the following >> items >> > (8 threads are executed in parallel using a pool - "walk" function): >> > -extract relationships of this node. >> > -perform a light processing. >> > -update results (in a ConcurrentHashMap). >> > >> > Note that: >> > -The above scenario is iterative. Roughly it runs 10 times. >> > -No update is applied to the DB during running (read only). >> > >> > After running the application: >> > -Less than 4GB/12GB of memory is occupied. It seems that Neo4j is >> leveraged >> > only 2GB of memory. >> >> What jvm-flags did you specify? I take it, you didn't forget to include >> a high -Xmx, to allow more memory and perhaps the parallel 'old >> generation' garbage collector to allow more throughput. Otherwise, most >> 64-bit jvm's start with system-dependent maximums (afaik at most 2GB). >> >> > -The hard disk is overloaded. >> > -Only less than 20% of 8 cores is utilized in average. >> >> What is your disk doing? Reading, writing, seeking? (see iostat, iotop >> or similar tools, if its seeking, you see just a few mb/sec reads and no >> writes). You have actually 4 real cores, the other 4 are just >> hyperthread-cores which may alleviate some of the work-load if you're >> cpu-bound. If you're disk-bound, you may actually overwhelm your disk >> even further with all the additional threads. >> >> > >> > Some documents are available in the wiki regarding performance >> (Performance >> > Guide, Configuration Settings, Linux Performance Guide). They are so >> general. >> > Would you please instruct me to have a better memory map and speed up my >> > application? >> > I can benchmark different configurations and reflect the results in the >> wiki for >> > future users. >> > Kind regards, >> > Amir >> >> Have you checked which parts of your application take a long time? Is it >> the fillNodes as well as the walk-methods? Or only the walk-variant. A >> decent profile may be useful. >> >> Are you sure you need to first retrieve all nodes, than store the >> cui-property in a hashset and than re-retrieve that same node via the >> index? It sounds to me, it should be possible to actually start working >> on the node right away? Or are your multiple threads (and thus the >> separate transactions) working against you here? >> >> Apart from that, I see a few things that may actually cost a bit of >> performance: >> - You're storing unique (?) values in a hashset, to iterate them later. >> An arraylist is faster for this scenario and uses less memory. >> - You're boxing and unboxing Double's continuously to and from double's >> (for instance your 'temp' and 'result'). I don't know how many of these >> the jvm is able to optimize away, but preventing them to begin with may >> save a few cpu-cycles per iteration. >> - You're recalculating 1 - alpha needlessly. >> - You're using Math.pow rather than diff = v1 - v2; diff_value += diff * >> diff, the latter has no serious mathematical side-effects (afaik) and >> should be a bit faster. >> - You're starting many transactions (for each cui you process), without >> modifying your graph. I've no idea how heavy these are (relative to the >> rest of your application), so you may or may not have a need to reduce >> the amount of transactions. With the above mention of a list it should >> be relatively easy to adjust your walkerthread to process several cui's >> rather than just one by using sublist's. >> - You're retrieving the reference node for each iteration, rather than >> just once outside the loop in fillNodes. >> - Why are you converting the weight-property to a string, to then >> convert it to a Double? If its stored as a string, perhaps it'd be a >> good idea to change it to a Double? >> - Perhaps the cui-value can also be stored in a more efficient storage >> format (long?), thus saving space and memory. >> - Why are you filling v_star if you're not using the result? >> >> Best regards and good luck, >> >> Arjen >> >> PS, shouldn't a random walk do some random stuff? >> _______________________________________________ >> Neo4j mailing list >> [email protected] >> https://lists.neo4j.org/mailman/listinfo/user >> > > > > -- > Mattias Persson, [[email protected]] > Hacker, Neo Technology > www.neotechnology.com > _______________________________________________ > Neo4j mailing list > [email protected] > https://lists.neo4j.org/mailman/listinfo/user > > > > > _______________________________________________ > Neo4j mailing list > [email protected] > https://lists.neo4j.org/mailman/listinfo/user > -- Mattias Persson, [[email protected]] Hacker, Neo Technology www.neotechnology.com _______________________________________________ Neo4j mailing list [email protected] https://lists.neo4j.org/mailman/listinfo/user _______________________________________________ Neo4j mailing list [email protected] https://lists.neo4j.org/mailman/listinfo/user

