Re: [Neo4j] Neo4j performance with 400million nodes

2011-11-06 Thread Anders Nawroth
Hi alican,

I just want to report back that I was able to reproduce the problem and 
narrow down the cause a bit. Seems the UI and DB threads are waiting for 
each other ... haven't got around to fix it though.

/anders

2011-11-02 07:08, algecya skrev:
 Hi anders,
 appreciate your offer very much! It is good to know that the neo4j community
 is very active and involved.

 http://neo4j-community-discussions.438527.n3.nabble.com/file/n3472966/BatchImportData.groovy
 BatchImportData.groovy

 Here is the import script. it is a stripped version of the graph I used for
 testing. If you need more data, just increase the variable 'amountTypeA' at
 line 26.

 --
 alican


 --
 View this message in context: 
 http://neo4j-community-discussions.438527.n3.nabble.com/Neo4j-performance-with-400million-nodes-tp3467806p3472966.html
 Sent from the Neo4j Community Discussions mailing list archive at Nabble.com.
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Neo4j performance with 400million nodes

2011-11-06 Thread algecya
anders, thank you very much for reporting back and looking at it!
Good luck fixing the bug then 
--
alican

--
View this message in context: 
http://neo4j-community-discussions.438527.n3.nabble.com/Neo4j-performance-with-400million-nodes-tp3467806p3486237.html
Sent from the Neo4j Community Discussions mailing list archive at Nabble.com.
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Neo4j performance with 400million nodes

2011-11-02 Thread algecya
Hi anders,
appreciate your offer very much! It is good to know that the neo4j community
is very active and involved. 

http://neo4j-community-discussions.438527.n3.nabble.com/file/n3472966/BatchImportData.groovy
BatchImportData.groovy 

Here is the import script. it is a stripped version of the graph I used for
testing. If you need more data, just increase the variable 'amountTypeA' at
line 26. 

--
alican


--
View this message in context: 
http://neo4j-community-discussions.438527.n3.nabble.com/Neo4j-performance-with-400million-nodes-tp3467806p3472966.html
Sent from the Neo4j Community Discussions mailing list archive at Nabble.com.
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Neo4j performance with 400million nodes

2011-11-01 Thread Alican Gecyasar
hello david,
thank you for the quick reply! appreciate it very much.



Am 01.11.2011 01:01, schrieb David Montag:
 Hi Alican,

 On Mon, Oct 31, 2011 at 6:26 AM, algecyaalican.gecya...@openconcept.chwrote:

 Hello everyone,

 We are relatively new to neo4j and are evaluating some test scenarios in
 order to decide to use neo4j in productive systems. We used the latest
 stable release 1.4.2.

 I wrote an import script and generated some random data with the given tree
 structure:

 http://neo4j-community-discussions.438527.n3.nabble.com/file/n3467806/neo4j_nodes.png

 Nodes Summary:
 Nodes with Type A: 1
 Nodes with Type B: 100
 Nodes with Type C: 50'000 (100x500)
 Nodes with Type D: 500'000 (50'000x10)
 Nodes with Type E: 25'000'000 (500'000x50)
 Nodes with Type F: 375'000'000 (25'000'000x15)

 This all worked quite OK, the import took approx. 30hours using the
 batchimport.
 We have multiple indexes, but we also have one index where all nodes are
 indexed.

 My first question would be, does it make sense to index all nodes with the
 same index?

 It depends on how you intend you access the data. If you always know the
 type, then it would be beneficial to use different indices. Otherwise you
 might want to put it all in a single index. Do remember that the index will
 consume some disk space as well.
ok, we decided to create a type node for each type and let the nodes 
relate to it. (Instead of having the type as an attribute at each node) 
I guess I was thinking too much in relational database schemes.
therefore we will have an index per type.



 If I would like to list all nodes with property type:type E it is quite
 slow the first time ~270s
 Second time it is fast ~1/2s. I know this is normal and mostlikely fixed in
 the current milestone version. But I am not sure how long the query will be
 cached in memory. Are there any configurations I should be concerned about?

 The difference there is all about disk access time. Will give me all 25
 million E's be a common operation?
We will need to find nodes with common attributes of type E , which may 
return approx.  1million results. But there will always be a search for 
different values.
E.g., nodes with type E have an attribute date created and an attribute 
name. I will need to find all attributes created at the given date(say 
year 2011) and the given name (abc).
The second search will be date (2011) and name (def). If certain time 
passes and memory is being used for other searches, I am afraid my first 
search (2011,abc) will be kicked out of memory and the search will take 
long again the next time I query for it.



 We also took the hardware sizing calculator. See the result here:

 http://neo4j-community-discussions.438527.n3.nabble.com/file/n3467806/neo4j_hardware.png

 Are these realistic result values? I guess 128GB RAM and 12TB SSD
 harddrives
 might be a bit cost intense.

 The reason that the disk usage is 12TB is because you specified that each
 node on average has 10kB of data, and each relationship on average has 1kB
 of data. What kind of data are you storing on the nodes and relationships?
 These are pretty rough estimates not taking into account the number of
 properties nor the type of them. Also, if you decrease the property data by
 a factor 100 (100B/node, 10B/rel), then your database will only consume
 ~150-200GB.
Ok I see your point. I think I am getting the hang of graph-based 
databases now. I.e., I might not want to put all my data into attributes 
but create nodes instead...
My rough guess was to increase the amount of nodes to a 1'000'000'000 
and decrease the bytes consumed to 100B/node and 10/rel. The result is 
to have approx. 400GB (no problem at all).
But I am still a bit concerned about the 128GB RAM..


 Are there any reference applications with these amount of nodes and
 relations?

 We are in the process of adding case studies. Please get in touch with
 sales for more info at this time.
Thank you, will do so.


 Also Neoclipse won't start/connect to the database anymore with these
 amount
 of data.
 Am I missing some configurations for neoclipse?

 Are you getting an error message?
No error messages. Is there an option to enable logging?
I let neoclipse run for almost an hour and suddenly the graph appeared. 
But I can not navigate(its like frozen, but there are calculations going 
on..)
Not so sure why it takes so long though, the initial traversal depth is 
1, there are 16 nodes and 15 relations. I also decreased the amount of 
nodes to be displayed to 50.
I thought It would load data lazily?


Best regards
alican





 Best,
 David


 Best regards
 --
 alican


 --
 View this message in context:
 http://neo4j-community-discussions.438527.n3.nabble.com/Neo4j-performance-with-400million-nodes-tp3467806p3467806.html
 Sent from the Neo4j Community Discussions mailing list archive at
 Nabble.com.
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org

Re: [Neo4j] Neo4j performance with 400million nodes

2011-11-01 Thread Anders Nawroth
Hi!

 Also Neoclipse won't start/connect to the database anymore with these
 amount
 of data.
 Am I missing some configurations for neoclipse?

 Are you getting an error message?
 No error messages. Is there an option to enable logging?
 I let neoclipse run for almost an hour and suddenly the graph appeared.
 But I can not navigate(its like frozen, but there are calculations going
 on..)
 Not so sure why it takes so long though, the initial traversal depth is
 1, there are 16 nodes and 15 relations. I also decreased the amount of
 nodes to be displayed to 50.
 I thought It would load data lazily?

If you start Neoclipse from the command line you may see some extra 
output there. Also, inside the Neoclipse directory there's a workspace 
directory, and inside that you'll find .metadata/.log

Just a thought: how many relationship types are there?

/anders
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Neo4j performance with 400million nodes

2011-11-01 Thread Jim Webber
Hi Alican,

 But I am still a bit concerned about the 128GB RAM..

You can run it on less of course. You could run it on your laptop and it would 
still work.

However Neo4j is clever in its use of RAM. The more RAM you can allocate to 
Neo4j, the more chance that database reads can come straight from memory rather 
than spending potentially milliseconds going to mechanical disk, yielding 
thousands of traversals per second rather than millions.

So more RAM = less disk hits (statistically) which is where you'll get huge 
read performance benefits. Less RAM means more likelihood of going to disk.

All things being equal, with 128GB RAM you can cache a lot of your dataset in 
main memory. Perhaps even all your *active* dataset in fact (since it's about a 
quarter the size of your full dataset). That's going to give you blistering 
performance.

Jim
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Neo4j performance with 400million nodes

2011-11-01 Thread Alican Gecyasar
hey anders!
Thanks for the pointers.

Am 01.11.2011 09:49, schrieb Anders Nawroth:
 Hi!

 Also Neoclipse won't start/connect to the database anymore with these
 amount
 of data.
 Am I missing some configurations for neoclipse?

 Are you getting an error message?
 No error messages. Is there an option to enable logging?
 I let neoclipse run for almost an hour and suddenly the graph appeared.
 But I can not navigate(its like frozen, but there are calculations going
 on..)
 Not so sure why it takes so long though, the initial traversal depth is
 1, there are 16 nodes and 15 relations. I also decreased the amount of
 nodes to be displayed to 50.
 I thought It would load data lazily?
 If you start Neoclipse from the command line you may see some extra
 output there. Also, inside the Neoclipse directory there's a workspace
 directory, and inside that you'll find .metadata/.log
There is no such directory (I am using neo4j 1.4.2)  (I am aware that it 
is supposed to be a hidden directory)

There was no extra output on the shell, but after another hour I got a 
java out of mem heap size exception.
Which was my first guess anyways.

But I just dont see why, since it is supposed to load only the first 16 
nodes. All Relation types are in the graph already. - 6 Relation Types 
all in all.




 Just a thought: how many relationship types are there?

 /anders
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

-- 
alican
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Neo4j performance with 400million nodes

2011-11-01 Thread Alican Gecyasar
hey jim! Thanks for the thoughts.

I know I could run it on less RAM, it's not a matter of can or cannot. I 
am also aware that the more RAM the better the performance.
But my question is more: how will it perform with less RAM, say 32GB. 
Every system is quite fast with this amount of RAM.

I am not sure if we can convince our customer to invest into 3x128GB 
RAM. (productive system, staging system, test system)
Especially since there is not yet any reference application, which would 
guarantee an acceptable performance with this kind of system.



Am 01.11.2011 10:27, schrieb Jim Webber:
 Hi Alican,

 But I am still a bit concerned about the 128GB RAM..
 You can run it on less of course. You could run it on your laptop and it 
 would still work.

 However Neo4j is clever in its use of RAM. The more RAM you can allocate to 
 Neo4j, the more chance that database reads can come straight from memory 
 rather than spending potentially milliseconds going to mechanical disk, 
 yielding thousands of traversals per second rather than millions.

 So more RAM = less disk hits (statistically) which is where you'll get huge 
 read performance benefits. Less RAM means more likelihood of going to disk.

 All things being equal, with 128GB RAM you can cache a lot of your dataset in 
 main memory. Perhaps even all your *active* dataset in fact (since it's about 
 a quarter the size of your full dataset). That's going to give you blistering 
 performance.

 Jim
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

-- 
alican
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Neo4j performance with 400million nodes

2011-11-01 Thread Anders Nawroth
Hi!

 If you start Neoclipse from the command line you may see some extra
 output there. Also, inside the Neoclipse directory there's a workspace
 directory, and inside that you'll find .metadata/.log
 There is no such directory (I am using neo4j 1.4.2)  (I am aware that it
 is supposed to be a hidden directory)

Ok, then the directory was created in the current directory when you 
started Neoclipse the first time. It's probably named just workspace. 
Later versions will always put it in the neoclipse dir (fixed after the 
1.4.x cycle, apparently).


 There was no extra output on the shell, but after another hour I got a
 java out of mem heap size exception.
 Which was my first guess anyways.

 But I just dont see why, since it is supposed to load only the first 16
 nodes. All Relation types are in the graph already. -  6 Relation Types
 all in all.

Seems like you hit a bug then. If you send me code to generate a graph 
like yours, I'll try it out.


/anders





 Just a thought: how many relationship types are there?

 /anders
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Neo4j performance with 400million nodes

2011-11-01 Thread Michael Hunger
Alican,

we have other customers with that RAM sizes. 

It is always about the size of your hot data-set (i.e. cached) the better you 
understand your use-cases, the better you can estimate the # of nodes and rels 
(and their properties) that have to be cached.

Michael

Am 01.11.2011 um 10:53 schrieb Alican Gecyasar:

 hey jim! Thanks for the thoughts.
 
 I know I could run it on less RAM, it's not a matter of can or cannot. I 
 am also aware that the more RAM the better the performance.
 But my question is more: how will it perform with less RAM, say 32GB. 
 Every system is quite fast with this amount of RAM.
 
 I am not sure if we can convince our customer to invest into 3x128GB 
 RAM. (productive system, staging system, test system)
 Especially since there is not yet any reference application, which would 
 guarantee an acceptable performance with this kind of system.
 
 
 
 Am 01.11.2011 10:27, schrieb Jim Webber:
 Hi Alican,
 
 But I am still a bit concerned about the 128GB RAM..
 You can run it on less of course. You could run it on your laptop and it 
 would still work.
 
 However Neo4j is clever in its use of RAM. The more RAM you can allocate to 
 Neo4j, the more chance that database reads can come straight from memory 
 rather than spending potentially milliseconds going to mechanical disk, 
 yielding thousands of traversals per second rather than millions.
 
 So more RAM = less disk hits (statistically) which is where you'll get huge 
 read performance benefits. Less RAM means more likelihood of going to disk.
 
 All things being equal, with 128GB RAM you can cache a lot of your dataset 
 in main memory. Perhaps even all your *active* dataset in fact (since it's 
 about a quarter the size of your full dataset). That's going to give you 
 blistering performance.
 
 Jim
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user
 
 -- 
 alican
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


[Neo4j] Neo4j performance with 400million nodes

2011-10-31 Thread algecya
Hello everyone,

We are relatively new to neo4j and are evaluating some test scenarios in
order to decide to use neo4j in productive systems. We used the latest
stable release 1.4.2.

I wrote an import script and generated some random data with the given tree
structure:
http://neo4j-community-discussions.438527.n3.nabble.com/file/n3467806/neo4j_nodes.png
 

Nodes Summary:
Nodes with Type A: 1
Nodes with Type B: 100
Nodes with Type C: 50'000 (100x500)
Nodes with Type D: 500'000 (50'000x10)
Nodes with Type E: 25'000'000 (500'000x50)
Nodes with Type F: 375'000'000 (25'000'000x15)

This all worked quite OK, the import took approx. 30hours using the
batchimport.
We have multiple indexes, but we also have one index where all nodes are
indexed.

My first question would be, does it make sense to index all nodes with the
same index?

If I would like to list all nodes with property type:type E it is quite
slow the first time ~270s
Second time it is fast ~1/2s. I know this is normal and mostlikely fixed in
the current milestone version. But I am not sure how long the query will be
cached in memory. Are there any configurations I should be concerned about?

We also took the hardware sizing calculator. See the result here:
http://neo4j-community-discussions.438527.n3.nabble.com/file/n3467806/neo4j_hardware.png
 

Are these realistic result values? I guess 128GB RAM and 12TB SSD harddrives
might be a bit cost intense.

Are there any reference applications with these amount of nodes and
relations?

Also Neoclipse won't start/connect to the database anymore with these amount
of data.
Am I missing some configurations for neoclipse?

Best regards
-- 
alican


--
View this message in context: 
http://neo4j-community-discussions.438527.n3.nabble.com/Neo4j-performance-with-400million-nodes-tp3467806p3467806.html
Sent from the Neo4j Community Discussions mailing list archive at Nabble.com.
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user