Hi Alican,

On Mon, Oct 31, 2011 at 6:26 AM, algecya <[email protected]>wrote:

> Hello everyone,
>
> We are relatively new to neo4j and are evaluating some test scenarios in
> order to decide to use neo4j in productive systems. We used the latest
> stable release 1.4.2.
>
> I wrote an import script and generated some random data with the given tree
> structure:
>
> http://neo4j-community-discussions.438527.n3.nabble.com/file/n3467806/neo4j_nodes.png
>
> Nodes Summary:
> Nodes with Type A: 1
> Nodes with Type B: 100
> Nodes with Type C: 50'000 (100x500)
> Nodes with Type D: 500'000 (50'000x10)
> Nodes with Type E: 25'000'000 (500'000x50)
> Nodes with Type F: 375'000'000 (25'000'000x15)
>
> This all worked quite OK, the import took approx. 30hours using the
> batchimport.
> We have multiple indexes, but we also have one index where all nodes are
> indexed.
>
> My first question would be, does it make sense to index all nodes with the
> same index?
>

It depends on how you intend you access the data. If you always know the
type, then it would be beneficial to use different indices. Otherwise you
might want to put it all in a single index. Do remember that the index will
consume some disk space as well.


>
> If I would like to list all nodes with property "type":"type E" it is quite
> slow the first time ~270s
> Second time it is fast ~1/2s. I know this is normal and mostlikely fixed in
> the current milestone version. But I am not sure how long the query will be
> cached in memory. Are there any configurations I should be concerned about?
>

The difference there is all about disk access time. Will "give me all 25
million E's" be a common operation?


>
> We also took the hardware sizing calculator. See the result here:
>
> http://neo4j-community-discussions.438527.n3.nabble.com/file/n3467806/neo4j_hardware.png
>
> Are these realistic result values? I guess 128GB RAM and 12TB SSD
> harddrives
> might be a bit cost intense.
>

The reason that the disk usage is 12TB is because you specified that each
node on average has 10kB of data, and each relationship on average has 1kB
of data. What kind of data are you storing on the nodes and relationships?
These are pretty rough estimates not taking into account the number of
properties nor the type of them. Also, if you decrease the property data by
a factor 100 (100B/node, 10B/rel), then your database will only consume
~150-200GB.


>
> Are there any reference applications with these amount of nodes and
> relations?
>

We are in the process of adding case studies. Please get in touch with
sales for more info at this time.


>
> Also Neoclipse won't start/connect to the database anymore with these
> amount
> of data.
> Am I missing some configurations for neoclipse?
>

Are you getting an error message?

Best,
David


>
> Best regards
> --
> alican
>
>
> --
> View this message in context:
> http://neo4j-community-discussions.438527.n3.nabble.com/Neo4j-performance-with-400million-nodes-tp3467806p3467806.html
> Sent from the Neo4j Community Discussions mailing list archive at
> Nabble.com.
> _______________________________________________
> Neo4j mailing list
> [email protected]
> https://lists.neo4j.org/mailman/listinfo/user
>



-- 
David Montag <[email protected]>
Neo Technology, www.neotechnology.com
Cell: 650.556.4411
Skype: ddmontag
_______________________________________________
Neo4j mailing list
[email protected]
https://lists.neo4j.org/mailman/listinfo/user

Reply via email to