[Neo] Evaluating Neo4J as an enterprise class application

2010-05-02 Thread suryadev vasudev
We are evaluating Neo4J for a business critical application. There will be a
User Interface UI component to browse the graph, create nodes and properties
as well as create/modify relationships. The data set spans across 7 domains
and expected to be around 40 GB.
User will manipulate data in 3 domains. A back end integration is expected
to manage data in remaining 4 domains. I use the word domain to mean
nodes/relationships/attributes that are grouped to perform one activity like
Sale Order, Shipping, Distribution etc. The domains are related to each
other and queries traverse across different domains
We are expecting 500 users per hour to use the system. Each user may
initiate a query once in 2 minutes. Each query is expected to traverse
through 20,000 nodes and collect 10 properties for filtering/display.
I am accountable for implementing this system. You probably know what
accountable means:) Say it is related to Guillotine.
What should I do to convince myself to move forward? Things that come to my
mind are stability, scalability, auditing and monitoring. Stability means
the JVM/application won't crash. Scalability means each user will get
response in 1-2 second for up to 500 users. Auditing means the system
reports its performance for all interactions. Monitoring means health and
performance of the system are made visible.
Comments and pointers to related articles are appreciated.
TIA
SDev
___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo] Evaluating Neo4J as an enterprise class application

2010-05-02 Thread Raul Raja Martinez
We have stress tested neo4j with over 500 concurrent users in a webapp with
a smaller dataset and we found no performance issues.
We even wrap their api in a domain layer that adds some extra overhead.
One thing to keep in mind is that if your data ever grows to a point where
it needs to be distributed among machines you can't do that with
the free version of neo but I think they support it with one of the
commercial licenses.

In my experience so far with Neo +5 months, since it is embedded if you use
java you get a much better experience than using any relational db with a
orm layer such as hibernate. The data is not transported from your db to a
resulset then to a pojo then tou your view objects.
With neo the data may be in memory when you request it and there is no jdbc
layer in between your code and the graph.

We have also purposedly crashed the JVM and app hoping that at some point it
will corrupt the graph and we have been doing this repeteadly at least 10
times a day for the last 5 months. It has always recovered and completed
queued transactions. So far we have not been able to corrupt the graph or
bring it down. Backing up the data is also easy as a copy of the graph
folder is all you need.

PROS

- Fast
- Easy api
- Reliable
- High Performance in our use cases
- The Neo team is fast answering doubts and questions
- No SQL
- Fast relationship traversals, in the relational world this usually means
JOINs which are not very scalable.
- Ideal for scenarios where there are multiple relationships and
interconnected objects

CONS

- Free version is non distributable in multiple machines
- Only one process or JVM can access the graph at a time
- No SQL (if you like sql)
- Filtered traversals where results should be ordered usually require full
scans / traversal then reorder results. This is not scalable when pagination
is required and the results are millions. We have fixed this issue though by
having a separate index for single ordereded relationships.
In a nutshell this is what your typical relational db provides as a btree
index of properties that allows you to query with order by fast. Neo at
the time does not have that so you have to keep your own indexes if you want
ordered traversals. (Not a trivial task to implement)


2010/5/2 suryadev vasudev suryadev.vasu...@gmail.com

 We are evaluating Neo4J for a business critical application. There will be
 a
 User Interface UI component to browse the graph, create nodes and
 properties
 as well as create/modify relationships. The data set spans across 7 domains
 and expected to be around 40 GB.
 User will manipulate data in 3 domains. A back end integration is expected
 to manage data in remaining 4 domains. I use the word domain to mean
 nodes/relationships/attributes that are grouped to perform one activity
 like
 Sale Order, Shipping, Distribution etc. The domains are related to each
 other and queries traverse across different domains
 We are expecting 500 users per hour to use the system. Each user may
 initiate a query once in 2 minutes. Each query is expected to traverse
 through 20,000 nodes and collect 10 properties for filtering/display.
 I am accountable for implementing this system. You probably know what
 accountable means:) Say it is related to Guillotine.
 What should I do to convince myself to move forward? Things that come to my
 mind are stability, scalability, auditing and monitoring. Stability means
 the JVM/application won't crash. Scalability means each user will get
 response in 1-2 second for up to 500 users. Auditing means the system
 reports its performance for all interactions. Monitoring means health and
 performance of the system are made visible.
 Comments and pointers to related articles are appreciated.
 TIA
 SDev
 ___
 Neo mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user




-- 
Raul Raja
___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user