Re: [Neo4j] How to boost performance?

Michael Hunger Wed, 23 Nov 2011 06:15:47 -0800

Just make sure that it is just a snapshot of the data and doesn't update its 
caches.


Otherwise you will run into synchronization issues.

See also this thread and Tobias' explanations around it:
http://neo4j-community-discussions.438527.n3.nabble.com/Neo4j-Synchronization-of-EmbeddedReadOnlyGraphDatabase-Bug-td3174626.html#a3213450

Michael

Am 23.11.2011 um 15:05 schrieb Vinicius Carvalho:

> But wouldn't it mean that I need to have exclusive lock on the db? I would
> like to keep the server running pointing at the same data directory.
> 
> Regards
> 
> On Wed, Nov 23, 2011 at 1:50 PM, Michael Hunger <
> michael.hun...@neotechnology.com> wrote:
> 
>> Please use EmbeddedGraphDatabase,
>> 
>> EmbeddedReadOnlyGraphDatabase caches a snapshot of the data in its caches
>> and doesn't get update-changes.
>> 
>> Michael
>> 
>> Am 23.11.2011 um 14:39 schrieb Vinicius Carvalho:
>> 
>>> Hi Michael, thanks. The data load was fine, I've used your script with
>> the
>>> BathInserter. Memory footprint was really slow, I think the peak was
>> 200mb
>>> of heap usage. I did something really retarded and left a logger.info,
>>> which slowed things a bit, but the process was really smooth.
>>> 
>>> Many thanks on the help with the query. I'll try this, I'm putting the
>>> readonlyembedded neo inside our app right now. I expect to see some good
>>> performance boost :)
>>> 
>>> Best Regards
>>> 
>>> On Wed, Nov 23, 2011 at 12:12 PM, Michael Hunger <
>>> michael.hun...@neotechnology.com> wrote:
>>> 
>>>> Vinicius,
>>>> 
>>>> first: did you have any issues importing the data into Neo4j?
>>>> second: your example used cypher which is not optimized for performance
>>>> (yet!). This is in our plans for the next two releases of neo4j.
>>>> 
>>>> So if you want to see the real performance of neo4j, please use the
>>>> traversal framework or the core-API:
>>>> 
>>>> Cypher & Traversals:
>>>> 
>>>> // define
>>>> cypherQuery = cypherParser.parse("start n=node({start_node}) match
>>>> n-->()-->x return x")
>>>> traversalQuery =
>>>> 
>> Traversal.description().evaluator(Evaluators.atDepth(2)).expand(Traversal.expanderForAllTypes(Direction.OUTGOING))
>>>> 
>>>> // execute
>>>> for (Node n : cypherQuery.execute({"start_node":startNode})) { ... }
>>>> for (Node n : traversalQuery.traverse(startNode).nodes()) { ... }
>>>> 
>>>> If you're interested in the paths, remove the ".nodes()" call at the
>>>> traverser
>>>> 
>>>> In java core-api code:
>>>> 
>>>> Node start=db.getNodeById(3);
>>>> 
>>>> for (Relationship rel=start.getRelationships()) {
>>>>  Node second = rel.getOtherNode(start);
>>>>  for (Relationship rel=second.getRelationships()) {
>>>>      Node third = rel.getOtherNode(second);
>>>>      // do something with the 3 nodes, 2 relationships which form your
>>>> path
>>>>  }
>>>> }
>>>> 
>>>> In the REST API the traversal would look like: (see
>>>> 
>> http://docs.neo4j.org/chunked/snapshot/rest-api-traverse.html#rest-api-traversal-using-a-return-filter
>>>> )
>>>>  * POST http://localhost:7474/db/data/node/3/traverse/node
>>>>  * Accept: application/json
>>>>  * Content-Type: application/json
>>>> 
>>>> {
>>>> "relationships" : [ {"direction" : "out" } ],
>>>> "max_depth" : 3
>>>> }
>>>> 
>>>> 
>>>> Am 23.11.2011 um 11:54 schrieb Vinicius Carvalho:
>>>> 
>>>>> Hi there, I've posted a few days ago about the POC I'm doing here at my
>>>>> company. I have some initial numbers and I'd like to ask for some help
>>>> here
>>>>> in order to promote neo4j here in LMI Ericsson.
>>>>> 
>>>>> I've loaded a mySQL db with a really simple entity, that pretty much
>> only
>>>>> represents a node and relations (only properties it has is an UID and
>> x/y
>>>>> space coordinate for each node)
>>>>> 
>>>>> The DB contains 250.000 cells and 19. relations stored in a myISAM
>> table,
>>>>> indexed only by it's primary key. Please find the DDL for the two
>> tables.
>>>>> 
>>>>> CREATE TABLE  `pci`.`cells` (
>>>>> `id` varchar(32) collate utf8_bin NOT NULL,
>>>>> `x_pos` double default NULL,
>>>>> `y_pos` double default NULL,
>>>>> `pci` smallint(6) default '0',
>>>>> PRIMARY KEY  (`id`)
>>>>> )
>>>>> 
>>>>> CREATE TABLE  `pci`.`relations` (
>>>>> `id` int(11) NOT NULL auto_increment,
>>>>> `source` varchar(32) collate utf8_bin default NULL,
>>>>> `target` varchar(32) collate utf8_bin default NULL,
>>>>> PRIMARY KEY  (`id`),
>>>>> KEY `src_idx` (`source`),
>>>>> KEY `src_target` (`target`)
>>>>> )
>>>>> 
>>>>> So as you can see, a simple secondary table contains the relationship
>>>> with
>>>>> source and targets pointing to the cells table.
>>>>> 
>>>>> I've loaded this exact same DB into a neoserver running on the same
>>>>> machine: A Blade with 26 cpus (6 cores each) and 16gb RAM.
>>>>> 
>>>>> One of the requirements we have is to find all associations of my
>>>>> associations. Something that in neo I did like this:
>>>>> 
>>>>> START n = node(3)
>>>>> MATCH n-->()-->(x)
>>>>> return x
>>>>> 
>>>>> For this specific node it returns 6475 nodes.
>>>>> 
>>>>> I have tested this before using Hibernate in two modes: without a L2
>>>> cache,
>>>>> and with an L2 Cache (Ehcache standalone no replication).
>>>>> Here's a snippet of the code that loads it, so you can understand
>> what's
>>>>> going under the hood:
>>>>> 
>>>>> 
>>>>> @Override
>>>>> public List<Cell> loadCellWithRealtions(String... ids) {
>>>>> Session session = (Session) em.getDelegate();
>>>>> Criteria c = session.createCriteria(Cell.class)
>>>>> .setFetchMode("incomingRelations", FetchMode.SELECT)
>>>>> .setFetchMode("outgoingRelations", FetchMode.SELECT)
>>>>> .add(Restrictions.in("id", Arrays.asList(ids)));
>>>>> List<Cell> results = c.list();
>>>>> for(Cell cell : results){
>>>>> Hibernate.initialize(cell.getIncomingRelations());
>>>>> Hibernate.initialize(cell.getOutgoingRelations());
>>>>> }
>>>>> return results;
>>>>> }
>>>>> 
>>>>> @Override
>>>>> public List<Cell> loadCellWithNeighbourRelations(String... ids) {
>>>>> List<Cell> cells = loadCellWithRealtions(ids);
>>>>> for(Cell c : cells){
>>>>> for(Relation r : c.getIncomingRelations()){
>>>>> Hibernate.initialize(r.getSource().getIncomingRelations());
>>>>> Hibernate.initialize(r.getSource().getOutgoingRelations());
>>>>> }
>>>>> for(Relation r : c.getOutgoingRelations()){
>>>>> Hibernate.initialize(r.getTarget().getIncomingRelations());
>>>>> Hibernate.initialize(r.getTarget().getOutgoingRelations());
>>>>> }
>>>>> }
>>>>> return cells;
>>>>> }
>>>>> 
>>>>> 
>>>>> 
>>>>> So the first method executes one query and 2 subselects to find a cell
>>>> and
>>>>> all relations, the second method, iterate over each relation and do the
>>>>> same. So I pretty much will have something like 3+r*3 selects on db,
>>>> where
>>>>> r is the number of relations right.
>>>>> 
>>>>> Ok, to be a bit fair with the tests, I've ran this for the same node 10
>>>>> times (get a chance to warm the caches), exclude the longest and
>> smallest
>>>>> result, and then took a mean of it. Here's the results:
>>>>> 
>>>>> EhCache: 70ms
>>>>> Plain Hibernate: 550ms
>>>>> 
>>>>> I still don't have a version of neo4j code running integrated in the
>> app
>>>>> server, but the idea is to use REST API. Running the query on the REST
>>>> API
>>>>> took over 2 seconds on average, but due the large size of the response,
>>>>> network lagging was the issue. So I ran the same query 10 times using
>> the
>>>>> web console, and the average time for neo was 300ms
>>>>> 
>>>>> Before asking anything I do know that we will have more complex queries
>>>>> where neo will shine, but I need to improve those results in order to
>>>> sell
>>>>> it here :), with those numbers, ppl will just say that having a cache
>> and
>>>>> using Relational model would suffice.
>>>>> 
>>>>> Anything I could do to improve this?
>>>>> 
>>>>> Regards
>>>>> _______________________________________________
>>>>> Neo4j mailing list
>>>>> User@lists.neo4j.org
>>>>> https://lists.neo4j.org/mailman/listinfo/user
>>>> 
>>>> _______________________________________________
>>>> Neo4j mailing list
>>>> User@lists.neo4j.org
>>>> https://lists.neo4j.org/mailman/listinfo/user
>>>> 
>>> _______________________________________________
>>> Neo4j mailing list
>>> User@lists.neo4j.org
>>> https://lists.neo4j.org/mailman/listinfo/user
>> 
>> _______________________________________________
>> Neo4j mailing list
>> User@lists.neo4j.org
>> https://lists.neo4j.org/mailman/listinfo/user
>> 
> _______________________________________________
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user

_______________________________________________
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo4j] How to boost performance?

Reply via email to