Re: [Neo4j] How to boost performance?

Vinicius Carvalho Wed, 23 Nov 2011 06:05:34 -0800

But wouldn't it mean that I need to have exclusive lock on the db? I would
like to keep the server running pointing at the same data directory.


Regards

On Wed, Nov 23, 2011 at 1:50 PM, Michael Hunger <
michael.hun...@neotechnology.com> wrote:

> Please use EmbeddedGraphDatabase,
>
> EmbeddedReadOnlyGraphDatabase caches a snapshot of the data in its caches
> and doesn't get update-changes.
>
> Michael
>
> Am 23.11.2011 um 14:39 schrieb Vinicius Carvalho:
>
> > Hi Michael, thanks. The data load was fine, I've used your script with
> the
> > BathInserter. Memory footprint was really slow, I think the peak was
> 200mb
> > of heap usage. I did something really retarded and left a logger.info,
> > which slowed things a bit, but the process was really smooth.
> >
> > Many thanks on the help with the query. I'll try this, I'm putting the
> > readonlyembedded neo inside our app right now. I expect to see some good
> > performance boost :)
> >
> > Best Regards
> >
> > On Wed, Nov 23, 2011 at 12:12 PM, Michael Hunger <
> > michael.hun...@neotechnology.com> wrote:
> >
> >> Vinicius,
> >>
> >> first: did you have any issues importing the data into Neo4j?
> >> second: your example used cypher which is not optimized for performance
> >> (yet!). This is in our plans for the next two releases of neo4j.
> >>
> >> So if you want to see the real performance of neo4j, please use the
> >> traversal framework or the core-API:
> >>
> >> Cypher & Traversals:
> >>
> >> // define
> >> cypherQuery = cypherParser.parse("start n=node({start_node}) match
> >> n-->()-->x return x")
> >> traversalQuery =
> >>
> Traversal.description().evaluator(Evaluators.atDepth(2)).expand(Traversal.expanderForAllTypes(Direction.OUTGOING))
> >>
> >> // execute
> >> for (Node n : cypherQuery.execute({"start_node":startNode})) { ... }
> >> for (Node n : traversalQuery.traverse(startNode).nodes()) { ... }
> >>
> >> If you're interested in the paths, remove the ".nodes()" call at the
> >> traverser
> >>
> >> In java core-api code:
> >>
> >> Node start=db.getNodeById(3);
> >>
> >> for (Relationship rel=start.getRelationships()) {
> >>   Node second = rel.getOtherNode(start);
> >>   for (Relationship rel=second.getRelationships()) {
> >>       Node third = rel.getOtherNode(second);
> >>       // do something with the 3 nodes, 2 relationships which form your
> >> path
> >>   }
> >> }
> >>
> >> In the REST API the traversal would look like: (see
> >>
> http://docs.neo4j.org/chunked/snapshot/rest-api-traverse.html#rest-api-traversal-using-a-return-filter
> >> )
> >>   * POST http://localhost:7474/db/data/node/3/traverse/node
> >>   * Accept: application/json
> >>   * Content-Type: application/json
> >>
> >> {
> >> "relationships" : [ {"direction" : "out" } ],
> >> "max_depth" : 3
> >> }
> >>
> >>
> >> Am 23.11.2011 um 11:54 schrieb Vinicius Carvalho:
> >>
> >>> Hi there, I've posted a few days ago about the POC I'm doing here at my
> >>> company. I have some initial numbers and I'd like to ask for some help
> >> here
> >>> in order to promote neo4j here in LMI Ericsson.
> >>>
> >>> I've loaded a mySQL db with a really simple entity, that pretty much
> only
> >>> represents a node and relations (only properties it has is an UID and
> x/y
> >>> space coordinate for each node)
> >>>
> >>> The DB contains 250.000 cells and 19. relations stored in a myISAM
> table,
> >>> indexed only by it's primary key. Please find the DDL for the two
> tables.
> >>>
> >>> CREATE TABLE  `pci`.`cells` (
> >>> `id` varchar(32) collate utf8_bin NOT NULL,
> >>> `x_pos` double default NULL,
> >>> `y_pos` double default NULL,
> >>> `pci` smallint(6) default '0',
> >>> PRIMARY KEY  (`id`)
> >>> )
> >>>
> >>> CREATE TABLE  `pci`.`relations` (
> >>> `id` int(11) NOT NULL auto_increment,
> >>> `source` varchar(32) collate utf8_bin default NULL,
> >>> `target` varchar(32) collate utf8_bin default NULL,
> >>> PRIMARY KEY  (`id`),
> >>> KEY `src_idx` (`source`),
> >>> KEY `src_target` (`target`)
> >>> )
> >>>
> >>> So as you can see, a simple secondary table contains the relationship
> >> with
> >>> source and targets pointing to the cells table.
> >>>
> >>> I've loaded this exact same DB into a neoserver running on the same
> >>> machine: A Blade with 26 cpus (6 cores each) and 16gb RAM.
> >>>
> >>> One of the requirements we have is to find all associations of my
> >>> associations. Something that in neo I did like this:
> >>>
> >>> START n = node(3)
> >>> MATCH n-->()-->(x)
> >>> return x
> >>>
> >>> For this specific node it returns 6475 nodes.
> >>>
> >>> I have tested this before using Hibernate in two modes: without a L2
> >> cache,
> >>> and with an L2 Cache (Ehcache standalone no replication).
> >>> Here's a snippet of the code that loads it, so you can understand
> what's
> >>> going under the hood:
> >>>
> >>>
> >>> @Override
> >>> public List<Cell> loadCellWithRealtions(String... ids) {
> >>> Session session = (Session) em.getDelegate();
> >>> Criteria c = session.createCriteria(Cell.class)
> >>> .setFetchMode("incomingRelations", FetchMode.SELECT)
> >>> .setFetchMode("outgoingRelations", FetchMode.SELECT)
> >>> .add(Restrictions.in("id", Arrays.asList(ids)));
> >>> List<Cell> results = c.list();
> >>> for(Cell cell : results){
> >>> Hibernate.initialize(cell.getIncomingRelations());
> >>> Hibernate.initialize(cell.getOutgoingRelations());
> >>> }
> >>> return results;
> >>> }
> >>>
> >>> @Override
> >>> public List<Cell> loadCellWithNeighbourRelations(String... ids) {
> >>> List<Cell> cells = loadCellWithRealtions(ids);
> >>> for(Cell c : cells){
> >>> for(Relation r : c.getIncomingRelations()){
> >>> Hibernate.initialize(r.getSource().getIncomingRelations());
> >>> Hibernate.initialize(r.getSource().getOutgoingRelations());
> >>> }
> >>> for(Relation r : c.getOutgoingRelations()){
> >>> Hibernate.initialize(r.getTarget().getIncomingRelations());
> >>> Hibernate.initialize(r.getTarget().getOutgoingRelations());
> >>> }
> >>> }
> >>> return cells;
> >>> }
> >>>
> >>>
> >>>
> >>> So the first method executes one query and 2 subselects to find a cell
> >> and
> >>> all relations, the second method, iterate over each relation and do the
> >>> same. So I pretty much will have something like 3+r*3 selects on db,
> >> where
> >>> r is the number of relations right.
> >>>
> >>> Ok, to be a bit fair with the tests, I've ran this for the same node 10
> >>> times (get a chance to warm the caches), exclude the longest and
> smallest
> >>> result, and then took a mean of it. Here's the results:
> >>>
> >>> EhCache: 70ms
> >>> Plain Hibernate: 550ms
> >>>
> >>> I still don't have a version of neo4j code running integrated in the
> app
> >>> server, but the idea is to use REST API. Running the query on the REST
> >> API
> >>> took over 2 seconds on average, but due the large size of the response,
> >>> network lagging was the issue. So I ran the same query 10 times using
> the
> >>> web console, and the average time for neo was 300ms
> >>>
> >>> Before asking anything I do know that we will have more complex queries
> >>> where neo will shine, but I need to improve those results in order to
> >> sell
> >>> it here :), with those numbers, ppl will just say that having a cache
> and
> >>> using Relational model would suffice.
> >>>
> >>> Anything I could do to improve this?
> >>>
> >>> Regards
> >>> _______________________________________________
> >>> Neo4j mailing list
> >>> User@lists.neo4j.org
> >>> https://lists.neo4j.org/mailman/listinfo/user
> >>
> >> _______________________________________________
> >> Neo4j mailing list
> >> User@lists.neo4j.org
> >> https://lists.neo4j.org/mailman/listinfo/user
> >>
> > _______________________________________________
> > Neo4j mailing list
> > User@lists.neo4j.org
> > https://lists.neo4j.org/mailman/listinfo/user
>
> _______________________________________________
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>
_______________________________________________
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo4j] How to boost performance?

Reply via email to