Re: [Neo4j] How to boost performance?

Vinicius Carvalho Wed, 23 Nov 2011 04:08:54 -0800

Peter, I agree with you, would not be the ideal scenario, I would never
access a REST api that way, at minimum pagination should be there. But we
have this really specific case, where we need to have (not whole domain)
but at least part of it (power, bearing, gain, frequency) to perform some
calculations, and we need the whole dataset.
As I said, this is a very specific case, and I'll try to use the embedded
version to see the benefits.


Regards

On Wed, Nov 23, 2011 at 12:02 PM, Peter Neubauer <
peter.neuba...@neotechnology.com> wrote:

> Vinicius,
> in real-world usages, you probably want to build a REST API that
> operates on domain and usecase level. Shuffling 6K nodes forth and
> back and resolve properties sounds not good to me, given the REST
> discovery overhead of the JSON representation. At the very least, you
> could do
>
> START n = node(3)
> MATCH n-->()-->(x)
> return ID(x), x.name
>
>
> Which will give you only the minimum data you need to perform the
> operations initially and please management :)
>
> Just my 2c
>
> Cheers,
>
> /peter neubauer
>
> GTalk:      neubauer.peter
> Skype       peter.neubauer
> Phone       +46 704 106975
> LinkedIn   http://www.linkedin.com/in/neubauer
> Twitter      http://twitter.com/peterneubauer
>
> http://www.neo4j.org              - NOSQL for the Enterprise.
> http://startupbootcamp.org/    - Öresund - Innovation happens HERE.
>
>
>
> On Wed, Nov 23, 2011 at 12:54 PM, Vinicius Carvalho
> <java.vinic...@gmail.com> wrote:
> > Hi Peter, thanks, that indeed boosts to lower 10 ms, but we really need
> to
> > access the nodes to perform the operations.
> >
> > I know that to be fair on a test, we should be running an embedded
> version
> > of neo4j inside the appserver, after all the cache resides in the same
> JVM.
> >
> > But I'm not worried with performance against cache, just would like to
> have
> > a faster access versus the RDBMS.
> >
> > I'm working on a version using an embedded read only db pointing to the
> > server data files, hope it boosts performance a lot :). I'll be fare and
> > give the same amount of memory as I would give to the cache, so I can
> > benefit from Object caching on neo as well.
> >
> > Regards
> >
> > On Wed, Nov 23, 2011 at 11:44 AM, Peter Neubauer <
> > peter.neuba...@neotechnology.com> wrote:
> >
> >> Vinicius,
> >> in order to cut down on the REST JSON overhead (which you don't have
> >> in the RDBMS case), maybe you could look at just counting the results,
> >> something like
> >>
> >> START n = node(3)
> >> MATCH n-->()-->(x)
> >> return count(x)
> >>
> >> And see what happens?
> >>
> >> Cheers,
> >>
> >> /peter neubauer
> >>
> >> GTalk:      neubauer.peter
> >> Skype       peter.neubauer
> >> Phone       +46 704 106975
> >> LinkedIn   http://www.linkedin.com/in/neubauer
> >> Twitter      http://twitter.com/peterneubauer
> >>
> >> http://www.neo4j.org              - NOSQL for the Enterprise.
> >> http://startupbootcamp.org/    - Öresund - Innovation happens HERE.
> >>
> >>
> >>
> >> On Wed, Nov 23, 2011 at 11:54 AM, Vinicius Carvalho
> >> <java.vinic...@gmail.com> wrote:
> >> > Hi there, I've posted a few days ago about the POC I'm doing here at
> my
> >> > company. I have some initial numbers and I'd like to ask for some help
> >> here
> >> > in order to promote neo4j here in LMI Ericsson.
> >> >
> >> > I've loaded a mySQL db with a really simple entity, that pretty much
> only
> >> > represents a node and relations (only properties it has is an UID and
> x/y
> >> > space coordinate for each node)
> >> >
> >> > The DB contains 250.000 cells and 19. relations stored in a myISAM
> table,
> >> > indexed only by it's primary key. Please find the DDL for the two
> tables.
> >> >
> >> > CREATE TABLE  `pci`.`cells` (
> >> >  `id` varchar(32) collate utf8_bin NOT NULL,
> >> >  `x_pos` double default NULL,
> >> >  `y_pos` double default NULL,
> >> >  `pci` smallint(6) default '0',
> >> >  PRIMARY KEY  (`id`)
> >> > )
> >> >
> >> > CREATE TABLE  `pci`.`relations` (
> >> >  `id` int(11) NOT NULL auto_increment,
> >> >  `source` varchar(32) collate utf8_bin default NULL,
> >> >  `target` varchar(32) collate utf8_bin default NULL,
> >> >  PRIMARY KEY  (`id`),
> >> >  KEY `src_idx` (`source`),
> >> >  KEY `src_target` (`target`)
> >> > )
> >> >
> >> > So as you can see, a simple secondary table contains the relationship
> >> with
> >> > source and targets pointing to the cells table.
> >> >
> >> > I've loaded this exact same DB into a neoserver running on the same
> >> > machine: A Blade with 26 cpus (6 cores each) and 16gb RAM.
> >> >
> >> > One of the requirements we have is to find all associations of my
> >> > associations. Something that in neo I did like this:
> >> >
> >> > START n = node(3)
> >> > MATCH n-->()-->(x)
> >> > return x
> >> >
> >> > For this specific node it returns 6475 nodes.
> >> >
> >> > I have tested this before using Hibernate in two modes: without a L2
> >> cache,
> >> > and with an L2 Cache (Ehcache standalone no replication).
> >> > Here's a snippet of the code that loads it, so you can understand
> what's
> >> > going under the hood:
> >> >
> >> >
> >> > @Override
> >> > public List<Cell> loadCellWithRealtions(String... ids) {
> >> > Session session = (Session) em.getDelegate();
> >> > Criteria c = session.createCriteria(Cell.class)
> >> > .setFetchMode("incomingRelations", FetchMode.SELECT)
> >> > .setFetchMode("outgoingRelations", FetchMode.SELECT)
> >> > .add(Restrictions.in("id", Arrays.asList(ids)));
> >> > List<Cell> results = c.list();
> >> > for(Cell cell : results){
> >> > Hibernate.initialize(cell.getIncomingRelations());
> >> > Hibernate.initialize(cell.getOutgoingRelations());
> >> > }
> >> > return results;
> >> > }
> >> >
> >> > @Override
> >> > public List<Cell> loadCellWithNeighbourRelations(String... ids) {
> >> > List<Cell> cells = loadCellWithRealtions(ids);
> >> > for(Cell c : cells){
> >> > for(Relation r : c.getIncomingRelations()){
> >> > Hibernate.initialize(r.getSource().getIncomingRelations());
> >> > Hibernate.initialize(r.getSource().getOutgoingRelations());
> >> > }
> >> > for(Relation r : c.getOutgoingRelations()){
> >> > Hibernate.initialize(r.getTarget().getIncomingRelations());
> >> > Hibernate.initialize(r.getTarget().getOutgoingRelations());
> >> > }
> >> > }
> >> > return cells;
> >> > }
> >> >
> >> >
> >> >
> >> > So the first method executes one query and 2 subselects to find a cell
> >> and
> >> > all relations, the second method, iterate over each relation and do
> the
> >> > same. So I pretty much will have something like 3+r*3 selects on db,
> >> where
> >> > r is the number of relations right.
> >> >
> >> > Ok, to be a bit fair with the tests, I've ran this for the same node
> 10
> >> > times (get a chance to warm the caches), exclude the longest and
> smallest
> >> > result, and then took a mean of it. Here's the results:
> >> >
> >> > EhCache: 70ms
> >> > Plain Hibernate: 550ms
> >> >
> >> > I still don't have a version of neo4j code running integrated in the
> app
> >> > server, but the idea is to use REST API. Running the query on the REST
> >> API
> >> > took over 2 seconds on average, but due the large size of the
> response,
> >> > network lagging was the issue. So I ran the same query 10 times using
> the
> >> > web console, and the average time for neo was 300ms
> >> >
> >> > Before asking anything I do know that we will have more complex
> queries
> >> > where neo will shine, but I need to improve those results in order to
> >> sell
> >> > it here :), with those numbers, ppl will just say that having a cache
> and
> >> > using Relational model would suffice.
> >> >
> >> > Anything I could do to improve this?
> >> >
> >> > Regards
> >> > _______________________________________________
> >> > Neo4j mailing list
> >> > User@lists.neo4j.org
> >> > https://lists.neo4j.org/mailman/listinfo/user
> >> >
> >> _______________________________________________
> >> Neo4j mailing list
> >> User@lists.neo4j.org
> >> https://lists.neo4j.org/mailman/listinfo/user
> >>
> > _______________________________________________
> > Neo4j mailing list
> > User@lists.neo4j.org
> > https://lists.neo4j.org/mailman/listinfo/user
> >
> _______________________________________________
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>
_______________________________________________
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo4j] How to boost performance?

Reply via email to