Re: [Neo4j] How to boost performance?

Peter Neubauer Wed, 23 Nov 2011 04:03:04 -0800

Vinicius,
in real-world usages, you probably want to build a REST API that
operates on domain and usecase level. Shuffling 6K nodes forth and
back and resolve properties sounds not good to me, given the REST
discovery overhead of the JSON representation. At the very least, you
could do


START n = node(3)
MATCH n-->()-->(x)
return ID(x), x.name


Which will give you only the minimum data you need to perform the
operations initially and please management :)

Just my 2c

Cheers,

/peter neubauer

GTalk:      neubauer.peter
Skype       peter.neubauer
Phone       +46 704 106975
LinkedIn   http://www.linkedin.com/in/neubauer
Twitter      http://twitter.com/peterneubauer

http://www.neo4j.org              - NOSQL for the Enterprise.
http://startupbootcamp.org/    - Öresund - Innovation happens HERE.



On Wed, Nov 23, 2011 at 12:54 PM, Vinicius Carvalho
<java.vinic...@gmail.com> wrote:
> Hi Peter, thanks, that indeed boosts to lower 10 ms, but we really need to
> access the nodes to perform the operations.
>
> I know that to be fair on a test, we should be running an embedded version
> of neo4j inside the appserver, after all the cache resides in the same JVM.
>
> But I'm not worried with performance against cache, just would like to have
> a faster access versus the RDBMS.
>
> I'm working on a version using an embedded read only db pointing to the
> server data files, hope it boosts performance a lot :). I'll be fare and
> give the same amount of memory as I would give to the cache, so I can
> benefit from Object caching on neo as well.
>
> Regards
>
> On Wed, Nov 23, 2011 at 11:44 AM, Peter Neubauer <
> peter.neuba...@neotechnology.com> wrote:
>
>> Vinicius,
>> in order to cut down on the REST JSON overhead (which you don't have
>> in the RDBMS case), maybe you could look at just counting the results,
>> something like
>>
>> START n = node(3)
>> MATCH n-->()-->(x)
>> return count(x)
>>
>> And see what happens?
>>
>> Cheers,
>>
>> /peter neubauer
>>
>> GTalk:      neubauer.peter
>> Skype       peter.neubauer
>> Phone       +46 704 106975
>> LinkedIn   http://www.linkedin.com/in/neubauer
>> Twitter      http://twitter.com/peterneubauer
>>
>> http://www.neo4j.org              - NOSQL for the Enterprise.
>> http://startupbootcamp.org/    - Öresund - Innovation happens HERE.
>>
>>
>>
>> On Wed, Nov 23, 2011 at 11:54 AM, Vinicius Carvalho
>> <java.vinic...@gmail.com> wrote:
>> > Hi there, I've posted a few days ago about the POC I'm doing here at my
>> > company. I have some initial numbers and I'd like to ask for some help
>> here
>> > in order to promote neo4j here in LMI Ericsson.
>> >
>> > I've loaded a mySQL db with a really simple entity, that pretty much only
>> > represents a node and relations (only properties it has is an UID and x/y
>> > space coordinate for each node)
>> >
>> > The DB contains 250.000 cells and 19. relations stored in a myISAM table,
>> > indexed only by it's primary key. Please find the DDL for the two tables.
>> >
>> > CREATE TABLE  `pci`.`cells` (
>> >  `id` varchar(32) collate utf8_bin NOT NULL,
>> >  `x_pos` double default NULL,
>> >  `y_pos` double default NULL,
>> >  `pci` smallint(6) default '0',
>> >  PRIMARY KEY  (`id`)
>> > )
>> >
>> > CREATE TABLE  `pci`.`relations` (
>> >  `id` int(11) NOT NULL auto_increment,
>> >  `source` varchar(32) collate utf8_bin default NULL,
>> >  `target` varchar(32) collate utf8_bin default NULL,
>> >  PRIMARY KEY  (`id`),
>> >  KEY `src_idx` (`source`),
>> >  KEY `src_target` (`target`)
>> > )
>> >
>> > So as you can see, a simple secondary table contains the relationship
>> with
>> > source and targets pointing to the cells table.
>> >
>> > I've loaded this exact same DB into a neoserver running on the same
>> > machine: A Blade with 26 cpus (6 cores each) and 16gb RAM.
>> >
>> > One of the requirements we have is to find all associations of my
>> > associations. Something that in neo I did like this:
>> >
>> > START n = node(3)
>> > MATCH n-->()-->(x)
>> > return x
>> >
>> > For this specific node it returns 6475 nodes.
>> >
>> > I have tested this before using Hibernate in two modes: without a L2
>> cache,
>> > and with an L2 Cache (Ehcache standalone no replication).
>> > Here's a snippet of the code that loads it, so you can understand what's
>> > going under the hood:
>> >
>> >
>> > @Override
>> > public List<Cell> loadCellWithRealtions(String... ids) {
>> > Session session = (Session) em.getDelegate();
>> > Criteria c = session.createCriteria(Cell.class)
>> > .setFetchMode("incomingRelations", FetchMode.SELECT)
>> > .setFetchMode("outgoingRelations", FetchMode.SELECT)
>> > .add(Restrictions.in("id", Arrays.asList(ids)));
>> > List<Cell> results = c.list();
>> > for(Cell cell : results){
>> > Hibernate.initialize(cell.getIncomingRelations());
>> > Hibernate.initialize(cell.getOutgoingRelations());
>> > }
>> > return results;
>> > }
>> >
>> > @Override
>> > public List<Cell> loadCellWithNeighbourRelations(String... ids) {
>> > List<Cell> cells = loadCellWithRealtions(ids);
>> > for(Cell c : cells){
>> > for(Relation r : c.getIncomingRelations()){
>> > Hibernate.initialize(r.getSource().getIncomingRelations());
>> > Hibernate.initialize(r.getSource().getOutgoingRelations());
>> > }
>> > for(Relation r : c.getOutgoingRelations()){
>> > Hibernate.initialize(r.getTarget().getIncomingRelations());
>> > Hibernate.initialize(r.getTarget().getOutgoingRelations());
>> > }
>> > }
>> > return cells;
>> > }
>> >
>> >
>> >
>> > So the first method executes one query and 2 subselects to find a cell
>> and
>> > all relations, the second method, iterate over each relation and do the
>> > same. So I pretty much will have something like 3+r*3 selects on db,
>> where
>> > r is the number of relations right.
>> >
>> > Ok, to be a bit fair with the tests, I've ran this for the same node 10
>> > times (get a chance to warm the caches), exclude the longest and smallest
>> > result, and then took a mean of it. Here's the results:
>> >
>> > EhCache: 70ms
>> > Plain Hibernate: 550ms
>> >
>> > I still don't have a version of neo4j code running integrated in the app
>> > server, but the idea is to use REST API. Running the query on the REST
>> API
>> > took over 2 seconds on average, but due the large size of the response,
>> > network lagging was the issue. So I ran the same query 10 times using the
>> > web console, and the average time for neo was 300ms
>> >
>> > Before asking anything I do know that we will have more complex queries
>> > where neo will shine, but I need to improve those results in order to
>> sell
>> > it here :), with those numbers, ppl will just say that having a cache and
>> > using Relational model would suffice.
>> >
>> > Anything I could do to improve this?
>> >
>> > Regards
>> > _______________________________________________
>> > Neo4j mailing list
>> > User@lists.neo4j.org
>> > https://lists.neo4j.org/mailman/listinfo/user
>> >
>> _______________________________________________
>> Neo4j mailing list
>> User@lists.neo4j.org
>> https://lists.neo4j.org/mailman/listinfo/user
>>
> _______________________________________________
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>
_______________________________________________
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo4j] How to boost performance?

Reply via email to