Re: Performance of a large number of small nodes

Nigel Sim Wed, 26 Aug 2009 20:41:39 -0700

Hi Bertrand,

Some numbers:


Jackrabbit:
Ingest = 43189ms. Retrieve = 580ms

JPA/Database:
Ingest = 86ms. Retrieve = 33ms.

The data structure looks like this:

Instrument:
  String name;
  String model;
  ...

Dataset:
  Instrument instrument;
  String type;
  String units;
  List<Value> values;

Value:
  Date time;
  Double value;

In Jackrabbit the path looks like /<instrument>/<dataset>/YYYY/MM/DD/<value>

I can probably improve the ingest time by an order of magnitude by more
intelligent session handling, but the retrieval also needs to be improved
and I don't know how. In the production system, using PostgreSQL as the back
end, with 100,000 points across 50 instruments, it takes about 3 seconds to
execute the query to retrieve the dataset. This needs to be < 1s at worst,
as it feeds other systems.

Advice would be gratefully received.

Cheers
Nigel

2009/8/15 Bertrand Delacretaz <[email protected]>

> Hi Nigel,
>
> On Sat, Aug 15, 2009 at 6:32 AM, Nigel Sim<[email protected]> wrote:
> > ...Thanks for your suggestion. Unfortunately, even in the simplest case
> of 100
> > nodes in the root node, the time taken to retrieve is too long. If I
> could
> > resolve this fundamental speed issue then I could apply your solution to
> > help me scale my system....
>
> How much is too long, and how do you retrieve the nodes?
> I'm curious, as retrieving 100 nodes by navigating the JCR
> parent/child relationships should not be that slow.
>
> > ...I think I just need to bite the bullet and admit my use case doesn't
> really
> > map on Jackrabbit :)...
>
> If you tell us a bit more about your data structure, someone might be
> able to help.
> Did you have a look at http://wiki.apache.org/jackrabbit/DavidsModel ?
> That can help structure things in a JCR-friendly way.
>
> -Bertrand
>



-- 
JCU eResearch Centre
School Of Business (IT)
James Cook University

Re: Performance of a large number of small nodes

Reply via email to