Some questions on JackRAbbit performance with large data sets

Sriram Narayanan Thu, 01 Mar 2007 01:30:39 -0800

Hi list:

We are using JackRabbit 1.2.1 at present.


Out Nodes look like this:
/Product/Customer1/Settings
/Product/Customer1/Configuration
/Product/Customer1/DataA
/Product/Customer1/DataB
/Product/Customer2/Settings
/Product/Customer2/Configuration
/Product/Customer2/DataA
/Product/Customer2/DataB

This way, we go all the way upto 250 customers.

When we load data for all these customers, we see that the derby
database size is 2.5 GB, and the Lucene Index is 470 MB.

We have to provide for the following:
a. Access data for around 20 customer simulatneously.
b. The queries are of the type "All attributes of a given node for a
given customer".
c. Data about one customer should not be accessed by another customer.

At present, we're access JackRabbit using 20 threads and 20 different
sessions. This is to achieve separation of data etc.

We're seeing performance figures such as the following:

Network Derby: 80 seconds for all the threads to receive results
Oracle: 35 seconds for all the threads to receive results

Some questions:
1. What are the lessons learned by various community members on using Derby ?
2. Would you recommend using Oracle to using Derby for such large
amounts of data ?
3. Are there ways to speed up lucene searches ?
4. Are lucene searches affected by such large indexes ?
5. Would it be better for us to split the repository into smaller ones
and to then have smaller lucene indexes ?
6. For such large data, would Embedded Derby or Network derby be
suitable to the task ?

-- Sriram

Some questions on JackRAbbit performance with large data sets

Reply via email to