Hi list: We are using JackRabbit 1.2.1 at present.
Out Nodes look like this: /Product/Customer1/Settings /Product/Customer1/Configuration /Product/Customer1/DataA /Product/Customer1/DataB /Product/Customer2/Settings /Product/Customer2/Configuration /Product/Customer2/DataA /Product/Customer2/DataB This way, we go all the way upto 250 customers. When we load data for all these customers, we see that the derby database size is 2.5 GB, and the Lucene Index is 470 MB. We have to provide for the following: a. Access data for around 20 customer simulatneously. b. The queries are of the type "All attributes of a given node for a given customer". c. Data about one customer should not be accessed by another customer. At present, we're access JackRabbit using 20 threads and 20 different sessions. This is to achieve separation of data etc. We're seeing performance figures such as the following: Network Derby: 80 seconds for all the threads to receive results Oracle: 35 seconds for all the threads to receive results Some questions: 1. What are the lessons learned by various community members on using Derby ? 2. Would you recommend using Oracle to using Derby for such large amounts of data ? 3. Are there ways to speed up lucene searches ? 4. Are lucene searches affected by such large indexes ? 5. Would it be better for us to split the repository into smaller ones and to then have smaller lucene indexes ? 6. For such large data, would Embedded Derby or Network derby be suitable to the task ? -- Sriram
