Sriram Narayanan wrote:
1. What are the lessons learned by various community members on using
Derby ?
what I heard from others playing with different setups is that derby over a
network is quite slow. I didn't do any tests myself, but it seems that derby is
the best choice if you use it in embedded more, but you should consider another
db if you use a standalone db server.
2. Would you recommend using Oracle to using Derby for such large
amounts of data ?
from what I've seen so far, both scale well with large amounts of data.
3. Are there ways to speed up lucene searches ?
1) there are configuration parameters that affect the query performance:
a) respectDocumentOrder
b) resultFetchSize
see [1] for some details on those parameters.
2) some query feature are more expensive that others, which means you may be
able to speed up searches by rephrasing your query statements.
4. Are lucene searches affected by such large indexes ?
access rights are checked at the very end of the query and will probably affect
your queries negatively. because you have access rights that are limited to a
certain customer most query results are rejected by access control in the last
stage of the query execution. if we assume 250 customers and each has only
access to its own tree an average of 99.6% of the query result nodes are
rejected by access control.
5. Would it be better for us to split the repository into smaller ones
and to then have smaller lucene indexes ?
if each customer has only access to its own tree I would definitively create one
workspace per customer. this will result in:
- smaller indexes
- faster queries, because only a small amount of intermediate result nodes are
rejected by access control
- you can configure an idle time which will shutdown workspaces that are not in
use (-> saves resources)
- allows better concurrency because an update in one workspace does not affect
other workspaces
- allows you to create db backups per customer
6. For such large data, would Embedded Derby or Network derby be
suitable to the task ?
as mentioned before, I think derby does its job best if it runs embedded.
regards
marcel
[1]
http://svn.apache.org/repos/asf/jackrabbit/tags/1.2.2/jackrabbit-core/src/main/config/repository.xml