Vadim, Thank you for your reply. I'll make the necessary adjustments
to accomodate your feedback. The examples in the online developer
notes highly recommend that collections are closed after every query
using finally { col.close. Are you contradicting this by saying the
collection can be closed on application shutdown? If there's no risk
or penalty I'd prefer to leave the collection open.

Could I suggest that the analogy between collection and RDBMS table be
added to the Wiki.

On 28/11/06, Vadim Gritsenko <[EMAIL PROTECTED]> wrote:
Brendan Laing wrote:
> Hi,
>
> I've been using xindice for a few weeks now and have started to puzzle
> over the follow question. If I have many xml documents and store each
> in a collection I'll have a disk space problem due to the 4MB tbl
> files  (like a similar user who posted a mail on 2006-11-08 15:24:51
> titled 'pagesize/pagecount change in beta4').

If you store one document per collection, that is a wrong approach. In the
XML:DB database, collection is intended to store lots of documents. It is
similar to how single RDBMS table stores multiple records.


> However if I aggregate xml documents into a collection I'm concerned
> about issues such as record locking and performance. To discuss the
> issue let's suppose the following:
>
> 1) I have 10k xml documents, 5k on domain xyz.com and 5k on abc.com.
> Obviously each domain will have a xindice server running and therefore
> a collection per domain would be required at least.

If you have two xindice servers, they should be two different server
installations, with separate config.xml file and must have separate directories
for the database files.

Multiple xindice servers must not ever share same database files.

You should either have one xindice server with multiple collections, or multiple
servers (with one or many collections - whatever suits your needs).


> 2) An application sits on xyz and accesses documents via the embeded
> interface. Each read or update opens the collection and closes it.

You don't have to open/close collection for each operation. Collection can be
opened once and used by multiple threads and closed on application shutdown.
Collection opening/closing in the client API does not cause collection
opening/closing in the database itself.


> If
> thread a opens the collection and thread b tries to access it and
> close it before a has finished will we experience locking
> synchronisation issues?

No.


> Or is locking at node level in the BTree?

There is no locking implemented in the xindice (one client can not prevent
another from modifying a document), but there is a synchronization (prevents
data corruption when multiple threads are writing to database). It is done on
levels deeper than CollectionImpl classes.


> 3) The application on xyz accesses documents on abc.com over http (via
> the xml_rpc interface). We naturally try to reduce network traffic and
> bundle updates to improve response times (the cost of the xml_rpc
> exceeds the java applications at each end). However by using a single
> collection (that is continually opened and closed?) versus many
> smaller collections do we incur a penalty for reading and writing a
> larger document, parsing it in and out of xindice?

Lots of smaller collections will require more operating system resources (such
as file descriptors). Smaller collections are also harder to query: there is no
cross collection querying implemented by xindice. Parsing of the document from
small or large collection will take exactly same amount of time.

Vadim


Reply via email to