Re: What is the implementation architecture regarding collections...?

Vadim Gritsenko Mon, 27 Nov 2006 16:29:06 -0800

Brendan Laing wrote:

Hi,


I've been using xindice for a few weeks now and have started to puzzle
over the follow question. If I have many xml documents and store each
in a collection I'll have a disk space problem due to the 4MB tbl
files  (like a similar user who posted a mail on 2006-11-08 15:24:51
titled 'pagesize/pagecount change in beta4').

If you store one document per collection, that is a wrong approach. In theXML:DB database, collection is intended to store lots of documents. It issimilar to how single RDBMS table stores multiple records.

However if I aggregate xml documents into a collection I'm concerned
about issues such as record locking and performance. To discuss the
issue let's suppose the following:

1) I have 10k xml documents, 5k on domain xyz.com and 5k on abc.com.
Obviously each domain will have a xindice server running and therefore
a collection per domain would be required at least.

If you have two xindice servers, they should be two different serverinstallations, with separate config.xml file and must have separate directoriesfor the database files.


Multiple xindice servers must not ever share same database files.

You should either have one xindice server with multiple collections, or multipleservers (with one or many collections - whatever suits your needs).

2) An application sits on xyz and accesses documents via the embeded
interface. Each read or update opens the collection and closes it.

You don't have to open/close collection for each operation. Collection can beopened once and used by multiple threads and closed on application shutdown.Collection opening/closing in the client API does not cause collectionopening/closing in the database itself.

If
thread a opens the collection and thread b tries to access it and
close it before a has finished will we experience locking
synchronisation issues?

No.

Or is locking at node level in the BTree?

There is no locking implemented in the xindice (one client can not preventanother from modifying a document), but there is a synchronization (preventsdata corruption when multiple threads are writing to database). It is done onlevels deeper than CollectionImpl classes.

3) The application on xyz accesses documents on abc.com over http (via
the xml_rpc interface). We naturally try to reduce network traffic and
bundle updates to improve response times (the cost of the xml_rpc
exceeds the java applications at each end). However by using a single
collection (that is continually opened and closed?) versus many
smaller collections do we incur a penalty for reading and writing a
larger document, parsing it in and out of xindice?

Lots of smaller collections will require more operating system resources (suchas file descriptors). Smaller collections are also harder to query: there is nocross collection querying implemented by xindice. Parsing of the document fromsmall or large collection will take exactly same amount of time.


Vadim

Re: What is the implementation architecture regarding collections...?

Reply via email to