What is the implementation architecture regarding collections...?

Brendan Laing Mon, 27 Nov 2006 04:35:10 -0800

Hi,

I've been using xindice for a few weeks now and have started to puzzle
over the follow question. If I have many xml documents and store each
in a collection I'll have a disk space problem due to the 4MB tbl
files  (like a similar user who posted a mail on 2006-11-08 15:24:51
titled 'pagesize/pagecount change in beta4').


However if I aggregate xml documents into a collection I'm concerned
about issues such as record locking and performance. To discuss the
issue let's suppose the following:

1) I have 10k xml documents, 5k on domain xyz.com and 5k on abc.com.
Obviously each domain will have a xindice server running and therefore
a collection per domain would be required at least.
2) An application sits on xyz and accesses documents via the embeded
interface. Each read or update opens the collection and closes it. If
thread a opens the collection and thread b tries to access it and
close it before a has finished will we experience locking
synchronisation issues? Or is locking at node level in the BTree?
3) The application on xyz accesses documents on abc.com over http (via
the xml_rpc interface). We naturally try to reduce network traffic and
bundle updates to improve response times (the cost of the xml_rpc
exceeds the java applications at each end). However by using a single
collection (that is continually opened and closed?) versus many
smaller collections do we incur a penalty for reading and writing a
larger document, parsing it in and out of xindice?

Any views, opinions?

Brendan

What is the implementation architecture regarding collections...?

Reply via email to