Hi, I've been using xindice for a few weeks now and have started to puzzle over the follow question. If I have many xml documents and store each in a collection I'll have a disk space problem due to the 4MB tbl files (like a similar user who posted a mail on 2006-11-08 15:24:51 titled 'pagesize/pagecount change in beta4').
However if I aggregate xml documents into a collection I'm concerned about issues such as record locking and performance. To discuss the issue let's suppose the following: 1) I have 10k xml documents, 5k on domain xyz.com and 5k on abc.com. Obviously each domain will have a xindice server running and therefore a collection per domain would be required at least. 2) An application sits on xyz and accesses documents via the embeded interface. Each read or update opens the collection and closes it. If thread a opens the collection and thread b tries to access it and close it before a has finished will we experience locking synchronisation issues? Or is locking at node level in the BTree? 3) The application on xyz accesses documents on abc.com over http (via the xml_rpc interface). We naturally try to reduce network traffic and bundle updates to improve response times (the cost of the xml_rpc exceeds the java applications at each end). However by using a single collection (that is continually opened and closed?) versus many smaller collections do we incur a penalty for reading and writing a larger document, parsing it in and out of xindice? Any views, opinions? Brendan