Vadim Gritsenko wrote:
Honglin Ye wrote:
Vadim Gritsenko wrote:
Honglin Ye wrote:
By partitioning the database into smaller collections may improve the
performance.
I'm curious, why do you think so? Is there any reason or observation
for this?
I am curious too. Suppose we have 4 groups, each group add 1000 files
every month.
Some time we only need to search files from one group, sometime we
only need to
search for a particular month. Will it be easier if we separate the
files by groups
and months such that we can work with smaller groups?
Hm, I guess so. Certainly, search over 1000 files will be faster than
search over 1000*4*months files. During the search, Xindice will not go
through each file if you are using indexes, but still, I can think that
search over smaller collection / smaller indexes should be faster.
It may make little difference in searching?
Suppose you have everything in one collection. Then, you'll need at
least three indexes: one for a group, one for a month, and another on an
element/attribute you are searching for. Intersection of these three
indexes will give you a resulting set of documents. First, month index
will return you 4000 documents, then group index will return you
1000*months documents, and then third index will be used. Intersection
of results returned by the index will give you resulting set of
documents. So Xindice will take some time reading / searching using
indexes and building intersection.
OTOH, if you have collection for a group and for a month, you will not
need first two indexes, so this will make search faster.
how about update? When we update a rescue, will
xindice directly modify that resource in the tbl and leave other part
untouched?
Get document by ID / Set document by ID / perform an XUpdate of the
document are all fast operations. Xindice uses BTree to store key ->
document association, where each node in the BTree stores (4096 /
KeySize) keys. So, for keys 128 bytes in size (32 keys per 4096 page),
access speed will be log32(N), where N is count of documents in the
collection.
Xindice stores documents in paged file, so when you update a document,
only pages containing document will be updated. The rest of the .tbl
will not be touched.
Partitioning is an useful strategic in rdms, is it has any similarity
in xindice?
If you are planning to have large database (>2Gb) then you'll have to
partition due to limits on file size (and this limits varies on
different operating systems / file systems).
Either way, let us know how many documents you were able to store and
how fast did it work ;-)
Vadim
Vadim,
The whole document retrieve - modify - store is mostly I needed.
I have a more demanding query requirement. I am building a proposal
submission and handling system for radio telescope systems and required to keep
docs as xml.
It should be searchable over proposer names, proposal titles, observation
types, telescopes, configurations, target sources, frequencies, and
abstract text etc. From my understanding, there is no performance penalty
to have more indexes. It also looks like that as new proposals add in, the
indexes be generated automatically, is that true?
Do you have pointers to the systems like I am building that uses
xml as storage? Do you have suggestions how should I proceed?
Honglin