Dan,
From what I have done using XPath to search collections, you can't search
more than one collection at a time.   Which would meant that if you needed to
search all, then you would have to run the search against each individual
collection.   People have built collections with hundreds of thousands of
documents and the search time is fast if you index them.

So, what kind of search do you want to do?   XPath provides a "contains" but
as I understand it, that will search every tag and you can't index them.   If 
you
can pick out tags and index those then your search will be under a second.
Tom Bradford is working on a full text search, but that is not ready yet.

So it appears that you have several choices.

1) Sub-collections - as long as the number of documents and content is small, 
fast searches.
But no way to search all collections at once.

2) One collection, use "contains", fast as long as your data doesn't get too 
big too quick.
Wait for Tom Bradford to come up with Full Text Search capability.

3) One collection, pick out specific tags, index those, fast for all index 
searches, slower for
"contains".

4) Use a third-party tool (apache has one) to index your documents as you 
add/edit/delete them.
Use it to find your document and then do a direct access.   Fast, a little more 
complicated, but
you can implement it now.   Also, when Tom is done, you can discard it and 
transfer the maintenance
to Xindice.

HTH,

Mark

Dan Barron wrote:

> I have an idea and I was just wondering what you all thought about it. Here's 
> the deal:
>
> We are going to use Xindice to store XML data for scientific journal 
> citations. The simplest idea is to just dump them all in one collection and 
> use XPath to find what we need. But most times, they would be searched by 
> journal name and volume.
>
> So what I'm thinking is if I create a subcollection for each journal, and 
> then collections for each volume say under that, there would only be a few 
> dozen articles in each collection. And since you search first by getting a 
> collection and then searching, I'm guessing this would be much faster and 
> could effectively eliminate the need for indexers on journal name and volume. 
> And presumably I could still search the entire collection when necessary 
> using the base collection.
>
> So I'm thinking search the /db/citations/JAMA/132 collection of a few dozen 
> documents would be way faster than searching /db/citations where altogether 
> there would be hundreds of thousands of documents.
>
> Does this make any sense? Will it be faster? Am I missing any obvious 
> problems with this approach? Any ideas would be appreciated.
>
> dan
>
> ____________________________________________________________________
> Daniel W. Barron
> Senior Systems Analyst/Application Developer
> American College of Physicians-American Society of Internal Medicine
> Tel: (215) 351-2617     Tel: (800) 523-1546 x2617
> Fax: (215) 351-2644    E-mail: [EMAIL PROTECTED]

--
Mark J Stang
Architect
Cybershop Systems

begin:vcard 
n:Stang;Mark
x-mozilla-html:TRUE
adr:;;;;;;
version:2.1
email;internet:[EMAIL PROTECTED]
fn:Mark Stang
end:vcard

Reply via email to