Re: Performance with index?

Terry Rosenbaum 21 Feb 2004 19:43:18 -0000

Using contains or ends-with in you query will
ensure that a scan over the whole collection will
occur. Indexing will never be used to evaulate
the contains or ends-with functions in Xindice
as currently implemented.

A value index, if present on the proper attribute or element,
will be used to evaluate equality and starts-with.

To evaluate contains and ends-with efficiently using indexing would
require a different kind of index that Xindice does not currently
support (e.g. substring for either contains or ends-with, or
reversed value index for ends-with). (Maybe there would be
some gain by performing contains or ends-with by scanning the entire
index during the index evaluation phase, but that is not implemented).

Indexes are used to try to narrow the set of resources
that could possibly match the query to less than the entire collection.

Once the indexed partial evaluation has been completed,
a standard XPath evaluation is run on each resource in
the (hopefully) narrowed set to produce the final answer.
Thus, you can see that if the use of indexes fails to
significantly narrow the query, it would have been
faster to just run an XPath evaluation against each
of the resources in the collection and skip the index
evaluation entirely.

-Terry

Jeff Greif wrote:

Indexes in xindice worked as follows (at the time I last used it, over a
year ago):
1.  The index maps from element or attribute values to documents.  It tells
you which documents in a collection contain the value in question.
2.  As such, if properly implemented, it could also be used to partially
optimize queries testing "contains", "starts_with" and "ends_with" for the
indexed elements or attributes.
3.  Once the index determines which documents to examine, each document must
be subject to extraction via the xpath of the query.  This means the entire
document must be scanned if the xpath starts with //, and large subtrees
must be scanned if the path contains // elsewhere than at the beginning.
4.  Clearly the index will work best when the documents are small and not be
useful when there is just one document.

I believe this information should help answer your questions.

Jeff ----- Original Message ----- From: "Eric Zhang" <[EMAIL PROTECTED]> To: <xindice-users@xml.apache.org> Sent: Saturday, February 21, 2004 10:01 AM Subject: Performance with index?

Hi all: I have some question about using index on xindice. I have a fairly large

xml

file(2.3M), with lots of nosense element, however, there are only two

element I

interested in the file:  <div>test</div> <div name="algo"/>
I want to query this xml file using xpath //div[.="test"],

//[EMAIL PROTECTED]"algo"],

and //div to get the element I want. The processing time is pretty long.

However,

after I add indexes
xindice ai -c /db/test -p div
xindice ai -c /db/test -p [EMAIL PROTECTED]
their performance doesn't get any better.

My questions are:
- Whether the index of Xindice only helps the query on lots of files,
 not on a sinlge file?
- If the index can work on a single file, whether the way I am creating
 the index is right?
- In one word, how can I create a index on a document to help me
 quickly find the element I want with its value or its attribute's value
 as search keyword?

Thanks a lot

Yue(Eric) Zhang Database Analyst/DBA, TAPoR Project Arts Department, University of Alberta Edmonton, AB, Canada

Re: Performance with index?

Reply via email to