Natalia Shilenkova wrote:
I've been looking at the full text indexing patch that was submitted
by Andy Armstrong a couple years ago. It uses plain Lucene query
syntax to search the indexes.
Full text index (like any other index) has a pattern parameter that
determines what elements/attributes are going to be indexed. And it is
possible to create several indexes with different patterns.
Hmmm... Ok...
If there are several indexes, which one should be used to execute a
query? Existing patch always uses the index with a shortest pattern,
but it does not really mean a better match and overall effect is the
same as there was only one index, since the only way to use another
one is to drop index with the shortest pattern.
This does not sound good to me... If such 'index' with shortest pattern did not
index the xml element(s) requested in the query... It would not find anything.
So the question is, does it make sense to have more than one full text
index per collection?
I'm not quite if there is any real need for more than one Lucene index. For most
cases, IMHO it is sufficient to have single Lucene index per collection. And
each such Lucene index can be associated with multiple Xindice Indexer objects,
which would contribute patterns which should be indexed by this collection's
Lucene indexe.
To illustrate this thought, say you create several Xindice full text Indexers
with patterns:
name
phone
[EMAIL PROTECTED]
All three of these Indexers could be backed by single Lucene index which would
contain multiple fields (org.apache.lucene.document.Field) for each document
stored in Xindice (and which corresponds to org.apache.lucene.document.Document):
Document:
Field id=abcdeff -- Stored field with Xindice document ID
Field name=John -- Indexed field created from <name>John</name> element
Field phone=123-456-5555
Field [EMAIL PROTECTED]
So now when querying it is possible to assemble complex query - such as, give me
all John's who have work phone, or some such. Lucene's index should store id
field, so that we can retrieve ids of matching documents from the search result.
PS As an aside... There are at least two options on how Xindice document can
correspond to Lucene Document:
* 1:1 mapping. It would allow to search only for documents, since all we would
know from the search result is document id.
* Create a Document for each matching element, which would include its own
data and data for all nested matches as well. It would allow a possibility to
search for particular elements matching a query - if we can figure out a way on
how to do this in Lucene query?
Either way we can start off with simpler option first and think about how to do
more complex searches later.
Vadim
If so, how to find out which index is a better
match for a particular query (modifying query language to include
hints? using field names to find right pattern?), can query be run
against multiple indexes?
Any ideas?
Regards,
Natalia