Re: Multiple full text indexes

Vadim Gritsenko Mon, 16 Jul 2007 20:36:12 -0700

Natalia Shilenkova wrote:

I've been looking at the full text indexing patch that was submitted
by Andy Armstrong a couple years ago. It uses plain Lucene query
syntax to search the indexes.


Full text index (like any other index) has a pattern parameter that
determines what elements/attributes are going to be indexed. And it is
possible to create several indexes with different patterns.


Hmmm... Ok...

If there are several indexes, which one should be used to execute a
query? Existing patch always uses the index with a shortest pattern,
but it does not really mean a better match and overall effect is the
same as there was only one index, since the only way to use another
one is to drop index with the shortest pattern.

This does not sound good to me... If such 'index' with shortest pattern did notindex the xml element(s) requested in the query... It would not find anything.

So the question is, does it make sense to have more than one full text
index per collection?

I'm not quite if there is any real need for more than one Lucene index. For mostcases, IMHO it is sufficient to have single Lucene index per collection. Andeach such Lucene index can be associated with multiple Xindice Indexer objects,which would contribute patterns which should be indexed by this collection'sLucene indexe.

To illustrate this thought, say you create several Xindice full text Indexerswith patterns:


  name
  phone
  [EMAIL PROTECTED]

All three of these Indexers could be backed by single Lucene index which wouldcontain multiple fields (org.apache.lucene.document.Field) for each documentstored in Xindice (and which corresponds to org.apache.lucene.document.Document):


 Document:
  Field id=abcdeff  -- Stored field with Xindice document ID
  Field name=John   -- Indexed field created from <name>John</name> element
  Field phone=123-456-5555
  Field [EMAIL PROTECTED]

So now when querying it is possible to assemble complex query - such as, give meall John's who have work phone, or some such. Lucene's index should store idfield, so that we can retrieve ids of matching documents from the search result.

PS As an aside... There are at least two options on how Xindice document cancorrespond to Lucene Document:

* 1:1 mapping. It would allow to search only for documents, since all we wouldknow from the search result is document id.

* Create a Document for each matching element, which would include its owndata and data for all nested matches as well. It would allow a possibility tosearch for particular elements matching a query - if we can figure out a way onhow to do this in Lucene query?

Either way we can start off with simpler option first and think about how to domore complex searches later.



Vadim

If so, how to find out which index is a better
match for a particular query (modifying query language to include
hints? using field names to find right pattern?), can query be run
against multiple indexes?

Any ideas?

Regards,
Natalia

Re: Multiple full text indexes

Reply via email to