Andy Armstrong wrote:
Here's a patch to be applied against the current CVS (as at 2004/03/11) which adds Lucene based full text indexing. In addition to applying the patch you need to add lucene-1.4-rc1-dev.jar (or similar I guess) to /java/lib.
To add a full text index to a collection use an index config like this:
<index class="org.apache.xindice.core.indexer.LuceneIndexer" name="text-index" pattern="[EMAIL PROTECTED]" analyzer="org.apache.lucene.analysis.SimpleAnalyzer" />
If omitted analyzer defaults to the value shown above. To find out about other analyzers you'll need to check the Lucene documentation.
To query the full text index do something like this:
String query = "some lucene query"; TextQueryService tqs = (TextQueryService) col.getService("TextQueryService", "1.0"); ResourceSet resultSet = tqs.query(query);
At the moment the implementation is pretty much devoid of any kind of XML:DB loveliness - it just lets you fire regular Lucene queries at the index and returns whole matching documents. Comments and criticism welcome.
Interesting. Couple of patch formatting suggestions... (a) please use ALv2, Apache License version 2, as the rest of xindice sources, (b) please use 4 spaces instead of tabs. :-)
Now, about indexer itself... First, we need to get 1.1b4 out... For this, I'm planning to write some short doc about current XPath features - because XPath query result format was changed, and then I think 1.1b4 should be considered as ready. Once 1.1b4 is out, I can add your patch. Second, what are your plans on evolving this indexer. I think next logical step would be to discuss what text indexer pattern looks like (element/attribute, as for other indexers? or may be it should include path to element?), what should be query language (should it be lucene query language, or any additions to it?), and results should be (whole documents, or matching parts only? or configurable behavior?). What's your opinion?
Vadim