I have an off-the-cuff idea I wonder if anybody else
has considered: "does it make any sense to think
about using apache::lucene as an alternate, fuzzy-search
mechanism over collections of XML files, rather than, or
in addition to xpath?"


Lucene appears to provide a way of indexing words and word proximities in otherwise free-form text documents. You could, for instance, use a term modifier like ["jakarta apache" ~10]to find all the documents that contained the fields jakarta and apache, that appear no more than ten fields apart from each other.

To the extent this query language is useful over
completely unstructured, free-form text, it seems likely
that it (the lucene query language) would be even more
powerful operating over more regularly structured text, like XML files.

Lucene is more of a search-engine technology than a database
technololgy....where answer sets are expected to have an attractive ratio
between relevant and irrelevant data, rather than
the rigid, 100% metadata criteria matches possible with
xpath queries over XML data.

Does it make sense for projects like Xindice to have alterate,
plug-in-like ways to search and query the same datasets? Or should alterate
query technologies exist as disparate, separate software entities?






-- /* Sandy Pittendrigh >--oO0> * [EMAIL PROTECTED] * http://cns.montana.edu/~sandy */




Reply via email to