A little late to the party, but thought I would add my two cents...
On Aug 19, 2009, at 4:31 AM, Dave Pawson wrote:
It's the search capabilities I'm most interested in, hence the
Lucene kick.
Note, also that Tika is fully integrated into Solr and will be a part
of the upcoming Solr 1.4 release (but you can try it now by getting
the nightly). Also, I believe Solr's Data Import Handler has
mechanisms for importing XML. I'd suggest looking at the Solr Wiki (http://wiki.apache.org/solr
), in particular:
http://wiki.apache.org/solr/ExtractingRequestHandler
http://wiki.apache.org/solr/DataImportHandler
As both a Lucene and Solr committer, I think I can safely say that for
most people, Solr is the place to start with Lucene, as it will save
you from writing a whole lot of code and get you searching much faster
and is still completely pluggable giving you near full access to
Lucene. People often worry about the HTTP stuff up front, but in
practice it is, in >99% of the cases a non-issue.