Hi Dave, On Mon, Aug 17, 2009 at 12:48 PM, Dave Pawson<dave.paw...@gmail.com> wrote: > 2009/8/17 Jukka Zitting <jukka.zitt...@gmail.com>: >>... If you want more control in indexing your XML documents, you should >> consider parsing them directly without Tika in between. > > Generate some text, then feed that directly into Lucene indexes? > Is that the general idea?...
You might want to have a look at Apache Solr (http://lucene.apache.org/solr/) if you haven't seen it yet. I'd recommend starting with the tutorial [1], and the schema.xml docs [2] show how to boost specific fields. Converting your XML to "Solr XML" might be sufficient to index your XML with Solr, which uses Lucene underneath. -Bertrand [1] http://lucene.apache.org/solr/tutorial.html [2] http://wiki.apache.org/solr/SchemaXml