Hi Dave,
On Mon, Aug 17, 2009 at 12:48 PM, Dave Pawson<dave.paw...@gmail.com> wrote:
> 2009/8/17 Jukka Zitting <jukka.zitt...@gmail.com>:
>>... If you want more control in indexing your XML documents, you should
>> consider parsing them directly without Tika in between.
>
> Generate some text, then feed that directly into Lucene indexes?
> Is that the general idea?...

You might want to have a look at Apache Solr
(http://lucene.apache.org/solr/) if you haven't seen it yet. I'd
recommend starting with the tutorial [1], and the schema.xml docs [2]
show how to boost specific fields.

Converting your XML to "Solr XML" might be sufficient to index your
XML with Solr, which uses Lucene underneath.

-Bertrand

[1] http://lucene.apache.org/solr/tutorial.html
[2] http://wiki.apache.org/solr/SchemaXml

Reply via email to