Andrzej,

Great !!!
I did not realize you could put your own content in ParseData.metadata and
read it back in the IndexingFilter... this was my missing piece in the
puzzle, for the rest I knew what to do.

Thanks,



2009/10/10 Andrzej Bialecki <a...@getopt.org>

> MilleBii wrote:
>
>> Andzej,
>>
>> The use case you are thinking is : at the parsing stage, filter out
>> garbage
>> content and index only the rest.
>>
>> I have a different use case, I want to keep everything as standard
>> indexing
>> _AND_  also extract part for being indexed in a dedicated field (which
>> will
>> be boosted at search time). In a document certain part have more
>> importance
>> than others in my case.
>>
>> So I would like either
>> 1. to access html representation at indexing time... not possible or did
>> not
>> find how
>> 2. create a dual representation of the document, plain & standard,
>> filtered
>> document
>>
>> I think option 2. is much better because it better fits the model and
>> allows
>> for a lot of different other use cases.
>>
>
> Actually, creativecommons provides hints how to do this .. but to be more
> explicit:
>
> * in your HtmlParseFilter you need to extract from DOM tree the parts that
> you want, and put them inside ParseData.metadata. This way you will preserve
> both the original text, and your special parts that you extracted.
>
> * in your IndexingFilter you will retrieve the parts from
> ParseData.metadata and add them as additional index fields (don't forget to
> specify indexing backend options).
>
> * in your QueryFilter plugin.xml you declare that QueryParser should pass
> your special fields without treating them as terms, and in the
> implementation you create a BooleanClause to be added to the translated
> query.
>
>
>
> --
> Best regards,
> Andrzej Bialecki     <><
>  ___. ___ ___ ___ _ _   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
>
>


-- 
-MilleBii-

Reply via email to