Andrzej, Great !!! I did not realize you could put your own content in ParseData.metadata and read it back in the IndexingFilter... this was my missing piece in the puzzle, for the rest I knew what to do.
Thanks, 2009/10/10 Andrzej Bialecki <a...@getopt.org> > MilleBii wrote: > >> Andzej, >> >> The use case you are thinking is : at the parsing stage, filter out >> garbage >> content and index only the rest. >> >> I have a different use case, I want to keep everything as standard >> indexing >> _AND_ also extract part for being indexed in a dedicated field (which >> will >> be boosted at search time). In a document certain part have more >> importance >> than others in my case. >> >> So I would like either >> 1. to access html representation at indexing time... not possible or did >> not >> find how >> 2. create a dual representation of the document, plain & standard, >> filtered >> document >> >> I think option 2. is much better because it better fits the model and >> allows >> for a lot of different other use cases. >> > > Actually, creativecommons provides hints how to do this .. but to be more > explicit: > > * in your HtmlParseFilter you need to extract from DOM tree the parts that > you want, and put them inside ParseData.metadata. This way you will preserve > both the original text, and your special parts that you extracted. > > * in your IndexingFilter you will retrieve the parts from > ParseData.metadata and add them as additional index fields (don't forget to > specify indexing backend options). > > * in your QueryFilter plugin.xml you declare that QueryParser should pass > your special fields without treating them as terms, and in the > implementation you create a BooleanClause to be added to the translated > query. > > > > -- > Best regards, > Andrzej Bialecki <>< > ___. ___ ___ ___ _ _ __________________________________ > [__ || __|__/|__||\/| Information Retrieval, Semantic Web > ___|||__|| \| || | Embedded Unix, System Integration > http://www.sigram.com Contact: info at sigram dot com > > -- -MilleBii-
