Re: Accessing crawled data

Andrzej Bialecki Tue, 22 Dec 2009 06:02:32 -0800

On 2009-12-22 13:16, Claudio Martella wrote:

Yes, I'am aware of that. The problem is that i have some fields of the
SolrDocument that i want to compute by text analysis (basically i want
to do some smart keywords extraction) so i have to get in the middle
between crawling and indexing! My actual solution is to dump the content
in a file through the segreader, parse it and then use SolrJ to send the
documents. Probably the best solution is to set my own analyzer for the
field on solr side, and do keywords extraction there.


Thanks for the script, you'll use it!

Likely the solution that you are looking for is an IndexingFilter - thisreceives a copy of the document with all fields collected just beforeit's sent to the indexing backend - and you can freely modify thecontent of NutchDocument, e.g. do additional analysis, add/remove/modifyfields, etc.


--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Re: Accessing crawled data

Reply via email to