On 09/04/2011 10:22 AM, Dinçer Kavraal wrote:
Hi,
Is it possible to reject a page to be indexed in parse operation? I even
don't want it to be indexed as a no-content page without any text
information inside.
There is a situation that I cannot understand whether I should inject or not
from the URL itself. I need check the content. When I match a, say, keyword
in the page, I want to avoid the page in render phase.
Do you have any ideas?
Thanks
Dincer
have you read this:
http://wiki.apache.org/nutch/WritingPluginExample-0.9 ?
I guess, you need parsefilter and/or indexingfilter...