Maybe I should think of adding "avoid-this-page" meta tag, and remove the
pages -tagged like that- as searching.

That might be the trick for easy solution. But again, there will be a data
pollution after all.


Dinçer


2011/9/7 Dinçer Kavraal <[email protected]>

> Hi Alex,
> Yes I have read that one. It led me to return zero content for the page (so
> that URL seems to be empty HTML page this way), but I couldn't make that URL
> as "never-downloaded".
>
>  Dinçer
>
>
> 2011/9/5 alex <[email protected]>
>
>> On 09/04/2011 10:22 AM, Dinçer Kavraal wrote:
>>
>>> Hi,
>>>
>>> Is it possible to reject a page to be indexed in parse operation? I even
>>> don't want it to be indexed as a no-content page without any text
>>> information inside.
>>>
>>> There is a situation that I cannot understand whether I should inject or
>>> not
>>> from the URL itself. I need check the content. When I match a, say,
>>> keyword
>>> in the page, I want to avoid the page in render phase.
>>>
>>> Do you have any ideas?
>>>
>>> Thanks
>>> Dincer
>>>
>>>
>>>
>> have you read this:
>> http://wiki.apache.org/nutch/**WritingPluginExample-0.9<http://wiki.apache.org/nutch/WritingPluginExample-0.9>?
>>
>> I guess, you need parsefilter and/or indexingfilter...
>>
>>
>

Reply via email to