Maybe I should think of adding "avoid-this-page" meta tag, and remove the pages -tagged like that- as searching.
That might be the trick for easy solution. But again, there will be a data pollution after all. Dinçer 2011/9/7 Dinçer Kavraal <[email protected]> > Hi Alex, > Yes I have read that one. It led me to return zero content for the page (so > that URL seems to be empty HTML page this way), but I couldn't make that URL > as "never-downloaded". > > Dinçer > > > 2011/9/5 alex <[email protected]> > >> On 09/04/2011 10:22 AM, Dinçer Kavraal wrote: >> >>> Hi, >>> >>> Is it possible to reject a page to be indexed in parse operation? I even >>> don't want it to be indexed as a no-content page without any text >>> information inside. >>> >>> There is a situation that I cannot understand whether I should inject or >>> not >>> from the URL itself. I need check the content. When I match a, say, >>> keyword >>> in the page, I want to avoid the page in render phase. >>> >>> Do you have any ideas? >>> >>> Thanks >>> Dincer >>> >>> >>> >> have you read this: >> http://wiki.apache.org/nutch/**WritingPluginExample-0.9<http://wiki.apache.org/nutch/WritingPluginExample-0.9>? >> >> I guess, you need parsefilter and/or indexingfilter... >> >> >

