thanks; In my case I don't want to save the content of the page in segments, ,, to save the disk space from save unneeded data !!
I guess it's simpler while indexing, by implement an index-filter to skip the document that include that words !! Regards; ________________________________ From: Scott Gonyea <[email protected]> To: [email protected] Sent: Mon, August 23, 2010 7:04:33 PM Subject: Re: nutch plugin to filter indexing by content! Not to my knowledge. You may want to look for where the "regex-normalize.xml" is being used and can write a plugin there. It would be useful, certainly. I'm looking to eventually do the same, but at index time. Scott On Mon, Aug 23, 2010 at 8:11 AM, Ahmad Al-Amri <[email protected]> wrote: > > hello; > > I want to check if the web-page contains certain words; and DON'T index it > - > while crawling -, and to prevent the url to added to my carwldb ... > > I just want to ask if there is a plug-in to do such a thing or similar to > it; to > start from it. > > thank you; > > >

