Maybe you try to write a plugin for query parser that excludes all patterns you want to avoid. A heavy penalization on the url will do the work IMHO.
On 12/28/05, Kumar Limbu <[EMAIL PROTECTED]> wrote: > > Hi everyone, > > I am currently indexing a single website, say www.somesite.com. But I do > not > want to crawl urls with certain pattern let's say "nocrawl", ie > www.somesite.com/nocrawl.html or www.somesite.com/apage.php?nocrawl. I > want > to discard any urls that contains the pattern 'nocrawl'. How do I do it? I > am using nutch version 7.1. Also I want to use the 'crawl' command for > crawling these pages. > > Thank you for you support. > > -- > Keep on smiling > :) Kumar > >
