Maybe you try to write a plugin for query parser that excludes all patterns
you want to avoid. A heavy penalization on the url will do the work IMHO.


On 12/28/05, Kumar Limbu <[EMAIL PROTECTED]> wrote:
>
> Hi everyone,
>
> I am currently indexing a single website, say www.somesite.com. But I do
> not
> want to crawl urls with certain pattern let's say "nocrawl", ie
> www.somesite.com/nocrawl.html or www.somesite.com/apage.php?nocrawl. I
> want
> to discard any urls that contains the pattern 'nocrawl'. How do I do it? I
> am using nutch version 7.1. Also I want to use the 'crawl' command for
> crawling these pages.
>
> Thank you for you support.
>
> --
> Keep on smiling
> :) Kumar
>
>

Reply via email to