Hi everyone, I am currently indexing a single website, say www.somesite.com. But I do not want to crawl urls with certain pattern let's say "nocrawl", ie www.somesite.com/nocrawl.html or www.somesite.com/apage.php?nocrawl. I want to discard any urls that contains the pattern 'nocrawl'. How do I do it? I am using nutch version 7.1. Also I want to use the 'crawl' command for crawling these pages.
Thank you for you support. -- Keep on smiling :) Kumar
