Is it possible get it done by modify the regualar expression in the config file?
Bong On 1/3/06, Nguyen Ngoc Giang <[EMAIL PROTECTED]> wrote: > > Maybe you try to write a plugin for query parser that excludes all > patterns > you want to avoid. A heavy penalization on the url will do the work IMHO. > > > On 12/28/05, Kumar Limbu <[EMAIL PROTECTED]> wrote: > > > > Hi everyone, > > > > I am currently indexing a single website, say www.somesite.com. But I do > > not > > want to crawl urls with certain pattern let's say "nocrawl", ie > > www.somesite.com/nocrawl.html or www.somesite.com/apage.php?nocrawl. I > > want > > to discard any urls that contains the pattern 'nocrawl'. How do I do it? > I > > am using nutch version 7.1. Also I want to use the 'crawl' command for > > crawling these pages. > > > > Thank you for you support. > > > > -- > > Keep on smiling > > :) Kumar > > > > > >
