You can use a suffix filter if there are no query strings. Dennis
Jens Martin Schubert wrote: > Hello, > > is it possible to crawl e.g. http://www.domain.com, > but to skip crawling all urls matching to > (http://www.domain.com/subpage/) > > I tried to achieve this with crawl-urlfilter.txt/regex-urlfilter.txt. > but it doesn't work: > > -ftp.tu-clausthal.de > -^http://([a-z0-9]*\.)asta.tu-clausthal.de/de/mobil/ > +^http://([a-z0-9]*\.)asta.tu-clausthal.de > +^http://([a-z0-9]*\.)*tu-clausthal.de/ > # skip everything else > -. > > skipping ftp.tu-clausthal.de works perfect, > but http://www.asta.tu-clausthal.de/de/mobil/ is still indexed, which > takes a long time to crawl. > > regards, > Jens Martin Schubert ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
