Re: crawl-urlfilter subpages of domains

Dennis Kubes Sat, 12 Aug 2006 08:10:28 -0700

You can use a suffix filter if there are no query strings.


Dennis

Jens Martin Schubert wrote:

Hello,

is it possible to crawl e.g. http://www.domain.com,
but to skip crawling all urls matching to(http://www.domain.com/subpage/)
I tried to achieve this with crawl-urlfilter.txt/regex-urlfilter.txt.but it doesn't work:
-ftp.tu-clausthal.de
-^http://([a-z0-9]*\.)asta.tu-clausthal.de/de/mobil/
+^http://([a-z0-9]*\.)asta.tu-clausthal.de
+^http://([a-z0-9]*\.)*tu-clausthal.de/
# skip everything else
-.

skipping ftp.tu-clausthal.de works perfect,
but http://www.asta.tu-clausthal.de/de/mobil/ is still indexed, whichtakes a long time to crawl.
regards,
Jens Martin Schubert

Re: crawl-urlfilter subpages of domains

Reply via email to