You can use a suffix filter if there are no query strings.

Dennis

Jens Martin Schubert wrote:
Hello,

is it possible to crawl e.g. http://www.domain.com,
but to skip crawling all urls matching to (http://www.domain.com/subpage/)

I tried to achieve this with crawl-urlfilter.txt/regex-urlfilter.txt. but it doesn't work:

-ftp.tu-clausthal.de
-^http://([a-z0-9]*\.)asta.tu-clausthal.de/de/mobil/
+^http://([a-z0-9]*\.)asta.tu-clausthal.de
+^http://([a-z0-9]*\.)*tu-clausthal.de/
# skip everything else
-.

skipping ftp.tu-clausthal.de works perfect,
but http://www.asta.tu-clausthal.de/de/mobil/ is still indexed, which takes a long time to crawl.

regards,
Jens Martin Schubert

Reply via email to