I trink your regex doesn't allow more than the home Page.

Try to extend your Domain by .*
+^http://([a-z0-9]*\.)sina.com.cn/.*

Am 15.05.2011 11:05 schrieb "Bupo Jung" <[email protected]>:
> Hi,
> I use nutch to crawl a website :http://www.sina.com.cn
> The crawl process stop at depth 0, and only fetch the homepage of the
> website.
>
> My crawl crawl-urlfilter.txt is
> # accept hosts in MY.DOMAIN.NAME
> +^http://([a-z0-9]*\.)sina.com.cn/
>
> # skip everything else
> -.
>
> Have somebody an idea ?
>
> --
>
> Yizhong Zhuang
> Beijing University of Posts and Telecommunications
> Email:[email protected]
> Myblog:www.mikkoo.info

Reply via email to