I tried you suggestion, but get the same result as before.

2011/5/15 ts egge <[email protected]>

> I trink your regex doesn't allow more than the home Page.
>
> Try to extend your Domain by .*
> +^http://([a-z0-9]*\.)sina.com.cn/.*
>
> Am 15.05.2011 11:05 schrieb "Bupo Jung" <[email protected]>:
> > Hi,
> > I use nutch to crawl a website :http://www.sina.com.cn
> > The crawl process stop at depth 0, and only fetch the homepage of the
> > website.
> >
> > My crawl crawl-urlfilter.txt is
> > # accept hosts in MY.DOMAIN.NAME
> > +^http://([a-z0-9]*\.)sina.com.cn/
> >
> > # skip everything else
> > -.
> >
> > Have somebody an idea ?
> >
> > --
> >
> > Yizhong Zhuang
> > Beijing University of Posts and Telecommunications
> > Email:[email protected]
> > Myblog:www.mikkoo.info
>



-- 

Yizhong Zhuang
Beijing University of Posts and Telecommunications
Email:[email protected]
Myblog:www.mikkoo.info

Reply via email to