I tried you suggestion, but get the same result as before. 2011/5/15 ts egge <[email protected]>
> I trink your regex doesn't allow more than the home Page. > > Try to extend your Domain by .* > +^http://([a-z0-9]*\.)sina.com.cn/.* > > Am 15.05.2011 11:05 schrieb "Bupo Jung" <[email protected]>: > > Hi, > > I use nutch to crawl a website :http://www.sina.com.cn > > The crawl process stop at depth 0, and only fetch the homepage of the > > website. > > > > My crawl crawl-urlfilter.txt is > > # accept hosts in MY.DOMAIN.NAME > > +^http://([a-z0-9]*\.)sina.com.cn/ > > > > # skip everything else > > -. > > > > Have somebody an idea ? > > > > -- > > > > Yizhong Zhuang > > Beijing University of Posts and Telecommunications > > Email:[email protected] > > Myblog:www.mikkoo.info > -- Yizhong Zhuang Beijing University of Posts and Telecommunications Email:[email protected] Myblog:www.mikkoo.info

