Like when I was crawling http://www.cc.gatech.edu/grads/, it add most
pages like http://www.cc.gatech.edu/grads/d/don,
http://www.cc.gatech.edu/grads/k/David.Krum,
but it will ignore some which has the same grammers as above, for example,
my nutch will ignore http://www.cc.gatech.edu/grads/h/Yan.Huang

I cannot figure out why. Maybe you guys can try, here is my approach
1. in urls: http://www.cc.gatech.edu/grads/
2. in crawl-urlfilter.txt: 
+^http://www.cc.gatech.edu/grads/
+^http://www.cc.gatech.edu/grads/([a-z0-9]*\.//)*

and crawl for depth 3. Hope somebody could explain what happened, thanks all.

Reply via email to