hi! I got a problem when using nutch-0.9 to crawl a site. There was an article-list-page , and in this page there were many links which point to the article-pages. So i made nutch crawl starting with the list-page so that those articles could be indexed. However during the crawling i found nutch ignored all those article links ! At last , none of those articles but some other pages could be indexed. I tried several times and got the same problem. I'm sure there's no problem with the conf/crawl-urlfilter.txt.( +^http://([a-z0-9]*\.)*site.com/ ) Doesn't nutch pull out all the links from a page and crawl them all? Have i made some stupid mistakes? any help ??
any reply will be great appreciated! -- View this message in context: http://www.nabble.com/why-did-nutch-miss-so-many-links-when-crawling--tf4322916.html#a12310200 Sent from the Nutch - User mailing list archive at Nabble.com.
