why did nutch miss so many links when crawling?

kevin.Y Fri, 24 Aug 2007 03:55:39 -0700

hi! 
I got a problem when using nutch-0.9 to crawl a site.
There was an article-list-page , and in this page there were many links
which point to the article-pages.
So i made nutch crawl starting with the list-page so that those articles
could be indexed.
However during the crawling i found nutch ignored all those article links !
At last , none of those articles but some other pages could be indexed. I
tried several times and got the same problem.
I'm sure there's no problem with the conf/crawl-urlfilter.txt.(
+^http://([a-z0-9]*\.)*site.com/ )
Doesn't nutch pull out all the links from a page and crawl them all? Have i
made some stupid mistakes?
any help ??


any reply will be great appreciated!
-- 
View this message in context: 
http://www.nabble.com/why-did-nutch-miss-so-many-links-when-crawling--tf4322916.html#a12310200
Sent from the Nutch - User mailing list archive at Nabble.com.

why did nutch miss so many links when crawling?

Reply via email to