Hi, When crawling it looks it crawls more pages from seed URL then the discovered links.
I am crawling apple.com <http://apple.com/> as seed (language english by default) and this contain links for other languages like apple.com/cn <http://apple.com/cn> for china and so on for other language. What I am observing after 7 cycles en language has 10 time more pages then any other language like /cn , I was expecting almost same for each language. Then I did reverse I put apple.com/cn <http://apple.com/cn> in seed and removed apple.com <http://apple.com/> , now observed there are more docs from /cn then other language. I am using nutch 1.10 and crawling usng crawl script crawl -i -D solr.server.url=http://localhost:8983/solr/ urls/ TestCrawl/ 7 I observed from logs crawl script uses -topn 50000 by default. Please suggest. Thanks Manish

