Hi I am crawling a url. I downloaded the page as well. I counted the urls in the page by simply doing...
grep -c href page.html I got 724 links So I run inject/generate/fetch/parse/updatedb once. I believe this first run will collect all the links on this page to be crawled on next run. So I run the next generate/fetch This is what I see in the fetch reducer on jobtracker 20/20 spinwaiting/active, 61 pages, 0 errors, 0.1 0 pages/s, 414 459 kb/s, 1000 URLs in 1 queues > reduce So why are there 1000 urls in the queue, when the page only has 724 links. This page does not have any ajax stuff.

