URL count in queue

h b Fri, 12 Jul 2013 11:53:40 -0700

Hi
I am crawling a url. I downloaded the page as well. I counted the urls in
the page by simply doing...


grep -c href page.html

I got 724 links

So I run inject/generate/fetch/parse/updatedb once. I believe this first
run will collect all the links on this page to be crawled on next run.

So I run the next generate/fetch

This is what I see in the fetch reducer on jobtracker

20/20 spinwaiting/active, 61 pages, 0 errors, 0.1 0 pages/s, 414 459 kb/s,
1000 URLs in 1 queues > reduce


So why are there 1000 urls in the queue, when the page only has 724 links.
This page does not have any ajax stuff.

URL count in queue

Reply via email to