On 5/31/07, Andrzej Bialecki <[EMAIL PROTECTED]> wrote:
Doğacan Güney wrote:
> I am still not sure about the source of this bug, but I think I found
> some unnecessary waits in Fetcher2. Even if a url is blocked by
> robots.txt (or has a crawl delay larger that max.crawl.delay),
> Fetcher2 still waits fetcher.server.delay before fetching another url
> from same host, which is not necessary, considering that Fetcher2
> didn't make a request to server anyway.
>
> So, I have put up a patch for this at (*) . What do you think? If you
> have no objections I am going to go ahead and open an issue for this.
>
> (*) http://www.ceng.metu.edu.tr/~e1345172/fetcher2_robots.patch
Good catch! The patch looks good, too - please go ahead. One question:
why did you remove the call to finishFetchItem() around line 505?
Because it seems we already call finishFetchItem in that code path
just before the switch statement. I have opened NUTCH-495 for this, if
I am mistaken, just give me a nudge and I will send an updated patch.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
--
Doğacan Güney