The list contains at least several thousands of unique hosts. Does FetcherThread randomly pick a URL from the fetchlist or does it choose alphabetically? There are some sites containing about 30 or so links to its own domain, so it wouldn't be surprising if my threads are blocked if FetcherThread picks alphabetically. But then, is there a way to make it pick randomly?

Thanks again.

Michael

On Feb 8, 2009 7:40pm, Andrzej Bialecki <[email protected]> wrote:
[email protected] wrote:


Hi,



Thanks for the reply.



I've tried Fetcher2 but it seems to be even slower -- most of the threads
are put into a spinlock after just 1 or 2 levels. I see that some others find Fetcher2 quite slow too. Is there an alternative?




Please check what is the distribution of URLs per unique hosts in a newly
generated segment - it could be that for some reason your fetchlists consist of URLs from few unique hosts, and that's the reason for blocking.



You can also adjust your generate.max.urls.per.host property to eg 100,
and see if it makes any difference.





--

Best regards,

Andrzej Bialecki <>
___. ___ ___ ___ _ _ __________________________________

[__ || __|__/|__||\/| Information Retrieval, Semantic Web

___|||__|| \| || | Embedded Unix, System Integration

http://www.sigram.com Contact: info at sigram dot com



Reply via email to