The list contains at least several thousands of unique hosts. Does
FetcherThread randomly pick a URL from the fetchlist or does it choose
alphabetically? There are some sites containing about 30 or so links to its
own domain, so it wouldn't be surprising if my threads are blocked if
FetcherThread picks alphabetically. But then, is there a way to make it
pick randomly?
Thanks again.
Michael
On Feb 8, 2009 7:40pm, Andrzej Bialecki <[email protected]> wrote:
[email protected] wrote:
Hi,
Thanks for the reply.
I've tried Fetcher2 but it seems to be even slower -- most of the threads
are put into a spinlock after just 1 or 2 levels. I see that some others
find Fetcher2 quite slow too. Is there an alternative?
Please check what is the distribution of URLs per unique hosts in a newly
generated segment - it could be that for some reason your fetchlists
consist of URLs from few unique hosts, and that's the reason for blocking.
You can also adjust your generate.max.urls.per.host property to eg 100,
and see if it makes any difference.
--
Best regards,
Andrzej Bialecki <>
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com