MilleBii wrote:
Oops continuing previous mail.

So I wonder if there would be a better  algorithm 'generate' which
would maintain a constant rate of host per 100 url ... Below a certain
threshold it stops or better starts including URLs of lower scores.

That's exactly how the max.urls.per.host limit works.


Using scores is de-optimzing the fetching process... Having said that
I should first read the code and try to understand it.

That wouldn't hurt in any case ;)

There is also a method in ScoringFilter-s (e.g. the default scoring-opic), where it determines the priority of URL during generation. See ScoringFilter.generatorSortValue(..), you can modify this method in scoring-opic (or in your own scoring filter) to prioritize certain urls over others.

--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Reply via email to