MilleBii wrote:
Oops continuing previous mail.
So I wonder if there would be a better algorithm 'generate' which
would maintain a constant rate of host per 100 url ... Below a certain
threshold it stops or better starts including URLs of lower scores.
That's exactly how the max.urls.per.host limit works.
Using scores is de-optimzing the fetching process... Having said that
I should first read the code and try to understand it.
That wouldn't hurt in any case ;)
There is also a method in ScoringFilter-s (e.g. the default
scoring-opic), where it determines the priority of URL during
generation. See ScoringFilter.generatorSortValue(..), you can modify
this method in scoring-opic (or in your own scoring filter) to
prioritize certain urls over others.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com