MilleBii wrote:
Oops continuing previous mail.
So I wonder if there would be a better algorithm 'generate' which
would maintain a constant rate of host per 100 url ... Below a certain
threshold it stops or better starts including URLs of lower scores.
That's exactly how the max.urls.per.host
Hum... I use the max urls and sets it to 600... Because in the worst
case you have 6s (measured on logs) in between urls of same host: so 6
x 600= 3600 s = 1 hour. In the worst case the long tail shouldn't last
longer than 1hour... Unfortunately it is not what I see
I also tried the by.ip
Hum... I use the max urls and sets it to 600... Because in the worst
case you have 6s (measured on logs) in between urls of same host: so 6
x 600= 3600 s = 1 hour. In the worst case the long tail shouldn't last
longer than 1hour... Unfortunately it is not what I see
that's assuming that all
Observing my fetch cycles perf. It looks like there is always a rather
long tail.
I saw it on 10k, 150k, 450k fetch runs.
Of course you can cut-off the tail with the patch 770 made by Julien
(thx), I did some dry test looks like working, so I'm going to move it
to production.
Yet, what seems to