Re: How does generate work ?

2009-12-03 Thread Andrzej Bialecki
MilleBii wrote: Oops continuing previous mail. So I wonder if there would be a better algorithm 'generate' which would maintain a constant rate of host per 100 url ... Below a certain threshold it stops or better starts including URLs of lower scores. That's exactly how the max.urls.per.host

Re: How does generate work ?

2009-12-03 Thread MilleBii
Hum... I use the max urls and sets it to 600... Because in the worst case you have 6s (measured on logs) in between urls of same host: so 6 x 600= 3600 s = 1 hour. In the worst case the long tail shouldn't last longer than 1hour... Unfortunately it is not what I see I also tried the by.ip

Re: How does generate work ?

2009-12-03 Thread Julien Nioche
Hum... I use the max urls and sets it to 600... Because in the worst case you have 6s (measured on logs) in between urls of same host: so 6 x 600= 3600 s = 1 hour. In the worst case the long tail shouldn't last longer than 1hour... Unfortunately it is not what I see that's assuming that all

How does generate work ?

2009-12-02 Thread MilleBii
Observing my fetch cycles perf. It looks like there is always a rather long tail. I saw it on 10k, 150k, 450k fetch runs. Of course you can cut-off the tail with the patch 770 made by Julien (thx), I did some dry test looks like working, so I'm going to move it to production. Yet, what seems to