> One user got his IP banned by Slashdot because JPluck (0.8.6.2) was making
> too many simultaneous requests. To counter this, I introduced a 750ms
> delay when creating multiple connections and lowered the maximum number of
> connections to 4.

        Rob warned us about this several months ago, they monitor the logs
at Slashdot VERY closely, and he specifically said to point to the version
at http://slashd.org/palm version of the site, and NOT to pound it.

        What facilities in JPluck are you using to adhere to robots.txt?

> Also, 0.8.6 has no cache so resources are always retrieved in their
> entirety. The HTTP cache in 0.9 greatly reduces network traffic as the
> spider can make conditional GET requests using the if-modified-since
> header.

        Are you also using (as I am in perl) a simultaneous HEAD request to
see that the page is indeed valid, before making a GET request for it?


d.


_______________________________________________
plucker-list mailing list
[EMAIL PROTECTED]
http://lists.rubberchicken.org/mailman/listinfo/plucker-list

Reply via email to