David A. Desrosiers wrote: > Rob warned us about this several months ago, they monitor the logs > at Slashdot VERY closely, and he specifically said to point to the > version at http://slashd.org/palm version of the site, and NOT to > pound it.
That is why pushing the 0.9 release is the right thing to do. I did not foresee that JPluck would take off like this and the report of an IP ban prompted me into action. The delay between connections, the HTTP cache, and the lowered limit on the number of simultaneous connections should be adequate to address this concern for the future. > What facilities in JPluck are you using to adhere to robots.txt? None yet, but these could be added easily. However, I think it'll piss off some users. For instance, The Onion has a robots.txt that disallows everything but the home page. http://mobile.theonion.com/robots.txt. Does plucker-build parse robots.txt? Also, should robots.txt parsing be mandatory or on a voluntary basis? If you make this configurable, the first thing people will do is shut it off. > Are you also using (as I am in perl) a simultaneous HEAD request to > see that the page is indeed valid, before making a GET request for it? The response to the GET will tell you whether it's valid. Performing an extra HEAD only leads to more traffic(if only marginal). The if-modified-since request header is there to avoid doing a HEAD request. (Browsers do it this way as well.) Otherwise you have to retrieve the last-modified date using a HEAD, then decide whether to perform a GET based on that. Also, servers often do not return a last-modified date in their response, but do handle if-modified-since. This makes sense, it's often hard to say when exactly a resource was last modified but it's easier to say that it hasn't been modified since a particular date. As an aside: do people think the default user-agent should be AvantGo? Personally, I don't like to play hide-and-seek, but some people might be concerned that when too much of these "alien" user agents show up in the log of AvantGo sites, webmasters will take better measures to stop non-AvantGo clients from retrieving content. The PDA version of space.com already has such a protection. They seem to scan the IP address, which is protection that cannot be easily foiled. Regards -Laurens _______________________________________________ plucker-list mailing list [EMAIL PROTECTED] http://lists.rubberchicken.org/mailman/listinfo/plucker-list

