> I suppose that one way would be to mirror the website with wget 
> (which can be throttled) then pluckerizing what was downloaded, or 
> making plucker use a local proxy on a non-standard port and 
> throttling inbound on that port on that machine with dummynet at the 
> (openbsd, in my case) router

        That presumes that Wget is allowed via robots.txt on the site 
in question. Many (most?) sites that have lots of static HTML will be 
blocking Wget anyway. 

        The solution there is to try to forge the UserAgent string, 
and hope that the upstream maintainer doesn't check their logs to see 
your hits.. or that they aren't using mod_throttle on the server-side 
to cut you down more.

        Why not just ask marc.theaimsgroup.com for the mbox and parse 
that into HTML locally, and handle it that way? That's what I've been 
doing here without too much difficulty.



David A. Desrosiers
[EMAIL PROTECTED]
http://gnu-designs.com
_______________________________________________
plucker-list mailing list
[email protected]
http://lists.rubberchicken.org/mailman/listinfo/plucker-list

Reply via email to