> I suppose that one way would be to mirror the website with wget
> (which can be throttled) then pluckerizing what was downloaded, or
> making plucker use a local proxy on a non-standard port and
> throttling inbound on that port on that machine with dummynet at the
> (openbsd, in my case) router
That presumes that Wget is allowed via robots.txt on the site
in question. Many (most?) sites that have lots of static HTML will be
blocking Wget anyway.
The solution there is to try to forge the UserAgent string,
and hope that the upstream maintainer doesn't check their logs to see
your hits.. or that they aren't using mod_throttle on the server-side
to cut you down more.
Why not just ask marc.theaimsgroup.com for the mbox and parse
that into HTML locally, and handle it that way? That's what I've been
doing here without too much difficulty.
David A. Desrosiers
[EMAIL PROTECTED]
http://gnu-designs.com
_______________________________________________
plucker-list mailing list
[email protected]
http://lists.rubberchicken.org/mailman/listinfo/plucker-list