On Mon, Mar 11, 2002 at 07:42:22AM -0800, David A. Desrosiers wrote:
>       So we're back to a few ideas/solutions. I'm still clamoring for a
> --stayondomain argument, which will go from the "dot" domain back up (i.e.
> ARPA specification, like .org.slashdot) and stop there, so *ANYTHING* that
> is on that domain itself (not the hostname, the _DOMAIN_) will be included,
> this means images.slashdot.org, articles.slashdot.org, etc.

I very much like that idea,.. What I've had to do to emulate that is
create a slashdot.txt file with the contents

0:-:.*
1:+:.+slashdot.org/palm.*
2:-:.*comments.*shtml

then call plucker-build with -E slashdot.txt. .. Basically for every
site I use as a cronjob pluck, I have a similar setup... its a bit of
a pain to go and create a new one when I want to add a new site in.

>       The other idea is coupled to that, and allows a maximum of links to
> be gathered before it stops. --maxlinks=200 for example, would stop the
> parse, roll up the existing data at that point, and pack it into the pdb.
> This could be coupled with a --breadth-first --depth-first option pair,
> depending on your needs (I brought this exact pair of options up about 2
> years ago as well).

Recently, PalmInfoCenter had a problem in their /palm/ directory which
had a link to their main full site, but within the /palm/ directory..
as a result plucker-build was making me a 1000+ link .pdb file of every
single story on their site, full graphics and everything. It would
always be nice to have a value to prevent any run-away links.

-- 
Adam McDaniel
Array Networks
Calgary, AB, Canada

Reply via email to