On Thu, 29 May 2003, Michael Nordstrom wrote:
> Plucker's exclusion list format allows you to both exclude and
> include links,
> 
>     http://docs.plkr.org/node55.html

It doesn't quite do what I was after.  In iSiloX I can specify a multiple
list of URLs that the fetching starts from, while Plucker only lets me
specify a single URL.  Thus, suppose I want to get only two URLs from my
home page: index.html and cv.html .  Now, index.html links to cv.html and
to many other things.  Ideally, I could have an include list of two URLs,
index.html and cv.html, and then set the spidering depth to 1.  If I put
index.html as the main url to fetch in Plucker, put the spidering depth at
1, and then add cv.html to the inclusion list, it never fetches cv.html,
because the inclusion list is only processed when it attempts to fetch
something.

Of course, even if it fetched index.html and cv.html, there would still be
a bit of a problem, in that I think the current python code would miss the
link from index.html to cv.html, since once it has reached maximum depth,
it no longer checks for further links even to files that have already been
fetched (at least so it seems from my experiments, but I could be
wrong).  I think the code should do a final check whether any of the links
in the files fetched at maximum depth can be resolved to things already
fetched.  This would be nice.

Alex

--
Dr. Alexander R. Pruss  || e-mail: [EMAIL PROTECTED]
Philosophy Department   || online papers and home page:
Georgetown University   ||  www.georgetown.edu/faculty/ap85
Washington, DC 20057    ||
U.S.A.                  ||
-----------------------------------------------------------------------------
   "Philosophiam discimus non ut tantum sciamus, sed ut boni efficiamur."
       - Paul of Worczyn (1424)

_______________________________________________
plucker-dev mailing list
[EMAIL PROTECTED]
http://lists.rubberchicken.org/mailman/listinfo/plucker-dev

Reply via email to