On Thu, 29 May 2003, David A. Desrosiers wrote: > Bookmarks solves these two problems, or seems to, based on these > explanations. Maybe I don't understand the larger goal here. Alexander?
Bookmarks probably solves the problem. It does require an extra step to go to the desired page, which is a bit of a nuisance, but that's all. The point was to allow more careful control over what is collected. With multiple root pages for spidering, one could get that. But one can also get that by generating an HTML page that has links to the multiple "root" pages, and then having bookmarks. It does require an extra step, but that can be done. I suppose the thing is that I use Plucker for rather different purposes from what most people do. I use it to keep large collections of scholarly texts. (E.g., I just plucked the complete works of John Henry Newman, ~18mb compressed.) Getting just the right subset of texts from a large website might be difficult. One can do some stuff with inclusion/exclusion lists, but occasionally one might want more precise control. The ideal would be to input a list of URLs that must be included in the collection, with a separate depth for each item on the list. And an assumption that there are interlinks between all of these. For instance, a site might have a home page which then links to three different pages, A, B, C, D and E. One could then get such careful control that one could include the home page, two levels of links starting with A, three levels starting with C, just page D itself and no links from it, and exclude D and E. Right now, such precise control is not available. I am not sure yet whether I need it for my purposes or not, so this is entirely theoretical. For now, creating an HTML page of links and using bookmarks will do. However, I definitely would like more control over the order things are put into the pdb--putting things in in the order fetched (with fragments correctly handled) would be OK--and I definitely would like the spider to check over all the files fetched to see if it can resolve any other links "for free" (i.e., without fetching more data from the web). The latter feature would be rather nice, since it would allow one to control precisely what order pages are put into the pdb. Alex -- Dr. Alexander R. Pruss || e-mail: [EMAIL PROTECTED] Philosophy Department || online papers and home page: Georgetown University || www.georgetown.edu/faculty/ap85 Washington, DC 20057 || U.S.A. || ----------------------------------------------------------------------------- "Philosophiam discimus non ut tantum sciamus, sed ut boni efficiamur." - Paul of Worczyn (1424) _______________________________________________ plucker-dev mailing list [EMAIL PROTECTED] http://lists.rubberchicken.org/mailman/listinfo/plucker-dev