Re: [PLUG] A "spider" or "other"?

Jim Garrison Mon, 05 Mar 2018 07:02:12 -0800

On 3/5/2018 2:52 AM, Richard Owlett wrote:
> On two fora I've posted this question, appropriately phrased to be
> explicitly *on topic*.
> 
> Did not receive any nibles much less bytes. (YEPP bad puns)
> 
> Am I asking the "right" question?
> 
> Any suggestions gleefully accepted ;/
> TIA
> 
>> By a convoluted set of links, I arrived at
>> <www.debian.org/doc/packaging-manuals/virtual-package-names-list.txt>
>>
>> That page did not have the specific details I wanted, but suggested
>> that the next level up, <www.debian.org/doc/packaging-manuals/>, would
>> be valuable. Many of the listed folders and documents will be.
>>
>> I clicked on "Parent Directory" wishing a list of sister directories
>> of <.../packaging-manuals>. Instead I got an HTML with lots of links.
>> But what I want to see is the structure of the site, NOT content of
>> individual pages. I read about spiders but they are content oriented.


The concept of a site having a static "structure" that you can somehow
access and map no longer has any meaning.  Maybe in 1995, when the web
was a bunch of static HTML files arranged in a directory tree and that
directory structure was reflected in the path portion of the URL,
but not today.  Almost everything is dynamically generated and bits
of content are fetched with AJAX and inserted into the DOM.

The only way to "map" a site is to fully emulate all the capabilites
of a browser (i.e. Javascript, AJAX, DOM, etc) and actually fetch the
content.  Each bit of content may in turn cause other bits of content
to be retrieved.  And then, once you've done all that, the next time
you request the same page you could get back something completely
different.
_______________________________________________
PLUG mailing list
[email protected]
http://lists.pdxlinux.org/mailman/listinfo/plug

Re: [PLUG] A "spider" or "other"?

Reply via email to