At 04:47 PM 6/26/2001 -0700, you wrote: >On Mon, 25 Jun 2001, Erick Thompson wrote: > > > > > Hello everyone, > > > > I am writing a basic robot in C#. I have had good success in grabbing URLs > > from the source page, but a lot of sites are using image maps and > > javascript navigation systems. I think that I can extract URLs out of the > > javascript, but I'm not sure on the image maps. How are people handling > these? > >Interesting about the javascript; are you actually implementing >a JS interpreter ? I think there is open-source code at Mozilla.org. >I see more sites with dynamic scripts using external JS functions, >and if you're unlucky not bothering to create non-JS menus.
That is an interesting idea, implementing a javascript client, and see where I can go. However, I think that it would be difficult to find all events that could happen. What I was planning to do for the javascript was to parse the source, looking for URL type strings and grabbing them. Before I visit those URLs, I make sure they are well formed, etc. It won't catch dynamcially created URLs, but it should catch most navigation systems (as the URLs are usually defined in some sort of array or collection). >Re. image maps; most sites I think moved to client-side maps which are >easily parsable - AREA tag. I mean, all you have to do is recognize that >there's a link and maybe pick up the ALT text, not actually generate X,Y >pairs. >As regards server-side maps, I wrote a robot-friendly imagemap CGI about >the time that everyone moved to client-side. The idea was that, if you >didn't send any coordinates that you would get the default link (same as >if you clicked some X,Y that was out of bounds), and the server would >return a list of choices instead of the usual dumb "upgrade your browser" >text that must bug the h*ll out of blind users. I would think that it does (or did, most image maps are gone, thank god). I guess for the server side scripts, I could ignore them, or see if they have a default link, or click on a grid, and see if I can find the links. I have a feeling that I'll probably take ignore route, as they are not so common anymore. Thanks, Erick -- This message was sent by the Internet robots and spiders discussion list ([EMAIL PROTECTED]). For list server commands, send "help" in the body of a message to "[EMAIL PROTECTED]".
