-Original Message-
> So I'm trying to think like a regular expression (a new concept there...),
so I try to find things in common:
>
> 1) They all start with http:// :) (not much help there)
> 2) They all end with a '\/.+\/.+\.?' (could this be the key?) (Is #2
the right regex for sla
In my prior post, I should have added:
> The point of retrieving this data is to hit each link
> (returning the HTTP code), scan it for HTML tags,
> and then check the functionality of the links.
You may want to look into the WWW::Link suite of modules before you try to
reinvent them:
http://
Jeremy Junginger graced perl with these words of wisdom:
> I'm extracting links from an html page (using the HTML tags).
> Woohoo! I'm not having any problems with that part. The data looks
> that's getting returned is (much to my surprise) formatted exactly
> like I wanted itagain Woohoo!
The answer to your HTML-parsing-with-a-regex question is: don't do it.
Parsing HTML should be done with an HTML parser. This is hinted at in
PerlFaq6 and PerlFaq9, but it's not as explicit as it should be.
I would recommend HTML::TokeParser or HTML::LinkExtor for your needs.
--
Mark Thomas