On Fri, 2009-10-23 at 13:23 -0400, Brad Fuller wrote:

> I'm looking for a regular expression to accomplish a specific task.
> I'm hoping someone who's really good at regex patterns can lend a quick hand.
> I need a regex pattern that will grab URLs out of HTML that have a
> certain link text. (i.e. the word "Continue")
> This is what I have so far but it does not work properly (If there are
> other attributes in the <a> tag it returns them as part of the URL.)
> preg_match_all('#<a[\s]+[^>]*href\s*=\s*([\"\']+)([^>]+?)(\1|>)>Continue</a>#i',
> $html, $matches);
> It needs to be able to extract the URL and disregard arbitrary
> attributes in the HTML tag
> Test it with the following examples:
> <a href=/path/to/url.html>Continue</a>
> <a href='/path/to/url.html'>Continue</a>
> <a href="http://example.com/path/to/url.html"; class="link">Continue</a>
> <a style="font-size: 12px" href="http://example.com/path/to/url.html";
> onlick="someFunction('foo','bar')">Continue</a>
> Please reply
> Your help is much appreciated.
> Thanks in advance,
> Brad F.

\"\']+?).+?>Continue</a>#i', $html, $matches);

I just changed your regex a bit. What your regex was previously doing
was matching everything from the first quote after the href= right up
until the first > it found, which would usually be the one that closes
the opening tag. You could make it a bit more intelligent if you wished
with backreferencing to make sure it matches against the same type of
quotation character it matched as the start of the href's value.


