I absolutely hate regular expressions because I suck at writing
them...but I can help you with the logic. I was thinking search for a
pattern which matches HREF=" + any number of characters + ". Your match
would be HREF="blahblahblah". Then, you could go and chop off the HREF="
and the lagging ", and then you are left with just a URL. Then, you can
use that built in url parser function (I forget its name, I think it
might be urlparse()). Then, see if there is no host, it's obviously a
relative link, otherwise, you can just see if the host matches or not.
This should work well. Good luck

-----Original Message-----
From: Martin Towell [mailto:[EMAIL PROTECTED]] 
Sent: Tuesday, February 19, 2002 6:59 PM
To: '[EMAIL PROTECTED]'; php
Subject: RE: [PHP] regexp on user supplied link

reg.ex. something like (not tested):
        "<a[^>]*>"
this would give you the entire anchor tag, then go from there?

or what about using the XML parsing routines, get it to find the anchors
and
give you it's attributes, then go from there?

Martin

-----Original Message-----
From: Justin French [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, February 20, 2002 10:46 AM
To: php
Subject: [PHP] regexp on user supplied link


Hi,

I have a website which is based purely on user-added content.  The
problem with this is that some areas allow users to use links in the
text, and it's difficult to ensure that they all have a decent knowledge
of attributes such as tartget="_new", etc etc.

So, I'd like a script that...

1. looks at $text for any link tags, and for each tag, does the
following:

2. throws out everything except the HREF eg:
<A HREF="http://www.somesite.com"; target="_new">click</a> becomes
http://www.somesite.com
<A HREF="javascript:something();"> becomes javascript:something();

3. prefixe the url with <A HREF="

4. establish if it's an internal or external link:  so how do we
establish if it's an external link? well it'd be easy if we just say
"anything begining with http:// is not relative", but because this
content is user-driven, I'd like to be a little safer, and say "anything
that begins with http://www.mysite.com OR http://mysite.com"; is an
external link.

5. if it's an external link, suffix the URL with " TARGET="_new">, or if
it's internal, suffix it with ">


Anyway, that'd be a great start.  From there, I might like to prex each
external link to go thru a program called out.php to log affiliate
activity, and I might like to retain onmouseover, onclick, onmouseout
etc etc properties in the tag, I might like to ensure a session ID is
found within each internal link, and stripped from each external link,
ensure that the <A> has a matching </A> etc etc, but the above would be
a great start.


Any help, especially with steps 1, 2 & 4, would be much appreciated.


Thanks in advance,

Justin French
http://indent.com.au
http://soundpimps.com

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to