Re: [PHP] regexp on user supplied link
I'll offer the following code to get you started on the task - and invite critiques/improvements! (cribbed from various sources and 'tuned' - note that either apostrophes or double quotes can be used to delimit the URL) $bValidity = $iFound = preg_match_all( "/(href *= *['\"]?)([^'\" >]*)(['\" >])/i", $HTML, $aRegExOut ); if ( 0 < $iFound ) { $aA = $aRegExOut[2]; if ( DEBUG ) { ShowList( "Located", $aA ); } BTW I'm covering a case of finding multiple links in one piece of HTML. This can be dialed-back for single cases. The rest I'll leave to you. Regards, =dn - Original Message - From: "SpamSucks86" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Sent: 20 February 2002 21:16 Subject: RE: [PHP] regexp on user supplied link > I absolutely hate regular expressions because I suck at writing > them...but I can help you with the logic. I was thinking search for a > pattern which matches HREF=" + any number of characters + ". Your match > would be HREF="blahblahblah". Then, you could go and chop off the HREF=" > and the lagging ", and then you are left with just a URL. Then, you can > use that built in url parser function (I forget its name, I think it > might be urlparse()). Then, see if there is no host, it's obviously a > relative link, otherwise, you can just see if the host matches or not. > This should work well. Good luck > > -----Original Message- > From: Martin Towell [mailto:[EMAIL PROTECTED]] > Sent: Tuesday, February 19, 2002 6:59 PM > To: '[EMAIL PROTECTED]'; php > Subject: RE: [PHP] regexp on user supplied link > > reg.ex. something like (not tested): > "]*>" > this would give you the entire anchor tag, then go from there? > > or what about using the XML parsing routines, get it to find the anchors > and > give you it's attributes, then go from there? > > Martin > > -Original Message- > From: Justin French [mailto:[EMAIL PROTECTED]] > Sent: Wednesday, February 20, 2002 10:46 AM > To: php > Subject: [PHP] regexp on user supplied link > > > Hi, > > I have a website which is based purely on user-added content. The > problem with this is that some areas allow users to use links in the > text, and it's difficult to ensure that they all have a decent knowledge > of attributes such as tartget="_new", etc etc. > > So, I'd like a script that... > > 1. looks at $text for any link tags, and for each tag, does the > following: > > 2. throws out everything except the HREF eg: > http://www.somesite.com"; target="_new">click becomes > http://www.somesite.com > becomes javascript:something(); > > 3. prefixe the url with content is user-driven, I'd like to be a little safer, and say "anything > that begins with http://www.mysite.com OR http://mysite.com"; is an > external link. > > 5. if it's an external link, suffix the URL with " TARGET="_new">, or if > it's internal, suffix it with "> > > > Anyway, that'd be a great start. From there, I might like to prex each > external link to go thru a program called out.php to log affiliate > activity, and I might like to retain onmouseover, onclick, onmouseout > etc etc properties in the tag, I might like to ensure a session ID is > found within each internal link, and stripped from each external link, > ensure that the has a matching etc etc, but the above would be > a great start. > > > Any help, especially with steps 1, 2 & 4, would be much appreciated. > > > Thanks in advance, > > Justin French > http://indent.com.au > http://soundpimps.com > > -- > PHP General Mailing List (http://www.php.net/) > To unsubscribe, visit: http://www.php.net/unsub.php > > > -- > PHP General Mailing List (http://www.php.net/) > To unsubscribe, visit: http://www.php.net/unsub.php > > -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
RE: [PHP] regexp on user supplied link
I absolutely hate regular expressions because I suck at writing them...but I can help you with the logic. I was thinking search for a pattern which matches HREF=" + any number of characters + ". Your match would be HREF="blahblahblah". Then, you could go and chop off the HREF=" and the lagging ", and then you are left with just a URL. Then, you can use that built in url parser function (I forget its name, I think it might be urlparse()). Then, see if there is no host, it's obviously a relative link, otherwise, you can just see if the host matches or not. This should work well. Good luck -Original Message- From: Martin Towell [mailto:[EMAIL PROTECTED]] Sent: Tuesday, February 19, 2002 6:59 PM To: '[EMAIL PROTECTED]'; php Subject: RE: [PHP] regexp on user supplied link reg.ex. something like (not tested): "]*>" this would give you the entire anchor tag, then go from there? or what about using the XML parsing routines, get it to find the anchors and give you it's attributes, then go from there? Martin -Original Message- From: Justin French [mailto:[EMAIL PROTECTED]] Sent: Wednesday, February 20, 2002 10:46 AM To: php Subject: [PHP] regexp on user supplied link Hi, I have a website which is based purely on user-added content. The problem with this is that some areas allow users to use links in the text, and it's difficult to ensure that they all have a decent knowledge of attributes such as tartget="_new", etc etc. So, I'd like a script that... 1. looks at $text for any link tags, and for each tag, does the following: 2. throws out everything except the HREF eg: http://www.somesite.com"; target="_new">click becomes http://www.somesite.com becomes javascript:something(); 3. prefixe the url with http://www.mysite.com OR http://mysite.com"; is an external link. 5. if it's an external link, suffix the URL with " TARGET="_new">, or if it's internal, suffix it with "> Anyway, that'd be a great start. From there, I might like to prex each external link to go thru a program called out.php to log affiliate activity, and I might like to retain onmouseover, onclick, onmouseout etc etc properties in the tag, I might like to ensure a session ID is found within each internal link, and stripped from each external link, ensure that the has a matching etc etc, but the above would be a great start. Any help, especially with steps 1, 2 & 4, would be much appreciated. Thanks in advance, Justin French http://indent.com.au http://soundpimps.com -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
RE: [PHP] regexp on user supplied link
reg.ex. something like (not tested): "]*>" this would give you the entire anchor tag, then go from there? or what about using the XML parsing routines, get it to find the anchors and give you it's attributes, then go from there? Martin -Original Message- From: Justin French [mailto:[EMAIL PROTECTED]] Sent: Wednesday, February 20, 2002 10:46 AM To: php Subject: [PHP] regexp on user supplied link Hi, I have a website which is based purely on user-added content. The problem with this is that some areas allow users to use links in the text, and it's difficult to ensure that they all have a decent knowledge of attributes such as tartget="_new", etc etc. So, I'd like a script that... 1. looks at $text for any link tags, and for each tag, does the following: 2. throws out everything except the HREF eg: http://www.somesite.com"; target="_new">click becomes http://www.somesite.com becomes javascript:something(); 3. prefixe the url with http://www.mysite.com OR http://mysite.com"; is an external link. 5. if it's an external link, suffix the URL with " TARGET="_new">, or if it's internal, suffix it with "> Anyway, that'd be a great start. From there, I might like to prex each external link to go thru a program called out.php to log affiliate activity, and I might like to retain onmouseover, onclick, onmouseout etc etc properties in the tag, I might like to ensure a session ID is found within each internal link, and stripped from each external link, ensure that the has a matching etc etc, but the above would be a great start. Any help, especially with steps 1, 2 & 4, would be much appreciated. Thanks in advance, Justin French http://indent.com.au http://soundpimps.com -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php