Re: [PHP] regexp on user supplied link

2002-02-20 Thread DL Neil

I'll offer the following code to get you started on the task - and invite 
critiques/improvements!
(cribbed from various sources and 'tuned' - note that either apostrophes or double 
quotes can be used to delimit
the URL)

 $bValidity  = $iFound
   = preg_match_all( "/(href *= *['\"]?)([^'\" >]*)(['\" >])/i", 
$HTML, $aRegExOut );
 if ( 0  < $iFound )
 {
  $aA  = $aRegExOut[2];
  if ( DEBUG ) { ShowList( "Located", $aA ); }

BTW I'm covering a case of finding multiple links in one piece of HTML. This can be 
dialed-back for single
cases.

The rest I'll leave to you.

Regards,
=dn

- Original Message -
From: "SpamSucks86" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: 20 February 2002 21:16
Subject: RE: [PHP] regexp on user supplied link


> I absolutely hate regular expressions because I suck at writing
> them...but I can help you with the logic. I was thinking search for a
> pattern which matches HREF=" + any number of characters + ". Your match
> would be HREF="blahblahblah". Then, you could go and chop off the HREF="
> and the lagging ", and then you are left with just a URL. Then, you can
> use that built in url parser function (I forget its name, I think it
> might be urlparse()). Then, see if there is no host, it's obviously a
> relative link, otherwise, you can just see if the host matches or not.
> This should work well. Good luck
>
> -----Original Message-
> From: Martin Towell [mailto:[EMAIL PROTECTED]]
> Sent: Tuesday, February 19, 2002 6:59 PM
> To: '[EMAIL PROTECTED]'; php
> Subject: RE: [PHP] regexp on user supplied link
>
> reg.ex. something like (not tested):
> "]*>"
> this would give you the entire anchor tag, then go from there?
>
> or what about using the XML parsing routines, get it to find the anchors
> and
> give you it's attributes, then go from there?
>
> Martin
>
> -Original Message-
> From: Justin French [mailto:[EMAIL PROTECTED]]
> Sent: Wednesday, February 20, 2002 10:46 AM
> To: php
> Subject: [PHP] regexp on user supplied link
>
>
> Hi,
>
> I have a website which is based purely on user-added content.  The
> problem with this is that some areas allow users to use links in the
> text, and it's difficult to ensure that they all have a decent knowledge
> of attributes such as tartget="_new", etc etc.
>
> So, I'd like a script that...
>
> 1. looks at $text for any link tags, and for each tag, does the
> following:
>
> 2. throws out everything except the HREF eg:
> http://www.somesite.com"; target="_new">click becomes
> http://www.somesite.com
>  becomes javascript:something();
>
> 3. prefixe the url with  content is user-driven, I'd like to be a little safer, and say "anything
> that begins with http://www.mysite.com OR http://mysite.com"; is an
> external link.
>
> 5. if it's an external link, suffix the URL with " TARGET="_new">, or if
> it's internal, suffix it with ">
>
>
> Anyway, that'd be a great start.  From there, I might like to prex each
> external link to go thru a program called out.php to log affiliate
> activity, and I might like to retain onmouseover, onclick, onmouseout
> etc etc properties in the tag, I might like to ensure a session ID is
> found within each internal link, and stripped from each external link,
> ensure that the  has a matching  etc etc, but the above would be
> a great start.
>
>
> Any help, especially with steps 1, 2 & 4, would be much appreciated.
>
>
> Thanks in advance,
>
> Justin French
> http://indent.com.au
> http://soundpimps.com
>
> --
> PHP General Mailing List (http://www.php.net/)
> To unsubscribe, visit: http://www.php.net/unsub.php
>
>
> --
> PHP General Mailing List (http://www.php.net/)
> To unsubscribe, visit: http://www.php.net/unsub.php
>
>


-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php




RE: [PHP] regexp on user supplied link

2002-02-20 Thread SpamSucks86

I absolutely hate regular expressions because I suck at writing
them...but I can help you with the logic. I was thinking search for a
pattern which matches HREF=" + any number of characters + ". Your match
would be HREF="blahblahblah". Then, you could go and chop off the HREF="
and the lagging ", and then you are left with just a URL. Then, you can
use that built in url parser function (I forget its name, I think it
might be urlparse()). Then, see if there is no host, it's obviously a
relative link, otherwise, you can just see if the host matches or not.
This should work well. Good luck

-Original Message-
From: Martin Towell [mailto:[EMAIL PROTECTED]] 
Sent: Tuesday, February 19, 2002 6:59 PM
To: '[EMAIL PROTECTED]'; php
Subject: RE: [PHP] regexp on user supplied link

reg.ex. something like (not tested):
"]*>"
this would give you the entire anchor tag, then go from there?

or what about using the XML parsing routines, get it to find the anchors
and
give you it's attributes, then go from there?

Martin

-Original Message-
From: Justin French [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, February 20, 2002 10:46 AM
To: php
Subject: [PHP] regexp on user supplied link


Hi,

I have a website which is based purely on user-added content.  The
problem with this is that some areas allow users to use links in the
text, and it's difficult to ensure that they all have a decent knowledge
of attributes such as tartget="_new", etc etc.

So, I'd like a script that...

1. looks at $text for any link tags, and for each tag, does the
following:

2. throws out everything except the HREF eg:
http://www.somesite.com"; target="_new">click becomes
http://www.somesite.com
 becomes javascript:something();

3. prefixe the url with http://www.mysite.com OR http://mysite.com"; is an
external link.

5. if it's an external link, suffix the URL with " TARGET="_new">, or if
it's internal, suffix it with ">


Anyway, that'd be a great start.  From there, I might like to prex each
external link to go thru a program called out.php to log affiliate
activity, and I might like to retain onmouseover, onclick, onmouseout
etc etc properties in the tag, I might like to ensure a session ID is
found within each internal link, and stripped from each external link,
ensure that the  has a matching  etc etc, but the above would be
a great start.


Any help, especially with steps 1, 2 & 4, would be much appreciated.


Thanks in advance,

Justin French
http://indent.com.au
http://soundpimps.com

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php




RE: [PHP] regexp on user supplied link

2002-02-19 Thread Martin Towell

reg.ex. something like (not tested):
"]*>"
this would give you the entire anchor tag, then go from there?

or what about using the XML parsing routines, get it to find the anchors and
give you it's attributes, then go from there?

Martin

-Original Message-
From: Justin French [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, February 20, 2002 10:46 AM
To: php
Subject: [PHP] regexp on user supplied link


Hi,

I have a website which is based purely on user-added content.  The
problem with this is that some areas allow users to use links in the
text, and it's difficult to ensure that they all have a decent knowledge
of attributes such as tartget="_new", etc etc.

So, I'd like a script that...

1. looks at $text for any link tags, and for each tag, does the following:

2. throws out everything except the HREF eg:
http://www.somesite.com"; target="_new">click becomes
http://www.somesite.com
 becomes javascript:something();

3. prefixe the url with http://www.mysite.com OR http://mysite.com"; is an
external link.

5. if it's an external link, suffix the URL with " TARGET="_new">, or if
it's internal, suffix it with ">


Anyway, that'd be a great start.  From there, I might like to prex each
external link to go thru a program called out.php to log affiliate
activity, and I might like to retain onmouseover, onclick, onmouseout
etc etc properties in the tag, I might like to ensure a session ID is
found within each internal link, and stripped from each external link,
ensure that the  has a matching  etc etc, but the above would be
a great start.


Any help, especially with steps 1, 2 & 4, would be much appreciated.


Thanks in advance,

Justin French
http://indent.com.au
http://soundpimps.com

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php