Hi,

I'm trying to use regex to strip an HTML text from the <A HREF> tag... but keep the http link intact.

Here is an example:

Here is the "dirty" text:

$dirty = "Check <A HREF= "" style="color: rgb(255, 0, 0); font-weight: bold;">\"http://www.opengroup.org/cde/\ " TARGET=\"_blank\">The Open Group's Web site</A> for updates.
 <P> For Solaris/Sun OS, use <A HREF="" style="color: rgb(255, 0, 0); font-weight: bold;">\"http://www.securityfocus.com/archive/1/358426 \" TARGET=\"_blank\"> this workaround</A>
 for protecting the 'dtlogin' service from remote access </A>. Sun also released a patch available at <A HREF="" style="color: rgb(255, 0, 0); font-weight: bold;">\"http://su
nsolve.sun.com/search/document.do?assetkey=1-26-57539-1\" TARGET= \"_blank\">Sun Alert 57539</A>.";

** '\' were added to regard the " (dobule qoutes) as text.

here is how the text should look like:

$clean = "
Check [http://www.opengroup.org/cde/]   {The Open Group's Web site} for updates.
  For Solaris/Sun OS, use [http://www.securityfocus.com/archive/1/358426] this workaround for protecting the 'dtlogin' service from remote access.
  Sun also released a patch available at [http://sunsolve.sun.com/search/document.do?assetkey=1-26-57539-1] {Sun Alert 57539}";

i'm using this to remove any HTML tags, but it also removes the HREF tags:

    # remove all HTML TAGS
    $solution =~ s/<[^>]*>//gs;
   
    # remove all escape chars like gt & quot
    $solution =~ s/&gt;/>/gs;
    $solution =~ s/&quot;/"/gs;   

Can you help?
--
Eyal Edri | System & Security Engineer  | [EMAIL PROTECTED] Communication.
_______________________________________________
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

Reply via email to