I'm trying to use regex to strip an HTML text from the <A HREF> tag... but keep the http link intact.
Here is an example:
Here is the "dirty" text:
$dirty = "Check <A HREF= "" style="color: rgb(255, 0, 0); font-weight: bold;">\"http://www.opengroup.org/cde/\ " TARGET=\"_blank\">The Open Group's Web site</A> for updates.
<P> For Solaris/Sun OS, use <A HREF="" style="color: rgb(255, 0, 0); font-weight: bold;">\"http://www.securityfocus.com/archive/1/358426 \" TARGET=\"_blank\"> this workaround</A>
for protecting the 'dtlogin' service from remote access </A>. Sun also released a patch available at <A HREF="" style="color: rgb(255, 0, 0); font-weight: bold;">\"http://su
nsolve.sun.com/search/document.do?assetkey=1-26-57539-1\" TARGET= \"_blank\">Sun Alert 57539</A>.";
** '\' were added to regard the " (dobule qoutes) as text.
here is how the text should look like:
$clean = "Check [http://www.opengroup.org/cde/] {The Open Group's Web site} for updates.
For Solaris/Sun OS, use [http://www.securityfocus.com/archive/1/358426] this workaround for protecting the 'dtlogin' service from remote access.
Sun also released a patch available at [http://sunsolve.sun.com/search/document.do?assetkey=1-26-57539-1] {Sun Alert 57539}";
i'm using this to remove any HTML tags, but it also removes the HREF tags:
# remove all HTML TAGS
$solution =~ s/<[^>]*>//gs;
# remove all escape chars like gt & quot
$solution =~ s/>/>/gs;
$solution =~ s/"/"/gs;
Can you help?
--
Eyal Edri | System & Security Engineer | [EMAIL PROTECTED] Communication.
_______________________________________________ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs