At 05:52 PM 3/14/2002 +0200, Ando Saabas wrote:
>Ok let me explain my problem further some. I need the regular expression to
>purify the html page from script tags:
>I used: $file = eregi_replace("(<script(.*)>.*</script>)", " ", $file);
>Now this works fine, until theres a webpage like:
>
><script something>script data.</script>
>Some webpage data
><script something>another script data </script>
>
>so the regexp above replaces everything between first <script > and last
></script> ie the webpage data also.
>So i thought to change the regexp to something like this: $file =
>eregi_replace("(<script(.*)>NOT(script)</script>)", " ", $file);
>where NOT(script) would match everything that contains word script
I suspect that POSIX extended regular expression functions will not be
sufficient to do what you want. Most likely you will need to use the PRCE
functions (preg_replace, etc.) I tried to come up with a regex to do what
you are looking for but it's beyond me. I think it may have something to
do with what is called a "negative look ahead assertion", although I
couldn't personally get it to work. You can read about negative look ahead
assertions here:
http://www.perldoc.com/perl5.6.1/pod/perlre.html
You may be better off asking this question on a Perl newsgroup or mailing
list...
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php