strip_tags() would not solve his problem, although that was my first thought as well.
To skip tags, including content, where content contains certain words is possible.
But to me the problem occurs with nested tags. What do you want to do when you meet tables?
Here is an example that solves you're example, and similar situations, but not much else.
preg_match_all("/<(?!body|script|etc)(\w+)[^>]*>((?>(?!eee|etc|<\/ \\1>).)*)<\/\\1>/s",$text,$match);
print_r($match[2]);
will return [0] => aaa jjjj mmmm dddd yyyy ssss [1] => aaa hhh mmmm dddd yyyy ssss [2] => aaa kkkk mmmm dddd yyyy ssss
(?!body|script|etc) is used to filter unwanted tags, and in (?!eee|etc|<\/\\1>) you can put your filter words.
Hope this helps you anyway.
--
Stian
On Wed, 2 Feb 2005 11:36:26 +0100, Mirco Blitz <[EMAIL PROTECTED]> wrote:
Hi, Use strip_tags() instead of regex.
http://www.php-center.de/en-html-manual/function.strip-tags.html
Greetings Mirco
-----Ursprüngliche Nachricht----- Von: php [mailto:[EMAIL PROTECTED] Gesendet: Mittwoch, 2. Februar 2005 09:25 An: php-general@lists.php.net Betreff: [PHP] regular expresion
I want to parse a html file for instance
<body> <p>aaa jjjj mmmm dddd yyyy ssss</p> <b>aaa hhh mmmm dddd yyyy ssss</b> <p>aaa eee mmmm dddd yyyy ssss</p> <i>aaa kkkk mmmm dddd yyyy ssss</i> </body>
and I want to create a regular expresion wich is able to extract entire text
from enclosed tags WITHOUT a particular word
for example eee
final I want to obtain this result
aaa jjjj mmmm dddd yyyy ssss aaa hhh mmmm dddd yyyy ssss aaa kkkk mmmm dddd yyyy ssss
Any solution?
thank you
Silviu
-- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
-- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php