Chian Hsieh wrote:
Hi,
I want to extract all contents started with <embed> and <object>
with/without closing tags.
My solution is using a regular expression to get it work, but there is some
exception I could not handle out.
The REGEXs I used are:
// With closing tag
if (preg_match_all("#(<(object|embed)[^>]+>.*?</\\2>)#is", $str,
$matchObjs)) {
// blahblah
// Without closing tag
} else if (preg_match_all("#(<(?:object|embed)[^>]+>)#",$str,$matchObjs)){
// blahblah
}
But it might be failed if the $str are mixed with/without closing tags:
$str ='<div><div><object type="application/x-shockwave-flash"><param
name="zz" value="xx"></object></div><div><embed src="http://sample.com"
/></div>'
In this situation, it will only get the
<object type="application/x-shockwave-flash"><param name="zz"
value="xx"></object>
but I want to get the two results which are
<object type="application/x-shockwave-flash"><param name="zz"
value="xx"></object>
<embed src="http://sample.com" />
So, is there a good way to use one REGEX to process this issue?
If you're open to using methods other than regex; then one way to get
pretty good results is to run the document through HTML Tidy, then parse
it in to a DOM and query it using xpath/xquery - basically mimic the
base way in which the browsers do it (and the way recommended by the
HTML specs)
Best,
Nathan
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php