[PHP] parsing malformed xml documents

2006-04-04 Thread Mariano Guadagnini

Hei guys,
I´m parsing some xml's and fetching nodes using xpath, and the PHP 5.0 
DOM. Unfortunately, some documents have white spaces in the beginning or 
some missing tags. In some situations, the script just skips that xml, 
or even crashes without notice. I tried loading them as html, and 
disabling validation, but that didn´t do the trick, as they have invalid 
html tags.
I wonder if there is some way (maybe an external class, or something) to 
accomplish this. I know that theorically, it would be better to have 
well formed xml (I also think so), but I need to handle them at any 
rate, and they´re created by an external source away from my control.


Thanks in advance,

Mariano Guadagnini


--
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.385 / Virus Database: 268.3.5/300 - Release Date: 03/04/2006

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] parsing malformed xml documents

2006-04-04 Thread Rasmus Lerdorf

Mariano Guadagnini wrote:

Hei guys,
I´m parsing some xml's and fetching nodes using xpath, and the PHP 5.0 
DOM. Unfortunately, some documents have white spaces in the beginning or 
some missing tags. In some situations, the script just skips that xml, 
or even crashes without notice. I tried loading them as html, and 
disabling validation, but that didn´t do the trick, as they have invalid 
html tags.
I wonder if there is some way (maybe an external class, or something) to 
accomplish this. I know that theorically, it would be better to have 
well formed xml (I also think so), but I need to handle them at any 
rate, and they´re created by an external source away from my control.


How about something like this:

  $dom = @DOMDocument::loadHTML($xml);
  if(is_object($dom)) $xpath = new DOMXPath($dom);
  else {
$xml = tidy_repair_string($xml);
if($xml) {
$dom = @DOMDocument::loadHTML($xml);
if(is_object($dom)) $xpath = new DOMXPath($dom);
}
  }

-Rasmus

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php