Hello there,

I posted the following on my blog and wanted to check with the crowd if I didn't miss anything obvious here. <http://lettre13.com/2007/02/07/trapping-errors-with-simplexml-for- not-well-formed-xml/>

Cheers
-Emmanuel


I discovered the hard way that in PHP5 there are no obvious ways to detect if some XML is well-formed, especially if you want to deploy on Unix/Windows platform and don’t want to access the shell directly.

Adding to this problem, I discovered also that the DOM and simplexml extensions can’t use the PHP5 exception handling to trap the errors when the XML is not well-formed. Using simplexml or the DOM extensions against not well-formed XML, the errors generated by these extensions are not trapped and are displayed immediately.

It’s possible to load with the DOM or the Tidy extensions not well- formed XML, and then repair it on the fly. But what if you need to detect not well-formed XML and provide a message stating the error?

Fortunately, after some research, I found that you could use the libxml functions (PHP 5.1 and over) to test XML well formedness and trap XML errors. So, I wiped out this little function called get_xml_object (see here (1) for the inspiration) that allow me to trap errors when simplexml is used to parse XML. The function is quite simple, by default, you provide a path to a XML file. If you want to use a string, just add another argument after the first parameter (it can’t be anything, but here’s I chose “string” for clarity sakes). You can also replace the simplexml extension by the DOM extensions if you prefer this extension to parse XML.

The function get_xml_object will return an array that contains two keys, errors and xml. In this example, $result=get_xml_object($s, "string"), $result is an array. If there are no errors, $result ['errors'] will be set to null. If everything is ok, $result['xml'] will contains a simplexml object that you can then manipulate with the simplexml extension.

$s = "tag>hello world</tag>";
// $s = "<tag>hello world</tag>";

function get_xml_object ($xml, $xmlFormat=”file”) {

  $xml_object = null;
  $result = array (”errors” => null, “xml” => null);

  libxml_use_internal_errors (true);
  $xmlFormat == “file”  ? $xml_object = simplexml_load_file ($xml)
                        : $xml_object = simplexml_load_string ($xml);

  if (!$xml_object) {
     $errors = libxml_get_errors();
     foreach ($errors as $error) {
         $error_msg = “Error: line: ” . $error->line
                    . “: column: ” . $error->column . “: ”
                    . $error->message . “n”;
     }
     libxml_clear_errors();
     $result[”errors”] = $error_msg;
  } else {
    $result[”xml”] = $xml_object;
  }
  return $result;
}

$result = get_xml_object ($s, “string”);

if ($result[’errors’]) {
  var_dump ($result[’errors’]);
} else {
  var_dump ($result[’xml’]);
}

(1) <http://ca3.php.net/manual/en/function.libxml-use-internal- errors.php>_______________________________________________
New York PHP Community Talk Mailing List
http://lists.nyphp.org/mailman/listinfo/talk

NYPHPCon 2006 Presentations Online
http://www.nyphpcon.com

Show Your Participation in New York PHP
http://www.nyphp.org/show_participation.php

Reply via email to