Edit report at https://bugs.php.net/bug.php?id=55127&edit=1
ID: 55127 Comment by: blasterdrp at gmail dot com Reported by: frederic dot auguste at gmail dot com Summary: SimpleXML and HTML5 microformat Status: Wont fix Type: Feature/Change Request Package: SimpleXML related PHP Version: 5.3.6 Block user comment: N Private report: N New Comment: "I don't really see any point complicating the SimpleXML API to support this, given the workaround is that easy." The DOMDocument class is disrespectful toward input HTML and makes a lot of assumptions that you can't persuade it from making. It self-terminates tags you may not want to self-terminate, such as <script/>, or leaves open tags you may want to close, such as <li>, depending on whether or not you put it in quirks mode or strict mode. Furthermore it also arbitrarily adds DTDs if it doesn't like the one you already have (<!DOCTYPE html> is acceptable for HTML 5), and the same with meta tags, and so forth. The DOMDocument should only change what you tell it to change, but instead it changes everything, and there's no way to tell it not to. Previous Comments: ------------------------------------------------------------------------ [2011-07-04 08:09:30] paj...@php.net Additionally you can use tidy to create somehow valid xhtml out of a broken html input. ------------------------------------------------------------------------ [2011-07-04 08:04:31] ahar...@php.net By definition, it's not valid XML. It's already possible to use SimpleXML to manipulate this markup by using DOMDocument::loadHTML() first; eg: $doc = new DOMDocument; $doc->loadHTML($xml); $a = simplexml_import_dom($doc->documentElement); I don't really see any point complicating the SimpleXML API to support this, given the workaround is that easy. ------------------------------------------------------------------------ [2011-07-04 07:03:44] frederic dot auguste at gmail dot com Description: ------------ We would like to manipulate and genere HTML5 microformat. Parsing a HTML5 microformat with simpleXML is not possible : Some warning are generated and simplexml_load_string function return false. The problem is with the itemscope attribute : It has no value. Our XML is available on this web site : http://schema.org/Person Can you add these manipulations in simpleXML API ? * add attribut without value * parsing XML with attribute without value. Thanks. Test script: --------------- <?php $xml = <<<XML <div itemscope itemtype="http://schema.org/Person"> <span itemprop="name">Jane Doe</span> <img src="janedoe.jpg" itemprop="image" /> <span itemprop="jobTitle">Professor</span> <div itemprop="address" itemscope itemtype="http://schema.org/PostalAddress"> <span itemprop="streetAddress"> 20341 Whitworth Institute 405 N. Whitworth </span> <span itemprop="addressLocality">Seattle</span>, <span itemprop="addressRegion">WA</span> <span itemprop="postalCode">98052</span> </div> <span itemprop="telephone">(425) 123-4567</span> <a href="mailto:jane-...@xyz.edu" itemprop="email"> jane-...@xyz.edu</a> Jane's home page: <a href="www.janedoe.com" itemprop="url">janedoe.com</a> Graduate students: <a href="www.xyz.edu/students/alicejones.html" itemprop="colleagues"> Alice Jones</a> <a href="www.xyz.edu/students/bobsmith.html" itemprop="colleagues"> Bob Smith</a> </div> XML; $a = simplexml_load_string($xml); if($a == false) { echo "XML not valid"; } else { echo $a->asXML(); } ------------------------------------------------------------------------ -- Edit this bug report at https://bugs.php.net/bug.php?id=55127&edit=1