Edit report at https://bugs.php.net/bug.php?id=55127&edit=1

 ID:                 55127
 Comment by:         blasterdrp at gmail dot com
 Reported by:        frederic dot auguste at gmail dot com
 Summary:            SimpleXML and HTML5 microformat
 Status:             Wont fix
 Type:               Feature/Change Request
 Package:            SimpleXML related
 PHP Version:        5.3.6
 Block user comment: N
 Private report:     N

 New Comment:

"I don't really see any point complicating the SimpleXML API to support this, 
given the workaround is that easy."

The DOMDocument class is disrespectful toward input HTML and makes a lot of 
assumptions that you can't persuade it from making. It self-terminates tags you 
may not want to self-terminate, such as <script/>, or leaves open tags you may 
want to close, such as <li>, depending on whether or not you put it in quirks 
mode or strict mode. Furthermore it also arbitrarily adds DTDs if it doesn't 
like the one you already have (<!DOCTYPE html> is acceptable for HTML 5), and 
the same with meta tags, and so forth.

The DOMDocument should only change what you tell it to change, but instead it 
changes everything, and there's no way to tell it not to.


Previous Comments:
------------------------------------------------------------------------
[2011-07-04 08:09:30] paj...@php.net

Additionally you can use tidy to create somehow valid xhtml out of a broken 
html 
input.

------------------------------------------------------------------------
[2011-07-04 08:04:31] ahar...@php.net

By definition, it's not valid XML. It's already possible to use SimpleXML to 
manipulate this markup by using DOMDocument::loadHTML() first; eg:

$doc = new DOMDocument;
$doc->loadHTML($xml);
$a = simplexml_import_dom($doc->documentElement);

I don't really see any point complicating the SimpleXML API to support this, 
given the workaround is that easy.

------------------------------------------------------------------------
[2011-07-04 07:03:44] frederic dot auguste at gmail dot com

Description:
------------
We would like to manipulate and genere HTML5 microformat.

Parsing a HTML5 microformat with simpleXML is not possible : Some warning are 
generated and simplexml_load_string function return false.

The problem is with the itemscope attribute : It has no value.

Our XML is available on this web site : http://schema.org/Person

Can you add these manipulations in simpleXML API ?
 * add attribut without value
 * parsing XML with attribute without value.

Thanks.

Test script:
---------------
<?php
$xml = <<<XML
<div itemscope itemtype="http://schema.org/Person";>
  <span itemprop="name">Jane Doe</span>
  <img src="janedoe.jpg" itemprop="image" />

  <span itemprop="jobTitle">Professor</span>
  <div itemprop="address" itemscope itemtype="http://schema.org/PostalAddress";>
    <span itemprop="streetAddress">
      20341 Whitworth Institute
      405 N. Whitworth
    </span>
    <span itemprop="addressLocality">Seattle</span>,
    <span itemprop="addressRegion">WA</span>
    <span itemprop="postalCode">98052</span>
  </div>
  <span itemprop="telephone">(425) 123-4567</span>
  <a href="mailto:jane-...@xyz.edu"; itemprop="email">
    jane-...@xyz.edu</a>

  Jane's home page:
  <a href="www.janedoe.com" itemprop="url">janedoe.com</a>

  Graduate students:
  <a href="www.xyz.edu/students/alicejones.html" itemprop="colleagues">
    Alice Jones</a>
  <a href="www.xyz.edu/students/bobsmith.html" itemprop="colleagues">
    Bob Smith</a>
</div>
XML;

$a = simplexml_load_string($xml);

if($a == false) {
        echo "XML not valid"; 
}
else {
        echo $a->asXML();
}




------------------------------------------------------------------------



-- 
Edit this bug report at https://bugs.php.net/bug.php?id=55127&edit=1

Reply via email to