Edit report at https://bugs.php.net/bug.php?id=44367&edit=1
ID: 44367 Comment by: mike at skew dot org Reported by: daniel dot oconnor at gmail dot com Summary: DOMDocument::baseURI parsing is out of whack Status: Not a bug Type: Bug Package: DOM XML related Operating System: Windows PHP Version: 5.2.5 Assigned To: rrichards Block user comment: N Private report: N New Comment: I submitted a couple of related feature requests: Request #65364 - In doc not loaded from a URL, baseURI should still be a real URI Request #65365 - Allow defining baseURI of doc not loaded directly from URL Previous Comments: ------------------------------------------------------------------------ [2013-07-14 12:53:55] hanskrentel at yahoo dot de Please take care that PHP's DOMDocument does not offer the DOM CORE Level 3 feature at all. So whatever the specs of that DOM Core Level say, nothing - absolutely nothing - allows to draw the conclusion that this (perhaps by accident same named property) is an implementation of DOM Core Level 3. PHP's DOMDocument has only DOM Core Level 1 feature which does not cover this property. All references to XML Infoset in this ticket are therefore completely bogus. ------------------------------------------------------------------------ [2013-02-15 09:52:26] sites at hubmed dot org A test case which illustrates that the baseURI parsing is working correctly now (at least in PHP 5.3.15): <?php $doc = DOMDocument::load('http://www.w3.org/2001/sw/grddl-wg/td/inline-rdf6.xml'); var_dump($doc->baseURI); // "http://www.w3.org/2001/sw/grddl-wg/td/inline-rdf6.xml" var_dump($doc->documentElement->baseURI); // "http://wwww.example.org/" As http://www.w3.org/TR/xmlbase/ describes, the base URI of a document entity is the URI used to retrieve the document entity. The base URI of an element (including the document element) is detected by various rules, starting with the xml:base attribute on the element. ------------------------------------------------------------------------ [2008-03-12 22:30:13] daniel dot oconnor at gmail dot com :S I hate being pushy / argumentitive, sorry if its coming across that way. RFC 2396 is "Uniform Resource Identifiers (URI): Generic Syntax" Section 5.1. is "Establishing a Base URI" describes what I've been trying to say, probably a little clearer. XML Base spec @ http://www.w3.org/TR/xmlbase/#rfc2396 says: Determine a baseURI: 1. The base URI is embedded in the document's content. 2. The base URI is that of the encapsulating entity (message, document, or none). 3. The base URI is the URI used to retrieve the entity. 4. The base URI is defined by the context of the application. > This is not just how it is implemented in PHP as the other major DOM parsers > implement it the same way ... and that's why the xml:base GRDDL tests were written - to clarify correct behaviour / check implementations. ------------------------------------------------------------------------ [2008-03-12 17:16:05] rricha...@php.net still bogus as what you are describing pertains to GRDDL only not DOM, so when working with GRDDL and DOm you need to check base uri of the document element, not the DOMDocument. DOM determines base uri using the xml base spec. "The base URI of a document entity or an external entity is determined by RFC 2396 rules, namely, that the base URI is the URI used to retrieve the document entity or external entity." This is not just how it is implemented in PHP as the other major DOM parsers implement it the same way, ------------------------------------------------------------------------ [2008-03-11 00:03:46] daniel dot oconnor at gmail dot com See http://www.w3.org/TR/grddl/#base_misc & http://www.apps.ietf.org/rfc/rfc3986.html#sec-5.1 The way to determine baseURI is: 1. Look for it on the root document element (HTML - <base>, XML - <foo xml:base=""> 2. Couldn't find that? Use the URL we retrieved the document with * And make sure we follow redirects! 3. Couldn't find that? Application specific (but we don't really have a setBaseURI()) So, condition #1 is broken in 5.2.5 when you do: <?php $doc = DOMDocument::load('http://www.w3.org/2001/sw/grddl-wg/td/inline-rdf6.xml'); var_dump($doc->baseURI); //Expected http://wwww.example.org/ produces: string(53) "http://www.w3.org/2001/sw/grddl-wg/td/inline-rdf6.xml" ------------------------------------------------------------------------ The remainder of the comments for this report are too long. To view the rest of the comments, please view the bug report online at https://bugs.php.net/bug.php?id=44367 -- Edit this bug report at https://bugs.php.net/bug.php?id=44367&edit=1