ID:               44367
 User updated by:  daniel dot oconnor at gmail dot com
 Reported By:      daniel dot oconnor at gmail dot com
 Status:           Bogus
 Bug Type:         DOM XML related
 Operating System: Windows
 PHP Version:      5.2.5
 Assigned To:      rrichards
 New Comment:

:S I hate being pushy / argumentitive, sorry if its coming across that
way.


RFC 2396 is "Uniform Resource Identifiers (URI): Generic Syntax"

Section 5.1. is "Establishing a Base URI" describes what I've been
trying to say, probably a little clearer.



XML Base spec @ http://www.w3.org/TR/xmlbase/#rfc2396 says:

Determine a baseURI:
 1. The base URI is embedded in the document's content.
 2. The base URI is that of the encapsulating entity (message,
document, or none).
 3. The base URI is the URI used to retrieve the entity.
 4. The base URI is defined by the context of the application.




> This is not just how it is implemented in PHP as the other major DOM
parsers implement it the same way

... and that's why the xml:base GRDDL tests were written - to clarify
correct behaviour / check implementations.


Previous Comments:
------------------------------------------------------------------------

[2008-03-12 17:16:05] [EMAIL PROTECTED]

still bogus as what you are describing pertains to GRDDL only not DOM,

so when working with GRDDL and DOm you need to check base uri of the 
document element, not the DOMDocument.
DOM determines base uri using the xml base spec.

"The base URI of a document entity or an external entity is determined

by RFC 2396 rules, namely, that the base URI is the URI used to
retrieve 
the document entity or external entity."

This is not just how it is implemented in PHP as the other major DOM 
parsers implement it the same way,

------------------------------------------------------------------------

[2008-03-11 00:03:46] daniel dot oconnor at gmail dot com

See http://www.w3.org/TR/grddl/#base_misc &
http://www.apps.ietf.org/rfc/rfc3986.html#sec-5.1

The way to determine baseURI is:
 1. Look for it on the root document element (HTML - <base>, XML - <foo
xml:base="">
 2. Couldn't find that? Use the URL we retrieved the document with
     * And make sure we follow redirects!
 3. Couldn't find that? Application specific (but we don't really have
a setBaseURI())

So, condition #1 is broken in 5.2.5 when you do:

<?php
$doc =
DOMDocument::load('http://www.w3.org/2001/sw/grddl-wg/td/inline-rdf6.xml');
var_dump($doc->baseURI);    //Expected http://wwww.example.org/

produces:
string(53) "http://www.w3.org/2001/sw/grddl-wg/td/inline-rdf6.xml";

------------------------------------------------------------------------

[2008-03-10 14:09:30] [EMAIL PROTECTED]

Thank you for taking the time to write to us, but this is not
a bug. Please double-check the documentation available at
http://www.php.net/manual/ and the instructions on how to report
a bug at http://bugs.php.net/how-to-report.php

Don't know about GRDDL, but for DOM trees, base uri of a DOMDocument is

the URI its loaded from (or for memory based tree, the current dir).
You need to check on the document element to get the base uri you are 
looking for.

------------------------------------------------------------------------

[2008-03-08 22:20:31] [EMAIL PROTECTED]

Rob, please take a look

------------------------------------------------------------------------

[2008-03-08 05:09:06] daniel dot oconnor at gmail dot com

Description:
------------
The W3C clarified a few xml:base issues when publishing the GRDDL
spec.

You can see the tests at
http://www.w3.org/TR/grddl-tests/#ambiguous-infoset.

Basically:
 * DOMDocument::loadXML does not detect xml:base attributes
 * simplexml_load_file does not detect xml:base attributes (or they are
lost during the importNode phase)
 * simplexml_load_string does not detect xml:base attributes (or they
are lost during the importNode phase)
 * DOMDocument does not deal with nested xml:base
 * DOMDocument does not deal with redirected xml:base locations

To clarify on the redirect-xml:base stuff...

If I request http://foo.com/example.xml
and that redirects me to http://bar.com/example.xml
and bar.com/example.xml said xml:base = http://foo.com/example.xml

... then http://bar.com/example.xml's baseURI should be
http://bar.com/example.xml

Reproduce code:
---------------
<?php
$url = 'http://www.w3.org/2001/sw/grddl-wg/td/base/xmlWithBase.xml';
$xml = file_get_contents($url);

//Load a url
$doc = DOMDocument::load($url);
var_dump($doc->baseURI);    //Expected
http://www.w3.org/2001/sw/grddl-wg/td/base/xmlWithBase.xml

//Load an xml document with xml:base
$doc = DOMDocument::loadXML($xml);
var_dump($doc->baseURI);    //Expected
http://www.w3.org/2001/sw/grddl-wg/td/base/xmlWithBase.xml



//Does it work with importNode?
$sxe = simplexml_load_file($url);
$dom_sxe = dom_import_simplexml($sxe);

$dom = new DOMDocument('1.0');
$dom_sxe = $dom->importNode($dom_sxe, true);
$dom_sxe = $dom->appendChild($dom_sxe);
var_dump($doc->baseURI);    //Expected (maybe)
http://www.w3.org/2001/sw/grddl-wg/td/base/xmlWithBase.xml

// Alternative?
$sxe = simplexml_load_string($xml);
$dom_sxe = dom_import_simplexml($sxe);

$dom = new DOMDocument('1.0');
$dom_sxe = $dom->importNode($dom_sxe, true);
$dom_sxe = $dom->appendChild($dom_sxe);
var_dump($doc->baseURI);   //Expected (maybe)
http://www.w3.org/2001/sw/grddl-wg/td/base/xmlWithBase.xml



//What about documents with an invalid xml:base (not on the top level
element)?
$doc =
DOMDocument::load('http://www.w3.org/2001/sw/grddl-wg/td/inline-rdf6.xml');
var_dump($doc->baseURI);    //Expected http://wwww.example.org/

//What about documents with a *redirected xml:base* ?
//Note: this test case is a little broken because of a W3C server
change - it *should* redirect to
'http://www.w3.org/2001/sw/grddl-wg/td/base/xmlWithBase.xml'
//      and thus have a funky new xml:base value
$doc =
DOMDocument::load('http://www.w3.org/2001/sw/grddl-wg/td/xmlWithBase.xml');
var_dump($doc->baseURI);    //Expected
http://www.w3.org/2001/sw/grddl-wg/td/base/xmlWithBase.xml

Expected result:
----------------
See reproduce code

Actual result:
--------------
See reproduce code


------------------------------------------------------------------------


-- 
Edit this bug report at http://bugs.php.net/?id=44367&edit=1

Reply via email to