Hi Matthias, it took a while but I looked at this today, indeed that's a serious regression however your patch needed a bit of work to avoid potential segfaults. Also I found that condition in a couple of place in the code. Resulting fix is commited to git head:
https://git.gnome.org/browse/libxml2/commit/?id=3daee3f159a1f962278e6f92572b7749b2b2babb thanks for the report ! Daniel On Thu, Nov 24, 2016 at 02:20:36PM +0000, Matthias Pigulla wrote: > Hello libxml2 developers, > > TL/DR: > > ./testURI --base file:///some/where file > > Without patch: file:/some/file > With patch: file:///some/file > > Full report: > > I am using PHP to read a simple XML file and see a regression between > libxml 2.9.1 and 2.9.2, with the problem still present in 2.9.4. > > File: > <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN³ > "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> > <html xmlns=3D"http://www.w3.org/1999/xhtml"> > <body> > <div>test © </div> > </body> > </html> > > > I am running this on Debian and have installed the w3c-dtd-xhtml pacakge, > relying on system tooling to generate the catalog definitions in /etc/xml. > If it helps, I can post a tarball with all relevant files from /etc/xml > and /usr/share/xml/xhtml somewhere. > > The resulting catalog will contain file:///etc/... references to other > files, eventually pointing to file:///usr/share/xml/somewhere... and > finally contain relative references like 'uri="xhtml1-strict.dtd"'. > > > Due to a glitch in xmlBuildURI, interpreting this relative URI with a base > of "file:///usr/share/xml/xhtml/schema/dtd/1.0/catalog.xml" ends up as > "file:/usr/share/xml/xhtml/schema/dtd/1.0/xhtml1-strict.dtd" with only a > single slash following the "file:". > > Further down the road, this is interpreted as a file path and passed to > system calls, as can be seen in this strace output: > > stat("file:/usr/share/xml/xhtml/schema/dtd/1.0/xhtml1-strict.dtd³, > 0x7fff7f56fda0) = -1 ENOENT (No such file or directory) > stat("file:/usr/share/xml/xhtml/schema/dtd/1.0/xhtml1-strict.dtd³, > 0x7fff7f56fd50) = -1 ENOENT (No such file or directory) > > Immediately after, PHP fails with a PHP Notice: DOMDocument::loadXML(): > failed to load external entity > "file:/usr/share/xml/xhtml/schema/dtd/1.0/xhtml1-strict.dtd" in Entity, > line ... > > The patch solves the problem for me. > > > Relevant changes: > https://git.gnome.org/browse/libxml2/commit/?id=8eb55d782a2b9afacc793869489 > 1cc6fad7b42a5 > https://git.gnome.org/browse/libxml2/commit/?id=beb7281055dbf0ed4d041022a67 > c6c5cfd126f25 > Also see https://mail.gnome.org/archives/xml/2014-December/msg00000.html. > > > Please let me know if a test case for this is needed (would need > instructions how/where to write this). > > Best regards > Matthias > > diff --git a/uri.c b/uri.c > index 2bd5720..6e09018 100644 > --- a/uri.c > +++ b/uri.c > @@ -2024,7 +2024,7 @@ xmlBuildURI(const xmlChar *URI, const xmlChar *base) { > } > if (bas->authority != NULL) > res->authority = xmlMemStrdup(bas->authority); > - else if (bas->server != NULL) { > + else if ((bas->server != NULL) || (bas->port == -1)) { > res->server = xmlMemStrdup(bas->server); > if (bas->user != NULL) > res->user = xmlMemStrdup(bas->user); > _______________________________________________ > xml mailing list, project page http://xmlsoft.org/ > xml@gnome.org > https://mail.gnome.org/mailman/listinfo/xml -- Daniel Veillard | Red Hat Developers Tools http://developer.redhat.com/ veill...@redhat.com | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | virtualization library http://libvirt.org/ _______________________________________________ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org https://mail.gnome.org/mailman/listinfo/xml