Bug#993638: [xml/sgml-pkgs] Bug#993638: libxml2: XHTML 1.0 validation is broken

2021-09-23 Thread Vincent Lefevre
On 2021-09-20 12:11:17 +0200, Mattia Rizzolo wrote:
> On Mon, Sep 20, 2021 at 11:41:38AM +0200, Vincent Lefevre wrote:
> > BTW, the error message should be more detailed, e.g. saying which
> > entity and which URI. This would have made debugging so much easier.
> > But that's a separate issue; I'll report a bug upstream if this has
> > not already been done.
> 
> It hasn't been done, so you should raise a bug with them if you think
> they should.

I've now reported the bug about the error message:

  https://gitlab.gnome.org/GNOME/libxml2/-/issues/308

Of course, dropping the error as I suggested in

  https://gitlab.gnome.org/GNOME/libxml2/-/issues/307

would also solve the issue.

-- 
Vincent Lefèvre  - Web: 
100% accessible validated (X)HTML - Blog: 
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)



Bug#993638: [xml/sgml-pkgs] Bug#993638: libxml2: XHTML 1.0 validation is broken

2021-09-20 Thread Vincent Lefevre
On 2021-09-20 17:50:56 +0200, Thorsten Glaser wrote:
> > > But if this upstream change affects DTDs that were once released, maybe
> > > it should accept, but ignore, this specific wrong redeclaration.
> > 
> > Perhaps. This should probably be first talked with upstream.
> 
> So indeed. Can one of you bring this to them? (My contributions to
> libxml2 don’t appear to be liked, even if multiple CVEs could have
> been avoided by applying them.)

Done here: https://gitlab.gnome.org/GNOME/libxml2/-/issues/307

I've also reported

  https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=994795

against w3-dtd-mathml, which has a similar issue (also invalid
redeclarations of the amp and lt entities, though these
redeclarations are different from the w3c-dtd-xhtml ones).

BTW, this doesn't affect only validation, but also entity resolution,
e.g. when using "xmllint --noent", which makes the issue even worse.

-- 
Vincent Lefèvre  - Web: 
100% accessible validated (X)HTML - Blog: 
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)



Bug#993638: [xml/sgml-pkgs] Bug#993638: libxml2: XHTML 1.0 validation is broken

2021-09-20 Thread Thorsten Glaser
On Mon, 20 Sep 2021, Vincent Lefevre wrote:

> For the 1.1 DTD, w3c-dtd-xhtml 1.1-5 had the *upstream* file
> xhtml-1.1/basic/xhtml-special.ent with the buggy entity definitions

Hmm, now where did t̲h̲a̲t̲ come from?

http://www.w3.org/TR/2001/REC-xhtml11-20010531/xhtml11.tgz
has the flattened DTD.

Apparently XHTML™ Basic 1.1 is a thing, though. This is not XHTML 1.1…
http://www.w3.org/TR/2010/REC-xhtml-basic-20101123/xhtml-basic.tgz does
not contain the entities at all though.

Hah, got it! XHTML™ Basic 1.0 does contain the bogus file:
http://www.w3.org/TR/2000/REC-xhtml-basic-20001219/xhtml-basic.tgz
Its list of errata is empty, so this is not listed upstream as known bug.

> > But if this upstream change affects DTDs that were once released, maybe
> > it should accept, but ignore, this specific wrong redeclaration.
> 
> Perhaps. This should probably be first talked with upstream.

So indeed. Can one of you bring this to them? (My contributions to
libxml2 don’t appear to be liked, even if multiple CVEs could have
been avoided by applying them.)

Thanks,
//mirabilos
-- 
Infrastrukturexperte • tarent solutions GmbH
Am Dickobskreuz 10, D-53121 Bonn • http://www.tarent.de/
Telephon +49 228 54881-393 • Fax: +49 228 54881-235
HRB AG Bonn 5168 • USt-ID (VAT): DE122264941
Geschäftsführer: Dr. Stefan Barth, Kai Ebenrett, Boris Esser, Alexander Steeg


/⁀\ The UTF-8 Ribbon
╲ ╱ Campaign against  Mit dem tarent-Newsletter nichts mehr verpassen:
 ╳  HTML eMail! Also, https://www.tarent.de/newsletter
╱ ╲ header encryption!




Bug#993638: [xml/sgml-pkgs] Bug#993638: libxml2: XHTML 1.0 validation is broken

2021-09-20 Thread Vincent Lefevre
On 2021-09-20 17:08:26 +0200, Thorsten Glaser wrote:
> On Mon, 20 Sep 2021, Vincent Lefevre wrote:
> 
> > Then libxml2 can find the right file on the local file system via
> > catalogs. In my case (which is the *default* setup with Debian
> 
> I never understood this catalogue thing. When I tried it, it didn’t
> work for me (that may admittedly have been multiple releases ago),
> the documentation was as good as Chinese to me, and… meh.

The catalog system was very buggy in the past. I had reported
many bugs in 2004. Things have much improved. The latest bugs
I found were in 2012.

> > Hmm... there seems to be a subtle difference in xhtml-special.ent:
> 
> Interesting.
> 
> I’m working with an XHTML 1.1 DTD, which has the entities inline
> (not sure if that was my doing or if I got it like this) and it
> too has:
> 
> 
>  
>  
>  
>  
> 
>  

For the 1.1 DTD, w3c-dtd-xhtml 1.1-5 had the *upstream* file
xhtml-1.1/basic/xhtml-special.ent with the buggy entity definitions






In w3c-sgml-lib, the xhtml-special.ent file no longer depends on
the XHTML version, and it has correct definitions.

> But if this upstream change affects DTDs that were once released, maybe
> it should accept, but ignore, this specific wrong redeclaration.

Perhaps. This should probably be first talked with upstream.

> Though you said the bug was introduced in a Debian package only…
> where did the package get the wrong .ent files from?

See my other message: I suppose that Debian took the XHTML 1.1
version (which was buggy) to use it with both XHTML 1.0 and XHTML 1.1
DTDs. This is my only plausible explanation.

> If this is truly Debian-local, I agree nothing than the conflict is
> probably needed.

The XHTML 1.0 DTD issue seems Debian-local. But the XHTML 1.1 DTD
issue (which I have not tried) is an upstream one, according to the
w3c-dtd-xhtml_1.1.orig.tar.gz file, which is the upstream part I
got from https://snapshot.debian.org/package/w3c-dtd-xhtml/1.1-5/ .

-- 
Vincent Lefèvre  - Web: 
100% accessible validated (X)HTML - Blog: 
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)



Bug#993638: [xml/sgml-pkgs] Bug#993638: libxml2: XHTML 1.0 validation is broken

2021-09-20 Thread Thorsten Glaser
On Mon, 20 Sep 2021, Vincent Lefevre wrote:

> Then libxml2 can find the right file on the local file system via
> catalogs. In my case (which is the *default* setup with Debian

I never understood this catalogue thing. When I tried it, it didn’t
work for me (that may admittedly have been multiple releases ago),
the documentation was as good as Chinese to me, and… meh.

> Hmm... there seems to be a subtle difference in xhtml-special.ent:

Interesting.

I’m working with an XHTML 1.1 DTD, which has the entities inline
(not sure if that was my doing or if I got it like this) and it
too has:


 
 
 
 

 

But if this upstream change affects DTDs that were once released, maybe
it should accept, but ignore, this specific wrong redeclaration. Though
you said the bug was introduced in a Debian package only… where did the
package get the wrong .ent files from? If this is truly Debian-local, I
agree nothing than the conflict is probably needed.

bye,
//mirabilos
-- 
15:41⎜ Somebody write a testsuite for helloworld :-)



Bug#993638: [xml/sgml-pkgs] Bug#993638: libxml2: XHTML 1.0 validation is broken

2021-09-20 Thread Vincent Lefevre
On 2021-09-20 15:57:35 +0200, Vincent Lefevre wrote:
> So, if I understand correctly, this was a Debian-specific bug. I
> suspect that the incorrect XHTML 1.1 definitions were retrieved
> from the old w3c-dtd-xhtml source and shared for both XHTML 1.0
> and XHTML 1.1 DTDs. This would explain how the bug has been
> introduced in Debian from 2012 to 2016 (and still now until the
> w3c-dtd-xhtml package is removed from users' machines).

I forgot to add that this means that probably almost all users
will not be affected by the bug after w3c-dtd-xhtml is removed
(i.e. I don't expect buggy files copied locally). So, definitively
no need to announce anything. The Conflicts should be sufficient.

-- 
Vincent Lefèvre  - Web: 
100% accessible validated (X)HTML - Blog: 
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)



Bug#993638: [xml/sgml-pkgs] Bug#993638: libxml2: XHTML 1.0 validation is broken

2021-09-20 Thread Vincent Lefevre
On 2021-09-20 03:18:46 +0200, Vincent Lefevre wrote:
> Hmm... there seems to be a subtle difference in xhtml-special.ent:
> 
> With the file from w3c-dtd-xhtml:
> 
> 
> 
> 
> 
> 
> But with the file from w3c-sgml-lib:
> 
> 
> 
> 
> 
> 
 
 
 
 

in August 2002.

On https://snapshot.debian.org/package/w3c-dtd-xhtml/1.1-5/
I can see that the Debian package (released on 2004-08-08)
was correct (for the XHTML 1.0 xhtml-special.ent file; the
XHTML 1.1 one was incorrect).

But on https://snapshot.debian.org/package/w3c-sgml-lib/1.2-2/
(which gave the w3c-dtd-xhtml binary package in this version),
released on 2012-04-14, while the upstream part was correct,
the  w3c-sgml-lib_1.2-2.debian.tar.gz file has

  debian/legacy/basic/xhtml-special.ent

with the incorrect entity definitions. So, if I understand correctly,
this was a Debian-specific bug. I suspect that the incorrect XHTML 1.1
definitions were retrieved from the old w3c-dtd-xhtml source and
shared for both XHTML 1.0 and XHTML 1.1 DTDs. This would explain
how the bug has been introduced in Debian from 2012 to 2016 (and
still now until the w3c-dtd-xhtml package is removed from users'
machines).

-- 
Vincent Lefèvre  - Web: 
100% accessible validated (X)HTML - Blog: 
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)



Bug#993638: [xml/sgml-pkgs] Bug#993638: libxml2: XHTML 1.0 validation is broken

2021-09-20 Thread Vincent Lefevre
On 2021-09-20 12:11:17 +0200, Mattia Rizzolo wrote:
> On Mon, Sep 20, 2021 at 11:41:38AM +0200, Vincent Lefevre wrote:
> > Please also make sure that the NEWS file is up-to-date; see my other
> > message. This is also useful for the user when getting regressions
> > in general (possibly from bug fixes like here).
> 
> I'm not sure I'd like to add such item to the Debian's NEWS.

Note that for this one, I was talking about the upstream NEWS. But
this may be an upstream bug. The NEWS file hasn't been regenerated
in the git repository. I don't know about the tarball. But the
announce message *does* contain the release notes. So I'm wondering.

Well, there is already an upstream bug for this one:

  https://gitlab.gnome.org/GNOME/libxml2/-/issues/171

This was for 2.9.10, but is still a valid issue; I've added a comment.

> It would stop updates for too many users that most likely are not
> affected. For now, you are really the only one that brought up this
> issue.

Concerning Debian's NEWS, it is difficult to know, as I fear that
this hasn't been tested by most users. I could detect the issue,
because I use a machine more recent than Debian/stable and because
I have a cron job that does a check everyday.

> > I'm wondering whether this check for invalid redeclarations of
> > predefined entities should also go to Debian/stable since it fixes
> > an integer overflow at the same time:
> > 
> >   https://gitlab.gnome.org/GNOME/libxml2/-/issues/217
> > 
> > Any security issue related to that?
> 
> AFAIK not yet at least.

This is the opposite: things like integer overflows (in particular
when they occur on untrusted data like here) should be regarded as
security issues by default, but it can be found later that they
have no security implications.

-- 
Vincent Lefèvre  - Web: 
100% accessible validated (X)HTML - Blog: 
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)



Bug#993638: [xml/sgml-pkgs] Bug#993638: libxml2: XHTML 1.0 validation is broken

2021-09-20 Thread Mattia Rizzolo
On Mon, Sep 20, 2021 at 11:41:38AM +0200, Vincent Lefevre wrote:
> Please also make sure that the NEWS file is up-to-date; see my other
> message. This is also useful for the user when getting regressions
> in general (possibly from bug fixes like here).

I'm not sure I'd like to add such item to the Debian's NEWS.  It would
stop updates for too many users that most likely are not affected.  For
now, you are really the only one that brought up this issue.

> BTW, the error message should be more detailed, e.g. saying which
> entity and which URI. This would have made debugging so much easier.
> But that's a separate issue; I'll report a bug upstream if this has
> not already been done.

It hasn't been done, so you should raise a bug with them if you think
they should.

> I'm wondering whether this check for invalid redeclarations of
> predefined entities should also go to Debian/stable since it fixes
> an integer overflow at the same time:
> 
>   https://gitlab.gnome.org/GNOME/libxml2/-/issues/217
> 
> Any security issue related to that?

AFAIK not yet at least.

-- 
regards,
Mattia Rizzolo

GPG Key: 66AE 2B4A FCCF 3F52 DA18  4D18 4B04 3FCD B944 4540  .''`.
More about me:  https://mapreri.org : :'  :
Launchpad user: https://launchpad.net/~mapreri  `. `'`
Debian QA page: https://qa.debian.org/developer.php?login=mattia  `-


signature.asc
Description: PGP signature


Bug#993638: [xml/sgml-pkgs] Bug#993638: libxml2: XHTML 1.0 validation is broken

2021-09-20 Thread Vincent Lefevre
(We searched for the commmit at about the same time...)

On 2021-09-20 11:15:16 +0200, Mattia Rizzolo wrote:
> I bisected libxml2:
[...]

FYI, I found this commit just by looking at the git logs, with a
search for "predefined" (and "redeclaration" works too). This is
faster than bisecting. This is great that libxml2 has detailed
logs, not true for every software...

> > > In the latter case, I think that
> > > there should be a Breaks against w3c-dtd-xhtml.
> 
> On its way.

Thanks.

Please also make sure that the NEWS file is up-to-date; see my other
message. This is also useful for the user when getting regressions
in general (possibly from bug fixes like here).

BTW, the error message should be more detailed, e.g. saying which
entity and which URI. This would have made debugging so much easier.
But that's a separate issue; I'll report a bug upstream if this has
not already been done.

I'm wondering whether this check for invalid redeclarations of
predefined entities should also go to Debian/stable since it fixes
an integer overflow at the same time:

  https://gitlab.gnome.org/GNOME/libxml2/-/issues/217

Any security issue related to that?

-- 
Vincent Lefèvre  - Web: 
100% accessible validated (X)HTML - Blog: 
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)



Bug#993638: [xml/sgml-pkgs] Bug#993638: libxml2: XHTML 1.0 validation is broken

2021-09-20 Thread Vincent Lefevre
Concerning the change in the libxml2 code, I found this:

  
https://gitlab.gnome.org/GNOME/libxml2/-/commit/01411e7c5ea0fff181271e092f46a2138c3720ec
  "Check for invalid redeclarations of predefined entities"

with the example of the incorrect

   

which was in the old libxml2 testcases, BTW.

Thus this is intentional. But such a major change (since this breaks
official DTDs released in the past, something which should normally
*never* happen) should have at least been announced somewhere.
Otherwise one doesn't know what's going on (even a web search for the
error message led to nothing -- now, there's only my bug report...).

Now, I understand why there's nothing mentioned in the NEWS file,
which is a symlink to the changelog file: this file stops at
"v2.9.9: Jan 03 2019", while this version is 2.9.12.

The upstream release notes of libxml2 2.9.11

  https://mail.gnome.org/archives/xml/2021-May/msg0.html

contain:

  - Check for invalid redeclarations of predefined entities (Nick Wellnhofer)

Note that this change is recent, so that most users (Debian or not)
have not upgraded yet. Whether the issue would be more visible once
most users have upgraded (in particular if the old DTDs have been
archived locally with the XML data), I don't know.

-- 
Vincent Lefèvre  - Web: 
100% accessible validated (X)HTML - Blog: 
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)



Bug#993638: [xml/sgml-pkgs] Bug#993638: libxml2: XHTML 1.0 validation is broken

2021-09-20 Thread Mattia Rizzolo
On Mon, Sep 20, 2021 at 03:55:39AM +0200, Vincent Lefevre wrote:
> Control: retitle -1 libxml2: XHTML 1.0 validation is broken with 
> w3c-dtd-xhtml's xhtml-special.ent file
> 
> This should be reproducible with w3c-dtd-xhtml's xhtml-special.ent file.
> The summary of the actual issue is below.

Yes, indeed it is.

> > The errors correspond to amp and lt.
> > 
> > Now, I don't know whether the new libxml2 version is too picky,
> > or there was a real issue with the old entity files (ignored
> > by all parsers until now?).

I bisected libxml2:

01411e7c5ea0fff181271e092f46a2138c3720ec is the first bad commit
commit 01411e7c5ea0fff181271e092f46a2138c3720ec
Author: Nick Wellnhofer 
Date:   Mon Feb 8 20:58:32 2021 +0100

Check for invalid redeclarations of predefined entities

https://gitlab.gnome.org/GNOME/libxml2/-/commit/01411e7c5ea0fff181271e092f46a2138c3720ec

So it's clearly intentional of libxml2 to be more picky now, and flag
this issue in the old dtd.

> > In the latter case, I think that
> > there should be a Breaks against w3c-dtd-xhtml.

On its way.



Thanks for your help in debugging this issue.

-- 
regards,
Mattia Rizzolo

GPG Key: 66AE 2B4A FCCF 3F52 DA18  4D18 4B04 3FCD B944 4540  .''`.
More about me:  https://mapreri.org : :'  :
Launchpad user: https://launchpad.net/~mapreri  `. `'`
Debian QA page: https://qa.debian.org/developer.php?login=mattia  `-


signature.asc
Description: PGP signature


Bug#993638: [xml/sgml-pkgs] Bug#993638: libxml2: XHTML 1.0 validation is broken

2021-09-19 Thread Vincent Lefevre
Control: retitle -1 libxml2: XHTML 1.0 validation is broken with 
w3c-dtd-xhtml's xhtml-special.ent file
Control: tags -1 - unreproducible

This should be reproducible with w3c-dtd-xhtml's xhtml-special.ent file.
The summary of the actual issue is below.

On 2021-09-20 03:18:46 +0200, Vincent Lefevre wrote:
[...]
> So the issue seems to occur when reading xhtml-special.ent.
> 
> Hmm... there seems to be a subtle difference in xhtml-special.ent:
> 
> With the file from w3c-dtd-xhtml:
> 
> 
> 
> 
> 
> 
> But with the file from w3c-sgml-lib:
> 
> 
> 
> 
> 
> 
> 
> The errors correspond to amp and lt.
> 
> Now, I don't know whether the new libxml2 version is too picky,
> or there was a real issue with the old entity files (ignored
> by all parsers until now?). In the latter case, I think that
> there should be a Breaks against w3c-dtd-xhtml.
> 
> One more thing: I've just checked on my Debian/stable machine,
> which just has w3c-sgml-lib installed:
> "xmllint --loaddtd --nonet --noout" works without any error.
> Thus there should be no issue by switching w3c-dtd-xhtml to
> w3c-sgml-lib.

FYI, the change of xhtml-special.ent upstream seems to be in

  
https://github.com/w3c/markup-validator/commit/fa78ea2526fe20a89c90c4734f704fb0126186fd

(the diff output by git seems incorrect: one needs to browse the
files from the parent d1431fc to see the old version).

-- 
Vincent Lefèvre  - Web: 
100% accessible validated (X)HTML - Blog: 
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)



Bug#993638: [xml/sgml-pkgs] Bug#993638: libxml2: XHTML 1.0 validation is broken

2021-09-19 Thread Vincent Lefevre
On 2021-09-19 22:59:31 +0200, Mattia Rizzolo wrote:
> On Sun, Sep 19, 2021 at 09:45:19PM +0200, Vincent Lefevre wrote:
> > On 2021-09-19 19:15:54 +0200, Mattia Rizzolo wrote:
> > > I can never manage to download DTDs from w3.org (how could you?!), so,
> > > taking your testcase and a copy of the same DTD:
> > 
> > The DTD is provided by Debian, no need to download it.
> 
> But you need to instruct xmllint to use said DTD, it won't by its own
> decision to pick a random DTD from the filesystem.

No, this is not necessary with a correctly configured system.
This is not a random DTD, but the DTD mentioned in the HTML file,
which has the standard public identifier

  "-//W3C//DTD XHTML 1.0 Strict//EN"

Then libxml2 can find the right file on the local file system via
catalogs. In my case (which is the *default* setup with Debian
packages on my system, i.e. I haven't changed anything about that
in /etc):

/etc/xml/catalog contains



so that libxml2 then uses /etc/xml/w3c-dtd-xhtml.xml, which contains



so that libxml2 then uses
/usr/share/xml/xhtml/schema/dtd/1.0/catalog.xml, which contains



so that libxml2 gets the file

  /usr/share/xml/xhtml/schema/dtd/1.0/xhtml1-strict.dtd

There is the same mechanism for the .ent files referenced
by xhtml1-strict.dtd, i.e. via public identifiers.

>  I also know how to
> use apt-file myself:
> | % apt-file search xhtml1-strict.dtd
> | dita-ot: /usr/share/dita-ot/demo/h2d/dtd/xhtml1-strict.dtd
> | erlang-erl-docgen: 
> /usr/lib/erlang/lib/erl_docgen-1.1.1/priv/dtd/xhtml1-strict.dtd
> | kate5-data: /usr/share/katexmltools/xhtml1-strict.dtd.xml
> | libpxp-ocaml-dev: 
> /usr/share/doc/libpxp-ocaml-dev/examples/namespaces/xhtml1-strict.dtd.gz
> | librdf-rdfa-parser-perl: 
> /usr/share/perl5/auto/share/dist/RDF-RDFa-Parser/catalogue/www.w3.org/MarkUp/DTD/xhtml1-strict.dtd
> | w3-recs: 
> /usr/share/doc/w3-recs/html/www.w3.org/TR/2002/REC-xhtml1-20020801/DTD/xhtml1-strict.dtd.gz
> | w3c-sgml-lib: 
> /usr/share/xml/w3c-sgml-lib/schema/dtd/REC-xhtml1-20020801/xhtml1-strict.dtd
> | xemacs21-basesupport: 
> /usr/share/xemacs21/xemacs-packages/etc/psgml-dtds/xhtml1-strict.dtd
> | xmlcopyeditor: /usr/share/xmlcopyeditor/dtd/xhtml1-strict.dtd
> | %
> 
> indeed the one I used is the one from xmlcopyeditor (I picked a random
> package, trusting that said .dtd is actually the same as all of the
> above).

The one I'm using is from w3c-dtd-xhtml, apparently no longer
available in Debian (my machine is a Debian/unstable one installed
about 5 years ago, and Debian won't replace the package by
w3c-sgml-lib automatically). In any case, the concerned files
from w3c-sgml-lib seem to be the same with minor differences.

> My system is fine.  That error message is only a red herring due to
> --nonet,

Everything is on the local filesystem. There is no reason to do
any network access! If libxml2 tries to do a network access, this
means that something on your system is broken... perhaps catalogs
that are not set up correctly.

> and indeed the return code of xmllint is 0.

Don't look at the return code of xmllint; it is not reliable.
Even in case of bad usage, it will sometimes return 0:

  https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=727075

Validation issues are reported on stderr, e.g. with a working libxml2:

$ xmllint --loaddtd --nonet --noout test.html
test.html:6: parser error : EndTag: ' If you prefer, I can modify the DOCTYPE and do this instead, so there
> won't be "I/O error"s and the return code is clear:
> 
> mattia@warren /tmp/tmp/xml % cat test.html
> 
>  "file:///tmp/tmp/xml/xhtml1-strict.dtd">
> http://www.w3.org/1999/xhtml;>
> title
> text
> 
> mattia@warren /tmp/tmp/xml % xmllint --noout --nonet test.html ; echo $?
> 0

Wrong test. You forgot to load the DTD!

Please try:

  xmllint --loaddtd --noout --nonet test.html

Note: you may also need to copy the 3 .ent files referenced by
the DTD in the same directory:


%HTMLlat1;


%HTMLsymbol;


%HTMLspecial;

I have tried that:

$ ls -l /tmp/tmp/xml
total 68
-rw-r--r-- 1 vinc17 vinc17 13484 2012-04-24 22:49:16 xhtml-lat1.ent
-rw-r--r-- 1 vinc17 vinc17  4486 2012-04-24 22:49:16 xhtml-special.ent
-rw-r--r-- 1 vinc17 vinc17 13748 2012-04-24 22:49:16 xhtml-symbol.ent
-rw-r--r-- 1 vinc17 vinc17 25473 2012-04-24 22:49:15 xhtml1-strict.dtd

With libxml2 2.9.10+dfsg-6.7, strace shows that every file is loaded
from this directory, and I get no output, as expected.

But with libxml2 2.9.12+dfsg-4, I get:

$ xmllint --loaddtd --noout --nonet test.html
error : xmlAddEntity: invalid redeclaration of predefined entity
error : xmlAddEntity: invalid redeclaration of predefined entity

and strace still shows that every file is loaded from this directory.

Something interesting:

openat(AT_FDCWD, "/tmp/tmp/xml/xhtml-lat1.ent", O_RDONLY) = 5
lseek(5, 0, SEEK_CUR)   = 0
read(5, "




But with the file from w3c-sgml-lib:







The errors correspond to amp and lt.

Now, I don't know whether the new libxml2 version is too picky,
or 

Bug#993638: [xml/sgml-pkgs] Bug#993638: libxml2: XHTML 1.0 validation is broken

2021-09-19 Thread Vincent Lefevre
On 2021-09-19 22:33:09 +0200, Thorsten Glaser wrote:
> It probably contains the ones for 1.0, but I found w3c-sgml-lib to
> not be sufficient in many ways and now use local files only…

which has always been the case, AFAIK. And the XHTML 1.0 related files
seem to be identical to the w3c-dtd-xhtml ones, except for comments
and spacing. For instance, there's the following change in the comment
of xhtml-lat1.ent:

  Typical invocation:
 
http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent; >
+  "xhtml-lat1.ent" >
%xhtml-lat1;

but /usr/share/xml/xhtml/schema/dtd/1.0/xhtml1-strict.dtd from
w3c-dtd-xhtml is using:


%HTMLlat1;


%HTMLsymbol;


%HTMLspecial;

(which has never had any issue). So, this was probably an old
documentation bug (but it doesn't matter when one uses only
public identifiers and catalogs).

> which means validating involves copying the file, changing the http
> link in the DOCTYPE with a local file:// link, then validating…
> working but suboptimal.

Everything should be available with the public identifiers via
catalogs. Perhaps w3c-sgml-lib doesn't set the catalogs correctly.
For instance, with w3c-dtd-xhtml, /etc/xml/w3c-dtd-xhtml.xml
contains:


















and /usr/share/xml/entities/xhtml/catalog.xml contains:

[...]






[...]

so that libxml2 gets the right files only by using public identifiers.

-- 
Vincent Lefèvre  - Web: 
100% accessible validated (X)HTML - Blog: 
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)



Bug#993638: [xml/sgml-pkgs] Bug#993638: libxml2: XHTML 1.0 validation is broken

2021-09-19 Thread Mattia Rizzolo
On Sun, Sep 19, 2021 at 09:45:19PM +0200, Vincent Lefevre wrote:
> On 2021-09-19 19:15:54 +0200, Mattia Rizzolo wrote:
> > I can never manage to download DTDs from w3.org (how could you?!), so,
> > taking your testcase and a copy of the same DTD:
> 
> The DTD is provided by Debian, no need to download it.

But you need to instruct xmllint to use said DTD, it won't by its own
decision to pick a random DTD from the filesystem.  I also know how to
use apt-file myself:
| % apt-file search xhtml1-strict.dtd
| dita-ot: /usr/share/dita-ot/demo/h2d/dtd/xhtml1-strict.dtd
| erlang-erl-docgen: 
/usr/lib/erlang/lib/erl_docgen-1.1.1/priv/dtd/xhtml1-strict.dtd
| kate5-data: /usr/share/katexmltools/xhtml1-strict.dtd.xml
| libpxp-ocaml-dev: 
/usr/share/doc/libpxp-ocaml-dev/examples/namespaces/xhtml1-strict.dtd.gz
| librdf-rdfa-parser-perl: 
/usr/share/perl5/auto/share/dist/RDF-RDFa-Parser/catalogue/www.w3.org/MarkUp/DTD/xhtml1-strict.dtd
| w3-recs: 
/usr/share/doc/w3-recs/html/www.w3.org/TR/2002/REC-xhtml1-20020801/DTD/xhtml1-strict.dtd.gz
| w3c-sgml-lib: 
/usr/share/xml/w3c-sgml-lib/schema/dtd/REC-xhtml1-20020801/xhtml1-strict.dtd
| xemacs21-basesupport: 
/usr/share/xemacs21/xemacs-packages/etc/psgml-dtds/xhtml1-strict.dtd
| xmlcopyeditor: /usr/share/xmlcopyeditor/dtd/xhtml1-strict.dtd
| %

indeed the one I used is the one from xmlcopyeditor (I picked a random
package, trusting that said .dtd is actually the same as all of the
above).

> > mattia@warren /tmp/tmp/xml % xmllint --dtdvalid xhtml1-strict.dtd --nonet 
> > --noout test.html
> > I/O error : Attempt to load network entity 
> > http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd
> > test.html:2: warning: failed to load external entity 
> > "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd;
> > C//DTD XHTML 1.0 Strict//EN" 
> > "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd;
> > 
> >^
> > mattia@warren /tmp/tmp/xml %
> > 
> > which looks good to me.
> 
> An I/O error is not good. Your system appears to be broken.

My system is fine.  That error message is only a red herring due to
--nonet, and indeed the return code of xmllint is 0.

If you prefer, I can modify the DOCTYPE and do this instead, so there
won't be "I/O error"s and the return code is clear:

mattia@warren /tmp/tmp/xml % cat test.html


http://www.w3.org/1999/xhtml;>
title
text

mattia@warren /tmp/tmp/xml % xmllint --noout --nonet test.html ; echo $?
0
mattia@warren /tmp/tmp/xml % dpkg -l libxml2|tail -n1
ii  libxml2:amd64  2.9.12+dfsg-4 amd64GNOME XML library
mattia@warren /tmp/tmp/xml %

-- 
regards,
Mattia Rizzolo

GPG Key: 66AE 2B4A FCCF 3F52 DA18  4D18 4B04 3FCD B944 4540  .''`.
More about me:  https://mapreri.org : :'  :
Launchpad user: https://launchpad.net/~mapreri  `. `'`
Debian QA page: https://qa.debian.org/developer.php?login=mattia  `-


signature.asc
Description: PGP signature


Bug#993638: [xml/sgml-pkgs] Bug#993638: libxml2: XHTML 1.0 validation is broken

2021-09-19 Thread Thorsten Glaser
On Sun, 19 Sep 2021, Vincent Lefevre wrote:

> I can see that xhtml1-strict.dtd is provided by the w3c-dtd-xhtml
> package.

Not quite.

https://packages.qa.debian.org/w/w3c-dtd-xhtml/news/20160107T183823Z.html

--- Reason ---
RoQA; superseded by w3c-sgml-lib
--

That’s not entirely true, though:

 * [22]#826217 [n|  |  ] [[23]w3c-sgml-lib] [24]w3c-sgml-lib: XHTML 1.1
   files missing
   Reported by: [25]Thorsten Glaser ; Date: Fri, 3 Jun
   2016 11:21:02 UTC; Severity: normal; Filed 5 years and 109 days ago;
   Modified 5 years and 109 days ago;

It probably contains the ones for 1.0, but I found w3c-sgml-lib to
not be sufficient in many ways and now use local files only… which
means validating involves copying the file, changing the http link
in the DOCTYPE with a local file:// link, then validating… working
but suboptimal.

bye,
//mirabilos
-- 
«MyISAM tables -will- get corrupted eventually. This is a fact of life. »
“mysql is about as much database as ms access” – “MSSQL at least descends
from a database” “it's a rebranded SyBase” “MySQL however was born from a
flatfile and went downhill from there” – “at least jetDB doesn’t claim to
be a database”  (#nosec)‣‣‣ Please let MySQL and MariaDB finally die!



Bug#993638: [xml/sgml-pkgs] Bug#993638: libxml2: XHTML 1.0 validation is broken

2021-09-19 Thread Vincent Lefevre
On 2021-09-19 21:45:19 +0200, Vincent Lefevre wrote:
> On 2021-09-19 19:15:54 +0200, Mattia Rizzolo wrote:
> > I can never manage to download DTDs from w3.org (how could you?!), so,
> > taking your testcase and a copy of the same DTD:
> 
> The DTD is provided by Debian, no need to download it.

I can see that xhtml1-strict.dtd is provided by the w3c-dtd-xhtml
package.

You have this package installed, right?

On my machine, it is w3c-dtd-xhtml 1.2-4.

-- 
Vincent Lefèvre  - Web: 
100% accessible validated (X)HTML - Blog: 
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)



Bug#993638: [xml/sgml-pkgs] Bug#993638: libxml2: XHTML 1.0 validation is broken

2021-09-19 Thread Vincent Lefevre
On 2021-09-19 19:15:54 +0200, Mattia Rizzolo wrote:
> I can never manage to download DTDs from w3.org (how could you?!), so,
> taking your testcase and a copy of the same DTD:

The DTD is provided by Debian, no need to download it.

> mattia@warren /tmp/tmp/xml % l
> total 68
> -rw-r--r-- 1 mattia mattia   260 Sep 19 19:02 test.html
> -rw-r--r-- 1 mattia mattia 26450 Sep  6  2014 xhtml1-strict.dtd
> -rw-r--r-- 1 mattia mattia 12055 Sep  6  2014 xhtml-lat1.ent
> -rw-r--r-- 1 mattia mattia  4293 Sep  6  2014 xhtml-special.ent
> -rw-r--r-- 1 mattia mattia 14167 Sep  6  2014 xhtml-symbol.ent
> mattia@warren /tmp/tmp/xml % cat test.html
> 
>  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd;>
> http://www.w3.org/1999/xhtml;>
> title
> text
> 
> mattia@warren /tmp/tmp/xml % xmllint --dtdvalid xhtml1-strict.dtd --nonet 
> --noout test.html
> I/O error : Attempt to load network entity 
> http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd
> test.html:2: warning: failed to load external entity 
> "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd;
> C//DTD XHTML 1.0 Strict//EN" 
> "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd;
>   
>  ^
> mattia@warren /tmp/tmp/xml %
> 
> which looks good to me.

An I/O error is not good. Your system appears to be broken.

-- 
Vincent Lefèvre  - Web: 
100% accessible validated (X)HTML - Blog: 
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)



Bug#993638: [xml/sgml-pkgs] Bug#993638: libxml2: XHTML 1.0 validation is broken

2021-09-19 Thread Mattia Rizzolo
Control: tag -1 unreproducible

On Sat, Sep 04, 2021 at 03:40:17AM +0200, Vincent Lefevre wrote:
> After the upgrade to 2.9.12+dfsg-3, XHTML 1.0 validation is broken.
> There was no such issue with 2.9.10+dfsg-6.7.

Actually, I can't reproduce it.
And, honestly, I think that if really didn't work I would have heard
quite a lot of noise by now.

> $ xmllint --noout --loaddtd --valid test.html
> error : xmlAddEntity: invalid redeclaration of predefined entity
> error : xmlAddEntity: invalid redeclaration of predefined entity

I can never manage to download DTDs from w3.org (how could you?!), so,
taking your testcase and a copy of the same DTD:

mattia@warren /tmp/tmp/xml % l
total 68
-rw-r--r-- 1 mattia mattia   260 Sep 19 19:02 test.html
-rw-r--r-- 1 mattia mattia 26450 Sep  6  2014 xhtml1-strict.dtd
-rw-r--r-- 1 mattia mattia 12055 Sep  6  2014 xhtml-lat1.ent
-rw-r--r-- 1 mattia mattia  4293 Sep  6  2014 xhtml-special.ent
-rw-r--r-- 1 mattia mattia 14167 Sep  6  2014 xhtml-symbol.ent
mattia@warren /tmp/tmp/xml % cat test.html

http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd;>
http://www.w3.org/1999/xhtml;>
title
text

mattia@warren /tmp/tmp/xml % xmllint --dtdvalid xhtml1-strict.dtd --nonet 
--noout test.html
I/O error : Attempt to load network entity 
http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd
test.html:2: warning: failed to load external entity 
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd;
C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd;
   ^
mattia@warren /tmp/tmp/xml %

which looks good to me.


This is with the current 2.9.12+dfsg-4.

-- 
regards,
Mattia Rizzolo

GPG Key: 66AE 2B4A FCCF 3F52 DA18  4D18 4B04 3FCD B944 4540  .''`.
More about me:  https://mapreri.org : :'  :
Launchpad user: https://launchpad.net/~mapreri  `. `'`
Debian QA page: https://qa.debian.org/developer.php?login=mattia  `-


signature.asc
Description: PGP signature