On Thu, Jan 12, 2006 at 01:29:39AM +0000, Gary Coady wrote:
> [EMAIL PROTECTED] wrote:
> > Hi All,
> >
> > I personally believe that it should be based on the DTD being used for the
> > HTML.
> >
> > I use XHTML Strict (-//W3C//DTD XHTML 1.0 Strict//EN) and I would expect any
> > conversion from XML to XHTML to make an XML document that is valid against
> > the DTD. If libxml2 does what you want then this will *not* be the case and
> > hence all my XHTML would be invalid.
> >
> > In the future people will have problems with their XHTML if they do not
> > consider using the strict version. The semantic web and machine to machine
> > communications will need to depend on the documents being as compliant as
> > possible to the standard.
> >
> > "-//W3C//DTD XHTML 1.0 Transitional//EN" is supposed to be for
> > "transitional"
> > use while one is going from HTML 4.0 to XHTML. I believe that
> > XHTML1.0-Strict is the expected standard until it is replaced by the W3C XML
> > Schema version. For this reason I believe Libxml2 should automatically
> > provide XHTML1.0-Strict. If not then should libxml2 be creating HTML 1.0?
> > ;--)
>
> I agree with most of the above, but an alteration would not affect the
> behaviour you're worried about; XHTML should be parsed with the XML
> parser, not the legacy HTML parser - and this issue involves the
> behaviour of the latter.
Right.
> A few months ago, I came across a "bug" where whitespace nodes as a
> direct child of the <body> tag would be removed. The problem is similar
> in that pure whitespace nodes are forbidden by the strict DTD, but
> allowed by the transitional DTD.
>
> In this case, the applied patch checked the DTD in use with code like
>
> dtd = xmlGetIntSubset(ctxt->myDoc);
> if (dtd != NULL && dtd->ExternalID != NULL) {
> if (!xmlStrcasecmp(dtd->ExternalID,
> BAD_CAST "-//W3C//DTD HTML 4.01//EN") ||
> !xmlStrcasecmp(dtd->ExternalID,
> BAD_CAST "-//W3C//DTD HTML 4//EN"))
> {
> (line 2060, HTMLparser.c).
>
> This code assumes that HTML 4 and HTML 4.01 are the only strict non-XML
> DTDs in existence.
>
> Something similar might be useful for this issue - the <p> tags are not
> needed for a Transitional DTD. I'll have a look to see if there's an
> easy fix at the weekend, if nobody's supplied a patch before that :-)
Yes, thanks ! That sounds the right approach to me, I would just turn
merge that with a new htmlParserOption HTML_PARSE_STRICT, which could be
either passed by the user to maintain the current behaviour or activated by
default when the DOCTYPE is read if it happen to be a Strict HTML one.
make sense ?
Daniel
--
Daniel Veillard | Red Hat http://redhat.com/
[EMAIL PROTECTED] | libxml GNOME XML XSLT toolkit http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/
_______________________________________________
xml mailing list, project page http://xmlsoft.org/
[email protected]
http://mail.gnome.org/mailman/listinfo/xml