On Thu, Feb 04, 2010 at 09:31:11AM -0800, John Clements wrote:
>
> On Feb 4, 2010, at 7:09 AM, Daniel Veillard wrote:
>
> > On Thu, Feb 04, 2010 at 08:53:42AM -0500, Piotr Sipika wrote:
> >> John,
> >> Try parsing the document using:
> >> xmlReadFile(URI, encoding, options)
> >> with options set to XML_PARSE_NOBLANKS (in addition to anything else
> >> you want to use)
> >
> > Honnestly, I think it's a bad advice in general. The blank nodes
> > used for "formatting" are an integral part of the XML document content
> > and users should rather learn XML and do the right thing than tweak the
> > parser to become non conformant.
>
> Ah! Got your attention. What is the "right thing" to do? Specifically:
> the DTD contains information about where whitespace is significant;
> how is this information represented in the parsed tree? Duplicating the
> knowledge about where whitespace is significant seems fragile.
yes it's fragile because it depends on the DTD validation step being
done, and 1/ it's optional (and libxml2 doesn't do it by default) 2/
it may depend on external files not available.
Plus even if the DTD states that an element content is mixed allowing
non blank nodes, you still don't know if a given blank character item
in a text node at that level is there for indentation or really for
content
<foo>
some text
<bar/>
more text
</foo>
it's only if the content model is not mixed than you know for sure that
blank nodes should be ignored ... but ... assuming foo content model is
provided as (bar*)
<foo>
<bar/>
oops
<bar/>
</foo>
this will pass parsing, but not validation, and sometimes you don't want
or can't validate, and "oops" maybe useful informations.
So in general, the logic of handling text nodes need to be put at the
application level, and it's highly contextual. It's hard to extract
the DTD informations about the content model (well it's not trivial)
and sometimnes it may not be available either. I would not delegate
that logic purely to the DTD, but this is just my opinion ;-)
Daniel
--
Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/
[email protected] | Rpmfind RPM search engine http://rpmfind.net/
http://veillard.com/ | virtualization library http://libvirt.org/
_______________________________________________
xml mailing list, project page http://xmlsoft.org/
[email protected]
http://mail.gnome.org/mailman/listinfo/xml