> I still feel that since the way white space should be handled is very
> app-specific the best way of handling this is not to try and second guess
> the app programmer by examining the DTD or applying other rules.  As Tim
> Bray said "a lot of smart people have wasted a lot of time trying to write
> down simple rules".  The implication is that it can't be done.  (I certainly
> didn't like the rule you proposed.  Not all #PCDATA is the same.  I have
> applications where I use #PCDATA in a very restricted way and would want
> "fierce normalization" (including the pruning of blank text nodes) to be
> applied.)

I'm not talking about the normalization of #PCDATA here, I'm talking
about the ignoring whitespaces when there is no #PCDATA specified in the
DTD. By definition if the DTD says no #PCDATA (element content only)
than any characters inside the element are either a non-valid document,
or something else - ignoreable whitesapce.

SAX correctly handles that by reporting ignoreable whitespace occuring
inside element content with ignorableWhitespace() and spaces occuring
inside #PCDATA with characters(). I could not find any indication that
white space appearing inside element content should suddenly become
meaningful in the DOM world and create Text nodes, when the DTD
expressly says "element content only".

So currently ProjectX works as I expect it to, Xerces does not and
selecting a different parser breaks Tomcat. IMHO either one of the
parsers is broken, or the definition of what element content means in
the DTD is not clear, or we lack a definition of what an information
model is in the XML world.

At any rate, I will be convienced in a flash that I'm wrong if someone
can just show me an example to the contrary.


> The best we can do is to help the app programmer handle his app-specific
> rules by providing facilities that handle certain common cases for him.
> Lots of people are going to want to strip white-space text nodes.  Xerces or
> DOM should provide the mechanism (via methods or properties) but leave it up
> to the app programmer to decide the policy of whether these mechanisms
> should be applied or not.

A whitespace handling filter between the SAX parser and DOM parser?

arkin


> 
> -- jP --
> 
> This message is for the named person's use only.  It may contain
> confidential, proprietary or legally privileged information.  No
> confidentiality or privilege is waived or lost by any mistransmission.
> If you receive this message in error, please immediately delete it and all
> copies of it from your system, destroy any hard copies of it and notify the
> sender.  You must not, directly or indirectly, use, disclose, distribute,
> print, or copy any part of this message if you are not the intended
> recipient. CREDIT SUISSE GROUP, CREDIT SUISSE FIRST BOSTON, and each of
> their subsidiaries each reserve  the right to monitor all e-mail
> communications through its networks.  Any views expressed in this message
> are those of the individual sender, except where the message states
> otherwise and the sender is authorised to state them to be the views of
> any such entity.

Reply via email to