On Tue, Apr 23, 2002 at 12:22:31PM -0400, Tito Burgos wrote:
> I would like to know if there is a way to remove all "ignorable whitespace"
> from an XML document when you parse it to create a document object?
> 
> I thought I was close by using
> DocumentBuilderFactory.setIgnoringElementContentWhiteSpace(true) however,
> this requires you to also set DocumentBuilderFactory.setValidating(true).
> When validating is turned on it expects to validate against a DTD, I'm not
> using DTD's I just want to eliminate CR's and other unnecessary whitespace.
> 
> For example,
> turn this original xml file:
> <root>
>       <elem1>somevalue</elem1>
>       <elem2>some other value</elem2>
> </root>
> 
> to this document object:
> <root><elem1>somevalue</elem1><elem2>some other value</elem2></root>

I think the problem here is that the parser does not know what
whitespace is ignorable unless it is validating. The whitespace that you
seek to suppress here would be significant if <root> was of
mixed-content. In order for the parser to realise that this was ignorable
it would have to make the assumption that there is no mixed content in
your document. Though this assumption would be true in a lot of cases,
some people use mixed content, so the parser cannot assume otherwise. 

I wonder how easy it would be to write a filter that assumed that there
was no mixed content and filtered it based on this assumption.

David
-- 
David Sheldon, Client Services        DecisionSoft Ltd.
Telephone: +44-1865-203192            http://www.decisionsoft.com

Reply via email to