Simon, You are mostly correct. However, whitespace in between elements (e.g. <tag1/> <tag2/>) will be preserved if there is no DTD associated with the document.
The parser has no way of knowing what whitespace is "ignorable" if there is no DTD to describe the structure of the XML and therefore even whitespace between the end tag of one element and the start tag of another element is reported in the characters() SAX event. I agree that usually it is not the case that one wants to preserve this whitespace, and if a DTD is associated, the ignorableWhitespace() event is called instead of characters() for inter-tag whitespace. However, it's incorrect to say all whitespace other than intra-tag whitespace is ignored by the parser. Everything else you mentioned about the spacing of attributes (and even more to the point, the ordering of attributes) is not guaranteed by the XML spec. Cheers! Brion -----Original Message----- From: Simon Kitching [mailto:[EMAIL PROTECTED] Sent: Thursday, November 14, 2002 1:06 AM To: [EMAIL PROTECTED] Subject: Re: Transforming XML document Hi, Unfortunately, you're out of luck. XML parsing just doesn't work that way. XML parsers are required to respect the contents of <i>text nodes</i> within an xml document, but in every other place whitespace is not significant according to the spec. You can either treat the input file as a plain text file (eg use perl to modify it), or you can treat it as XML in which case the XML parser will guaruntee to preserve the *meaning* of your XML document, but not necessarily its layout. For example, <x y="a"> (two spaces) in xml means *exactly* the same thing as <x y="a"> (one space). You do get to choose the "style" in which the output is generated (indented or not, how much indenting, etc) but you cannot ask for "the same as the input", because no existing XML parser bothers to keep that information around. Regards, Simon On Thu, 2002-11-14 at 18:36, Wai-Yip Tung wrote: > I am trying to make simple transformation on a XML document. Let say just > changing one attribute value. I want to keep everything else the same, > including white spaces. > > My first task is to parse and output a document identical to the input > document. It seems the sample code sax.Writer is a good example. > Unfortunately it altered the document in several ways > > - white space in an element is changed, e.g. > <x y = "a"> becomes <x y="a"> > > - The empty element becomes two tags, e.g. > <x/> becomes <x></x> > > Anyone can give me some direction? > > Wai yip > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
