Hi Jean-Francois. there is no bug here. You need to modify your application.
The whole problem has to do with round-tripping. One should be able to read in an XML document with an XML parser, put it through the identity transform (like you are doing) and end up with an equivalent XML document on output, with no information lost. Then repeat the process by reading it in with an XML parser and ... That is the goal. An XML parser however is required to normalize carriage-return/line-feed sequence ( #xD #xA ) to a line-feed (#xA). Any carriage-return (#D) not followed by a new-line is translated or normalized to a line-feed (#xA). The recommendation says that the reason for doing this is "to simplify the tasks of applications" (such as an XSLT processor). Truth be told this is probably to get around the ugly fact that Windows uses carriage-return/line-feed at the end of lines and everyone else uses just line-feed. See http://www.w3.org/TR/REC-xml/#sec-line-ends Consider the following things 1) \r\n 2) \r 3) 4) Your DOM with no XML parser 1) If the input file has this in a text node: "a\r\nb" then when parsed by an XML parser, the XSLT processor will see this: "a\nb" When written back out by the XSLT processor it must decide how to write out (un-normalize) the '\n'. Xalan has decided (a long time ago) to write out the system line separator, and will write out "a\r\nb" when serializing on WIndows. We are actually back to where we started. No information is lost. Great! 2) If the input text node is this: "a\rb" the XML parser will normalize it and the XSLT processor will see this: "a\nb" As before, when written out Xalan will serialize (and un-normalize) to this on Windows: "a\r\nb" Not exactly perfect, as the output differs from the original. In all cases the XSLT processor will never see a '\r' if a literal '\r' is in a text node as an XML parser is required to normalize them away. The XSLT processor will only ever see a '\n'. 3) However we can go out of our way to force a carriage return to be seen by the XSLT processor. If the original XML file has this: "ab" then this will be parsed by an XML parser and the value presented to the XSLT processor will be this: "a\rb" The XSLT processor sees this and thinks "hey there is a carriage-return, they must have gone to special effort to get that one by the normalization requirement". So on output the XSLT processor says if I write this one as any of these: "a\rb" "a\r\nb" "a\nb" then all of them would be read in by an XML parser and normalized to "a\nb" and the carriage return would be lost. If I write "a\rb" out as: "a b" then this is back to where we started. The carriage return is not lost, and "ab" is just as valid as "a\rb" in an XML document anyway. 4) Now for your problem. You don't have an XML parser. You build the document yourself. Your text node is this: "a" + System.getProperty("line.separator") + "b" and on windows that ends up being: "a\r\nb" and this is presented to the XSLT processor. The processor doesn't know where this came from (presumably an XML parser). There is a carriage return in the middle of the text node, so the processor thinks that special efforts have gone into getting that \r to it, and it will make special efforts not to loose it. The \r is written out as  and the \n is un-normalized to \r\n. The net result is that this is what is serialized: "a\r\n" This is perfectly valid XML. If the next application that reads this XML can't handle the that is too bad, but there is no bug. Basically you are very intentionally presenting a \r to the XSLT processor. So why are you calling System.getProperty("line.separator") in the first place? In XML presented to an XSLT processor things are presumed normalized. Why not just use "\n", the normalized end of line sequence. So "a" + "\n" + "b" or "a\nb" will be serialized to: "a\r\nb" on Windows and to: "a\n\b" practically everywhere else. - Brian - - - - - - - - - - - - - - - - - - - - Brian Minchau, Ph.D. XSLT Development, IBM Toronto e-mail: [EMAIL PROTECTED] Jean-Francois Beaulac <jean-francois.be To [EMAIL PROTECTED] Brian Minchau/Toronto/[EMAIL PROTECTED] > cc xalan-j-users@xml.apache.org 06/08/2007 05:15 Subject PM RE: Serializing a DOM tree to XML file, customize entities replacement Hi, I made a test program which outputs a < , a \n and a System.getProperty("line.separator"). The simple \n comes out fine, the only problem I have is with the System.getProperty("line.separator"). Thank you import java.io.StringWriter; import javax.xml.parsers.DocumentBuilder; import javax.xml.parsers.DocumentBuilderFactory; import javax.xml.transform.OutputKeys; import javax.xml.transform.Transformer; import javax.xml.transform.TransformerFactory; import javax.xml.transform.dom.DOMSource; import javax.xml.transform.stream.StreamResult; import org.w3c.dom.Document; import org.w3c.dom.Element; /* * Test.java * * Created on June 8, 2007, 4:55 PM * */ /** * * @author Jean-Francois Beaulac */ public class Test { public static void main(String args []){ try{ // Generate a DOM tree /* <ROOT> <TEXT> Test text < With special character </TEXT> </ROOT> Should result in: <ROOT><TEXT>Test text < With special character after line.separator.</TEXT></ROOT> */ DocumentBuilderFactory dbfac = DocumentBuilderFactory.newInstance(); DocumentBuilder docBuilder = dbfac.newDocumentBuilder(); Document doc = docBuilder.newDocument(); //<QBXML> Element root = doc.createElement("ROOT"); doc.appendChild(root); Element text = doc.createElement("TEXT"); root.appendChild(text); text.appendChild(doc.createTextNode("Test text\n< With special character" + System.getProperty("line.separator") + "after line.separator.")); // Transformation TransformerFactory transfac = TransformerFactory.newInstance(); Transformer trans = transfac.newTransformer(); trans.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes"); trans.setOutputProperty(OutputKeys.METHOD,"xml"); //create string from xml tree StringWriter sw = new StringWriter(); StreamResult result = new StreamResult(sw); DOMSource source = new DOMSource(doc); trans.transform(source, result); // Console System.out.println(sw.toString()); // File java.io.File file = new java.io.File(System.currentTimeMillis() + "@lastRequest.xml"); java.io.BufferedWriter writer = new java.io.BufferedWriter(new java.io.FileWriter(file, true)); writer.write(sw.toString()); writer.flush(); writer.close(); }catch(Exception e){ } } } -----Original Message----- From: Brian Minchau [mailto:[EMAIL PROTECTED] Sent: June 8, 2007 4:53 PM To: Jean-Francois Beaulac Cc: xalan-j-users@xml.apache.org Subject: RE: Serializing a DOM tree to XML file, customize entities replacement HI Jean-Francois, please post a small Java program that creates a small DOM, for example a document with only a root element, that has a text node child with say a '>' and a '\n' in it. Also your code to serializer the DOM so I can see how the comes about. I'm willing to investigate, but I'm not willing to spend time trying to create the testcase. Thanks, - Brian - - - - - - - - - - - - - - - - - - - - Brian Minchau, Ph.D. XSLT Development, IBM Toronto e-mail: [EMAIL PROTECTED] Jean-Francois Beaulac <jean-francois.be To [EMAIL PROTECTED] Brian Minchau/Toronto/[EMAIL PROTECTED] > cc 06/08/2007 04:43 Subject PM RE: Serializing a DOM tree to XML file, customize entities replacement Hi, Is that option supposed to change the String the transformer will use to replace line separators? I just tried it and it changes nothing at all, my XML output is still filled with strings. What I am looking for would be a way to disable output escaping for everything, except the characters I listed in my first post. If I add the processing instruction in my DOM using: ProcessingInstruction pi = doc.createProcessingInstruction(Result.PI_DISABLE_OUTPUT_ESCAPING, ""); root.getParentNode().insertBefore(pi, root); I get the desired result, but then I would need to manually escape all the < > & ' " characters. Thank you -----Original Message----- From: Brian Minchau [mailto:[EMAIL PROTECTED] Sent: June 8, 2007 4:16 PM To: Jean-Francois Beaulac Subject: Re: Serializing a DOM tree to XML file, customize entities replacement Hi Jean-Francois, I think there are solutions to this, but all of them are Xalan specific. I assume that you are running your DOM through the identity transformation in order to serialize it. This is the most portable way to do it. Once you get your Transformer object, even though it is the identity transform, you can set some properties via JAXP. I suggest you try this: javax.xml.transform.Transformer t = ... t.setOutputProperty("{http://xml.apache.org/xalan}line-separator"," "); If you had a stylesheet this could be done like this: <xsl:out xalanPrfx:line-separator=" " xmlns:xalanPrfx="http://xml.apache.org/xalan" /> but you don't have a stylesheet. Still JAXP lets you over-ride xsl:output attribute values, and I think this should work even when there is no stylesheet. So my suggestion is to not output the '\n' but to output a space. Of course if you want something else like "-EndOfLine-" then do this: t.setOutputProperty("{http://xml.apache.org/xalan}line-separator","-EndOfLin e-"); Hope this does the job for you. - Brian - - - - - - - - - - - - - - - - - - - - Brian Minchau, Ph.D. XSLT Development, IBM Toronto e-mail: [EMAIL PROTECTED] Jean-Francois Beaulac <jean-francois.be To [EMAIL PROTECTED] xalan-j-users@xml.apache.org > cc 06/08/2007 03:01 Subject PM Serializing a DOM tree to XML file, customize entities replacement Hi, I am currently building a DOM tree using the Xerces implementation and then write it to a String using the Xalan transformer. I currently have a problem with line breaks (I use System.getProperty("line.separator")) in the text nodes being replaced by the entity . The application I am trying to then send the XML message to does not transform that entity back into a line break. Is there a way to tell Xalan to use either a custom set of entities, or to remove specific entities from this automatic treatment or am I force the reparse manually the result to replace the back to a normal line separator. Having a way to tell the transformer to use a custom set of entities would be my best option since the application I communicate with only threats: - < - > - & - ' - " Thank you ================================ Jean-Francois Beaulac [EMAIL PROTECTED] (See attached file: Test.java)
Test.java
Description: Binary data