Hi Jean-Francois.
there is no bug here.  You need to modify your application.

The whole problem has to do with round-tripping.  One should be able to
read in an XML document with an XML parser, put it through the identity
transform (like you are doing) and end up with an equivalent XML document
on output, with no information lost.  Then repeat the process by reading it
in with an XML parser and ...   That is the goal.

An XML parser however is required to normalize carriage-return/line-feed
sequence ( #xD #xA ) to a line-feed (#xA).  Any carriage-return (#D)  not
followed by a new-line is translated or normalized to a line-feed (#xA).
The recommendation says that the reason for doing this is "to simplify the
tasks of applications" (such as an XSLT processor).  Truth be told this is
probably to get around the ugly fact that Windows uses
carriage-return/line-feed at the end of lines and everyone else uses just
line-feed.

See http://www.w3.org/TR/REC-xml/#sec-line-ends

Consider the following things
1) \r\n
2) \r
3) 
4) Your DOM with no XML parser

1) If the input file has this in a text node:
      "a\r\nb"
then when parsed by an XML parser, the XSLT processor will see this:
      "a\nb"
When written back out by the XSLT processor it must decide how to write out
(un-normalize) the '\n'.  Xalan has decided (a long time ago) to write out
the system line separator, and will write out
      "a\r\nb"
when serializing on WIndows.  We are actually back to where we started. No
information is lost. Great!

2) If the input text node is this:
      "a\rb"
the XML parser will normalize it and the  XSLT processor will see this:
      "a\nb"
As before, when written out Xalan will serialize (and un-normalize) to this
on Windows:
      "a\r\nb"
Not exactly perfect, as the output differs from the original.

In all cases the XSLT processor will never see a '\r' if a literal '\r' is
in a text node as an XML parser is required to normalize them away. The
XSLT processor will only ever see a '\n'.

3) However we can go out of our way to force a carriage return to be seen
by the XSLT processor. If the original XML file has this:
      "ab"
then this will be parsed by an XML parser and the value presented to the
XSLT processor will be this:
      "a\rb"
The XSLT processor sees this and thinks "hey there is a carriage-return,
they must have gone to special effort to get that one by the normalization
requirement".
So on output the XSLT processor says if I write this one as any of these:
      "a\rb"
      "a\r\nb"
      "a\nb"
then all of them would be read in by an XML parser and normalized to "a\nb"
and the carriage return would be lost. If I write "a\rb" out as:
      "a
b"
then this is back to where we started.  The carriage return is not lost,
and "ab" is just as valid as "a\rb" in an XML document anyway.

4) Now for your problem. You don't have an XML parser. You build the
document yourself.  Your text node is this:
      "a" + System.getProperty("line.separator") + "b"
and on windows that ends up being:
      "a\r\nb"
and this is presented to the XSLT processor.  The processor doesn't know
where this came from (presumably an XML parser).  There is a carriage
return in the middle of the text node, so the processor thinks that special
efforts have gone into getting that \r to it, and it will make special
efforts not to loose it. The \r is written out as  and the \n is
un-normalized to \r\n.  The net result is that this is what is serialized:
      "a\r\n"
This is perfectly valid XML. If the next application that reads this XML
can't handle the 
 that is too bad, but there is no bug. Basically you
are very intentionally presenting a \r to the XSLT processor.  So why are
you calling System.getProperty("line.separator") in the first place?   In
XML presented to an XSLT processor things are presumed normalized. Why not
just use "\n", the normalized end of line sequence.  So
      "a" + "\n" + "b"
or
      "a\nb"
will be serialized to:
      "a\r\nb"
on Windows and to:
      "a\n\b"
practically everywhere else.

- Brian
- - - - - - - - - - - - - - - - - - - -
Brian Minchau, Ph.D.
XSLT Development, IBM Toronto
e-mail:        [EMAIL PROTECTED]



                                                                           
             Jean-Francois                                                 
             Beaulac                                                       
             <jean-francois.be                                          To 
             [EMAIL PROTECTED]         Brian Minchau/Toronto/[EMAIL PROTECTED]  
   
             >                                                          cc 
                                       xalan-j-users@xml.apache.org        
             06/08/2007 05:15                                      Subject 
             PM                        RE: Serializing a DOM tree to XML   
                                       file, customize entities            
                                       replacement                         
                                                                           
                                                                           
                                                                           
                                                                           
                                                                           
                                                                           




Hi,

I made a test program which outputs a < , a \n and a
System.getProperty("line.separator").

The simple \n comes out fine, the only problem I have is with the
System.getProperty("line.separator").

Thank you

import java.io.StringWriter;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.OutputKeys;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
/*
 * Test.java
 *
 * Created on June 8, 2007, 4:55 PM
 *
 */

/**
 *
 * @author Jean-Francois Beaulac
 */
public class Test {

    public static void main(String args []){
        try{
            // Generate a DOM tree
            /*
             <ROOT>
                <TEXT>
                    Test text
                    < With special character
                </TEXT>
             </ROOT>

                         Should result in:

                         <ROOT><TEXT>Test text
                         &lt; With special character
                         after line.separator.</TEXT></ROOT>


             */
            DocumentBuilderFactory dbfac =
DocumentBuilderFactory.newInstance();
            DocumentBuilder docBuilder = dbfac.newDocumentBuilder();

            Document doc = docBuilder.newDocument();

            //<QBXML>
            Element root = doc.createElement("ROOT");
            doc.appendChild(root);

            Element text = doc.createElement("TEXT");
            root.appendChild(text);

            text.appendChild(doc.createTextNode("Test text\n< With special
character" + System.getProperty("line.separator") + "after
line.separator."));

            // Transformation
            TransformerFactory transfac = TransformerFactory.newInstance();
            Transformer trans = transfac.newTransformer();
            trans.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION,
"yes");
            trans.setOutputProperty(OutputKeys.METHOD,"xml");

            //create string from xml tree
            StringWriter sw = new StringWriter();
            StreamResult result = new StreamResult(sw);
            DOMSource source = new DOMSource(doc);
            trans.transform(source, result);

            // Console
            System.out.println(sw.toString());
            // File
            java.io.File file = new java.io.File(System.currentTimeMillis()
+ "@lastRequest.xml");
            java.io.BufferedWriter writer = new java.io.BufferedWriter(new
java.io.FileWriter(file, true));
            writer.write(sw.toString());
            writer.flush();
            writer.close();
        }catch(Exception e){
        }
    }
}

-----Original Message-----
From: Brian Minchau [mailto:[EMAIL PROTECTED]
Sent: June 8, 2007 4:53 PM
To: Jean-Francois Beaulac
Cc: xalan-j-users@xml.apache.org
Subject: RE: Serializing a DOM tree to XML file, customize entities
replacement

HI Jean-Francois,
please post a small Java program that creates a small DOM, for example a
document with only a root element, that has a text node child with say
a '>' and a  '\n' in it.  Also your code to serializer the DOM so I can see
how the &#13; comes about.

I'm willing to investigate, but I'm not willing to spend time trying to
create the testcase.

Thanks,
- Brian
- - - - - - - - - - - - - - - - - - - -
Brian Minchau, Ph.D.
XSLT Development, IBM Toronto
e-mail:        [EMAIL PROTECTED]




             Jean-Francois
             Beaulac
             <jean-francois.be                                          To
             [EMAIL PROTECTED]         Brian Minchau/Toronto/[EMAIL PROTECTED]
             >                                                          cc

             06/08/2007 04:43                                      Subject
             PM                        RE: Serializing a DOM tree to XML
                                       file, customize entities
                                       replacement










Hi,

Is that option supposed to change the String the transformer will use to
replace line separators? I just tried it and it changes nothing at all, my
XML output is still filled with &#13; strings.

What I am looking for would be a way to disable output escaping for
everything, except the characters I listed in my first post.

If I add the processing instruction in my DOM using:

ProcessingInstruction pi =
doc.createProcessingInstruction(Result.PI_DISABLE_OUTPUT_ESCAPING, "");
root.getParentNode().insertBefore(pi, root);

I get the desired result, but then I would need to manually escape all the
<
> & ' " characters.


Thank you

-----Original Message-----
From: Brian Minchau [mailto:[EMAIL PROTECTED]
Sent: June 8, 2007 4:16 PM
To: Jean-Francois Beaulac
Subject: Re: Serializing a DOM tree to XML file, customize entities
replacement

Hi Jean-Francois,

I think there are solutions to this, but all of them are Xalan specific.

I assume that you are running your DOM through the identity transformation
in order to serialize it.  This is the most portable way to do it.

Once you get your Transformer object, even though it is the identity
transform, you can set some properties via JAXP.  I suggest you try this:

javax.xml.transform.Transformer t = ...
t.setOutputProperty("{http://xml.apache.org/xalan}line-separator","; ");


If you had a stylesheet this could be done like this:
      <xsl:out   xalanPrfx:line-separator=" "
xmlns:xalanPrfx="http://xml.apache.org/xalan"; />
but you don't have a stylesheet.
Still JAXP lets you over-ride xsl:output attribute values, and I think this
should work even when there is no stylesheet.


So my suggestion is to not output the '\n'  but to output a space.   Of
course if you want something else like  "-EndOfLine-" then do this:

t.setOutputProperty("{http://xml.apache.org/xalan}line-separator","-EndOfLin


e-");

Hope this does the job for you.


- Brian
- - - - - - - - - - - - - - - - - - - -
Brian Minchau, Ph.D.
XSLT Development, IBM Toronto
e-mail:        [EMAIL PROTECTED]




             Jean-Francois
             Beaulac
             <jean-francois.be                                          To
             [EMAIL PROTECTED]         xalan-j-users@xml.apache.org
             >                                                          cc

             06/08/2007 03:01                                      Subject
             PM                        Serializing a DOM tree to XML file,
                                       customize entities replacement










Hi,

I am currently building a DOM tree using the Xerces implementation and then
write it to a String using the Xalan transformer.  I currently have a
problem with line breaks (I use System.getProperty("line.separator")) in
the
text nodes being replaced by the entity &#13;. The application I am trying
to then send the XML message to does not transform that entity back into a
line break.

Is there a way to tell Xalan to use either a custom set of entities, or to
remove specific entities from this automatic treatment or am I force the
reparse manually the result to replace the &#13; back to a normal line
separator. Having a way to tell the transformer to use a custom set of
entities would be my best option since the application I communicate with
only threats:

- &lt;
- &gt;
- &amp;
- &apos;
- &quot;

Thank you

================================
Jean-Francois Beaulac
[EMAIL PROTECTED]








(See attached file: Test.java)

Attachment: Test.java
Description: Binary data

Reply via email to