DO NOT REPLY [Bug 23147] - Carriage returns in text nodes double after being transformed/parsed.

bugzilla Fri, 20 Feb 2004 07:32:43 -0800

DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=23147>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.


http://nagoya.apache.org/bugzilla/show_bug.cgi?id=23147

Carriage returns in text nodes double after being transformed/parsed.

[EMAIL PROTECTED] changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |INVALID



------- Additional Comments From [EMAIL PROTECTED]  2004-02-20 15:33 -------
I don't think there's a bug here.

The PreAssessment and PostAssessment elements in the input document look like 
this:

  <PreAssessment>hernia&#13;&#10;g</PreAssessment>
  <PostAssessment>hernia&#10;g</PostAssessment>

Their contents are copied to the result using xsl:value-of instructions, and 
respectively produce textarea elements that look like this:

<textarea name="PreAssessment" rows="5" cols="40">hernia&#13;\r\ng</textarea>
<textarea name="PostAssessment" rows="5" cols="40">hernia\r\ng</textarea>

I've represented the carriage return/line feed pair of actual characters that 
appears in the output as \r\n above.  This distinguishes them from the 
character reference &#13; that appears in the "PreAssessment" textarea.

When the XML parser encounters the character references &#13;&#10; in the 
content of the PreAssessment element, the carriage return and line feed 
characters are passed to Xalan as separate characters, rather than as a single 
end-of-line marker.  It is the fact that they appear as character references 
that forces the parser to treat them that way.  If the input had contained the 
actual ASCII characters \r\n, as opposed to character references, the XML 
parser would have been required to treat them as a single end-of-line marker, 
and would have passed them to Xalan as a single line-feed character.

When Xalan copies those characters to the output, it attempts to write them in 
a way that preserves the fact that they were represented internally as separate 
characters.  In the case of a carriage return, it does that by emitting the 
character as a character reference.  (For XML output that representation is of 
greater significance than it is for HTML.)  The line feed character is emitted 
in a form that is appropriate for an end-of-line marker.  For both XML and HTML 
output, \n or \r\n are equivalent as end-of-line markers.

In the result, you end up getting &#13; for the carriage return character, and 
\r\n to represent the line break for the line feed character.  An HTML agent 
should represent those as two separate line breaks.

DO NOT REPLY [Bug 23147] - Carriage returns in text nodes double after being transformed/parsed.

Reply via email to