DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT <http://nagoya.apache.org/bugzilla/show_bug.cgi?id=20841>. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE.
http://nagoya.apache.org/bugzilla/show_bug.cgi?id=20841 linefeed character not handle properly on Windows. ------- Additional Comments From [EMAIL PROTECTED] 2003-06-26 16:49 ------- As described the attributes value gets normalized first by a text-serializer. For characters in the range 0-127 this leaves them alone except for a newline (NL or decimal 10). The NL is turned into two characters here because of the Windows platform, and it is turned into CR,NL. CR is the carriage-return, decimal 13. This normalized value is later passed to the html-serializer which takes the attribute value with CR,NL combination and leaves the CR alone and but normalizes the NL yet again, producing CR,CR,NL. Both the text-serializer and the html-serializer are not expecting a CR,NL windows sytle end-of-line combination. They are expecting that the XML parser has cleaned that up. Both think that they are writing the final output, and both turn a NL into CR,NL. Possiblity 1: If this NL to CR,NL normalization never happened for both text and html serialization then the NL would stay a NL all the way through. Possibility 2: Both the text and html serializers could be more suspicious of their input and look for a sequence of characters matching the internal character array m_lineSep. When running on windows this is an array of two charater array with the CR,NL combination. This wouldn't be a performance hit because they already pause to do special processing when they hit a NL. A sequence that matches those in m_lineSep could be left alone without the normalization on output. On other platforms this array is just a NL so the input NL is left alone by the serilizers and it acts like "Possibility 1". I'm just worried that some legitimate form of input might not get normalized properly, either for attributes or text, but I haven't thought of something that would break. Possibility 3: Temporarily turning normalization off in the text-serializer. This is tricky because the code sees this serializer as a ContentHandler, which doesn't have a way to do this. Also we might accidentally turn of this normalization when the output is really just to a text-serializer and no further. One might argue that a text-serializer should not be used to do normalization of attribute values, but I think that changes in this area are harder to do than the ones listed in possibility 2 (which I favour). Regards, Brian Minchau
