[jira] (XALANJ-2419) Astral characters written as a pair of NCRs with the surrogate scalar values when using UTF-8
[ https://issues.apache.org/jira/browse/XALANJ-2419 ] Joe Kesselman deleted comment on XALANJ-2419: --- was (Author: JIRAUSER285361): Max's alternative does cause a regression in some of the new tests, assuming I applied it correctly. Surprising. Can take a longer look, but may want to merge what we have first since it *is* an improvement over the previous code. > Astral characters written as a pair of NCRs with the surrogate scalar values > when using UTF-8 > - > > Key: XALANJ-2419 > URL: https://issues.apache.org/jira/browse/XALANJ-2419 > Project: XalanJ2 > Issue Type: Bug > Components: Serialization >Affects Versions: 2.7.1 >Reporter: Henri Sivonen >Assignee: Joe Kesselman >Priority: Major > Attachments: XALANJ-2419-fix-v3.txt, XALANJ-2419-tests-v3.txt > > > org.apache.xml.serializer.ToStream contains the following code: > else if (m_encodingInfo.isInEncoding(ch)) { > // If the character is in the encoding, and > // not in the normal ASCII range, we also > // just leave it get added on to the clean characters > > } > else { > // This is a fallback plan, we should never get here > // but if the character wasn't previously handled > // (i.e. isn't in the encoding, etc.) then what > // should we do? We choose to write out an entity > writeOutCleanChars(chars, i, lastDirtyCharProcessed); >
[jira] (XALANJ-2419) Astral characters written as a pair of NCRs with the surrogate scalar values when using UTF-8
[ https://issues.apache.org/jira/browse/XALANJ-2419 ] Joseph Kessselman deleted comment on XALANJ-2419: --- was (Author: jkesselm): (apitest rather than smoketest, but it's there.) Seeing a few oddities in astrals. Thought I had that running. Investigating. I still need to look at @max's ToStream buffer-bounds tweak and see if that still applies. And at whether it ought to be replicated in ToXMLStream/ToHTMLStream to replace their handling of the surrogate-pair case; arguably so...? > Astral characters written as a pair of NCRs with the surrogate scalar values > when using UTF-8 > - > > Key: XALANJ-2419 > URL: https://issues.apache.org/jira/browse/XALANJ-2419 > Project: XalanJ2 > Issue Type: Bug > Components: Serialization >Affects Versions: 2.7.1 >Reporter: Henri Sivonen >Assignee: Joe Kesselman >Priority: Major > Attachments: XALANJ-2419-fix-v3.txt, XALANJ-2419-tests-v3.txt > > > org.apache.xml.serializer.ToStream contains the following code: > else if (m_encodingInfo.isInEncoding(ch)) { > // If the character is in the encoding, and > // not in the normal ASCII range, we also > // just leave it get added on to the clean characters > > } > else { > // This is a fallback plan, we should never get here > // but if the character wasn't previously handled > // (i.e. isn't in the encoding, etc.) then what > // should we do? We choose to write out an entity > writeOutCleanChars(chars, i, lastDirtyCharProcessed); >