[ https://issues.apache.org/jira/browse/XALANJ-2419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14632468#comment-14632468 ]
Scott Mitchell commented on XALANJ-2419: ---------------------------------------- Hi Gary, I'm trying to answer that question for you, but so far I've been unable to figure out how to run the test suite. I originally downloaded a source distribution to make my modifications, but that doesn't appear to have the tests included in it. I then tried cloning the Git repo and checking out the SVN repo and neither of those seem to include the tests either. Any clue how I can get my hands on the tests? FWIW, here's the error I'm getting when I run "ant smoketest" or "minitest": tests-not-available: [echo] [tests] The tests do not seem to be present in ../test [echo] [tests] You must have checked out from CVS to run the tests, [echo] [tests] it is not included in binary distributions. [echo] [tests] See http://xml.apache.org/xalan-j/test/ for more info. > Astral characters written as a pair of NCRs with the surrogate scalar values > when using UTF-8 > --------------------------------------------------------------------------------------------- > > Key: XALANJ-2419 > URL: https://issues.apache.org/jira/browse/XALANJ-2419 > Project: XalanJ2 > Issue Type: Bug > Components: Serialization > Affects Versions: 2.7.1 > Reporter: Henri Sivonen > Attachments: XALANJ-2419-fix.txt, XALANJ-2419-tests.txt > > > org.apache.xml.serializer.ToStream contains the following code: > else if (m_encodingInfo.isInEncoding(ch)) { > // If the character is in the encoding, and > // not in the normal ASCII range, we also > // just leave it get added on to the clean characters > > } > else { > // This is a fallback plan, we should never get here > // but if the character wasn't previously handled > // (i.e. isn't in the encoding, etc.) then what > // should we do? We choose to write out an entity > writeOutCleanChars(chars, i, lastDirtyCharProcessed); > writer.write("&#"); > writer.write(Integer.toString(ch)); > writer.write(';'); > lastDirtyCharProcessed = i; > } > This leads to the wrong (latter) if branch running for surrogates, because > isInEncoding() for UTF-8 returns false for surrogates. It is always wrong > (regardless of encoding) to escape a surrogate as an NCR. > The practical effect of this bug is that any document with astral characters > in it ends up in an ill-formed serialization and does not parse back using an > XML parser. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@xalan.apache.org For additional commands, e-mail: dev-h...@xalan.apache.org