[jira] [Commented] (XALANJ-2617) Serializer produces separately escaped surrogate pair instead of codepoint
[ https://issues.apache.org/jira/browse/XALANJ-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17811567#comment-17811567 ] Cédric Damioli commented on XALANJ-2617: [~kesh...@alum.mit.edu] I think you may also mark this one resolved as duplicate of XALANJ-2419 ? > Serializer produces separately escaped surrogate pair instead of codepoint > -- > > Key: XALANJ-2617 > URL: https://issues.apache.org/jira/browse/XALANJ-2617 > Project: XalanJ2 > Issue Type: Bug > Security Level: No security risk; visible to anyone(Ordinary problems in > Xalan projects. Anybody can view the issue.) > Components: Serialization, Xalan >Affects Versions: 2.7.1, 2.7.2 >Reporter: Daniel Kec >Assignee: Steven J. Hathaway >Priority: Major > Attachments: JI9053942.java, > XALANJ-2617_Fix_missing_surrogate_pairs_support.patch, > XALANJ-2617_java.patch, XALANJ-2617_test.patch > > > When trying to serialize XML with char consisting of unicode surogate char > "\uD840\uDC0B" I have tried several and non worked. XML Transformer creates > XML string with escaped surogate pair separately, which makes XML > unparseable. eg.: SAXParseException; Character reference "" is an > invalid XML character. It looks like a bug introduced in the XALANJ-2271 fix. > > {code:java|title=Output of Xalan ver. 2.7.2} > kec@phoebe:~/Downloads$ java -version > java version "1.8.0_171" > Java(TM) SE Runtime Environment (build 1.8.0_171-b11) > Java HotSpot(TM) 64-Bit Server VM (build 25.171-b11, mixed mode) > kec@phoebe:~/Downloads$ java -cp > /home/kec/.m2/repository/xml-apis/xml-apis/1.4.01/xml-apis-1.4.01.jar:/home/kec/.m2/repository/xalan/xalan/2.7.2/xalan-2.7.2.jar:/home/kec/.m2/repository/xalan/serializer/2.7.2/serializer-2.7.2.jar:. > JI9053942 > Character: > EXPECTED: > ACTUAL: > [Fatal Error] :1:50: Character reference
[jira] [Commented] (XALANJ-2617) Serializer produces separately escaped surrogate pair instead of codepoint
[ https://issues.apache.org/jira/browse/XALANJ-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17493637#comment-17493637 ] Joe Kesselman commented on XALANJ-2617: --- Sanity check: What version number does that Xalan report? Either OpenJDK is using a different release, or they're using a different default configuration, or you've found a place where the compiled-mode behavior of Xalan differs from the interpreted mode. (I don't have time to dig into the details of your issue right now; I'll try to look at it next week. I suspect this is arising because Xalan was originally written with the assumption that text was all going to be UTF16, so it's seeing the pair as two characters rather than one. I thought we'd started addressing that, but it might only have been in the IBM code.) -- /_ My pronouns are he/him/his, though I answer -/ _) to "(the) cat('s)" ... or anything, really. / Please correct me if I get yours wrong. () Plaintext Ribbon Campaign /\ Stamp out HTML mail! > Serializer produces separately escaped surrogate pair instead of codepoint > -- > > Key: XALANJ-2617 > URL: https://issues.apache.org/jira/browse/XALANJ-2617 > Project: XalanJ2 > Issue Type: Bug > Security Level: No security risk; visible to anyone(Ordinary problems in > Xalan projects. Anybody can view the issue.) > Components: Serialization, Xalan >Affects Versions: 2.7.1, 2.7.2 >Reporter: Daniel Kec >Assignee: Steven J. Hathaway >Priority: Major > Attachments: JI9053942.java, > XALANJ-2617_Fix_missing_surrogate_pairs_support.patch, > XALANJ-2617_java.patch, XALANJ-2617_test.patch > > > When trying to serialize XML with char consisting of unicode surogate char > "\uD840\uDC0B" I have tried several and non worked. XML Transformer creates > XML string with escaped surogate pair separately, which makes XML > unparseable. eg.: SAXParseException; Character reference "" is an > invalid XML character. It looks like a bug introduced in the XALANJ-2271 fix. > > {code:java|title=Output of Xalan ver. 2.7.2} > kec@phoebe:~/Downloads$ java -version > java version "1.8.0_171" > Java(TM) SE Runtime Environment (build 1.8.0_171-b11) > Java HotSpot(TM) 64-Bit Server VM (build 25.171-b11, mixed mode) > kec@phoebe:~/Downloads$ java -cp > /home/kec/.m2/repository/xml-apis/xml-apis/1.4.01/xml-apis-1.4.01.jar:/home/kec/.m2/repository/xalan/xalan/2.7.2/xalan-2.7.2.jar:/home/kec/.m2/repository/xalan/serializer/2.7.2/serializer-2.7.2.jar:. > JI9053942 > Character: > EXPECTED: > ACTUAL: > [Fatal Error] :1:50: Character reference
[jira] [Commented] (XALANJ-2617) Serializer produces separately escaped surrogate pair instead of codepoint
[ https://issues.apache.org/jira/browse/XALANJ-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17493626#comment-17493626 ] Takayuki Nagai commented on XALANJ-2617: I had the same problem and found a solution. I stopped to use org.apache.xalan.processor.TransformerFactoryImpl and moved to com.sun.org.apache.xalan.internal.xsltc.trax.TransformerFactoryImpl contained in OpenJDK11. I confirmed the xalan in OpenJDK does not have this problem. > Serializer produces separately escaped surrogate pair instead of codepoint > -- > > Key: XALANJ-2617 > URL: https://issues.apache.org/jira/browse/XALANJ-2617 > Project: XalanJ2 > Issue Type: Bug > Security Level: No security risk; visible to anyone(Ordinary problems in > Xalan projects. Anybody can view the issue.) > Components: Serialization, Xalan >Affects Versions: 2.7.1, 2.7.2 >Reporter: Daniel Kec >Assignee: Steven J. Hathaway >Priority: Major > Attachments: JI9053942.java, > XALANJ-2617_Fix_missing_surrogate_pairs_support.patch, > XALANJ-2617_java.patch, XALANJ-2617_test.patch > > > When trying to serialize XML with char consisting of unicode surogate char > "\uD840\uDC0B" I have tried several and non worked. XML Transformer creates > XML string with escaped surogate pair separately, which makes XML > unparseable. eg.: SAXParseException; Character reference "" is an > invalid XML character. It looks like a bug introduced in the XALANJ-2271 fix. > > {code:java|title=Output of Xalan ver. 2.7.2} > kec@phoebe:~/Downloads$ java -version > java version "1.8.0_171" > Java(TM) SE Runtime Environment (build 1.8.0_171-b11) > Java HotSpot(TM) 64-Bit Server VM (build 25.171-b11, mixed mode) > kec@phoebe:~/Downloads$ java -cp > /home/kec/.m2/repository/xml-apis/xml-apis/1.4.01/xml-apis-1.4.01.jar:/home/kec/.m2/repository/xalan/xalan/2.7.2/xalan-2.7.2.jar:/home/kec/.m2/repository/xalan/serializer/2.7.2/serializer-2.7.2.jar:. > JI9053942 > Character: > EXPECTED: > ACTUAL: > [Fatal Error] :1:50: Character reference
[jira] [Commented] (XALANJ-2617) Serializer produces separately escaped surrogate pair instead of codepoint
[ https://issues.apache.org/jira/browse/XALANJ-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17337488#comment-17337488 ] Patrick Ferreira commented on XALANJ-2617: -- As there are no updates since a few years, how do you guys deal with this issue ? I'm experiencing the same bug right now. Could I switch back to xalan-2.7.0 ? Or should I use another library ? Thanks > Serializer produces separately escaped surrogate pair instead of codepoint > -- > > Key: XALANJ-2617 > URL: https://issues.apache.org/jira/browse/XALANJ-2617 > Project: XalanJ2 > Issue Type: Bug > Security Level: No security risk; visible to anyone(Ordinary problems in > Xalan projects. Anybody can view the issue.) > Components: Serialization, Xalan >Affects Versions: 2.7.1, 2.7.2 >Reporter: Daniel Kec >Assignee: Steven J. Hathaway >Priority: Major > Attachments: JI9053942.java, > XALANJ-2617_Fix_missing_surrogate_pairs_support.patch, > XALANJ-2617_java.patch, XALANJ-2617_test.patch > > > When trying to serialize XML with char consisting of unicode surogate char > "\uD840\uDC0B" I have tried several and non worked. XML Transformer creates > XML string with escaped surogate pair separately, which makes XML > unparseable. eg.: SAXParseException; Character reference "" is an > invalid XML character. It looks like a bug introduced in the XALANJ-2271 fix. > > {code:java|title=Output of Xalan ver. 2.7.2} > kec@phoebe:~/Downloads$ java -version > java version "1.8.0_171" > Java(TM) SE Runtime Environment (build 1.8.0_171-b11) > Java HotSpot(TM) 64-Bit Server VM (build 25.171-b11, mixed mode) > kec@phoebe:~/Downloads$ java -cp > /home/kec/.m2/repository/xml-apis/xml-apis/1.4.01/xml-apis-1.4.01.jar:/home/kec/.m2/repository/xalan/xalan/2.7.2/xalan-2.7.2.jar:/home/kec/.m2/repository/xalan/serializer/2.7.2/serializer-2.7.2.jar:. > JI9053942 > Character: > EXPECTED: > ACTUAL: > [Fatal Error] :1:50: Character reference
[jira] [Commented] (XALANJ-2617) Serializer produces separately escaped surrogate pair instead of codepoint
[ https://issues.apache.org/jira/browse/XALANJ-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16804568#comment-16804568 ] Mukul Gandhi commented on XALANJ-2617: -- I can see that, somewhere in this jira thread its mentioned, "patch contains the fix in java code relative to http://svn.apache.org/repos/asf/xalan/java/trunk; I think, the patch should be relative to http://svn.apache.org/repos/asf/xalan/java/branches/xalan-j_2_7_1_maint. I think, Xalan-J's next version would be released from this branch. Xalan team may correct me, if I'm wrong. > Serializer produces separately escaped surrogate pair instead of codepoint > -- > > Key: XALANJ-2617 > URL: https://issues.apache.org/jira/browse/XALANJ-2617 > Project: XalanJ2 > Issue Type: Bug > Security Level: No security risk; visible to anyone(Ordinary problems in > Xalan projects. Anybody can view the issue.) > Components: Serialization, Xalan >Affects Versions: 2.7.1, 2.7.2 >Reporter: Daniel Kec >Assignee: Steven J. Hathaway >Priority: Major > Attachments: JI9053942.java, > XALANJ-2617_Fix_missing_surrogate_pairs_support.patch, > XALANJ-2617_java.patch, XALANJ-2617_test.patch > > > When trying to serialize XML with char consisting of unicode surogate char > "\uD840\uDC0B" I have tried several and non worked. XML Transformer creates > XML string with escaped surogate pair separately, which makes XML > unparseable. eg.: SAXParseException; Character reference "" is an > invalid XML character. It looks like a bug introduced in the XALANJ-2271 fix. > > {code:java|title=Output of Xalan ver. 2.7.2} > kec@phoebe:~/Downloads$ java -version > java version "1.8.0_171" > Java(TM) SE Runtime Environment (build 1.8.0_171-b11) > Java HotSpot(TM) 64-Bit Server VM (build 25.171-b11, mixed mode) > kec@phoebe:~/Downloads$ java -cp > /home/kec/.m2/repository/xml-apis/xml-apis/1.4.01/xml-apis-1.4.01.jar:/home/kec/.m2/repository/xalan/xalan/2.7.2/xalan-2.7.2.jar:/home/kec/.m2/repository/xalan/serializer/2.7.2/serializer-2.7.2.jar:. > JI9053942 > Character: > EXPECTED: > ACTUAL: > [Fatal Error] :1:50: Character reference
[jira] [Commented] (XALANJ-2617) Serializer produces separately escaped surrogate pair instead of codepoint
[ https://issues.apache.org/jira/browse/XALANJ-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16774686#comment-16774686 ] Jason Harrop commented on XALANJ-2617: -- Compare https://issues.apache.org/jira/browse/XALANJ-2419 and the fix there [^XALANJ-2419-fix-v3.txt] (elements only). > Serializer produces separately escaped surrogate pair instead of codepoint > -- > > Key: XALANJ-2617 > URL: https://issues.apache.org/jira/browse/XALANJ-2617 > Project: XalanJ2 > Issue Type: Bug > Security Level: No security risk; visible to anyone(Ordinary problems in > Xalan projects. Anybody can view the issue.) > Components: Serialization, Xalan >Affects Versions: 2.7.1, 2.7.2 >Reporter: Daniel Kec >Assignee: Steven J. Hathaway >Priority: Major > Attachments: JI9053942.java, > XALANJ-2617_Fix_missing_surrogate_pairs_support.patch, > XALANJ-2617_java.patch, XALANJ-2617_test.patch > > > When trying to serialize XML with char consisting of unicode surogate char > "\uD840\uDC0B" I have tried several and non worked. XML Transformer creates > XML string with escaped surogate pair separately, which makes XML > unparseable. eg.: SAXParseException; Character reference "" is an > invalid XML character. It looks like a bug introduced in the XALANJ-2271 fix. > > {code:java|title=Output of Xalan ver. 2.7.2} > kec@phoebe:~/Downloads$ java -version > java version "1.8.0_171" > Java(TM) SE Runtime Environment (build 1.8.0_171-b11) > Java HotSpot(TM) 64-Bit Server VM (build 25.171-b11, mixed mode) > kec@phoebe:~/Downloads$ java -cp > /home/kec/.m2/repository/xml-apis/xml-apis/1.4.01/xml-apis-1.4.01.jar:/home/kec/.m2/repository/xalan/xalan/2.7.2/xalan-2.7.2.jar:/home/kec/.m2/repository/xalan/serializer/2.7.2/serializer-2.7.2.jar:. > JI9053942 > Character: > EXPECTED: > ACTUAL: > [Fatal Error] :1:50: Character reference
[jira] [Commented] (XALANJ-2617) Serializer produces separately escaped surrogate pair instead of codepoint
[ https://issues.apache.org/jira/browse/XALANJ-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16615312#comment-16615312 ] Peter De Maeyer commented on XALANJ-2617: - Pull request created. Unfortunately, it only contains the fix in production code and not the tests, because there is no repository on Github for the test code. This confuses me a bit - if anyone has a recommendation of what to do with the test, I'd be happy to follow them. My understanding of things (correct me if I'm wrong): * (/) http://svn.apache.org/repos/asf/xalan/java/trunk is the (ancient) authoritative repository for the production code. This is where I created my production code patch against. * (/) http://svn.apache.org/repos/asf/xalan/test/trunk is the (ancient) authoritative repository for the test code. This is where I created my test code patch against. * (/) The repository for the production code is mirrored on Github: https://github.com/apache/xalan-j. This is where I created a pull request against for my production code patch. * (x) I did not find an equivalent mirror on Github of the repository for the test code, so I can't create a pull request for my test code patch. To complete the story: I successfully ran the minitest and smoketest in the test repository before and after my fix. In order to be able to do this, I recreated an ancient Windows 2000 32-bit system in a VM, capable of running the ancient test harness. Being on a modern Ubuntu Linux 64-bit system, and being spoiled with JUnit tests, it took some effort to take a step back in time: # Install Windows 2000 32-bit in a VirtualBox VM. # Install 32-bit JDK 1.3 because the Xalan-J sources are -target 1.3 (I know I could have compiled this with a JDK 1.6 as well, but that only applies to the bytecode, it doesn't prevent @Since > 1.3 API usage). # Familiarize myself with the really clunky and ancient test harness (being used to JUnit and Mockito). Forgive me if this explanation is overly verbose, but I'm trying to illustrate that I didn't make this patch in a hurry, I was being thorough. > Serializer produces separately escaped surrogate pair instead of codepoint > -- > > Key: XALANJ-2617 > URL: https://issues.apache.org/jira/browse/XALANJ-2617 > Project: XalanJ2 > Issue Type: Bug > Security Level: No security risk; visible to anyone(Ordinary problems in > Xalan projects. Anybody can view the issue.) > Components: Serialization, Xalan >Affects Versions: 2.7.1, 2.7.2 >Reporter: Daniel Kec >Assignee: Steven J. Hathaway >Priority: Major > Attachments: JI9053942.java, > XALANJ-2617_Fix_missing_surrogate_pairs_support.patch, > XALANJ-2617_java.patch, XALANJ-2617_test.patch > > > When trying to serialize XML with char consisting of unicode surogate char > "\uD840\uDC0B" I have tried several and non worked. XML Transformer creates > XML string with escaped surogate pair separately, which makes XML > unparseable. eg.: SAXParseException; Character reference "" is an > invalid XML character. It looks like a bug introduced in the XALANJ-2271 fix. > > {code:java|title=Output of Xalan ver. 2.7.2} > kec@phoebe:~/Downloads$ java -version > java version "1.8.0_171" > Java(TM) SE Runtime Environment (build 1.8.0_171-b11) > Java HotSpot(TM) 64-Bit Server VM (build 25.171-b11, mixed mode) > kec@phoebe:~/Downloads$ java -cp > /home/kec/.m2/repository/xml-apis/xml-apis/1.4.01/xml-apis-1.4.01.jar:/home/kec/.m2/repository/xalan/xalan/2.7.2/xalan-2.7.2.jar:/home/kec/.m2/repository/xalan/serializer/2.7.2/serializer-2.7.2.jar:. > JI9053942 > Character: > EXPECTED: > ACTUAL: > [Fatal Error] :1:50: Character reference
[jira] [Commented] (XALANJ-2617) Serializer produces separately escaped surrogate pair instead of codepoint
[ https://issues.apache.org/jira/browse/XALANJ-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16615284#comment-16615284 ] ASF GitHub Bot commented on XALANJ-2617: GitHub user peterdemaeyer opened a pull request: https://github.com/apache/xalan-j/pull/4 XALANJ-2617 Fixed serializer for high-surrogate UTF-16 characters Fixed serializer such that it correctly deals with high-surrogate UTF-16 characters. This pull request replaces an earlier one from Daniel Kec, see comments on https://issues.apache.org/jira/browse/XALANJ-2617. You can merge this pull request into a Git repository by running: $ git pull https://github.com/peterdemaeyer/xalan-j trunk Alternatively you can review and apply these changes as the patch at: https://github.com/apache/xalan-j/pull/4.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4 commit 8a735e58e6804be1e6a125678d1a8d116ad54651 Author: peterdm Date: 2018-09-14T19:15:32Z XALANJ-2617 Fixed serializer such that it correctly deals with high-surrogate UTF-16 characters > Serializer produces separately escaped surrogate pair instead of codepoint > -- > > Key: XALANJ-2617 > URL: https://issues.apache.org/jira/browse/XALANJ-2617 > Project: XalanJ2 > Issue Type: Bug > Security Level: No security risk; visible to anyone(Ordinary problems in > Xalan projects. Anybody can view the issue.) > Components: Serialization, Xalan >Affects Versions: 2.7.1, 2.7.2 >Reporter: Daniel Kec >Assignee: Steven J. Hathaway >Priority: Major > Attachments: JI9053942.java, > XALANJ-2617_Fix_missing_surrogate_pairs_support.patch, > XALANJ-2617_java.patch, XALANJ-2617_test.patch > > > When trying to serialize XML with char consisting of unicode surogate char > "\uD840\uDC0B" I have tried several and non worked. XML Transformer creates > XML string with escaped surogate pair separately, which makes XML > unparseable. eg.: SAXParseException; Character reference "" is an > invalid XML character. It looks like a bug introduced in the XALANJ-2271 fix. > > {code:java|title=Output of Xalan ver. 2.7.2} > kec@phoebe:~/Downloads$ java -version > java version "1.8.0_171" > Java(TM) SE Runtime Environment (build 1.8.0_171-b11) > Java HotSpot(TM) 64-Bit Server VM (build 25.171-b11, mixed mode) > kec@phoebe:~/Downloads$ java -cp > /home/kec/.m2/repository/xml-apis/xml-apis/1.4.01/xml-apis-1.4.01.jar:/home/kec/.m2/repository/xalan/xalan/2.7.2/xalan-2.7.2.jar:/home/kec/.m2/repository/xalan/serializer/2.7.2/serializer-2.7.2.jar:. > JI9053942 > Character: > EXPECTED: > ACTUAL: > [Fatal Error] :1:50: Character reference
[jira] [Commented] (XALANJ-2617) Serializer produces separately escaped surrogate pair instead of codepoint
[ https://issues.apache.org/jira/browse/XALANJ-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16612703#comment-16612703 ] Daniel Kec commented on XALANJ-2617: Great thx, i have been afraid it might break something without a proper test suit. Feel free to create new PR, im gonna close mine and vote for yours > Serializer produces separately escaped surrogate pair instead of codepoint > -- > > Key: XALANJ-2617 > URL: https://issues.apache.org/jira/browse/XALANJ-2617 > Project: XalanJ2 > Issue Type: Bug > Security Level: No security risk; visible to anyone(Ordinary problems in > Xalan projects. Anybody can view the issue.) > Components: Serialization, Xalan >Affects Versions: 2.7.1, 2.7.2 >Reporter: Daniel Kec >Assignee: Steven J. Hathaway >Priority: Major > Attachments: JI9053942.java, > XALANJ-2617_Fix_missing_surrogate_pairs_support.patch, > XALANJ-2617_Fix_missing_surrogate_pairs_support_new.patch > > > When trying to serialize XML with char consisting of unicode surogate char > "\uD840\uDC0B" I have tried several and non worked. XML Transformer creates > XML string with escaped surogate pair separately, which makes XML > unparseable. eg.: SAXParseException; Character reference "" is an > invalid XML character. It looks like a bug introduced in the XALANJ-2271 fix. > > {code:java|title=Output of Xalan ver. 2.7.2} > kec@phoebe:~/Downloads$ java -version > java version "1.8.0_171" > Java(TM) SE Runtime Environment (build 1.8.0_171-b11) > Java HotSpot(TM) 64-Bit Server VM (build 25.171-b11, mixed mode) > kec@phoebe:~/Downloads$ java -cp > /home/kec/.m2/repository/xml-apis/xml-apis/1.4.01/xml-apis-1.4.01.jar:/home/kec/.m2/repository/xalan/xalan/2.7.2/xalan-2.7.2.jar:/home/kec/.m2/repository/xalan/serializer/2.7.2/serializer-2.7.2.jar:. > JI9053942 > Character: > EXPECTED: > ACTUAL: > [Fatal Error] :1:50: Character reference
[jira] [Commented] (XALANJ-2617) Serializer produces separately escaped surrogate pair instead of codepoint
[ https://issues.apache.org/jira/browse/XALANJ-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16612680#comment-16612680 ] Peter De Maeyer commented on XALANJ-2617: - It can be proven with a unit test that Daniel's fix breaks some scenarios that used to work. As I suspected, the "if" has to be an "else if". I've attached my own new patch + unit tests. Note that the patch spans 2 repositories: the fix is relative to [http://svn.apache.org/repos/asf/xalan/java/trunk,] the unit test is relative to [http://svn.apache.org/repos/asf/xalan/test/trunk|http://svn.apache.org/repos/asf/xalan/test/trunk.]. Just in case the patch isn't readable, this is essence of the test code: {code:java} /** * This test case illustrates the original problem with high-surrogate characters. * This is broken in Xalan 2.7.2, hence the need for a fix. */ public void serializationOfHighSurrogateCharactersInUtf8() throws Throwable { reporter.testCaseInit("serializationOfHighSurrogateCharactersInUtf8"); try { String value = "\uD840\uDC0B"; serializationOf(value,
[jira] [Commented] (XALANJ-2617) Serializer produces separately escaped surrogate pair instead of codepoint
[ https://issues.apache.org/jira/browse/XALANJ-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16597890#comment-16597890 ] Peter De Maeyer commented on XALANJ-2617: - [~danielkec], I reviewed your patch, but I wonder if the 'if' statement you added shouldn't have been an 'if else' statement. The reason I wonder is that the 'else if' and 'else' branches that immediately follow your changes now get a different meaning. Especially the 'else' branch that has a code comment "// This is a fallback plan, we should never get here" unsettles me. I can imagine that some _new_ scenario is indeed fixed, but I'm worried that some _existing_ scenarios (which were supposed to trigger this "fallback plan") might be broken. The fact that I did not find any unit tests to illustrate those scenarios does not help take away my concern either. I'm not familiar with the Xalan codebase (I am very familiar with Java code in general though), so maybe I misunderstand a couple of things, it would really appreciate it if you could reassure me this patch is indeed the right fix. Then as a cosmetic side note, I noticed that your patch does not respect the code style of the surrounding code. '{' should not be on a new line, and there are 2 instances of a ',' that aren't followed by a space as they should. Call me pedantic, but I'm just being thorough here. > Serializer produces separately escaped surrogate pair instead of codepoint > -- > > Key: XALANJ-2617 > URL: https://issues.apache.org/jira/browse/XALANJ-2617 > Project: XalanJ2 > Issue Type: Bug > Security Level: No security risk; visible to anyone(Ordinary problems in > Xalan projects. Anybody can view the issue.) > Components: Serialization, Xalan >Affects Versions: 2.7.1, 2.7.2 >Reporter: Daniel Kec >Assignee: Steven J. Hathaway >Priority: Major > Attachments: JI9053942.java, > XALANJ-2617_Fix_missing_surrogate_pairs_support.patch > > > When trying to serialize XML with char consisting of unicode surogate char > "\uD840\uDC0B" I have tried several and non worked. XML Transformer creates > XML string with escaped surogate pair separately, which makes XML > unparseable. eg.: SAXParseException; Character reference "" is an > invalid XML character. It looks like a bug introduced in the XALANJ-2271 fix. > > {code:java|title=Output of Xalan ver. 2.7.2} > kec@phoebe:~/Downloads$ java -version > java version "1.8.0_171" > Java(TM) SE Runtime Environment (build 1.8.0_171-b11) > Java HotSpot(TM) 64-Bit Server VM (build 25.171-b11, mixed mode) > kec@phoebe:~/Downloads$ java -cp > /home/kec/.m2/repository/xml-apis/xml-apis/1.4.01/xml-apis-1.4.01.jar:/home/kec/.m2/repository/xalan/xalan/2.7.2/xalan-2.7.2.jar:/home/kec/.m2/repository/xalan/serializer/2.7.2/serializer-2.7.2.jar:. > JI9053942 > Character: > EXPECTED: > ACTUAL: > [Fatal Error] :1:50: Character reference