[jira] [Updated] (XALANJ-2617) Serializer produces separately escaped surrogate pair instead of codepoint
[ https://issues.apache.org/jira/browse/XALANJ-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter De Maeyer updated XALANJ-2617: Attachment: (was: XALANJ-2617_java.patch) > Serializer produces separately escaped surrogate pair instead of codepoint > -- > > Key: XALANJ-2617 > URL: https://issues.apache.org/jira/browse/XALANJ-2617 > Project: XalanJ2 > Issue Type: Bug > Security Level: No security risk; visible to anyone(Ordinary problems in > Xalan projects. Anybody can view the issue.) > Components: Serialization, Xalan >Affects Versions: 2.7.1, 2.7.2 >Reporter: Daniel Kec >Assignee: Steven J. Hathaway >Priority: Major > Attachments: JI9053942.java, > XALANJ-2617_Fix_missing_surrogate_pairs_support.patch, > XALANJ-2617_java.patch, XALANJ-2617_test.patch > > > When trying to serialize XML with char consisting of unicode surogate char > "\uD840\uDC0B" I have tried several and non worked. XML Transformer creates > XML string with escaped surogate pair separately, which makes XML > unparseable. eg.: SAXParseException; Character reference "" is an > invalid XML character. It looks like a bug introduced in the XALANJ-2271 fix. > > {code:java|title=Output of Xalan ver. 2.7.2} > kec@phoebe:~/Downloads$ java -version > java version "1.8.0_171" > Java(TM) SE Runtime Environment (build 1.8.0_171-b11) > Java HotSpot(TM) 64-Bit Server VM (build 25.171-b11, mixed mode) > kec@phoebe:~/Downloads$ java -cp > /home/kec/.m2/repository/xml-apis/xml-apis/1.4.01/xml-apis-1.4.01.jar:/home/kec/.m2/repository/xalan/xalan/2.7.2/xalan-2.7.2.jar:/home/kec/.m2/repository/xalan/serializer/2.7.2/serializer-2.7.2.jar:. > JI9053942 > Character: > EXPECTED: > ACTUAL: > [Fatal Error] :1:50: Character reference
[jira] [Updated] (XALANJ-2617) Serializer produces separately escaped surrogate pair instead of codepoint
[ https://issues.apache.org/jira/browse/XALANJ-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter De Maeyer updated XALANJ-2617: Attachment: (was: XALANJ-2617_test.patch) > Serializer produces separately escaped surrogate pair instead of codepoint > -- > > Key: XALANJ-2617 > URL: https://issues.apache.org/jira/browse/XALANJ-2617 > Project: XalanJ2 > Issue Type: Bug > Security Level: No security risk; visible to anyone(Ordinary problems in > Xalan projects. Anybody can view the issue.) > Components: Serialization, Xalan >Affects Versions: 2.7.1, 2.7.2 >Reporter: Daniel Kec >Assignee: Steven J. Hathaway >Priority: Major > Attachments: JI9053942.java, > XALANJ-2617_Fix_missing_surrogate_pairs_support.patch, > XALANJ-2617_java.patch, XALANJ-2617_test.patch > > > When trying to serialize XML with char consisting of unicode surogate char > "\uD840\uDC0B" I have tried several and non worked. XML Transformer creates > XML string with escaped surogate pair separately, which makes XML > unparseable. eg.: SAXParseException; Character reference "" is an > invalid XML character. It looks like a bug introduced in the XALANJ-2271 fix. > > {code:java|title=Output of Xalan ver. 2.7.2} > kec@phoebe:~/Downloads$ java -version > java version "1.8.0_171" > Java(TM) SE Runtime Environment (build 1.8.0_171-b11) > Java HotSpot(TM) 64-Bit Server VM (build 25.171-b11, mixed mode) > kec@phoebe:~/Downloads$ java -cp > /home/kec/.m2/repository/xml-apis/xml-apis/1.4.01/xml-apis-1.4.01.jar:/home/kec/.m2/repository/xalan/xalan/2.7.2/xalan-2.7.2.jar:/home/kec/.m2/repository/xalan/serializer/2.7.2/serializer-2.7.2.jar:. > JI9053942 > Character: > EXPECTED: > ACTUAL: > [Fatal Error] :1:50: Character reference
[jira] [Updated] (XALANJ-2617) Serializer produces separately escaped surrogate pair instead of codepoint
[ https://issues.apache.org/jira/browse/XALANJ-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter De Maeyer updated XALANJ-2617: Attachment: XALANJ-2617_test.patch XALANJ-2617_java.patch > Serializer produces separately escaped surrogate pair instead of codepoint > -- > > Key: XALANJ-2617 > URL: https://issues.apache.org/jira/browse/XALANJ-2617 > Project: XalanJ2 > Issue Type: Bug > Security Level: No security risk; visible to anyone(Ordinary problems in > Xalan projects. Anybody can view the issue.) > Components: Serialization, Xalan >Affects Versions: 2.7.1, 2.7.2 >Reporter: Daniel Kec >Assignee: Steven J. Hathaway >Priority: Major > Attachments: JI9053942.java, > XALANJ-2617_Fix_missing_surrogate_pairs_support.patch, > XALANJ-2617_java.patch, XALANJ-2617_test.patch > > > When trying to serialize XML with char consisting of unicode surogate char > "\uD840\uDC0B" I have tried several and non worked. XML Transformer creates > XML string with escaped surogate pair separately, which makes XML > unparseable. eg.: SAXParseException; Character reference "" is an > invalid XML character. It looks like a bug introduced in the XALANJ-2271 fix. > > {code:java|title=Output of Xalan ver. 2.7.2} > kec@phoebe:~/Downloads$ java -version > java version "1.8.0_171" > Java(TM) SE Runtime Environment (build 1.8.0_171-b11) > Java HotSpot(TM) 64-Bit Server VM (build 25.171-b11, mixed mode) > kec@phoebe:~/Downloads$ java -cp > /home/kec/.m2/repository/xml-apis/xml-apis/1.4.01/xml-apis-1.4.01.jar:/home/kec/.m2/repository/xalan/xalan/2.7.2/xalan-2.7.2.jar:/home/kec/.m2/repository/xalan/serializer/2.7.2/serializer-2.7.2.jar:. > JI9053942 > Character: > EXPECTED: > ACTUAL: > [Fatal Error] :1:50: Character reference
[jira] [Comment Edited] (XALANJ-2617) Serializer produces separately escaped surrogate pair instead of codepoint
[ https://issues.apache.org/jira/browse/XALANJ-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16615312#comment-16615312 ] Peter De Maeyer edited comment on XALANJ-2617 at 9/14/18 8:00 PM: -- Pull request created. Unfortunately, it only contains the fix in production code and not the tests, because there is no repository on Github for the test code. This confuses me a bit - if anyone has a recommendation of what to do with the test, I'd be happy to follow them. What I did: * (/) http://svn.apache.org/repos/asf/xalan/java/trunk is the (ancient) authoritative repository for the production code. This is where I created my production code patch against. * (/) http://svn.apache.org/repos/asf/xalan/test/trunk is the (ancient) authoritative repository for the test code. This is where I created my test code patch against. * (/) The repository for the production code is mirrored on Github: https://github.com/apache/xalan-j. This is where I created a pull request against for my production code patch. * (x) I did not find an equivalent mirror on Github of the repository for the test code, so I can't create a pull request for my test code patch. To complete the story: I successfully ran the minitest and smoketest in the test repository before and after my fix. In order to be able to do this, I recreated an ancient Windows 2000 32-bit system in a VM, capable of running the ancient test harness. Being on a modern Ubuntu Linux 64-bit system, and being spoiled with JUnit, it took some effort to take a step back in time: # Install Windows 2000 32-bit in a VirtualBox VM. # Install 32-bit JDK 1.3 because the Xalan-J sources are -target 1.3 (I know I could have compiled this with a JDK 1.6 as well, but that only applies to the bytecode, it doesn't prevent @Since > 1.3 API usage). # Familiarize myself with the really clunky and ancient test harness (being used to JUnit). Forgive me if this explanation is overly verbose, but I'm trying to illustrate that I didn't make this patch in a hurry, I was being thorough. was (Author: peterdm): Pull request created. Unfortunately, it only contains the fix in production code and not the tests, because there is no repository on Github for the test code. This confuses me a bit - if anyone has a recommendation of what to do with the test, I'd be happy to follow them. My understanding of things (correct me if I'm wrong): * (/) http://svn.apache.org/repos/asf/xalan/java/trunk is the (ancient) authoritative repository for the production code. This is where I created my production code patch against. * (/) http://svn.apache.org/repos/asf/xalan/test/trunk is the (ancient) authoritative repository for the test code. This is where I created my test code patch against. * (/) The repository for the production code is mirrored on Github: https://github.com/apache/xalan-j. This is where I created a pull request against for my production code patch. * (x) I did not find an equivalent mirror on Github of the repository for the test code, so I can't create a pull request for my test code patch. To complete the story: I successfully ran the minitest and smoketest in the test repository before and after my fix. In order to be able to do this, I recreated an ancient Windows 2000 32-bit system in a VM, capable of running the ancient test harness. Being on a modern Ubuntu Linux 64-bit system, and being spoiled with JUnit, it took some effort to take a step back in time: # Install Windows 2000 32-bit in a VirtualBox VM. # Install 32-bit JDK 1.3 because the Xalan-J sources are -target 1.3 (I know I could have compiled this with a JDK 1.6 as well, but that only applies to the bytecode, it doesn't prevent @Since > 1.3 API usage). # Familiarize myself with the really clunky and ancient test harness (being used to JUnit). Forgive me if this explanation is overly verbose, but I'm trying to illustrate that I didn't make this patch in a hurry, I was being thorough. > Serializer produces separately escaped surrogate pair instead of codepoint > -- > > Key: XALANJ-2617 > URL: https://issues.apache.org/jira/browse/XALANJ-2617 > Project: XalanJ2 > Issue Type: Bug > Security Level: No security risk; visible to anyone(Ordinary problems in > Xalan projects. Anybody can view the issue.) > Components: Serialization, Xalan >Affects Versions: 2.7.1, 2.7.2 >Reporter: Daniel Kec >Assignee: Steven J. Hathaway >Priority: Major > Attachments: JI9053942.java, > XALANJ-2617_Fix_missing_surrogate_pairs_support.patch, > XALANJ-2617_java.patch, XALANJ-2617_test.patch > > > When trying to serialize XML with char consisting of unicode surogate char > "\uD840\uDC0B" I have tried several and non worked. XML
[jira] [Comment Edited] (XALANJ-2617) Serializer produces separately escaped surrogate pair instead of codepoint
[ https://issues.apache.org/jira/browse/XALANJ-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16615312#comment-16615312 ] Peter De Maeyer edited comment on XALANJ-2617 at 9/14/18 7:58 PM: -- Pull request created. Unfortunately, it only contains the fix in production code and not the tests, because there is no repository on Github for the test code. This confuses me a bit - if anyone has a recommendation of what to do with the test, I'd be happy to follow them. My understanding of things (correct me if I'm wrong): * (/) http://svn.apache.org/repos/asf/xalan/java/trunk is the (ancient) authoritative repository for the production code. This is where I created my production code patch against. * (/) http://svn.apache.org/repos/asf/xalan/test/trunk is the (ancient) authoritative repository for the test code. This is where I created my test code patch against. * (/) The repository for the production code is mirrored on Github: https://github.com/apache/xalan-j. This is where I created a pull request against for my production code patch. * (x) I did not find an equivalent mirror on Github of the repository for the test code, so I can't create a pull request for my test code patch. To complete the story: I successfully ran the minitest and smoketest in the test repository before and after my fix. In order to be able to do this, I recreated an ancient Windows 2000 32-bit system in a VM, capable of running the ancient test harness. Being on a modern Ubuntu Linux 64-bit system, and being spoiled with JUnit, it took some effort to take a step back in time: # Install Windows 2000 32-bit in a VirtualBox VM. # Install 32-bit JDK 1.3 because the Xalan-J sources are -target 1.3 (I know I could have compiled this with a JDK 1.6 as well, but that only applies to the bytecode, it doesn't prevent @Since > 1.3 API usage). # Familiarize myself with the really clunky and ancient test harness (being used to JUnit). Forgive me if this explanation is overly verbose, but I'm trying to illustrate that I didn't make this patch in a hurry, I was being thorough. was (Author: peterdm): Pull request created. Unfortunately, it only contains the fix in production code and not the tests, because there is no repository on Github for the test code. This confuses me a bit - if anyone has a recommendation of what to do with the test, I'd be happy to follow them. My understanding of things (correct me if I'm wrong): * (/) http://svn.apache.org/repos/asf/xalan/java/trunk is the (ancient) authoritative repository for the production code. This is where I created my production code patch against. * (/) http://svn.apache.org/repos/asf/xalan/test/trunk is the (ancient) authoritative repository for the test code. This is where I created my test code patch against. * (/) The repository for the production code is mirrored on Github: https://github.com/apache/xalan-j. This is where I created a pull request against for my production code patch. * (x) I did not find an equivalent mirror on Github of the repository for the test code, so I can't create a pull request for my test code patch. To complete the story: I successfully ran the minitest and smoketest in the test repository before and after my fix. In order to be able to do this, I recreated an ancient Windows 2000 32-bit system in a VM, capable of running the ancient test harness. Being on a modern Ubuntu Linux 64-bit system, and being spoiled with JUnit tests, it took some effort to take a step back in time: # Install Windows 2000 32-bit in a VirtualBox VM. # Install 32-bit JDK 1.3 because the Xalan-J sources are -target 1.3 (I know I could have compiled this with a JDK 1.6 as well, but that only applies to the bytecode, it doesn't prevent @Since > 1.3 API usage). # Familiarize myself with the really clunky and ancient test harness (being used to JUnit and Mockito). Forgive me if this explanation is overly verbose, but I'm trying to illustrate that I didn't make this patch in a hurry, I was being thorough. > Serializer produces separately escaped surrogate pair instead of codepoint > -- > > Key: XALANJ-2617 > URL: https://issues.apache.org/jira/browse/XALANJ-2617 > Project: XalanJ2 > Issue Type: Bug > Security Level: No security risk; visible to anyone(Ordinary problems in > Xalan projects. Anybody can view the issue.) > Components: Serialization, Xalan >Affects Versions: 2.7.1, 2.7.2 >Reporter: Daniel Kec >Assignee: Steven J. Hathaway >Priority: Major > Attachments: JI9053942.java, > XALANJ-2617_Fix_missing_surrogate_pairs_support.patch, > XALANJ-2617_java.patch, XALANJ-2617_test.patch > > > When trying to serialize XML with char consisting of unicode surogate char
[jira] [Commented] (XALANJ-2617) Serializer produces separately escaped surrogate pair instead of codepoint
[ https://issues.apache.org/jira/browse/XALANJ-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16615312#comment-16615312 ] Peter De Maeyer commented on XALANJ-2617: - Pull request created. Unfortunately, it only contains the fix in production code and not the tests, because there is no repository on Github for the test code. This confuses me a bit - if anyone has a recommendation of what to do with the test, I'd be happy to follow them. My understanding of things (correct me if I'm wrong): * (/) http://svn.apache.org/repos/asf/xalan/java/trunk is the (ancient) authoritative repository for the production code. This is where I created my production code patch against. * (/) http://svn.apache.org/repos/asf/xalan/test/trunk is the (ancient) authoritative repository for the test code. This is where I created my test code patch against. * (/) The repository for the production code is mirrored on Github: https://github.com/apache/xalan-j. This is where I created a pull request against for my production code patch. * (x) I did not find an equivalent mirror on Github of the repository for the test code, so I can't create a pull request for my test code patch. To complete the story: I successfully ran the minitest and smoketest in the test repository before and after my fix. In order to be able to do this, I recreated an ancient Windows 2000 32-bit system in a VM, capable of running the ancient test harness. Being on a modern Ubuntu Linux 64-bit system, and being spoiled with JUnit tests, it took some effort to take a step back in time: # Install Windows 2000 32-bit in a VirtualBox VM. # Install 32-bit JDK 1.3 because the Xalan-J sources are -target 1.3 (I know I could have compiled this with a JDK 1.6 as well, but that only applies to the bytecode, it doesn't prevent @Since > 1.3 API usage). # Familiarize myself with the really clunky and ancient test harness (being used to JUnit and Mockito). Forgive me if this explanation is overly verbose, but I'm trying to illustrate that I didn't make this patch in a hurry, I was being thorough. > Serializer produces separately escaped surrogate pair instead of codepoint > -- > > Key: XALANJ-2617 > URL: https://issues.apache.org/jira/browse/XALANJ-2617 > Project: XalanJ2 > Issue Type: Bug > Security Level: No security risk; visible to anyone(Ordinary problems in > Xalan projects. Anybody can view the issue.) > Components: Serialization, Xalan >Affects Versions: 2.7.1, 2.7.2 >Reporter: Daniel Kec >Assignee: Steven J. Hathaway >Priority: Major > Attachments: JI9053942.java, > XALANJ-2617_Fix_missing_surrogate_pairs_support.patch, > XALANJ-2617_java.patch, XALANJ-2617_test.patch > > > When trying to serialize XML with char consisting of unicode surogate char > "\uD840\uDC0B" I have tried several and non worked. XML Transformer creates > XML string with escaped surogate pair separately, which makes XML > unparseable. eg.: SAXParseException; Character reference "" is an > invalid XML character. It looks like a bug introduced in the XALANJ-2271 fix. > > {code:java|title=Output of Xalan ver. 2.7.2} > kec@phoebe:~/Downloads$ java -version > java version "1.8.0_171" > Java(TM) SE Runtime Environment (build 1.8.0_171-b11) > Java HotSpot(TM) 64-Bit Server VM (build 25.171-b11, mixed mode) > kec@phoebe:~/Downloads$ java -cp > /home/kec/.m2/repository/xml-apis/xml-apis/1.4.01/xml-apis-1.4.01.jar:/home/kec/.m2/repository/xalan/xalan/2.7.2/xalan-2.7.2.jar:/home/kec/.m2/repository/xalan/serializer/2.7.2/serializer-2.7.2.jar:. > JI9053942 > Character: > EXPECTED: > ACTUAL: > [Fatal Error] :1:50: Character reference
[jira] [Comment Edited] (XALANJ-2617) Serializer produces separately escaped surrogate pair instead of codepoint
[ https://issues.apache.org/jira/browse/XALANJ-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16612680#comment-16612680 ] Peter De Maeyer edited comment on XALANJ-2617 at 9/14/18 7:08 PM: -- It can be proven with a unit test that Daniel's fix breaks some scenarios that used to work. As I suspected, the "if" has to be an "else if". I've attached my own new patch + unit tests. Note that there are patches spaning 2 repositories: * {{XALANJ-2617_java.patch}} contains the fix in java code relative to [http://svn.apache.org/repos/asf/xalan/java/trunk,] * {{XALANJ-2617_test.patch}} contains the unit test relative to [http://svn.apache.org/repos/asf/xalan/test/trunk|http://svn.apache.org/repos/asf/xalan/test/trunk.]. Here is the essence of the test code: {code:java} /** * This test case illustrates the original problem with high-surrogate characters. * This is broken in Xalan 2.7.2, hence the need for a fix. */ public void serializationOfHighSurrogateCharactersInUtf8() throws Throwable { reporter.testCaseInit("serializationOfHighSurrogateCharactersInUtf8"); try { String value = "\uD840\uDC0B"; serializationOf(value,
[jira] [Updated] (XALANJ-2617) Serializer produces separately escaped surrogate pair instead of codepoint
[ https://issues.apache.org/jira/browse/XALANJ-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter De Maeyer updated XALANJ-2617: Attachment: XALANJ-2617_java.patch XALANJ-2617_test.patch > Serializer produces separately escaped surrogate pair instead of codepoint > -- > > Key: XALANJ-2617 > URL: https://issues.apache.org/jira/browse/XALANJ-2617 > Project: XalanJ2 > Issue Type: Bug > Security Level: No security risk; visible to anyone(Ordinary problems in > Xalan projects. Anybody can view the issue.) > Components: Serialization, Xalan >Affects Versions: 2.7.1, 2.7.2 >Reporter: Daniel Kec >Assignee: Steven J. Hathaway >Priority: Major > Attachments: JI9053942.java, > XALANJ-2617_Fix_missing_surrogate_pairs_support.patch, > XALANJ-2617_java.patch, XALANJ-2617_test.patch > > > When trying to serialize XML with char consisting of unicode surogate char > "\uD840\uDC0B" I have tried several and non worked. XML Transformer creates > XML string with escaped surogate pair separately, which makes XML > unparseable. eg.: SAXParseException; Character reference "" is an > invalid XML character. It looks like a bug introduced in the XALANJ-2271 fix. > > {code:java|title=Output of Xalan ver. 2.7.2} > kec@phoebe:~/Downloads$ java -version > java version "1.8.0_171" > Java(TM) SE Runtime Environment (build 1.8.0_171-b11) > Java HotSpot(TM) 64-Bit Server VM (build 25.171-b11, mixed mode) > kec@phoebe:~/Downloads$ java -cp > /home/kec/.m2/repository/xml-apis/xml-apis/1.4.01/xml-apis-1.4.01.jar:/home/kec/.m2/repository/xalan/xalan/2.7.2/xalan-2.7.2.jar:/home/kec/.m2/repository/xalan/serializer/2.7.2/serializer-2.7.2.jar:. > JI9053942 > Character: > EXPECTED: > ACTUAL: > [Fatal Error] :1:50: Character reference
[jira] [Updated] (XALANJ-2617) Serializer produces separately escaped surrogate pair instead of codepoint
[ https://issues.apache.org/jira/browse/XALANJ-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter De Maeyer updated XALANJ-2617: Attachment: (was: XALANJ-2617_Fix_missing_surrogate_pairs_support_new.patch) > Serializer produces separately escaped surrogate pair instead of codepoint > -- > > Key: XALANJ-2617 > URL: https://issues.apache.org/jira/browse/XALANJ-2617 > Project: XalanJ2 > Issue Type: Bug > Security Level: No security risk; visible to anyone(Ordinary problems in > Xalan projects. Anybody can view the issue.) > Components: Serialization, Xalan >Affects Versions: 2.7.1, 2.7.2 >Reporter: Daniel Kec >Assignee: Steven J. Hathaway >Priority: Major > Attachments: JI9053942.java, > XALANJ-2617_Fix_missing_surrogate_pairs_support.patch > > > When trying to serialize XML with char consisting of unicode surogate char > "\uD840\uDC0B" I have tried several and non worked. XML Transformer creates > XML string with escaped surogate pair separately, which makes XML > unparseable. eg.: SAXParseException; Character reference "" is an > invalid XML character. It looks like a bug introduced in the XALANJ-2271 fix. > > {code:java|title=Output of Xalan ver. 2.7.2} > kec@phoebe:~/Downloads$ java -version > java version "1.8.0_171" > Java(TM) SE Runtime Environment (build 1.8.0_171-b11) > Java HotSpot(TM) 64-Bit Server VM (build 25.171-b11, mixed mode) > kec@phoebe:~/Downloads$ java -cp > /home/kec/.m2/repository/xml-apis/xml-apis/1.4.01/xml-apis-1.4.01.jar:/home/kec/.m2/repository/xalan/xalan/2.7.2/xalan-2.7.2.jar:/home/kec/.m2/repository/xalan/serializer/2.7.2/serializer-2.7.2.jar:. > JI9053942 > Character: > EXPECTED: > ACTUAL: > [Fatal Error] :1:50: Character reference
[jira] [Comment Edited] (XALANJ-2617) Serializer produces separately escaped surrogate pair instead of codepoint
[ https://issues.apache.org/jira/browse/XALANJ-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16597890#comment-16597890 ] Peter De Maeyer edited comment on XALANJ-2617 at 9/13/18 6:46 AM: -- [~danielkec], I reviewed your patch, but I wonder if the 'if' statement you added shouldn't have been an 'if else' statement. The reason I wonder is that the 'else if' and 'else' branches that immediately follow your changes now get a different meaning. Especially the 'else' branch that has a code comment "// This is a fallback plan, we should never get here" unsettles me. I can imagine that some _new_ scenario is indeed fixed, but I'm worried that some _existing_ scenarios (which were supposed to trigger this "fallback plan") might be broken. The fact that I did not find any unit tests to illustrate those scenarios does not help take away my concern either. I'm not familiar with the Xalan codebase (I am very familiar with Java code in general though), so maybe I misunderstand a couple of things, I would really appreciate it if you could reassure me this patch is indeed the right fix. Then as a cosmetic side note, I noticed that your patch does not respect the code style of the surrounding code. '{' should not be on a new line, and there are 2 instances of a ',' that aren't followed by a space as they should. Call me pedantic, but I'm just being thorough here. was (Author: peterdm): [~danielkec], I reviewed your patch, but I wonder if the 'if' statement you added shouldn't have been an 'if else' statement. The reason I wonder is that the 'else if' and 'else' branches that immediately follow your changes now get a different meaning. Especially the 'else' branch that has a code comment "// This is a fallback plan, we should never get here" unsettles me. I can imagine that some _new_ scenario is indeed fixed, but I'm worried that some _existing_ scenarios (which were supposed to trigger this "fallback plan") might be broken. The fact that I did not find any unit tests to illustrate those scenarios does not help take away my concern either. I'm not familiar with the Xalan codebase (I am very familiar with Java code in general though), so maybe I misunderstand a couple of things, it would really appreciate it if you could reassure me this patch is indeed the right fix. Then as a cosmetic side note, I noticed that your patch does not respect the code style of the surrounding code. '{' should not be on a new line, and there are 2 instances of a ',' that aren't followed by a space as they should. Call me pedantic, but I'm just being thorough here. > Serializer produces separately escaped surrogate pair instead of codepoint > -- > > Key: XALANJ-2617 > URL: https://issues.apache.org/jira/browse/XALANJ-2617 > Project: XalanJ2 > Issue Type: Bug > Security Level: No security risk; visible to anyone(Ordinary problems in > Xalan projects. Anybody can view the issue.) > Components: Serialization, Xalan >Affects Versions: 2.7.1, 2.7.2 >Reporter: Daniel Kec >Assignee: Steven J. Hathaway >Priority: Major > Attachments: JI9053942.java, > XALANJ-2617_Fix_missing_surrogate_pairs_support.patch, > XALANJ-2617_Fix_missing_surrogate_pairs_support_new.patch > > > When trying to serialize XML with char consisting of unicode surogate char > "\uD840\uDC0B" I have tried several and non worked. XML Transformer creates > XML string with escaped surogate pair separately, which makes XML > unparseable. eg.: SAXParseException; Character reference "" is an > invalid XML character. It looks like a bug introduced in the XALANJ-2271 fix. > > {code:java|title=Output of Xalan ver. 2.7.2} > kec@phoebe:~/Downloads$ java -version > java version "1.8.0_171" > Java(TM) SE Runtime Environment (build 1.8.0_171-b11) > Java HotSpot(TM) 64-Bit Server VM (build 25.171-b11, mixed mode) > kec@phoebe:~/Downloads$ java -cp > /home/kec/.m2/repository/xml-apis/xml-apis/1.4.01/xml-apis-1.4.01.jar:/home/kec/.m2/repository/xalan/xalan/2.7.2/xalan-2.7.2.jar:/home/kec/.m2/repository/xalan/serializer/2.7.2/serializer-2.7.2.jar:. > JI9053942 > Character: > EXPECTED: > ACTUAL: > [Fatal Error] :1:50: Character reference
[jira] [Comment Edited] (XALANJ-2617) Serializer produces separately escaped surrogate pair instead of codepoint
[ https://issues.apache.org/jira/browse/XALANJ-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16612680#comment-16612680 ] Peter De Maeyer edited comment on XALANJ-2617 at 9/12/18 8:18 PM: -- It can be proven with a unit test that Daniel's fix breaks some scenarios that used to work. As I suspected, the "if" has to be an "else if". I've attached my own new patch + unit tests. Note that the patch spans 2 repositories: the fix is relative to [http://svn.apache.org/repos/asf/xalan/java/trunk,] the unit test is relative to [http://svn.apache.org/repos/asf/xalan/test/trunk|http://svn.apache.org/repos/asf/xalan/test/trunk.]. Here is the essence of the test code: {code:java} /** * This test case illustrates the original problem with high-surrogate characters. * This is broken in Xalan 2.7.2, hence the need for a fix. */ public void serializationOfHighSurrogateCharactersInUtf8() throws Throwable { reporter.testCaseInit("serializationOfHighSurrogateCharactersInUtf8"); try { String value = "\uD840\uDC0B"; serializationOf(value,
[jira] [Commented] (XALANJ-2617) Serializer produces separately escaped surrogate pair instead of codepoint
[ https://issues.apache.org/jira/browse/XALANJ-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16612680#comment-16612680 ] Peter De Maeyer commented on XALANJ-2617: - It can be proven with a unit test that Daniel's fix breaks some scenarios that used to work. As I suspected, the "if" has to be an "else if". I've attached my own new patch + unit tests. Note that the patch spans 2 repositories: the fix is relative to [http://svn.apache.org/repos/asf/xalan/java/trunk,] the unit test is relative to [http://svn.apache.org/repos/asf/xalan/test/trunk|http://svn.apache.org/repos/asf/xalan/test/trunk.]. Just in case the patch isn't readable, this is essence of the test code: {code:java} /** * This test case illustrates the original problem with high-surrogate characters. * This is broken in Xalan 2.7.2, hence the need for a fix. */ public void serializationOfHighSurrogateCharactersInUtf8() throws Throwable { reporter.testCaseInit("serializationOfHighSurrogateCharactersInUtf8"); try { String value = "\uD840\uDC0B"; serializationOf(value,
[jira] [Updated] (XALANJ-2617) Serializer produces separately escaped surrogate pair instead of codepoint
[ https://issues.apache.org/jira/browse/XALANJ-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter De Maeyer updated XALANJ-2617: Attachment: XALANJ-2617_Fix_missing_surrogate_pairs_support_new.patch > Serializer produces separately escaped surrogate pair instead of codepoint > -- > > Key: XALANJ-2617 > URL: https://issues.apache.org/jira/browse/XALANJ-2617 > Project: XalanJ2 > Issue Type: Bug > Security Level: No security risk; visible to anyone(Ordinary problems in > Xalan projects. Anybody can view the issue.) > Components: Serialization, Xalan >Affects Versions: 2.7.1, 2.7.2 >Reporter: Daniel Kec >Assignee: Steven J. Hathaway >Priority: Major > Attachments: JI9053942.java, > XALANJ-2617_Fix_missing_surrogate_pairs_support.patch, > XALANJ-2617_Fix_missing_surrogate_pairs_support_new.patch > > > When trying to serialize XML with char consisting of unicode surogate char > "\uD840\uDC0B" I have tried several and non worked. XML Transformer creates > XML string with escaped surogate pair separately, which makes XML > unparseable. eg.: SAXParseException; Character reference "" is an > invalid XML character. It looks like a bug introduced in the XALANJ-2271 fix. > > {code:java|title=Output of Xalan ver. 2.7.2} > kec@phoebe:~/Downloads$ java -version > java version "1.8.0_171" > Java(TM) SE Runtime Environment (build 1.8.0_171-b11) > Java HotSpot(TM) 64-Bit Server VM (build 25.171-b11, mixed mode) > kec@phoebe:~/Downloads$ java -cp > /home/kec/.m2/repository/xml-apis/xml-apis/1.4.01/xml-apis-1.4.01.jar:/home/kec/.m2/repository/xalan/xalan/2.7.2/xalan-2.7.2.jar:/home/kec/.m2/repository/xalan/serializer/2.7.2/serializer-2.7.2.jar:. > JI9053942 > Character: > EXPECTED: > ACTUAL: > [Fatal Error] :1:50: Character reference
[jira] [Commented] (XALANJ-2617) Serializer produces separately escaped surrogate pair instead of codepoint
[ https://issues.apache.org/jira/browse/XALANJ-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16597890#comment-16597890 ] Peter De Maeyer commented on XALANJ-2617: - [~danielkec], I reviewed your patch, but I wonder if the 'if' statement you added shouldn't have been an 'if else' statement. The reason I wonder is that the 'else if' and 'else' branches that immediately follow your changes now get a different meaning. Especially the 'else' branch that has a code comment "// This is a fallback plan, we should never get here" unsettles me. I can imagine that some _new_ scenario is indeed fixed, but I'm worried that some _existing_ scenarios (which were supposed to trigger this "fallback plan") might be broken. The fact that I did not find any unit tests to illustrate those scenarios does not help take away my concern either. I'm not familiar with the Xalan codebase (I am very familiar with Java code in general though), so maybe I misunderstand a couple of things, it would really appreciate it if you could reassure me this patch is indeed the right fix. Then as a cosmetic side note, I noticed that your patch does not respect the code style of the surrounding code. '{' should not be on a new line, and there are 2 instances of a ',' that aren't followed by a space as they should. Call me pedantic, but I'm just being thorough here. > Serializer produces separately escaped surrogate pair instead of codepoint > -- > > Key: XALANJ-2617 > URL: https://issues.apache.org/jira/browse/XALANJ-2617 > Project: XalanJ2 > Issue Type: Bug > Security Level: No security risk; visible to anyone(Ordinary problems in > Xalan projects. Anybody can view the issue.) > Components: Serialization, Xalan >Affects Versions: 2.7.1, 2.7.2 >Reporter: Daniel Kec >Assignee: Steven J. Hathaway >Priority: Major > Attachments: JI9053942.java, > XALANJ-2617_Fix_missing_surrogate_pairs_support.patch > > > When trying to serialize XML with char consisting of unicode surogate char > "\uD840\uDC0B" I have tried several and non worked. XML Transformer creates > XML string with escaped surogate pair separately, which makes XML > unparseable. eg.: SAXParseException; Character reference "" is an > invalid XML character. It looks like a bug introduced in the XALANJ-2271 fix. > > {code:java|title=Output of Xalan ver. 2.7.2} > kec@phoebe:~/Downloads$ java -version > java version "1.8.0_171" > Java(TM) SE Runtime Environment (build 1.8.0_171-b11) > Java HotSpot(TM) 64-Bit Server VM (build 25.171-b11, mixed mode) > kec@phoebe:~/Downloads$ java -cp > /home/kec/.m2/repository/xml-apis/xml-apis/1.4.01/xml-apis-1.4.01.jar:/home/kec/.m2/repository/xalan/xalan/2.7.2/xalan-2.7.2.jar:/home/kec/.m2/repository/xalan/serializer/2.7.2/serializer-2.7.2.jar:. > JI9053942 > Character: > EXPECTED: > ACTUAL: > [Fatal Error] :1:50: Character reference