[jira] [Comment Edited] (XALANJ-2617) Serializer produces separately escaped surrogate pair instead of codepoint
[ https://issues.apache.org/jira/browse/XALANJ-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16615312#comment-16615312 ] Peter De Maeyer edited comment on XALANJ-2617 at 9/14/18 8:00 PM: -- Pull request created. Unfortunately, it only contains the fix in production code and not the tests, because there is no repository on Github for the test code. This confuses me a bit - if anyone has a recommendation of what to do with the test, I'd be happy to follow them. What I did: * (/) http://svn.apache.org/repos/asf/xalan/java/trunk is the (ancient) authoritative repository for the production code. This is where I created my production code patch against. * (/) http://svn.apache.org/repos/asf/xalan/test/trunk is the (ancient) authoritative repository for the test code. This is where I created my test code patch against. * (/) The repository for the production code is mirrored on Github: https://github.com/apache/xalan-j. This is where I created a pull request against for my production code patch. * (x) I did not find an equivalent mirror on Github of the repository for the test code, so I can't create a pull request for my test code patch. To complete the story: I successfully ran the minitest and smoketest in the test repository before and after my fix. In order to be able to do this, I recreated an ancient Windows 2000 32-bit system in a VM, capable of running the ancient test harness. Being on a modern Ubuntu Linux 64-bit system, and being spoiled with JUnit, it took some effort to take a step back in time: # Install Windows 2000 32-bit in a VirtualBox VM. # Install 32-bit JDK 1.3 because the Xalan-J sources are -target 1.3 (I know I could have compiled this with a JDK 1.6 as well, but that only applies to the bytecode, it doesn't prevent @Since > 1.3 API usage). # Familiarize myself with the really clunky and ancient test harness (being used to JUnit). Forgive me if this explanation is overly verbose, but I'm trying to illustrate that I didn't make this patch in a hurry, I was being thorough. was (Author: peterdm): Pull request created. Unfortunately, it only contains the fix in production code and not the tests, because there is no repository on Github for the test code. This confuses me a bit - if anyone has a recommendation of what to do with the test, I'd be happy to follow them. My understanding of things (correct me if I'm wrong): * (/) http://svn.apache.org/repos/asf/xalan/java/trunk is the (ancient) authoritative repository for the production code. This is where I created my production code patch against. * (/) http://svn.apache.org/repos/asf/xalan/test/trunk is the (ancient) authoritative repository for the test code. This is where I created my test code patch against. * (/) The repository for the production code is mirrored on Github: https://github.com/apache/xalan-j. This is where I created a pull request against for my production code patch. * (x) I did not find an equivalent mirror on Github of the repository for the test code, so I can't create a pull request for my test code patch. To complete the story: I successfully ran the minitest and smoketest in the test repository before and after my fix. In order to be able to do this, I recreated an ancient Windows 2000 32-bit system in a VM, capable of running the ancient test harness. Being on a modern Ubuntu Linux 64-bit system, and being spoiled with JUnit, it took some effort to take a step back in time: # Install Windows 2000 32-bit in a VirtualBox VM. # Install 32-bit JDK 1.3 because the Xalan-J sources are -target 1.3 (I know I could have compiled this with a JDK 1.6 as well, but that only applies to the bytecode, it doesn't prevent @Since > 1.3 API usage). # Familiarize myself with the really clunky and ancient test harness (being used to JUnit). Forgive me if this explanation is overly verbose, but I'm trying to illustrate that I didn't make this patch in a hurry, I was being thorough. > Serializer produces separately escaped surrogate pair instead of codepoint > -- > > Key: XALANJ-2617 > URL: https://issues.apache.org/jira/browse/XALANJ-2617 > Project: XalanJ2 > Issue Type: Bug > Security Level: No security risk; visible to anyone(Ordinary problems in > Xalan projects. Anybody can view the issue.) > Components: Serialization, Xalan >Affects Versions: 2.7.1, 2.7.2 >Reporter: Daniel Kec >Assignee: Steven J. Hathaway >Priority: Major > Attachments: JI9053942.java, > XALANJ-2617_Fix_missing_surrogate_pairs_support.patch, > XALANJ-2617_java.patch, XALANJ-2617_test.patch > > > When trying to serialize XML with char consisting of unicode surogate char > "\uD840\uDC0B" I have tried several and non worked. XML
[jira] [Comment Edited] (XALANJ-2617) Serializer produces separately escaped surrogate pair instead of codepoint
[ https://issues.apache.org/jira/browse/XALANJ-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16615312#comment-16615312 ] Peter De Maeyer edited comment on XALANJ-2617 at 9/14/18 7:58 PM: -- Pull request created. Unfortunately, it only contains the fix in production code and not the tests, because there is no repository on Github for the test code. This confuses me a bit - if anyone has a recommendation of what to do with the test, I'd be happy to follow them. My understanding of things (correct me if I'm wrong): * (/) http://svn.apache.org/repos/asf/xalan/java/trunk is the (ancient) authoritative repository for the production code. This is where I created my production code patch against. * (/) http://svn.apache.org/repos/asf/xalan/test/trunk is the (ancient) authoritative repository for the test code. This is where I created my test code patch against. * (/) The repository for the production code is mirrored on Github: https://github.com/apache/xalan-j. This is where I created a pull request against for my production code patch. * (x) I did not find an equivalent mirror on Github of the repository for the test code, so I can't create a pull request for my test code patch. To complete the story: I successfully ran the minitest and smoketest in the test repository before and after my fix. In order to be able to do this, I recreated an ancient Windows 2000 32-bit system in a VM, capable of running the ancient test harness. Being on a modern Ubuntu Linux 64-bit system, and being spoiled with JUnit, it took some effort to take a step back in time: # Install Windows 2000 32-bit in a VirtualBox VM. # Install 32-bit JDK 1.3 because the Xalan-J sources are -target 1.3 (I know I could have compiled this with a JDK 1.6 as well, but that only applies to the bytecode, it doesn't prevent @Since > 1.3 API usage). # Familiarize myself with the really clunky and ancient test harness (being used to JUnit). Forgive me if this explanation is overly verbose, but I'm trying to illustrate that I didn't make this patch in a hurry, I was being thorough. was (Author: peterdm): Pull request created. Unfortunately, it only contains the fix in production code and not the tests, because there is no repository on Github for the test code. This confuses me a bit - if anyone has a recommendation of what to do with the test, I'd be happy to follow them. My understanding of things (correct me if I'm wrong): * (/) http://svn.apache.org/repos/asf/xalan/java/trunk is the (ancient) authoritative repository for the production code. This is where I created my production code patch against. * (/) http://svn.apache.org/repos/asf/xalan/test/trunk is the (ancient) authoritative repository for the test code. This is where I created my test code patch against. * (/) The repository for the production code is mirrored on Github: https://github.com/apache/xalan-j. This is where I created a pull request against for my production code patch. * (x) I did not find an equivalent mirror on Github of the repository for the test code, so I can't create a pull request for my test code patch. To complete the story: I successfully ran the minitest and smoketest in the test repository before and after my fix. In order to be able to do this, I recreated an ancient Windows 2000 32-bit system in a VM, capable of running the ancient test harness. Being on a modern Ubuntu Linux 64-bit system, and being spoiled with JUnit tests, it took some effort to take a step back in time: # Install Windows 2000 32-bit in a VirtualBox VM. # Install 32-bit JDK 1.3 because the Xalan-J sources are -target 1.3 (I know I could have compiled this with a JDK 1.6 as well, but that only applies to the bytecode, it doesn't prevent @Since > 1.3 API usage). # Familiarize myself with the really clunky and ancient test harness (being used to JUnit and Mockito). Forgive me if this explanation is overly verbose, but I'm trying to illustrate that I didn't make this patch in a hurry, I was being thorough. > Serializer produces separately escaped surrogate pair instead of codepoint > -- > > Key: XALANJ-2617 > URL: https://issues.apache.org/jira/browse/XALANJ-2617 > Project: XalanJ2 > Issue Type: Bug > Security Level: No security risk; visible to anyone(Ordinary problems in > Xalan projects. Anybody can view the issue.) > Components: Serialization, Xalan >Affects Versions: 2.7.1, 2.7.2 >Reporter: Daniel Kec >Assignee: Steven J. Hathaway >Priority: Major > Attachments: JI9053942.java, > XALANJ-2617_Fix_missing_surrogate_pairs_support.patch, > XALANJ-2617_java.patch, XALANJ-2617_test.patch > > > When trying to serialize XML with char consisting of unicode surogate char
[jira] [Commented] (XALANJ-2617) Serializer produces separately escaped surrogate pair instead of codepoint
[ https://issues.apache.org/jira/browse/XALANJ-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16615312#comment-16615312 ] Peter De Maeyer commented on XALANJ-2617: - Pull request created. Unfortunately, it only contains the fix in production code and not the tests, because there is no repository on Github for the test code. This confuses me a bit - if anyone has a recommendation of what to do with the test, I'd be happy to follow them. My understanding of things (correct me if I'm wrong): * (/) http://svn.apache.org/repos/asf/xalan/java/trunk is the (ancient) authoritative repository for the production code. This is where I created my production code patch against. * (/) http://svn.apache.org/repos/asf/xalan/test/trunk is the (ancient) authoritative repository for the test code. This is where I created my test code patch against. * (/) The repository for the production code is mirrored on Github: https://github.com/apache/xalan-j. This is where I created a pull request against for my production code patch. * (x) I did not find an equivalent mirror on Github of the repository for the test code, so I can't create a pull request for my test code patch. To complete the story: I successfully ran the minitest and smoketest in the test repository before and after my fix. In order to be able to do this, I recreated an ancient Windows 2000 32-bit system in a VM, capable of running the ancient test harness. Being on a modern Ubuntu Linux 64-bit system, and being spoiled with JUnit tests, it took some effort to take a step back in time: # Install Windows 2000 32-bit in a VirtualBox VM. # Install 32-bit JDK 1.3 because the Xalan-J sources are -target 1.3 (I know I could have compiled this with a JDK 1.6 as well, but that only applies to the bytecode, it doesn't prevent @Since > 1.3 API usage). # Familiarize myself with the really clunky and ancient test harness (being used to JUnit and Mockito). Forgive me if this explanation is overly verbose, but I'm trying to illustrate that I didn't make this patch in a hurry, I was being thorough. > Serializer produces separately escaped surrogate pair instead of codepoint > -- > > Key: XALANJ-2617 > URL: https://issues.apache.org/jira/browse/XALANJ-2617 > Project: XalanJ2 > Issue Type: Bug > Security Level: No security risk; visible to anyone(Ordinary problems in > Xalan projects. Anybody can view the issue.) > Components: Serialization, Xalan >Affects Versions: 2.7.1, 2.7.2 >Reporter: Daniel Kec >Assignee: Steven J. Hathaway >Priority: Major > Attachments: JI9053942.java, > XALANJ-2617_Fix_missing_surrogate_pairs_support.patch, > XALANJ-2617_java.patch, XALANJ-2617_test.patch > > > When trying to serialize XML with char consisting of unicode surogate char > "\uD840\uDC0B" I have tried several and non worked. XML Transformer creates > XML string with escaped surogate pair separately, which makes XML > unparseable. eg.: SAXParseException; Character reference "" is an > invalid XML character. It looks like a bug introduced in the XALANJ-2271 fix. > > {code:java|title=Output of Xalan ver. 2.7.2} > kec@phoebe:~/Downloads$ java -version > java version "1.8.0_171" > Java(TM) SE Runtime Environment (build 1.8.0_171-b11) > Java HotSpot(TM) 64-Bit Server VM (build 25.171-b11, mixed mode) > kec@phoebe:~/Downloads$ java -cp > /home/kec/.m2/repository/xml-apis/xml-apis/1.4.01/xml-apis-1.4.01.jar:/home/kec/.m2/repository/xalan/xalan/2.7.2/xalan-2.7.2.jar:/home/kec/.m2/repository/xalan/serializer/2.7.2/serializer-2.7.2.jar:. > JI9053942 > Character: > EXPECTED: > ACTUAL: > [Fatal Error] :1:50: Character reference
[jira] [Commented] (XALANJ-2617) Serializer produces separately escaped surrogate pair instead of codepoint
[ https://issues.apache.org/jira/browse/XALANJ-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16615284#comment-16615284 ] ASF GitHub Bot commented on XALANJ-2617: GitHub user peterdemaeyer opened a pull request: https://github.com/apache/xalan-j/pull/4 XALANJ-2617 Fixed serializer for high-surrogate UTF-16 characters Fixed serializer such that it correctly deals with high-surrogate UTF-16 characters. This pull request replaces an earlier one from Daniel Kec, see comments on https://issues.apache.org/jira/browse/XALANJ-2617. You can merge this pull request into a Git repository by running: $ git pull https://github.com/peterdemaeyer/xalan-j trunk Alternatively you can review and apply these changes as the patch at: https://github.com/apache/xalan-j/pull/4.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4 commit 8a735e58e6804be1e6a125678d1a8d116ad54651 Author: peterdm Date: 2018-09-14T19:15:32Z XALANJ-2617 Fixed serializer such that it correctly deals with high-surrogate UTF-16 characters > Serializer produces separately escaped surrogate pair instead of codepoint > -- > > Key: XALANJ-2617 > URL: https://issues.apache.org/jira/browse/XALANJ-2617 > Project: XalanJ2 > Issue Type: Bug > Security Level: No security risk; visible to anyone(Ordinary problems in > Xalan projects. Anybody can view the issue.) > Components: Serialization, Xalan >Affects Versions: 2.7.1, 2.7.2 >Reporter: Daniel Kec >Assignee: Steven J. Hathaway >Priority: Major > Attachments: JI9053942.java, > XALANJ-2617_Fix_missing_surrogate_pairs_support.patch, > XALANJ-2617_java.patch, XALANJ-2617_test.patch > > > When trying to serialize XML with char consisting of unicode surogate char > "\uD840\uDC0B" I have tried several and non worked. XML Transformer creates > XML string with escaped surogate pair separately, which makes XML > unparseable. eg.: SAXParseException; Character reference "" is an > invalid XML character. It looks like a bug introduced in the XALANJ-2271 fix. > > {code:java|title=Output of Xalan ver. 2.7.2} > kec@phoebe:~/Downloads$ java -version > java version "1.8.0_171" > Java(TM) SE Runtime Environment (build 1.8.0_171-b11) > Java HotSpot(TM) 64-Bit Server VM (build 25.171-b11, mixed mode) > kec@phoebe:~/Downloads$ java -cp > /home/kec/.m2/repository/xml-apis/xml-apis/1.4.01/xml-apis-1.4.01.jar:/home/kec/.m2/repository/xalan/xalan/2.7.2/xalan-2.7.2.jar:/home/kec/.m2/repository/xalan/serializer/2.7.2/serializer-2.7.2.jar:. > JI9053942 > Character: > EXPECTED: > ACTUAL: > [Fatal Error] :1:50: Character reference
[jira] [Comment Edited] (XALANJ-2617) Serializer produces separately escaped surrogate pair instead of codepoint
[ https://issues.apache.org/jira/browse/XALANJ-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16612680#comment-16612680 ] Peter De Maeyer edited comment on XALANJ-2617 at 9/14/18 7:08 PM: -- It can be proven with a unit test that Daniel's fix breaks some scenarios that used to work. As I suspected, the "if" has to be an "else if". I've attached my own new patch + unit tests. Note that there are patches spaning 2 repositories: * {{XALANJ-2617_java.patch}} contains the fix in java code relative to [http://svn.apache.org/repos/asf/xalan/java/trunk,] * {{XALANJ-2617_test.patch}} contains the unit test relative to [http://svn.apache.org/repos/asf/xalan/test/trunk|http://svn.apache.org/repos/asf/xalan/test/trunk.]. Here is the essence of the test code: {code:java} /** * This test case illustrates the original problem with high-surrogate characters. * This is broken in Xalan 2.7.2, hence the need for a fix. */ public void serializationOfHighSurrogateCharactersInUtf8() throws Throwable { reporter.testCaseInit("serializationOfHighSurrogateCharactersInUtf8"); try { String value = "\uD840\uDC0B"; serializationOf(value,
[jira] [Updated] (XALANJ-2617) Serializer produces separately escaped surrogate pair instead of codepoint
[ https://issues.apache.org/jira/browse/XALANJ-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter De Maeyer updated XALANJ-2617: Attachment: XALANJ-2617_java.patch XALANJ-2617_test.patch > Serializer produces separately escaped surrogate pair instead of codepoint > -- > > Key: XALANJ-2617 > URL: https://issues.apache.org/jira/browse/XALANJ-2617 > Project: XalanJ2 > Issue Type: Bug > Security Level: No security risk; visible to anyone(Ordinary problems in > Xalan projects. Anybody can view the issue.) > Components: Serialization, Xalan >Affects Versions: 2.7.1, 2.7.2 >Reporter: Daniel Kec >Assignee: Steven J. Hathaway >Priority: Major > Attachments: JI9053942.java, > XALANJ-2617_Fix_missing_surrogate_pairs_support.patch, > XALANJ-2617_java.patch, XALANJ-2617_test.patch > > > When trying to serialize XML with char consisting of unicode surogate char > "\uD840\uDC0B" I have tried several and non worked. XML Transformer creates > XML string with escaped surogate pair separately, which makes XML > unparseable. eg.: SAXParseException; Character reference "" is an > invalid XML character. It looks like a bug introduced in the XALANJ-2271 fix. > > {code:java|title=Output of Xalan ver. 2.7.2} > kec@phoebe:~/Downloads$ java -version > java version "1.8.0_171" > Java(TM) SE Runtime Environment (build 1.8.0_171-b11) > Java HotSpot(TM) 64-Bit Server VM (build 25.171-b11, mixed mode) > kec@phoebe:~/Downloads$ java -cp > /home/kec/.m2/repository/xml-apis/xml-apis/1.4.01/xml-apis-1.4.01.jar:/home/kec/.m2/repository/xalan/xalan/2.7.2/xalan-2.7.2.jar:/home/kec/.m2/repository/xalan/serializer/2.7.2/serializer-2.7.2.jar:. > JI9053942 > Character: > EXPECTED: > ACTUAL: > [Fatal Error] :1:50: Character reference
[jira] [Updated] (XALANJ-2617) Serializer produces separately escaped surrogate pair instead of codepoint
[ https://issues.apache.org/jira/browse/XALANJ-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter De Maeyer updated XALANJ-2617: Attachment: (was: XALANJ-2617_Fix_missing_surrogate_pairs_support_new.patch) > Serializer produces separately escaped surrogate pair instead of codepoint > -- > > Key: XALANJ-2617 > URL: https://issues.apache.org/jira/browse/XALANJ-2617 > Project: XalanJ2 > Issue Type: Bug > Security Level: No security risk; visible to anyone(Ordinary problems in > Xalan projects. Anybody can view the issue.) > Components: Serialization, Xalan >Affects Versions: 2.7.1, 2.7.2 >Reporter: Daniel Kec >Assignee: Steven J. Hathaway >Priority: Major > Attachments: JI9053942.java, > XALANJ-2617_Fix_missing_surrogate_pairs_support.patch > > > When trying to serialize XML with char consisting of unicode surogate char > "\uD840\uDC0B" I have tried several and non worked. XML Transformer creates > XML string with escaped surogate pair separately, which makes XML > unparseable. eg.: SAXParseException; Character reference "" is an > invalid XML character. It looks like a bug introduced in the XALANJ-2271 fix. > > {code:java|title=Output of Xalan ver. 2.7.2} > kec@phoebe:~/Downloads$ java -version > java version "1.8.0_171" > Java(TM) SE Runtime Environment (build 1.8.0_171-b11) > Java HotSpot(TM) 64-Bit Server VM (build 25.171-b11, mixed mode) > kec@phoebe:~/Downloads$ java -cp > /home/kec/.m2/repository/xml-apis/xml-apis/1.4.01/xml-apis-1.4.01.jar:/home/kec/.m2/repository/xalan/xalan/2.7.2/xalan-2.7.2.jar:/home/kec/.m2/repository/xalan/serializer/2.7.2/serializer-2.7.2.jar:. > JI9053942 > Character: > EXPECTED: > ACTUAL: > [Fatal Error] :1:50: Character reference