[jira] [Updated] (XALANJ-2617) Serializer produces separately escaped surrogate pair instead of codepoint

2018-10-20 Thread Peter De Maeyer (JIRA)


 [ 
https://issues.apache.org/jira/browse/XALANJ-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter De Maeyer updated XALANJ-2617:

Attachment: (was: XALANJ-2617_java.patch)

> Serializer produces separately escaped surrogate pair instead of codepoint
> --
>
> Key: XALANJ-2617
> URL: https://issues.apache.org/jira/browse/XALANJ-2617
> Project: XalanJ2
>  Issue Type: Bug
>  Security Level: No security risk; visible to anyone(Ordinary problems in 
> Xalan projects.  Anybody can view the issue.) 
>  Components: Serialization, Xalan
>Affects Versions: 2.7.1, 2.7.2
>Reporter: Daniel Kec
>Assignee: Steven J. Hathaway
>Priority: Major
> Attachments: JI9053942.java, 
> XALANJ-2617_Fix_missing_surrogate_pairs_support.patch, 
> XALANJ-2617_java.patch, XALANJ-2617_test.patch
>
>
> When trying to serialize XML with char consisting of unicode surogate char 
> "\uD840\uDC0B" I have tried several and non worked. XML Transformer creates 
> XML string with escaped surogate pair separately, which makes XML 
> unparseable. eg.: SAXParseException; Character reference "" is an 
> invalid XML character. It looks like a bug introduced in the XALANJ-2271 fix.
>  
> {code:java|title=Output of Xalan ver. 2.7.2}
> kec@phoebe:~/Downloads$ java -version
> java version "1.8.0_171"
> Java(TM) SE Runtime Environment (build 1.8.0_171-b11)
> Java HotSpot(TM) 64-Bit Server VM (build 25.171-b11, mixed mode)
> kec@phoebe:~/Downloads$ java -cp 
> /home/kec/.m2/repository/xml-apis/xml-apis/1.4.01/xml-apis-1.4.01.jar:/home/kec/.m2/repository/xalan/xalan/2.7.2/xalan-2.7.2.jar:/home/kec/.m2/repository/xalan/serializer/2.7.2/serializer-2.7.2.jar:.
>  JI9053942
> Character: 
> EXPECTED: 
>  ACTUAL: 
> [Fatal Error] :1:50: Character reference 

[jira] [Updated] (XALANJ-2617) Serializer produces separately escaped surrogate pair instead of codepoint

2018-10-20 Thread Peter De Maeyer (JIRA)


 [ 
https://issues.apache.org/jira/browse/XALANJ-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter De Maeyer updated XALANJ-2617:

Attachment: (was: XALANJ-2617_test.patch)

> Serializer produces separately escaped surrogate pair instead of codepoint
> --
>
> Key: XALANJ-2617
> URL: https://issues.apache.org/jira/browse/XALANJ-2617
> Project: XalanJ2
>  Issue Type: Bug
>  Security Level: No security risk; visible to anyone(Ordinary problems in 
> Xalan projects.  Anybody can view the issue.) 
>  Components: Serialization, Xalan
>Affects Versions: 2.7.1, 2.7.2
>Reporter: Daniel Kec
>Assignee: Steven J. Hathaway
>Priority: Major
> Attachments: JI9053942.java, 
> XALANJ-2617_Fix_missing_surrogate_pairs_support.patch, 
> XALANJ-2617_java.patch, XALANJ-2617_test.patch
>
>
> When trying to serialize XML with char consisting of unicode surogate char 
> "\uD840\uDC0B" I have tried several and non worked. XML Transformer creates 
> XML string with escaped surogate pair separately, which makes XML 
> unparseable. eg.: SAXParseException; Character reference "" is an 
> invalid XML character. It looks like a bug introduced in the XALANJ-2271 fix.
>  
> {code:java|title=Output of Xalan ver. 2.7.2}
> kec@phoebe:~/Downloads$ java -version
> java version "1.8.0_171"
> Java(TM) SE Runtime Environment (build 1.8.0_171-b11)
> Java HotSpot(TM) 64-Bit Server VM (build 25.171-b11, mixed mode)
> kec@phoebe:~/Downloads$ java -cp 
> /home/kec/.m2/repository/xml-apis/xml-apis/1.4.01/xml-apis-1.4.01.jar:/home/kec/.m2/repository/xalan/xalan/2.7.2/xalan-2.7.2.jar:/home/kec/.m2/repository/xalan/serializer/2.7.2/serializer-2.7.2.jar:.
>  JI9053942
> Character: 
> EXPECTED: 
>  ACTUAL: 
> [Fatal Error] :1:50: Character reference 

[jira] [Updated] (XALANJ-2617) Serializer produces separately escaped surrogate pair instead of codepoint

2018-10-20 Thread Peter De Maeyer (JIRA)


 [ 
https://issues.apache.org/jira/browse/XALANJ-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter De Maeyer updated XALANJ-2617:

Attachment: XALANJ-2617_test.patch
XALANJ-2617_java.patch

> Serializer produces separately escaped surrogate pair instead of codepoint
> --
>
> Key: XALANJ-2617
> URL: https://issues.apache.org/jira/browse/XALANJ-2617
> Project: XalanJ2
>  Issue Type: Bug
>  Security Level: No security risk; visible to anyone(Ordinary problems in 
> Xalan projects.  Anybody can view the issue.) 
>  Components: Serialization, Xalan
>Affects Versions: 2.7.1, 2.7.2
>Reporter: Daniel Kec
>Assignee: Steven J. Hathaway
>Priority: Major
> Attachments: JI9053942.java, 
> XALANJ-2617_Fix_missing_surrogate_pairs_support.patch, 
> XALANJ-2617_java.patch, XALANJ-2617_test.patch
>
>
> When trying to serialize XML with char consisting of unicode surogate char 
> "\uD840\uDC0B" I have tried several and non worked. XML Transformer creates 
> XML string with escaped surogate pair separately, which makes XML 
> unparseable. eg.: SAXParseException; Character reference "" is an 
> invalid XML character. It looks like a bug introduced in the XALANJ-2271 fix.
>  
> {code:java|title=Output of Xalan ver. 2.7.2}
> kec@phoebe:~/Downloads$ java -version
> java version "1.8.0_171"
> Java(TM) SE Runtime Environment (build 1.8.0_171-b11)
> Java HotSpot(TM) 64-Bit Server VM (build 25.171-b11, mixed mode)
> kec@phoebe:~/Downloads$ java -cp 
> /home/kec/.m2/repository/xml-apis/xml-apis/1.4.01/xml-apis-1.4.01.jar:/home/kec/.m2/repository/xalan/xalan/2.7.2/xalan-2.7.2.jar:/home/kec/.m2/repository/xalan/serializer/2.7.2/serializer-2.7.2.jar:.
>  JI9053942
> Character: 
> EXPECTED: 
>  ACTUAL: 
> [Fatal Error] :1:50: Character reference 

[jira] [Comment Edited] (XALANJ-2617) Serializer produces separately escaped surrogate pair instead of codepoint

2018-09-14 Thread Peter De Maeyer (JIRA)


[ 
https://issues.apache.org/jira/browse/XALANJ-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16615312#comment-16615312
 ] 

Peter De Maeyer edited comment on XALANJ-2617 at 9/14/18 8:00 PM:
--

Pull request created. Unfortunately, it only contains the fix in production 
code and not the tests, because there is no repository on Github for the test 
code. This confuses me a bit - if anyone has a recommendation of what to do 
with the test, I'd be happy to follow them.

What I did:
* (/) http://svn.apache.org/repos/asf/xalan/java/trunk is the (ancient) 
authoritative repository for the production code. This is where I created my 
production code patch against.
* (/) http://svn.apache.org/repos/asf/xalan/test/trunk is the (ancient) 
authoritative repository for the test code. This is where I created my test 
code patch against.
* (/) The repository for the production code is mirrored on Github: 
https://github.com/apache/xalan-j. This is where I created a pull request 
against for my production code patch.
* (x) I did not find an equivalent mirror on Github of the repository for the 
test code, so I can't create a pull request for my test code patch.

To complete the story: I successfully ran the minitest and smoketest in the 
test repository before and after my fix. In order to be able to do this, I 
recreated an ancient Windows 2000 32-bit system in a VM, capable of running the 
ancient test harness. Being on a modern Ubuntu Linux 64-bit system, and being 
spoiled with JUnit, it took some effort to take a step back in time:
# Install Windows 2000 32-bit in a VirtualBox VM.
# Install 32-bit JDK 1.3 because the Xalan-J sources are -target 1.3 (I know I 
could have compiled this with a JDK 1.6 as well, but that only applies to the 
bytecode, it doesn't prevent @Since > 1.3 API usage).
# Familiarize myself with the really clunky and ancient test harness (being 
used to JUnit).

Forgive me if this explanation is overly verbose, but I'm trying to illustrate 
that I didn't make this patch in a hurry, I was being thorough.


was (Author: peterdm):
Pull request created. Unfortunately, it only contains the fix in production 
code and not the tests, because there is no repository on Github for the test 
code. This confuses me a bit - if anyone has a recommendation of what to do 
with the test, I'd be happy to follow them.

My understanding of things (correct me if I'm wrong):
* (/) http://svn.apache.org/repos/asf/xalan/java/trunk is the (ancient) 
authoritative repository for the production code. This is where I created my 
production code patch against.
* (/) http://svn.apache.org/repos/asf/xalan/test/trunk is the (ancient) 
authoritative repository for the test code. This is where I created my test 
code patch against.
* (/) The repository for the production code is mirrored on Github: 
https://github.com/apache/xalan-j. This is where I created a pull request 
against for my production code patch.
* (x) I did not find an equivalent mirror on Github of the repository for the 
test code, so I can't create a pull request for my test code patch.

To complete the story: I successfully ran the minitest and smoketest in the 
test repository before and after my fix. In order to be able to do this, I 
recreated an ancient Windows 2000 32-bit system in a VM, capable of running the 
ancient test harness. Being on a modern Ubuntu Linux 64-bit system, and being 
spoiled with JUnit, it took some effort to take a step back in time:
# Install Windows 2000 32-bit in a VirtualBox VM.
# Install 32-bit JDK 1.3 because the Xalan-J sources are -target 1.3 (I know I 
could have compiled this with a JDK 1.6 as well, but that only applies to the 
bytecode, it doesn't prevent @Since > 1.3 API usage).
# Familiarize myself with the really clunky and ancient test harness (being 
used to JUnit).

Forgive me if this explanation is overly verbose, but I'm trying to illustrate 
that I didn't make this patch in a hurry, I was being thorough.

> Serializer produces separately escaped surrogate pair instead of codepoint
> --
>
> Key: XALANJ-2617
> URL: https://issues.apache.org/jira/browse/XALANJ-2617
> Project: XalanJ2
>  Issue Type: Bug
>  Security Level: No security risk; visible to anyone(Ordinary problems in 
> Xalan projects.  Anybody can view the issue.) 
>  Components: Serialization, Xalan
>Affects Versions: 2.7.1, 2.7.2
>Reporter: Daniel Kec
>Assignee: Steven J. Hathaway
>Priority: Major
> Attachments: JI9053942.java, 
> XALANJ-2617_Fix_missing_surrogate_pairs_support.patch, 
> XALANJ-2617_java.patch, XALANJ-2617_test.patch
>
>
> When trying to serialize XML with char consisting of unicode surogate char 
> "\uD840\uDC0B" I have tried several and non worked. XML 

[jira] [Comment Edited] (XALANJ-2617) Serializer produces separately escaped surrogate pair instead of codepoint

2018-09-14 Thread Peter De Maeyer (JIRA)


[ 
https://issues.apache.org/jira/browse/XALANJ-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16615312#comment-16615312
 ] 

Peter De Maeyer edited comment on XALANJ-2617 at 9/14/18 7:58 PM:
--

Pull request created. Unfortunately, it only contains the fix in production 
code and not the tests, because there is no repository on Github for the test 
code. This confuses me a bit - if anyone has a recommendation of what to do 
with the test, I'd be happy to follow them.

My understanding of things (correct me if I'm wrong):
* (/) http://svn.apache.org/repos/asf/xalan/java/trunk is the (ancient) 
authoritative repository for the production code. This is where I created my 
production code patch against.
* (/) http://svn.apache.org/repos/asf/xalan/test/trunk is the (ancient) 
authoritative repository for the test code. This is where I created my test 
code patch against.
* (/) The repository for the production code is mirrored on Github: 
https://github.com/apache/xalan-j. This is where I created a pull request 
against for my production code patch.
* (x) I did not find an equivalent mirror on Github of the repository for the 
test code, so I can't create a pull request for my test code patch.

To complete the story: I successfully ran the minitest and smoketest in the 
test repository before and after my fix. In order to be able to do this, I 
recreated an ancient Windows 2000 32-bit system in a VM, capable of running the 
ancient test harness. Being on a modern Ubuntu Linux 64-bit system, and being 
spoiled with JUnit, it took some effort to take a step back in time:
# Install Windows 2000 32-bit in a VirtualBox VM.
# Install 32-bit JDK 1.3 because the Xalan-J sources are -target 1.3 (I know I 
could have compiled this with a JDK 1.6 as well, but that only applies to the 
bytecode, it doesn't prevent @Since > 1.3 API usage).
# Familiarize myself with the really clunky and ancient test harness (being 
used to JUnit).

Forgive me if this explanation is overly verbose, but I'm trying to illustrate 
that I didn't make this patch in a hurry, I was being thorough.


was (Author: peterdm):
Pull request created. Unfortunately, it only contains the fix in production 
code and not the tests, because there is no repository on Github for the test 
code. This confuses me a bit - if anyone has a recommendation of what to do 
with the test, I'd be happy to follow them.

My understanding of things (correct me if I'm wrong):
* (/) http://svn.apache.org/repos/asf/xalan/java/trunk is the (ancient) 
authoritative repository for the production code. This is where I created my 
production code patch against.
* (/) http://svn.apache.org/repos/asf/xalan/test/trunk is the (ancient) 
authoritative repository for the test code. This is where I created my test 
code patch against.
* (/) The repository for the production code is mirrored on Github: 
https://github.com/apache/xalan-j. This is where I created a pull request 
against for my production code patch.
* (x) I did not find an equivalent mirror on Github of the repository for the 
test code, so I can't create a pull request for my test code patch.

To complete the story: I successfully ran the minitest and smoketest in the 
test repository before and after my fix. In order to be able to do this, I 
recreated an ancient Windows 2000 32-bit system in a VM, capable of running the 
ancient test harness. Being on a modern Ubuntu Linux 64-bit system, and being 
spoiled with JUnit tests, it took some effort to take a step back in time:
# Install Windows 2000 32-bit in a VirtualBox VM.
# Install 32-bit JDK 1.3 because the Xalan-J sources are -target 1.3 (I know I 
could have compiled this with a JDK 1.6 as well, but that only applies to the 
bytecode, it doesn't prevent @Since > 1.3 API usage).
# Familiarize myself with the really clunky and ancient test harness (being 
used to JUnit and Mockito).

Forgive me if this explanation is overly verbose, but I'm trying to illustrate 
that I didn't make this patch in a hurry, I was being thorough.

> Serializer produces separately escaped surrogate pair instead of codepoint
> --
>
> Key: XALANJ-2617
> URL: https://issues.apache.org/jira/browse/XALANJ-2617
> Project: XalanJ2
>  Issue Type: Bug
>  Security Level: No security risk; visible to anyone(Ordinary problems in 
> Xalan projects.  Anybody can view the issue.) 
>  Components: Serialization, Xalan
>Affects Versions: 2.7.1, 2.7.2
>Reporter: Daniel Kec
>Assignee: Steven J. Hathaway
>Priority: Major
> Attachments: JI9053942.java, 
> XALANJ-2617_Fix_missing_surrogate_pairs_support.patch, 
> XALANJ-2617_java.patch, XALANJ-2617_test.patch
>
>
> When trying to serialize XML with char consisting of unicode surogate char 

[jira] [Commented] (XALANJ-2617) Serializer produces separately escaped surrogate pair instead of codepoint

2018-09-14 Thread Peter De Maeyer (JIRA)


[ 
https://issues.apache.org/jira/browse/XALANJ-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16615312#comment-16615312
 ] 

Peter De Maeyer commented on XALANJ-2617:
-

Pull request created. Unfortunately, it only contains the fix in production 
code and not the tests, because there is no repository on Github for the test 
code. This confuses me a bit - if anyone has a recommendation of what to do 
with the test, I'd be happy to follow them.

My understanding of things (correct me if I'm wrong):
* (/) http://svn.apache.org/repos/asf/xalan/java/trunk is the (ancient) 
authoritative repository for the production code. This is where I created my 
production code patch against.
* (/) http://svn.apache.org/repos/asf/xalan/test/trunk is the (ancient) 
authoritative repository for the test code. This is where I created my test 
code patch against.
* (/) The repository for the production code is mirrored on Github: 
https://github.com/apache/xalan-j. This is where I created a pull request 
against for my production code patch.
* (x) I did not find an equivalent mirror on Github of the repository for the 
test code, so I can't create a pull request for my test code patch.

To complete the story: I successfully ran the minitest and smoketest in the 
test repository before and after my fix. In order to be able to do this, I 
recreated an ancient Windows 2000 32-bit system in a VM, capable of running the 
ancient test harness. Being on a modern Ubuntu Linux 64-bit system, and being 
spoiled with JUnit tests, it took some effort to take a step back in time:
# Install Windows 2000 32-bit in a VirtualBox VM.
# Install 32-bit JDK 1.3 because the Xalan-J sources are -target 1.3 (I know I 
could have compiled this with a JDK 1.6 as well, but that only applies to the 
bytecode, it doesn't prevent @Since > 1.3 API usage).
# Familiarize myself with the really clunky and ancient test harness (being 
used to JUnit and Mockito).

Forgive me if this explanation is overly verbose, but I'm trying to illustrate 
that I didn't make this patch in a hurry, I was being thorough.

> Serializer produces separately escaped surrogate pair instead of codepoint
> --
>
> Key: XALANJ-2617
> URL: https://issues.apache.org/jira/browse/XALANJ-2617
> Project: XalanJ2
>  Issue Type: Bug
>  Security Level: No security risk; visible to anyone(Ordinary problems in 
> Xalan projects.  Anybody can view the issue.) 
>  Components: Serialization, Xalan
>Affects Versions: 2.7.1, 2.7.2
>Reporter: Daniel Kec
>Assignee: Steven J. Hathaway
>Priority: Major
> Attachments: JI9053942.java, 
> XALANJ-2617_Fix_missing_surrogate_pairs_support.patch, 
> XALANJ-2617_java.patch, XALANJ-2617_test.patch
>
>
> When trying to serialize XML with char consisting of unicode surogate char 
> "\uD840\uDC0B" I have tried several and non worked. XML Transformer creates 
> XML string with escaped surogate pair separately, which makes XML 
> unparseable. eg.: SAXParseException; Character reference "" is an 
> invalid XML character. It looks like a bug introduced in the XALANJ-2271 fix.
>  
> {code:java|title=Output of Xalan ver. 2.7.2}
> kec@phoebe:~/Downloads$ java -version
> java version "1.8.0_171"
> Java(TM) SE Runtime Environment (build 1.8.0_171-b11)
> Java HotSpot(TM) 64-Bit Server VM (build 25.171-b11, mixed mode)
> kec@phoebe:~/Downloads$ java -cp 
> /home/kec/.m2/repository/xml-apis/xml-apis/1.4.01/xml-apis-1.4.01.jar:/home/kec/.m2/repository/xalan/xalan/2.7.2/xalan-2.7.2.jar:/home/kec/.m2/repository/xalan/serializer/2.7.2/serializer-2.7.2.jar:.
>  JI9053942
> Character: 
> EXPECTED: 
>  ACTUAL: 
> [Fatal Error] :1:50: Character reference 

[jira] [Comment Edited] (XALANJ-2617) Serializer produces separately escaped surrogate pair instead of codepoint

2018-09-14 Thread Peter De Maeyer (JIRA)


[ 
https://issues.apache.org/jira/browse/XALANJ-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16612680#comment-16612680
 ] 

Peter De Maeyer edited comment on XALANJ-2617 at 9/14/18 7:08 PM:
--

It can be proven with a unit test that Daniel's fix breaks some scenarios that 
used to work. As I suspected, the "if" has to be an "else if". I've attached my 
own new patch + unit tests.

Note that there are patches spaning 2 repositories:
* {{XALANJ-2617_java.patch}} contains the fix in java code relative to 
[http://svn.apache.org/repos/asf/xalan/java/trunk,]
* {{XALANJ-2617_test.patch}} contains the unit test relative to 
[http://svn.apache.org/repos/asf/xalan/test/trunk|http://svn.apache.org/repos/asf/xalan/test/trunk.].

Here is the essence of the test code:

 {code:java}

/**
 * This test case illustrates the original problem with high-surrogate 
characters.
 * This is broken in Xalan 2.7.2, hence the need for a fix.
 */
public void serializationOfHighSurrogateCharactersInUtf8() throws Throwable 
{
reporter.testCaseInit("serializationOfHighSurrogateCharactersInUtf8");
try {
String value = "\uD840\uDC0B";
serializationOf(value, 

[jira] [Updated] (XALANJ-2617) Serializer produces separately escaped surrogate pair instead of codepoint

2018-09-14 Thread Peter De Maeyer (JIRA)


 [ 
https://issues.apache.org/jira/browse/XALANJ-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter De Maeyer updated XALANJ-2617:

Attachment: XALANJ-2617_java.patch
XALANJ-2617_test.patch

> Serializer produces separately escaped surrogate pair instead of codepoint
> --
>
> Key: XALANJ-2617
> URL: https://issues.apache.org/jira/browse/XALANJ-2617
> Project: XalanJ2
>  Issue Type: Bug
>  Security Level: No security risk; visible to anyone(Ordinary problems in 
> Xalan projects.  Anybody can view the issue.) 
>  Components: Serialization, Xalan
>Affects Versions: 2.7.1, 2.7.2
>Reporter: Daniel Kec
>Assignee: Steven J. Hathaway
>Priority: Major
> Attachments: JI9053942.java, 
> XALANJ-2617_Fix_missing_surrogate_pairs_support.patch, 
> XALANJ-2617_java.patch, XALANJ-2617_test.patch
>
>
> When trying to serialize XML with char consisting of unicode surogate char 
> "\uD840\uDC0B" I have tried several and non worked. XML Transformer creates 
> XML string with escaped surogate pair separately, which makes XML 
> unparseable. eg.: SAXParseException; Character reference "" is an 
> invalid XML character. It looks like a bug introduced in the XALANJ-2271 fix.
>  
> {code:java|title=Output of Xalan ver. 2.7.2}
> kec@phoebe:~/Downloads$ java -version
> java version "1.8.0_171"
> Java(TM) SE Runtime Environment (build 1.8.0_171-b11)
> Java HotSpot(TM) 64-Bit Server VM (build 25.171-b11, mixed mode)
> kec@phoebe:~/Downloads$ java -cp 
> /home/kec/.m2/repository/xml-apis/xml-apis/1.4.01/xml-apis-1.4.01.jar:/home/kec/.m2/repository/xalan/xalan/2.7.2/xalan-2.7.2.jar:/home/kec/.m2/repository/xalan/serializer/2.7.2/serializer-2.7.2.jar:.
>  JI9053942
> Character: 
> EXPECTED: 
>  ACTUAL: 
> [Fatal Error] :1:50: Character reference 

[jira] [Updated] (XALANJ-2617) Serializer produces separately escaped surrogate pair instead of codepoint

2018-09-14 Thread Peter De Maeyer (JIRA)


 [ 
https://issues.apache.org/jira/browse/XALANJ-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter De Maeyer updated XALANJ-2617:

Attachment: (was: 
XALANJ-2617_Fix_missing_surrogate_pairs_support_new.patch)

> Serializer produces separately escaped surrogate pair instead of codepoint
> --
>
> Key: XALANJ-2617
> URL: https://issues.apache.org/jira/browse/XALANJ-2617
> Project: XalanJ2
>  Issue Type: Bug
>  Security Level: No security risk; visible to anyone(Ordinary problems in 
> Xalan projects.  Anybody can view the issue.) 
>  Components: Serialization, Xalan
>Affects Versions: 2.7.1, 2.7.2
>Reporter: Daniel Kec
>Assignee: Steven J. Hathaway
>Priority: Major
> Attachments: JI9053942.java, 
> XALANJ-2617_Fix_missing_surrogate_pairs_support.patch
>
>
> When trying to serialize XML with char consisting of unicode surogate char 
> "\uD840\uDC0B" I have tried several and non worked. XML Transformer creates 
> XML string with escaped surogate pair separately, which makes XML 
> unparseable. eg.: SAXParseException; Character reference "" is an 
> invalid XML character. It looks like a bug introduced in the XALANJ-2271 fix.
>  
> {code:java|title=Output of Xalan ver. 2.7.2}
> kec@phoebe:~/Downloads$ java -version
> java version "1.8.0_171"
> Java(TM) SE Runtime Environment (build 1.8.0_171-b11)
> Java HotSpot(TM) 64-Bit Server VM (build 25.171-b11, mixed mode)
> kec@phoebe:~/Downloads$ java -cp 
> /home/kec/.m2/repository/xml-apis/xml-apis/1.4.01/xml-apis-1.4.01.jar:/home/kec/.m2/repository/xalan/xalan/2.7.2/xalan-2.7.2.jar:/home/kec/.m2/repository/xalan/serializer/2.7.2/serializer-2.7.2.jar:.
>  JI9053942
> Character: 
> EXPECTED: 
>  ACTUAL: 
> [Fatal Error] :1:50: Character reference 

[jira] [Comment Edited] (XALANJ-2617) Serializer produces separately escaped surrogate pair instead of codepoint

2018-09-13 Thread Peter De Maeyer (JIRA)


[ 
https://issues.apache.org/jira/browse/XALANJ-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16597890#comment-16597890
 ] 

Peter De Maeyer edited comment on XALANJ-2617 at 9/13/18 6:46 AM:
--

[~danielkec], I reviewed your patch, but I wonder if the 'if' statement you 
added shouldn't have been an 'if else' statement. The reason I wonder is that 
the 'else if' and 'else' branches that immediately follow your changes now get 
a different meaning. Especially the 'else' branch that has a code comment "// 
This is a fallback plan, we should never get here" unsettles me. I can imagine 
that some _new_ scenario is indeed fixed, but I'm worried that some _existing_ 
scenarios (which were supposed to trigger this "fallback plan") might be 
broken. The fact that I did not find any unit tests to illustrate those 
scenarios does not help take away my concern either. I'm not familiar with the 
Xalan codebase (I am very familiar with Java code in general though), so maybe 
I misunderstand a couple of things, I would really appreciate it if you could 
reassure me this patch is indeed the right fix.

Then as a cosmetic side note, I noticed that your patch does not respect the 
code style of the surrounding code. '{' should not be on a new line, and there 
are 2 instances of a ',' that aren't followed by a space as they should. Call 
me pedantic, but I'm just being thorough here.


was (Author: peterdm):
[~danielkec], I reviewed your patch, but I wonder if the 'if' statement you 
added shouldn't have been an 'if else' statement. The reason I wonder is that 
the 'else if' and 'else' branches that immediately follow your changes now get 
a different meaning. Especially the 'else' branch that has a code comment "// 
This is a fallback plan, we should never get here" unsettles me. I can imagine 
that some _new_ scenario is indeed fixed, but I'm worried that some _existing_ 
scenarios (which were supposed to trigger this "fallback plan") might be 
broken. The fact that I did not find any unit tests to illustrate those 
scenarios does not help take away my concern either. I'm not familiar with the 
Xalan codebase (I am very familiar with Java code in general though), so maybe 
I misunderstand a couple of things, it would really appreciate it if you could 
reassure me this patch is indeed the right fix.

Then as a cosmetic side note, I noticed that your patch does not respect the 
code style of the surrounding code. '{' should not be on a new line, and there 
are 2 instances of a ',' that aren't followed by a space as they should. Call 
me pedantic, but I'm just being thorough here.

> Serializer produces separately escaped surrogate pair instead of codepoint
> --
>
> Key: XALANJ-2617
> URL: https://issues.apache.org/jira/browse/XALANJ-2617
> Project: XalanJ2
>  Issue Type: Bug
>  Security Level: No security risk; visible to anyone(Ordinary problems in 
> Xalan projects.  Anybody can view the issue.) 
>  Components: Serialization, Xalan
>Affects Versions: 2.7.1, 2.7.2
>Reporter: Daniel Kec
>Assignee: Steven J. Hathaway
>Priority: Major
> Attachments: JI9053942.java, 
> XALANJ-2617_Fix_missing_surrogate_pairs_support.patch, 
> XALANJ-2617_Fix_missing_surrogate_pairs_support_new.patch
>
>
> When trying to serialize XML with char consisting of unicode surogate char 
> "\uD840\uDC0B" I have tried several and non worked. XML Transformer creates 
> XML string with escaped surogate pair separately, which makes XML 
> unparseable. eg.: SAXParseException; Character reference "" is an 
> invalid XML character. It looks like a bug introduced in the XALANJ-2271 fix.
>  
> {code:java|title=Output of Xalan ver. 2.7.2}
> kec@phoebe:~/Downloads$ java -version
> java version "1.8.0_171"
> Java(TM) SE Runtime Environment (build 1.8.0_171-b11)
> Java HotSpot(TM) 64-Bit Server VM (build 25.171-b11, mixed mode)
> kec@phoebe:~/Downloads$ java -cp 
> /home/kec/.m2/repository/xml-apis/xml-apis/1.4.01/xml-apis-1.4.01.jar:/home/kec/.m2/repository/xalan/xalan/2.7.2/xalan-2.7.2.jar:/home/kec/.m2/repository/xalan/serializer/2.7.2/serializer-2.7.2.jar:.
>  JI9053942
> Character: 
> EXPECTED: 
>  ACTUAL: 
> [Fatal Error] :1:50: Character reference 

[jira] [Comment Edited] (XALANJ-2617) Serializer produces separately escaped surrogate pair instead of codepoint

2018-09-12 Thread Peter De Maeyer (JIRA)


[ 
https://issues.apache.org/jira/browse/XALANJ-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16612680#comment-16612680
 ] 

Peter De Maeyer edited comment on XALANJ-2617 at 9/12/18 8:18 PM:
--

It can be proven with a unit test that Daniel's fix breaks some scenarios that 
used to work. As I suspected, the "if" has to be an "else if". I've attached my 
own new patch + unit tests.

Note that the patch spans 2 repositories: the fix is relative to 
[http://svn.apache.org/repos/asf/xalan/java/trunk,] the unit test is relative 
to 
[http://svn.apache.org/repos/asf/xalan/test/trunk|http://svn.apache.org/repos/asf/xalan/test/trunk.].

Here is the essence of the test code:

 {code:java}

/**
 * This test case illustrates the original problem with high-surrogate 
characters.
 * This is broken in Xalan 2.7.2, hence the need for a fix.
 */
public void serializationOfHighSurrogateCharactersInUtf8() throws Throwable 
{
reporter.testCaseInit("serializationOfHighSurrogateCharactersInUtf8");
try {
String value = "\uD840\uDC0B";
serializationOf(value, 

[jira] [Commented] (XALANJ-2617) Serializer produces separately escaped surrogate pair instead of codepoint

2018-09-12 Thread Peter De Maeyer (JIRA)


[ 
https://issues.apache.org/jira/browse/XALANJ-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16612680#comment-16612680
 ] 

Peter De Maeyer commented on XALANJ-2617:
-

It can be proven with a unit test that Daniel's fix breaks some scenarios that 
used to work. As I suspected, the "if" has to be an "else if". I've attached my 
own new patch + unit tests.

Note that the patch spans 2 repositories: the fix is relative to 
[http://svn.apache.org/repos/asf/xalan/java/trunk,] the unit test is relative 
to 
[http://svn.apache.org/repos/asf/xalan/test/trunk|http://svn.apache.org/repos/asf/xalan/test/trunk.].

Just in case the patch isn't readable, this is essence of the test code:

 {code:java}

/**
 * This test case illustrates the original problem with high-surrogate 
characters.
 * This is broken in Xalan 2.7.2, hence the need for a fix.
 */
public void serializationOfHighSurrogateCharactersInUtf8() throws Throwable 
{
reporter.testCaseInit("serializationOfHighSurrogateCharactersInUtf8");
try {
String value = "\uD840\uDC0B";
serializationOf(value, 

[jira] [Updated] (XALANJ-2617) Serializer produces separately escaped surrogate pair instead of codepoint

2018-09-12 Thread Peter De Maeyer (JIRA)


 [ 
https://issues.apache.org/jira/browse/XALANJ-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter De Maeyer updated XALANJ-2617:

Attachment: XALANJ-2617_Fix_missing_surrogate_pairs_support_new.patch

> Serializer produces separately escaped surrogate pair instead of codepoint
> --
>
> Key: XALANJ-2617
> URL: https://issues.apache.org/jira/browse/XALANJ-2617
> Project: XalanJ2
>  Issue Type: Bug
>  Security Level: No security risk; visible to anyone(Ordinary problems in 
> Xalan projects.  Anybody can view the issue.) 
>  Components: Serialization, Xalan
>Affects Versions: 2.7.1, 2.7.2
>Reporter: Daniel Kec
>Assignee: Steven J. Hathaway
>Priority: Major
> Attachments: JI9053942.java, 
> XALANJ-2617_Fix_missing_surrogate_pairs_support.patch, 
> XALANJ-2617_Fix_missing_surrogate_pairs_support_new.patch
>
>
> When trying to serialize XML with char consisting of unicode surogate char 
> "\uD840\uDC0B" I have tried several and non worked. XML Transformer creates 
> XML string with escaped surogate pair separately, which makes XML 
> unparseable. eg.: SAXParseException; Character reference "" is an 
> invalid XML character. It looks like a bug introduced in the XALANJ-2271 fix.
>  
> {code:java|title=Output of Xalan ver. 2.7.2}
> kec@phoebe:~/Downloads$ java -version
> java version "1.8.0_171"
> Java(TM) SE Runtime Environment (build 1.8.0_171-b11)
> Java HotSpot(TM) 64-Bit Server VM (build 25.171-b11, mixed mode)
> kec@phoebe:~/Downloads$ java -cp 
> /home/kec/.m2/repository/xml-apis/xml-apis/1.4.01/xml-apis-1.4.01.jar:/home/kec/.m2/repository/xalan/xalan/2.7.2/xalan-2.7.2.jar:/home/kec/.m2/repository/xalan/serializer/2.7.2/serializer-2.7.2.jar:.
>  JI9053942
> Character: 
> EXPECTED: 
>  ACTUAL: 
> [Fatal Error] :1:50: Character reference 

[jira] [Commented] (XALANJ-2617) Serializer produces separately escaped surrogate pair instead of codepoint

2018-08-30 Thread Peter De Maeyer (JIRA)


[ 
https://issues.apache.org/jira/browse/XALANJ-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16597890#comment-16597890
 ] 

Peter De Maeyer commented on XALANJ-2617:
-

[~danielkec], I reviewed your patch, but I wonder if the 'if' statement you 
added shouldn't have been an 'if else' statement. The reason I wonder is that 
the 'else if' and 'else' branches that immediately follow your changes now get 
a different meaning. Especially the 'else' branch that has a code comment "// 
This is a fallback plan, we should never get here" unsettles me. I can imagine 
that some _new_ scenario is indeed fixed, but I'm worried that some _existing_ 
scenarios (which were supposed to trigger this "fallback plan") might be 
broken. The fact that I did not find any unit tests to illustrate those 
scenarios does not help take away my concern either. I'm not familiar with the 
Xalan codebase (I am very familiar with Java code in general though), so maybe 
I misunderstand a couple of things, it would really appreciate it if you could 
reassure me this patch is indeed the right fix.

Then as a cosmetic side note, I noticed that your patch does not respect the 
code style of the surrounding code. '{' should not be on a new line, and there 
are 2 instances of a ',' that aren't followed by a space as they should. Call 
me pedantic, but I'm just being thorough here.

> Serializer produces separately escaped surrogate pair instead of codepoint
> --
>
> Key: XALANJ-2617
> URL: https://issues.apache.org/jira/browse/XALANJ-2617
> Project: XalanJ2
>  Issue Type: Bug
>  Security Level: No security risk; visible to anyone(Ordinary problems in 
> Xalan projects.  Anybody can view the issue.) 
>  Components: Serialization, Xalan
>Affects Versions: 2.7.1, 2.7.2
>Reporter: Daniel Kec
>Assignee: Steven J. Hathaway
>Priority: Major
> Attachments: JI9053942.java, 
> XALANJ-2617_Fix_missing_surrogate_pairs_support.patch
>
>
> When trying to serialize XML with char consisting of unicode surogate char 
> "\uD840\uDC0B" I have tried several and non worked. XML Transformer creates 
> XML string with escaped surogate pair separately, which makes XML 
> unparseable. eg.: SAXParseException; Character reference "" is an 
> invalid XML character. It looks like a bug introduced in the XALANJ-2271 fix.
>  
> {code:java|title=Output of Xalan ver. 2.7.2}
> kec@phoebe:~/Downloads$ java -version
> java version "1.8.0_171"
> Java(TM) SE Runtime Environment (build 1.8.0_171-b11)
> Java HotSpot(TM) 64-Bit Server VM (build 25.171-b11, mixed mode)
> kec@phoebe:~/Downloads$ java -cp 
> /home/kec/.m2/repository/xml-apis/xml-apis/1.4.01/xml-apis-1.4.01.jar:/home/kec/.m2/repository/xalan/xalan/2.7.2/xalan-2.7.2.jar:/home/kec/.m2/repository/xalan/serializer/2.7.2/serializer-2.7.2.jar:.
>  JI9053942
> Character: 
> EXPECTED: 
>  ACTUAL: 
> [Fatal Error] :1:50: Character reference