[jira] [Commented] (XALANJ-2617) Serializer produces separately escaped surrogate pair instead of codepoint

2024-01-27 Thread Jira


[ 
https://issues.apache.org/jira/browse/XALANJ-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17811567#comment-17811567
 ] 

Cédric Damioli commented on XALANJ-2617:


[~kesh...@alum.mit.edu] I think you may also mark this one resolved as 
duplicate of XALANJ-2419 ?

> Serializer produces separately escaped surrogate pair instead of codepoint
> --
>
> Key: XALANJ-2617
> URL: https://issues.apache.org/jira/browse/XALANJ-2617
> Project: XalanJ2
>  Issue Type: Bug
>  Security Level: No security risk; visible to anyone(Ordinary problems in 
> Xalan projects.  Anybody can view the issue.) 
>  Components: Serialization, Xalan
>Affects Versions: 2.7.1, 2.7.2
>Reporter: Daniel Kec
>Assignee: Steven J. Hathaway
>Priority: Major
> Attachments: JI9053942.java, 
> XALANJ-2617_Fix_missing_surrogate_pairs_support.patch, 
> XALANJ-2617_java.patch, XALANJ-2617_test.patch
>
>
> When trying to serialize XML with char consisting of unicode surogate char 
> "\uD840\uDC0B" I have tried several and non worked. XML Transformer creates 
> XML string with escaped surogate pair separately, which makes XML 
> unparseable. eg.: SAXParseException; Character reference "" is an 
> invalid XML character. It looks like a bug introduced in the XALANJ-2271 fix.
>  
> {code:java|title=Output of Xalan ver. 2.7.2}
> kec@phoebe:~/Downloads$ java -version
> java version "1.8.0_171"
> Java(TM) SE Runtime Environment (build 1.8.0_171-b11)
> Java HotSpot(TM) 64-Bit Server VM (build 25.171-b11, mixed mode)
> kec@phoebe:~/Downloads$ java -cp 
> /home/kec/.m2/repository/xml-apis/xml-apis/1.4.01/xml-apis-1.4.01.jar:/home/kec/.m2/repository/xalan/xalan/2.7.2/xalan-2.7.2.jar:/home/kec/.m2/repository/xalan/serializer/2.7.2/serializer-2.7.2.jar:.
>  JI9053942
> Character: 
> EXPECTED: 
>  ACTUAL: 
> [Fatal Error] :1:50: Character reference 

[jira] [Commented] (XALANJ-2617) Serializer produces separately escaped surrogate pair instead of codepoint

2022-02-16 Thread Joe Kesselman (Jira)


[ 
https://issues.apache.org/jira/browse/XALANJ-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17493637#comment-17493637
 ] 

Joe Kesselman commented on XALANJ-2617:
---


Sanity check: What version number does that Xalan report?  Either 
OpenJDK is using a different release, or they're using a different 
default configuration, or you've found a place where the compiled-mode 
behavior of Xalan differs from the interpreted mode.

(I don't have time to dig into the details of your issue right now; I'll 
try to look at it next week. I suspect this is arising because Xalan was 
originally written with the assumption that text was all going to be 
UTF16, so it's seeing the pair as two characters rather than one. I 
thought we'd started addressing that, but it might only have been in the 
IBM code.)

-- 

   /_  My pronouns are he/him/his, though I answer
-/ _) to "(the) cat('s)" ... or anything, really.
   /   Please correct me if I get yours wrong.

() Plaintext Ribbon Campaign
/\ Stamp out HTML mail!


> Serializer produces separately escaped surrogate pair instead of codepoint
> --
>
> Key: XALANJ-2617
> URL: https://issues.apache.org/jira/browse/XALANJ-2617
> Project: XalanJ2
>  Issue Type: Bug
>  Security Level: No security risk; visible to anyone(Ordinary problems in 
> Xalan projects.  Anybody can view the issue.) 
>  Components: Serialization, Xalan
>Affects Versions: 2.7.1, 2.7.2
>Reporter: Daniel Kec
>Assignee: Steven J. Hathaway
>Priority: Major
> Attachments: JI9053942.java, 
> XALANJ-2617_Fix_missing_surrogate_pairs_support.patch, 
> XALANJ-2617_java.patch, XALANJ-2617_test.patch
>
>
> When trying to serialize XML with char consisting of unicode surogate char 
> "\uD840\uDC0B" I have tried several and non worked. XML Transformer creates 
> XML string with escaped surogate pair separately, which makes XML 
> unparseable. eg.: SAXParseException; Character reference "" is an 
> invalid XML character. It looks like a bug introduced in the XALANJ-2271 fix.
>  
> {code:java|title=Output of Xalan ver. 2.7.2}
> kec@phoebe:~/Downloads$ java -version
> java version "1.8.0_171"
> Java(TM) SE Runtime Environment (build 1.8.0_171-b11)
> Java HotSpot(TM) 64-Bit Server VM (build 25.171-b11, mixed mode)
> kec@phoebe:~/Downloads$ java -cp 
> /home/kec/.m2/repository/xml-apis/xml-apis/1.4.01/xml-apis-1.4.01.jar:/home/kec/.m2/repository/xalan/xalan/2.7.2/xalan-2.7.2.jar:/home/kec/.m2/repository/xalan/serializer/2.7.2/serializer-2.7.2.jar:.
>  JI9053942
> Character: 
> EXPECTED: 
>  ACTUAL: 
> [Fatal Error] :1:50: Character reference 

[jira] [Commented] (XALANJ-2617) Serializer produces separately escaped surrogate pair instead of codepoint

2022-02-16 Thread Takayuki Nagai (Jira)


[ 
https://issues.apache.org/jira/browse/XALANJ-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17493626#comment-17493626
 ] 

Takayuki Nagai commented on XALANJ-2617:


I had the same problem and found a solution.

I stopped to use  org.apache.xalan.processor.TransformerFactoryImpl and moved 
to com.sun.org.apache.xalan.internal.xsltc.trax.TransformerFactoryImpl 
contained in OpenJDK11.

I confirmed the xalan in OpenJDK does not have this problem.

> Serializer produces separately escaped surrogate pair instead of codepoint
> --
>
> Key: XALANJ-2617
> URL: https://issues.apache.org/jira/browse/XALANJ-2617
> Project: XalanJ2
>  Issue Type: Bug
>  Security Level: No security risk; visible to anyone(Ordinary problems in 
> Xalan projects.  Anybody can view the issue.) 
>  Components: Serialization, Xalan
>Affects Versions: 2.7.1, 2.7.2
>Reporter: Daniel Kec
>Assignee: Steven J. Hathaway
>Priority: Major
> Attachments: JI9053942.java, 
> XALANJ-2617_Fix_missing_surrogate_pairs_support.patch, 
> XALANJ-2617_java.patch, XALANJ-2617_test.patch
>
>
> When trying to serialize XML with char consisting of unicode surogate char 
> "\uD840\uDC0B" I have tried several and non worked. XML Transformer creates 
> XML string with escaped surogate pair separately, which makes XML 
> unparseable. eg.: SAXParseException; Character reference "" is an 
> invalid XML character. It looks like a bug introduced in the XALANJ-2271 fix.
>  
> {code:java|title=Output of Xalan ver. 2.7.2}
> kec@phoebe:~/Downloads$ java -version
> java version "1.8.0_171"
> Java(TM) SE Runtime Environment (build 1.8.0_171-b11)
> Java HotSpot(TM) 64-Bit Server VM (build 25.171-b11, mixed mode)
> kec@phoebe:~/Downloads$ java -cp 
> /home/kec/.m2/repository/xml-apis/xml-apis/1.4.01/xml-apis-1.4.01.jar:/home/kec/.m2/repository/xalan/xalan/2.7.2/xalan-2.7.2.jar:/home/kec/.m2/repository/xalan/serializer/2.7.2/serializer-2.7.2.jar:.
>  JI9053942
> Character: 
> EXPECTED: 
>  ACTUAL: 
> [Fatal Error] :1:50: Character reference 

[jira] [Commented] (XALANJ-2617) Serializer produces separately escaped surrogate pair instead of codepoint

2021-04-30 Thread Patrick Ferreira (Jira)


[ 
https://issues.apache.org/jira/browse/XALANJ-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17337488#comment-17337488
 ] 

Patrick Ferreira commented on XALANJ-2617:
--

As there are no updates since a few years, how do you guys deal with this issue 
? I'm experiencing the same bug right now.

 

Could I switch back to xalan-2.7.0 ? Or should I use another library ? 

 

Thanks

> Serializer produces separately escaped surrogate pair instead of codepoint
> --
>
> Key: XALANJ-2617
> URL: https://issues.apache.org/jira/browse/XALANJ-2617
> Project: XalanJ2
>  Issue Type: Bug
>  Security Level: No security risk; visible to anyone(Ordinary problems in 
> Xalan projects.  Anybody can view the issue.) 
>  Components: Serialization, Xalan
>Affects Versions: 2.7.1, 2.7.2
>Reporter: Daniel Kec
>Assignee: Steven J. Hathaway
>Priority: Major
> Attachments: JI9053942.java, 
> XALANJ-2617_Fix_missing_surrogate_pairs_support.patch, 
> XALANJ-2617_java.patch, XALANJ-2617_test.patch
>
>
> When trying to serialize XML with char consisting of unicode surogate char 
> "\uD840\uDC0B" I have tried several and non worked. XML Transformer creates 
> XML string with escaped surogate pair separately, which makes XML 
> unparseable. eg.: SAXParseException; Character reference "" is an 
> invalid XML character. It looks like a bug introduced in the XALANJ-2271 fix.
>  
> {code:java|title=Output of Xalan ver. 2.7.2}
> kec@phoebe:~/Downloads$ java -version
> java version "1.8.0_171"
> Java(TM) SE Runtime Environment (build 1.8.0_171-b11)
> Java HotSpot(TM) 64-Bit Server VM (build 25.171-b11, mixed mode)
> kec@phoebe:~/Downloads$ java -cp 
> /home/kec/.m2/repository/xml-apis/xml-apis/1.4.01/xml-apis-1.4.01.jar:/home/kec/.m2/repository/xalan/xalan/2.7.2/xalan-2.7.2.jar:/home/kec/.m2/repository/xalan/serializer/2.7.2/serializer-2.7.2.jar:.
>  JI9053942
> Character: 
> EXPECTED: 
>  ACTUAL: 
> [Fatal Error] :1:50: Character reference 

[jira] [Commented] (XALANJ-2617) Serializer produces separately escaped surrogate pair instead of codepoint

2019-03-28 Thread Mukul Gandhi (JIRA)


[ 
https://issues.apache.org/jira/browse/XALANJ-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16804568#comment-16804568
 ] 

Mukul Gandhi commented on XALANJ-2617:
--

I can see that, somewhere in this jira thread its mentioned,

"patch contains the fix in java code relative to 
http://svn.apache.org/repos/asf/xalan/java/trunk;

I think, the patch should be relative to 
http://svn.apache.org/repos/asf/xalan/java/branches/xalan-j_2_7_1_maint. I 
think, Xalan-J's next version would be released from this branch. Xalan team 
may correct me, if I'm wrong.

> Serializer produces separately escaped surrogate pair instead of codepoint
> --
>
> Key: XALANJ-2617
> URL: https://issues.apache.org/jira/browse/XALANJ-2617
> Project: XalanJ2
>  Issue Type: Bug
>  Security Level: No security risk; visible to anyone(Ordinary problems in 
> Xalan projects.  Anybody can view the issue.) 
>  Components: Serialization, Xalan
>Affects Versions: 2.7.1, 2.7.2
>Reporter: Daniel Kec
>Assignee: Steven J. Hathaway
>Priority: Major
> Attachments: JI9053942.java, 
> XALANJ-2617_Fix_missing_surrogate_pairs_support.patch, 
> XALANJ-2617_java.patch, XALANJ-2617_test.patch
>
>
> When trying to serialize XML with char consisting of unicode surogate char 
> "\uD840\uDC0B" I have tried several and non worked. XML Transformer creates 
> XML string with escaped surogate pair separately, which makes XML 
> unparseable. eg.: SAXParseException; Character reference "" is an 
> invalid XML character. It looks like a bug introduced in the XALANJ-2271 fix.
>  
> {code:java|title=Output of Xalan ver. 2.7.2}
> kec@phoebe:~/Downloads$ java -version
> java version "1.8.0_171"
> Java(TM) SE Runtime Environment (build 1.8.0_171-b11)
> Java HotSpot(TM) 64-Bit Server VM (build 25.171-b11, mixed mode)
> kec@phoebe:~/Downloads$ java -cp 
> /home/kec/.m2/repository/xml-apis/xml-apis/1.4.01/xml-apis-1.4.01.jar:/home/kec/.m2/repository/xalan/xalan/2.7.2/xalan-2.7.2.jar:/home/kec/.m2/repository/xalan/serializer/2.7.2/serializer-2.7.2.jar:.
>  JI9053942
> Character: 
> EXPECTED: 
>  ACTUAL: 
> [Fatal Error] :1:50: Character reference 

[jira] [Commented] (XALANJ-2617) Serializer produces separately escaped surrogate pair instead of codepoint

2019-02-21 Thread Jason Harrop (JIRA)


[ 
https://issues.apache.org/jira/browse/XALANJ-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16774686#comment-16774686
 ] 

Jason Harrop commented on XALANJ-2617:
--

Compare https://issues.apache.org/jira/browse/XALANJ-2419 and the fix there 
[^XALANJ-2419-fix-v3.txt] (elements only).  

> Serializer produces separately escaped surrogate pair instead of codepoint
> --
>
> Key: XALANJ-2617
> URL: https://issues.apache.org/jira/browse/XALANJ-2617
> Project: XalanJ2
>  Issue Type: Bug
>  Security Level: No security risk; visible to anyone(Ordinary problems in 
> Xalan projects.  Anybody can view the issue.) 
>  Components: Serialization, Xalan
>Affects Versions: 2.7.1, 2.7.2
>Reporter: Daniel Kec
>Assignee: Steven J. Hathaway
>Priority: Major
> Attachments: JI9053942.java, 
> XALANJ-2617_Fix_missing_surrogate_pairs_support.patch, 
> XALANJ-2617_java.patch, XALANJ-2617_test.patch
>
>
> When trying to serialize XML with char consisting of unicode surogate char 
> "\uD840\uDC0B" I have tried several and non worked. XML Transformer creates 
> XML string with escaped surogate pair separately, which makes XML 
> unparseable. eg.: SAXParseException; Character reference "" is an 
> invalid XML character. It looks like a bug introduced in the XALANJ-2271 fix.
>  
> {code:java|title=Output of Xalan ver. 2.7.2}
> kec@phoebe:~/Downloads$ java -version
> java version "1.8.0_171"
> Java(TM) SE Runtime Environment (build 1.8.0_171-b11)
> Java HotSpot(TM) 64-Bit Server VM (build 25.171-b11, mixed mode)
> kec@phoebe:~/Downloads$ java -cp 
> /home/kec/.m2/repository/xml-apis/xml-apis/1.4.01/xml-apis-1.4.01.jar:/home/kec/.m2/repository/xalan/xalan/2.7.2/xalan-2.7.2.jar:/home/kec/.m2/repository/xalan/serializer/2.7.2/serializer-2.7.2.jar:.
>  JI9053942
> Character: 
> EXPECTED: 
>  ACTUAL: 
> [Fatal Error] :1:50: Character reference 

[jira] [Commented] (XALANJ-2617) Serializer produces separately escaped surrogate pair instead of codepoint

2018-09-14 Thread Peter De Maeyer (JIRA)


[ 
https://issues.apache.org/jira/browse/XALANJ-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16615312#comment-16615312
 ] 

Peter De Maeyer commented on XALANJ-2617:
-

Pull request created. Unfortunately, it only contains the fix in production 
code and not the tests, because there is no repository on Github for the test 
code. This confuses me a bit - if anyone has a recommendation of what to do 
with the test, I'd be happy to follow them.

My understanding of things (correct me if I'm wrong):
* (/) http://svn.apache.org/repos/asf/xalan/java/trunk is the (ancient) 
authoritative repository for the production code. This is where I created my 
production code patch against.
* (/) http://svn.apache.org/repos/asf/xalan/test/trunk is the (ancient) 
authoritative repository for the test code. This is where I created my test 
code patch against.
* (/) The repository for the production code is mirrored on Github: 
https://github.com/apache/xalan-j. This is where I created a pull request 
against for my production code patch.
* (x) I did not find an equivalent mirror on Github of the repository for the 
test code, so I can't create a pull request for my test code patch.

To complete the story: I successfully ran the minitest and smoketest in the 
test repository before and after my fix. In order to be able to do this, I 
recreated an ancient Windows 2000 32-bit system in a VM, capable of running the 
ancient test harness. Being on a modern Ubuntu Linux 64-bit system, and being 
spoiled with JUnit tests, it took some effort to take a step back in time:
# Install Windows 2000 32-bit in a VirtualBox VM.
# Install 32-bit JDK 1.3 because the Xalan-J sources are -target 1.3 (I know I 
could have compiled this with a JDK 1.6 as well, but that only applies to the 
bytecode, it doesn't prevent @Since > 1.3 API usage).
# Familiarize myself with the really clunky and ancient test harness (being 
used to JUnit and Mockito).

Forgive me if this explanation is overly verbose, but I'm trying to illustrate 
that I didn't make this patch in a hurry, I was being thorough.

> Serializer produces separately escaped surrogate pair instead of codepoint
> --
>
> Key: XALANJ-2617
> URL: https://issues.apache.org/jira/browse/XALANJ-2617
> Project: XalanJ2
>  Issue Type: Bug
>  Security Level: No security risk; visible to anyone(Ordinary problems in 
> Xalan projects.  Anybody can view the issue.) 
>  Components: Serialization, Xalan
>Affects Versions: 2.7.1, 2.7.2
>Reporter: Daniel Kec
>Assignee: Steven J. Hathaway
>Priority: Major
> Attachments: JI9053942.java, 
> XALANJ-2617_Fix_missing_surrogate_pairs_support.patch, 
> XALANJ-2617_java.patch, XALANJ-2617_test.patch
>
>
> When trying to serialize XML with char consisting of unicode surogate char 
> "\uD840\uDC0B" I have tried several and non worked. XML Transformer creates 
> XML string with escaped surogate pair separately, which makes XML 
> unparseable. eg.: SAXParseException; Character reference "" is an 
> invalid XML character. It looks like a bug introduced in the XALANJ-2271 fix.
>  
> {code:java|title=Output of Xalan ver. 2.7.2}
> kec@phoebe:~/Downloads$ java -version
> java version "1.8.0_171"
> Java(TM) SE Runtime Environment (build 1.8.0_171-b11)
> Java HotSpot(TM) 64-Bit Server VM (build 25.171-b11, mixed mode)
> kec@phoebe:~/Downloads$ java -cp 
> /home/kec/.m2/repository/xml-apis/xml-apis/1.4.01/xml-apis-1.4.01.jar:/home/kec/.m2/repository/xalan/xalan/2.7.2/xalan-2.7.2.jar:/home/kec/.m2/repository/xalan/serializer/2.7.2/serializer-2.7.2.jar:.
>  JI9053942
> Character: 
> EXPECTED: 
>  ACTUAL: 
> [Fatal Error] :1:50: Character reference 

[jira] [Commented] (XALANJ-2617) Serializer produces separately escaped surrogate pair instead of codepoint

2018-09-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/XALANJ-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16615284#comment-16615284
 ] 

ASF GitHub Bot commented on XALANJ-2617:


GitHub user peterdemaeyer opened a pull request:

https://github.com/apache/xalan-j/pull/4

XALANJ-2617 Fixed serializer for high-surrogate UTF-16 characters

Fixed serializer such that it correctly deals with high-surrogate UTF-16 
characters. This pull request replaces an earlier one from Daniel Kec, see 
comments on https://issues.apache.org/jira/browse/XALANJ-2617.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/peterdemaeyer/xalan-j trunk

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/xalan-j/pull/4.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4


commit 8a735e58e6804be1e6a125678d1a8d116ad54651
Author: peterdm 
Date:   2018-09-14T19:15:32Z

XALANJ-2617 Fixed serializer such that it correctly deals with 
high-surrogate UTF-16 characters




> Serializer produces separately escaped surrogate pair instead of codepoint
> --
>
> Key: XALANJ-2617
> URL: https://issues.apache.org/jira/browse/XALANJ-2617
> Project: XalanJ2
>  Issue Type: Bug
>  Security Level: No security risk; visible to anyone(Ordinary problems in 
> Xalan projects.  Anybody can view the issue.) 
>  Components: Serialization, Xalan
>Affects Versions: 2.7.1, 2.7.2
>Reporter: Daniel Kec
>Assignee: Steven J. Hathaway
>Priority: Major
> Attachments: JI9053942.java, 
> XALANJ-2617_Fix_missing_surrogate_pairs_support.patch, 
> XALANJ-2617_java.patch, XALANJ-2617_test.patch
>
>
> When trying to serialize XML with char consisting of unicode surogate char 
> "\uD840\uDC0B" I have tried several and non worked. XML Transformer creates 
> XML string with escaped surogate pair separately, which makes XML 
> unparseable. eg.: SAXParseException; Character reference "" is an 
> invalid XML character. It looks like a bug introduced in the XALANJ-2271 fix.
>  
> {code:java|title=Output of Xalan ver. 2.7.2}
> kec@phoebe:~/Downloads$ java -version
> java version "1.8.0_171"
> Java(TM) SE Runtime Environment (build 1.8.0_171-b11)
> Java HotSpot(TM) 64-Bit Server VM (build 25.171-b11, mixed mode)
> kec@phoebe:~/Downloads$ java -cp 
> /home/kec/.m2/repository/xml-apis/xml-apis/1.4.01/xml-apis-1.4.01.jar:/home/kec/.m2/repository/xalan/xalan/2.7.2/xalan-2.7.2.jar:/home/kec/.m2/repository/xalan/serializer/2.7.2/serializer-2.7.2.jar:.
>  JI9053942
> Character: 
> EXPECTED: 
>  ACTUAL: 
> [Fatal Error] :1:50: Character reference 

[jira] [Commented] (XALANJ-2617) Serializer produces separately escaped surrogate pair instead of codepoint

2018-09-12 Thread Daniel Kec (JIRA)


[ 
https://issues.apache.org/jira/browse/XALANJ-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16612703#comment-16612703
 ] 

Daniel Kec commented on XALANJ-2617:


Great thx, i have been afraid it might break something without a proper test 
suit. Feel free to create new PR, im gonna close mine and vote for yours

> Serializer produces separately escaped surrogate pair instead of codepoint
> --
>
> Key: XALANJ-2617
> URL: https://issues.apache.org/jira/browse/XALANJ-2617
> Project: XalanJ2
>  Issue Type: Bug
>  Security Level: No security risk; visible to anyone(Ordinary problems in 
> Xalan projects.  Anybody can view the issue.) 
>  Components: Serialization, Xalan
>Affects Versions: 2.7.1, 2.7.2
>Reporter: Daniel Kec
>Assignee: Steven J. Hathaway
>Priority: Major
> Attachments: JI9053942.java, 
> XALANJ-2617_Fix_missing_surrogate_pairs_support.patch, 
> XALANJ-2617_Fix_missing_surrogate_pairs_support_new.patch
>
>
> When trying to serialize XML with char consisting of unicode surogate char 
> "\uD840\uDC0B" I have tried several and non worked. XML Transformer creates 
> XML string with escaped surogate pair separately, which makes XML 
> unparseable. eg.: SAXParseException; Character reference "" is an 
> invalid XML character. It looks like a bug introduced in the XALANJ-2271 fix.
>  
> {code:java|title=Output of Xalan ver. 2.7.2}
> kec@phoebe:~/Downloads$ java -version
> java version "1.8.0_171"
> Java(TM) SE Runtime Environment (build 1.8.0_171-b11)
> Java HotSpot(TM) 64-Bit Server VM (build 25.171-b11, mixed mode)
> kec@phoebe:~/Downloads$ java -cp 
> /home/kec/.m2/repository/xml-apis/xml-apis/1.4.01/xml-apis-1.4.01.jar:/home/kec/.m2/repository/xalan/xalan/2.7.2/xalan-2.7.2.jar:/home/kec/.m2/repository/xalan/serializer/2.7.2/serializer-2.7.2.jar:.
>  JI9053942
> Character: 
> EXPECTED: 
>  ACTUAL: 
> [Fatal Error] :1:50: Character reference 

[jira] [Commented] (XALANJ-2617) Serializer produces separately escaped surrogate pair instead of codepoint

2018-09-12 Thread Peter De Maeyer (JIRA)


[ 
https://issues.apache.org/jira/browse/XALANJ-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16612680#comment-16612680
 ] 

Peter De Maeyer commented on XALANJ-2617:
-

It can be proven with a unit test that Daniel's fix breaks some scenarios that 
used to work. As I suspected, the "if" has to be an "else if". I've attached my 
own new patch + unit tests.

Note that the patch spans 2 repositories: the fix is relative to 
[http://svn.apache.org/repos/asf/xalan/java/trunk,] the unit test is relative 
to 
[http://svn.apache.org/repos/asf/xalan/test/trunk|http://svn.apache.org/repos/asf/xalan/test/trunk.].

Just in case the patch isn't readable, this is essence of the test code:

 {code:java}

/**
 * This test case illustrates the original problem with high-surrogate 
characters.
 * This is broken in Xalan 2.7.2, hence the need for a fix.
 */
public void serializationOfHighSurrogateCharactersInUtf8() throws Throwable 
{
reporter.testCaseInit("serializationOfHighSurrogateCharactersInUtf8");
try {
String value = "\uD840\uDC0B";
serializationOf(value, 

[jira] [Commented] (XALANJ-2617) Serializer produces separately escaped surrogate pair instead of codepoint

2018-08-30 Thread Peter De Maeyer (JIRA)


[ 
https://issues.apache.org/jira/browse/XALANJ-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16597890#comment-16597890
 ] 

Peter De Maeyer commented on XALANJ-2617:
-

[~danielkec], I reviewed your patch, but I wonder if the 'if' statement you 
added shouldn't have been an 'if else' statement. The reason I wonder is that 
the 'else if' and 'else' branches that immediately follow your changes now get 
a different meaning. Especially the 'else' branch that has a code comment "// 
This is a fallback plan, we should never get here" unsettles me. I can imagine 
that some _new_ scenario is indeed fixed, but I'm worried that some _existing_ 
scenarios (which were supposed to trigger this "fallback plan") might be 
broken. The fact that I did not find any unit tests to illustrate those 
scenarios does not help take away my concern either. I'm not familiar with the 
Xalan codebase (I am very familiar with Java code in general though), so maybe 
I misunderstand a couple of things, it would really appreciate it if you could 
reassure me this patch is indeed the right fix.

Then as a cosmetic side note, I noticed that your patch does not respect the 
code style of the surrounding code. '{' should not be on a new line, and there 
are 2 instances of a ',' that aren't followed by a space as they should. Call 
me pedantic, but I'm just being thorough here.

> Serializer produces separately escaped surrogate pair instead of codepoint
> --
>
> Key: XALANJ-2617
> URL: https://issues.apache.org/jira/browse/XALANJ-2617
> Project: XalanJ2
>  Issue Type: Bug
>  Security Level: No security risk; visible to anyone(Ordinary problems in 
> Xalan projects.  Anybody can view the issue.) 
>  Components: Serialization, Xalan
>Affects Versions: 2.7.1, 2.7.2
>Reporter: Daniel Kec
>Assignee: Steven J. Hathaway
>Priority: Major
> Attachments: JI9053942.java, 
> XALANJ-2617_Fix_missing_surrogate_pairs_support.patch
>
>
> When trying to serialize XML with char consisting of unicode surogate char 
> "\uD840\uDC0B" I have tried several and non worked. XML Transformer creates 
> XML string with escaped surogate pair separately, which makes XML 
> unparseable. eg.: SAXParseException; Character reference "" is an 
> invalid XML character. It looks like a bug introduced in the XALANJ-2271 fix.
>  
> {code:java|title=Output of Xalan ver. 2.7.2}
> kec@phoebe:~/Downloads$ java -version
> java version "1.8.0_171"
> Java(TM) SE Runtime Environment (build 1.8.0_171-b11)
> Java HotSpot(TM) 64-Bit Server VM (build 25.171-b11, mixed mode)
> kec@phoebe:~/Downloads$ java -cp 
> /home/kec/.m2/repository/xml-apis/xml-apis/1.4.01/xml-apis-1.4.01.jar:/home/kec/.m2/repository/xalan/xalan/2.7.2/xalan-2.7.2.jar:/home/kec/.m2/repository/xalan/serializer/2.7.2/serializer-2.7.2.jar:.
>  JI9053942
> Character: 
> EXPECTED: 
>  ACTUAL: 
> [Fatal Error] :1:50: Character reference