[jira] [Commented] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs

2016-10-09 Thread Ben Fortuna (JIRA)

[ 
https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15560988#comment-15560988
 ] 

Ben Fortuna commented on COCOON-2352:
-

I've just created a pull request in github to add support for surrogate pairs.

https://github.com/apache/cocoon/pull/1

Summary of changes:

* Added instance variable to XMLEncoder to record the first surrogate of the 
pair - NOTE: this means the XMLEncoder is no longer thread safe. This may have 
implications I'm not aware of (i.e. usage in multi-threaded way)
* Added unit test to demonstrate the behaviour - NOTE: I needed to add the 
serializers project to the test classpath, not sure if there is a better way to 
do this with the ant config.

I look forward to any feedback or comments.

regards,
ben


> XMLEncoder doesn't support Unicode surrogate pairs
> --
>
> Key: COCOON-2352
> URL: https://issues.apache.org/jira/browse/COCOON-2352
> Project: Cocoon
>  Issue Type: Bug
>  Components: * Cocoon Core, Blocks: Serializers
>Reporter: Ben Fortuna
>
> Whilst investigating an issue with the Sling project and support for emoji 
> characters, I've come to notice that the XMLEncoder used by HTMLSerializer 
> doesn't support Unicode surrogate pairs to represent higher order unicode 
> characters.
> A simple unit test that demonstrates this issue is here:
> https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
> More background info here also: SLING-5973
> This seems to have been identified/addressed in other Apache projects also:
> https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs

2016-10-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15560981#comment-15560981
 ] 

ASF GitHub Bot commented on COCOON-2352:


GitHub user benfortuna opened a pull request:

https://github.com/apache/cocoon/pull/1

Support for Unicode surrogate pairs

This PR adds support for encoding surrogate pairs as a single character the 
XMLEncoder implementation. See 
[COCOON-2352](https://issues.apache.org/jira/browse/COCOON-2352) for further 
details.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/benfortuna/cocoon BRANCH_2_1_X

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/cocoon/pull/1.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1


commit 4975a555b8330446089c81e17e8bfaaaee669600
Author: Ben Fortuna 
Date:   2016-10-10T00:11:32Z

Added required folder for build

commit cf2d9b65eb55b9d19a0b0c179e90fe7c7b70b6e6
Author: Ben Fortuna 
Date:   2016-10-10T00:11:58Z

Added support for decoding surrogate pairs

commit cc68b0040c5afc6286dc767810ea2ec7abd58340
Author: Ben Fortuna 
Date:   2016-10-10T01:26:20Z

Added unit test for encoding unicode surrogate pairs




> XMLEncoder doesn't support Unicode surrogate pairs
> --
>
> Key: COCOON-2352
> URL: https://issues.apache.org/jira/browse/COCOON-2352
> Project: Cocoon
>  Issue Type: Bug
>  Components: * Cocoon Core, Blocks: Serializers
>Reporter: Ben Fortuna
>
> Whilst investigating an issue with the Sling project and support for emoji 
> characters, I've come to notice that the XMLEncoder used by HTMLSerializer 
> doesn't support Unicode surrogate pairs to represent higher order unicode 
> characters.
> A simple unit test that demonstrates this issue is here:
> https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
> More background info here also: SLING-5973
> This seems to have been identified/addressed in other Apache projects also:
> https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)