[jira] [Comment Edited] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs

2016-10-17 Thread Ben Fortuna (JIRA)

[ 
https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15582147#comment-15582147
 ] 

Ben Fortuna edited comment on COCOON-2352 at 10/17/16 12:46 PM:


Hmm, do you have a link to the source? I checked on BRANCH_2_1_X and it still 
has the old code. I noticed the error is on line 42, but the test I submitted 
only has 33 lines. 

Note it is important for the test to encode the surrogate pairs together, which 
is why I had the sequence like this:

char[] expectedValue = encoder.encode((char) 127808);
// surrogate 1/2
assertTrue(encoder.encode('\uD83C').length == 0);
// surrogate 2/2
assertTrue(Arrays.equals(expectedValue, encoder.encode('\uDF40')));




was (Author: fortuna):
Hmm, do you have a link to the source? I checked on BRANCH_2_1_X and it still 
has the old code. I noticed the error is on line 42, but the test I submitted 
only has 33 lines. 

Note it is important for the test to encode the surrogate pairs together, which 
is why I had the sequence like this:

```
char[] expectedValue = encoder.encode((char) 127808);
// surrogate 1/2
assertTrue(encoder.encode('\uD83C').length == 0);
// surrogate 2/2
assertTrue(Arrays.equals(expectedValue, encoder.encode('\uDF40')));
```


> XMLEncoder doesn't support Unicode surrogate pairs
> --
>
> Key: COCOON-2352
> URL: https://issues.apache.org/jira/browse/COCOON-2352
> Project: Cocoon
>  Issue Type: Bug
>  Components: * Cocoon Core, Blocks: Serializers
>Affects Versions: 2.1.12
>Reporter: Ben Fortuna
>Assignee: Francesco Chicchiriccò
> Fix For: 2.1.13
>
>
> Whilst investigating an issue with the Sling project and support for emoji 
> characters, I've come to notice that the XMLEncoder used by HTMLSerializer 
> doesn't support Unicode surrogate pairs to represent higher order unicode 
> characters.
> A simple unit test that demonstrates this issue is here:
> https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
> More background info here also: SLING-5973
> This seems to have been identified/addressed in other Apache projects also:
> https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs

2016-10-17 Thread Ben Fortuna (JIRA)

[ 
https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15582147#comment-15582147
 ] 

Ben Fortuna edited comment on COCOON-2352 at 10/17/16 12:45 PM:


Hmm, do you have a link to the source? I checked on BRANCH_2_1_X and it still 
has the old code. I noticed the error is on line 42, but the test I submitted 
only has 33 lines. 

Note it is important for the test to encode the surrogate pairs together, which 
is why I had the sequence like this:

```
char[] expectedValue = encoder.encode((char) 127808);
// surrogate 1/2
assertTrue(encoder.encode('\uD83C').length == 0);
// surrogate 2/2
assertTrue(Arrays.equals(expectedValue, encoder.encode('\uDF40')));
```



was (Author: fortuna):
Hmm, do you have a link to the source? I checked on BRANCH_2_1_X and it still 
has the old code. I noticed the error is on line 42, but the test I submitted 
only has 33 lines. 

Note it is important for the test to encode the surrogate pairs together, which 
is why I had the sequence like this:

{code}
char[] expectedValue = encoder.encode((char) 127808);
// surrogate 1/2
assertTrue(encoder.encode('\uD83C').length == 0);
// surrogate 2/2
assertTrue(Arrays.equals(expectedValue, encoder.encode('\uDF40')));
{code}


> XMLEncoder doesn't support Unicode surrogate pairs
> --
>
> Key: COCOON-2352
> URL: https://issues.apache.org/jira/browse/COCOON-2352
> Project: Cocoon
>  Issue Type: Bug
>  Components: * Cocoon Core, Blocks: Serializers
>Affects Versions: 2.1.12
>Reporter: Ben Fortuna
>Assignee: Francesco Chicchiriccò
> Fix For: 2.1.13
>
>
> Whilst investigating an issue with the Sling project and support for emoji 
> characters, I've come to notice that the XMLEncoder used by HTMLSerializer 
> doesn't support Unicode surrogate pairs to represent higher order unicode 
> characters.
> A simple unit test that demonstrates this issue is here:
> https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
> More background info here also: SLING-5973
> This seems to have been identified/addressed in other Apache projects also:
> https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs

2016-10-16 Thread Ben Fortuna (JIRA)

[ 
https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15580714#comment-15580714
 ] 

Ben Fortuna edited comment on COCOON-2352 at 10/16/16 11:39 PM:


Yes sorry, I forgot to mention I had updated the unit test also. See the same 
PR for the changes (3 lines in the test method).

https://github.com/apache/cocoon/pull/2/files#diff-4f5d5b9cb8b320832b3f0dfb8183a1b9R28




was (Author: fortuna):
Yes sorry, I forgot to mention I had updated the unit test also. See the same 
PR for the changes.

> XMLEncoder doesn't support Unicode surrogate pairs
> --
>
> Key: COCOON-2352
> URL: https://issues.apache.org/jira/browse/COCOON-2352
> Project: Cocoon
>  Issue Type: Bug
>  Components: * Cocoon Core, Blocks: Serializers
>Affects Versions: 2.1.12
>Reporter: Ben Fortuna
>Assignee: Francesco Chicchiriccò
> Fix For: 2.1.13
>
>
> Whilst investigating an issue with the Sling project and support for emoji 
> characters, I've come to notice that the XMLEncoder used by HTMLSerializer 
> doesn't support Unicode surrogate pairs to represent higher order unicode 
> characters.
> A simple unit test that demonstrates this issue is here:
> https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
> More background info here also: SLING-5973
> This seems to have been identified/addressed in other Apache projects also:
> https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (COCOON-2352) XMLEncoder doesn't support Unicode surrogate pairs

2016-10-12 Thread Ben Fortuna (JIRA)

[ 
https://issues.apache.org/jira/browse/COCOON-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15570179#comment-15570179
 ] 

Ben Fortuna edited comment on COCOON-2352 at 10/12/16 11:15 PM:


[~ilgrosso] I am happy to have this issue closed, however it would be good if 
there was a snapshot JAR available to verify the functionality. Specifically I 
am hoping this change will make it into this artefact:

http://search.maven.org/#search%7Cgav%7C1%7Cg%3A%22org.apache.cocoon%22%20AND%20a%3A%22cocoon-serializers-charsets%22

Will a new version be produced with the next release? Many thanks for your 
efforts.


was (Author: fortuna):
[~ilgrosso] I am happy to have this issue closed, however it would be good if 
there was a snapshot JAR available to verify the functionality. Specifically I 
am hoping this change will make it into this artefact:

http://search.maven.org/#search%7Cgav%7C1%7Cg%3A%22org.apache.cocoon%22%20AND%20a%3A%22cocoon-serializers-charsets%22

Will a new version be produced with the next release? Many thanks for your 
efforts.

> XMLEncoder doesn't support Unicode surrogate pairs
> --
>
> Key: COCOON-2352
> URL: https://issues.apache.org/jira/browse/COCOON-2352
> Project: Cocoon
>  Issue Type: Bug
>  Components: * Cocoon Core, Blocks: Serializers
>Affects Versions: 2.1.12
>Reporter: Ben Fortuna
>Assignee: Francesco Chicchiriccò
> Fix For: 2.1.13
>
>
> Whilst investigating an issue with the Sling project and support for emoji 
> characters, I've come to notice that the XMLEncoder used by HTMLSerializer 
> doesn't support Unicode surrogate pairs to represent higher order unicode 
> characters.
> A simple unit test that demonstrates this issue is here:
> https://github.com/micronode/whistlepost/blob/master/whistlepost-rewrite-lib/src/test/groovy/org/apache/cocoon/components/serializers/encoding/XMLEncoderTest.groovy
> More background info here also: SLING-5973
> This seems to have been identified/addressed in other Apache projects also:
> https://issues.apache.org/jira/browse/THRIFT-3403?jql=text%20~%20%22surrogate%20pairs%22



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)