[jira] [Commented] (PROTON-576) proton-j: codec support for UTF-8 encoding and decoding appears broken?

ASF GitHub Bot (JIRA) Wed, 04 Mar 2015 08:46:33 -0800

    [ 
https://issues.apache.org/jira/browse/PROTON-576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347083#comment-14347083
 ]


ASF GitHub Bot commented on PROTON-576:
---------------------------------------

GitHub user dnwe opened a pull request:

    https://github.com/apache/qpid-proton/pull/10

    PROTON-576: modified UTF-8 encoder fixes

    Commit 5069bb6 applied a modified version of a patch I submitted, to
    ensure that the UTF-8 encoder (and UTF-8 byte length calculator) would
    cope with surrogate pairs. This commit fixes an issue with three byte
    characters in the <= 0xFFFF range being incorrectly detected as invalid
    four byte surrogates.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/dnwe/qpid-proton fix-proton-576

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/qpid-proton/pull/10.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #10
    
----
commit 6ed99b97164d1bdb463b3bdbfc0507e0e603949e
Author: Dominic Evans <[email protected]>
Date:   2015-03-04T16:21:46Z

    PROTON-576: modified UTF-8 encoder fixes
    
    Commit 5069bb6 applied a modified version of a patch I submitted, to
    ensure that the UTF-8 encoder (and UTF-8 byte length calculator) would
    cope with surrogate pairs. This commit fixes an issue with three byte
    characters in the <= 0xFFFF range being incorrectly detected as invalid
    four byte surrogates.

----


> proton-j: codec support for UTF-8 encoding and decoding appears broken?
> -----------------------------------------------------------------------
>
>                 Key: PROTON-576
>                 URL: https://issues.apache.org/jira/browse/PROTON-576
>             Project: Qpid Proton
>          Issue Type: Bug
>          Components: proton-j
>    Affects Versions: 0.7
>            Reporter: Dominic Evans
>             Fix For: 0.8
>
>         Attachments: 02_fix_stringtype_encode_decode.patch, PROTON-576.patch
>
>
> It seems like Proton-J has its own custom UTF-8 encoder, but relies on Java 
> String's built-in UTF-8 decoder. However, the code doesn't seem quite right 
> and complex double byte UTF-8 like emoji ('📔🚢🍛🍴🍹🏊🏄') can quite easily fail to 
> parse:
> |   |   Cause:1       :-  java.lang.IllegalArgumentException: Cannot parse 
> String
> |   |   Message:1     :-  Cannot parse String
> |   |   StackTrace:1  :-  java.lang.IllegalArgumentException: Cannot parse 
> String
> |   |         at 
> org.apache.qpid.proton.codec.StringType$1.decode(StringType.java:48)
> |   |         at 
> org.apache.qpid.proton.codec.StringType$1.decode(StringType.java:36)
> |   |         at 
> org.apache.qpid.proton.codec.DecoderImpl.readRaw(DecoderImpl.java:945)
> |   |         at 
> org.apache.qpid.proton.codec.StringType$AllStringEncoding.readValue(StringType.java:172)
> |   |         at 
> org.apache.qpid.proton.codec.StringType$AllStringEncoding.readValue(StringType.java:124)
> |   |         at 
> org.apache.qpid.proton.codec.DynamicTypeConstructor.readValue(DynamicTypeConstructor.java:39)
> |   |         at 
> org.apache.qpid.proton.codec.DecoderImpl.readObject(DecoderImpl.java:885)
> |   |         at 
> org.apache.qpid.proton.message.impl.MessageImpl.decode(MessageImpl.java:629)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PROTON-576) proton-j: codec support for UTF-8 encoding and decoding appears broken?

Reply via email to