[jira] [Updated] (PROTON-576) proton-j: codec support for UTF-8 encoding and decoding appears broken?
[ https://issues.apache.org/jira/browse/PROTON-576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rob Godfrey updated PROTON-576: --- Attachment: PROTON-576.patch I've changed the test a bit (so that it covers other values and not just the surrogate paris) and make a couple of small changes to the code which will hopefully get the performance even closer to the old version for non-surrogate pairs. Unfortunately the benchmarking code attachment seems to have been removed at some point so I couldn't test that. (My very quick and dirty perf testing showed the original code at about 4.1million encodes per sec, 4.0 for this patch and 3.6 for the previous patch...) > proton-j: codec support for UTF-8 encoding and decoding appears broken? > --- > > Key: PROTON-576 > URL: https://issues.apache.org/jira/browse/PROTON-576 > Project: Qpid Proton > Issue Type: Bug > Components: proton-j >Affects Versions: 0.7 >Reporter: Dominic Evans > Attachments: 02_fix_stringtype_encode_decode.patch, PROTON-576.patch > > > It seems like Proton-J has its own custom UTF-8 encoder, but relies on Java > String's built-in UTF-8 decoder. However, the code doesn't seem quite right > and complex double byte UTF-8 like emoji ('📔🚢🍛🍴🍹🏊🏄') can quite easily fail to > parse: > | | Cause:1 :- java.lang.IllegalArgumentException: Cannot parse > String > | | Message:1 :- Cannot parse String > | | StackTrace:1 :- java.lang.IllegalArgumentException: Cannot parse > String > | | at > org.apache.qpid.proton.codec.StringType$1.decode(StringType.java:48) > | | at > org.apache.qpid.proton.codec.StringType$1.decode(StringType.java:36) > | | at > org.apache.qpid.proton.codec.DecoderImpl.readRaw(DecoderImpl.java:945) > | | at > org.apache.qpid.proton.codec.StringType$AllStringEncoding.readValue(StringType.java:172) > | | at > org.apache.qpid.proton.codec.StringType$AllStringEncoding.readValue(StringType.java:124) > | | at > org.apache.qpid.proton.codec.DynamicTypeConstructor.readValue(DynamicTypeConstructor.java:39) > | | at > org.apache.qpid.proton.codec.DecoderImpl.readObject(DecoderImpl.java:885) > | | at > org.apache.qpid.proton.message.impl.MessageImpl.decode(MessageImpl.java:629) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PROTON-576) proton-j: codec support for UTF-8 encoding and decoding appears broken?
[ https://issues.apache.org/jira/browse/PROTON-576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dominic Evans updated PROTON-576: - Attachment: 02_fix_stringtype_encode_decode.patch Patch refreshed. > proton-j: codec support for UTF-8 encoding and decoding appears broken? > --- > > Key: PROTON-576 > URL: https://issues.apache.org/jira/browse/PROTON-576 > Project: Qpid Proton > Issue Type: Bug > Components: proton-j >Affects Versions: 0.7 >Reporter: Dominic Evans > Attachments: 02_fix_stringtype_encode_decode.patch > > > It seems like Proton-J has its own custom UTF-8 encoder, but relies on Java > String's built-in UTF-8 decoder. However, the code doesn't seem quite right > and complex double byte UTF-8 like emoji ('📔🚢🍛🍴🍹🏊🏄') can quite easily fail to > parse: > | | Cause:1 :- java.lang.IllegalArgumentException: Cannot parse > String > | | Message:1 :- Cannot parse String > | | StackTrace:1 :- java.lang.IllegalArgumentException: Cannot parse > String > | | at > org.apache.qpid.proton.codec.StringType$1.decode(StringType.java:48) > | | at > org.apache.qpid.proton.codec.StringType$1.decode(StringType.java:36) > | | at > org.apache.qpid.proton.codec.DecoderImpl.readRaw(DecoderImpl.java:945) > | | at > org.apache.qpid.proton.codec.StringType$AllStringEncoding.readValue(StringType.java:172) > | | at > org.apache.qpid.proton.codec.StringType$AllStringEncoding.readValue(StringType.java:124) > | | at > org.apache.qpid.proton.codec.DynamicTypeConstructor.readValue(DynamicTypeConstructor.java:39) > | | at > org.apache.qpid.proton.codec.DecoderImpl.readObject(DecoderImpl.java:885) > | | at > org.apache.qpid.proton.message.impl.MessageImpl.decode(MessageImpl.java:629) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PROTON-576) proton-j: codec support for UTF-8 encoding and decoding appears broken?
[ https://issues.apache.org/jira/browse/PROTON-576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dominic Evans updated PROTON-576: - Attachment: (was: 02_fix_stringtype_encode_decode.patch) > proton-j: codec support for UTF-8 encoding and decoding appears broken? > --- > > Key: PROTON-576 > URL: https://issues.apache.org/jira/browse/PROTON-576 > Project: Qpid Proton > Issue Type: Bug > Components: proton-j >Affects Versions: 0.7 >Reporter: Dominic Evans > > It seems like Proton-J has its own custom UTF-8 encoder, but relies on Java > String's built-in UTF-8 decoder. However, the code doesn't seem quite right > and complex double byte UTF-8 like emoji ('📔🚢🍛🍴🍹🏊🏄') can quite easily fail to > parse: > | | Cause:1 :- java.lang.IllegalArgumentException: Cannot parse > String > | | Message:1 :- Cannot parse String > | | StackTrace:1 :- java.lang.IllegalArgumentException: Cannot parse > String > | | at > org.apache.qpid.proton.codec.StringType$1.decode(StringType.java:48) > | | at > org.apache.qpid.proton.codec.StringType$1.decode(StringType.java:36) > | | at > org.apache.qpid.proton.codec.DecoderImpl.readRaw(DecoderImpl.java:945) > | | at > org.apache.qpid.proton.codec.StringType$AllStringEncoding.readValue(StringType.java:172) > | | at > org.apache.qpid.proton.codec.StringType$AllStringEncoding.readValue(StringType.java:124) > | | at > org.apache.qpid.proton.codec.DynamicTypeConstructor.readValue(DynamicTypeConstructor.java:39) > | | at > org.apache.qpid.proton.codec.DecoderImpl.readObject(DecoderImpl.java:885) > | | at > org.apache.qpid.proton.message.impl.MessageImpl.decode(MessageImpl.java:629) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PROTON-576) proton-j: codec support for UTF-8 encoding and decoding appears broken?
[ https://issues.apache.org/jira/browse/PROTON-576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dominic Evans updated PROTON-576: - Attachment: 02_fix_stringtype_encode_decode.patch Updated patch that also contains a StringTypeTest.java unittest that loops over all the unicode characters in some "complex" ranges of the spec and attempts to round-trip them as an AmqpValue. The testcase fails against vanilld qpid-proton 0.7, but passes successfully after applying the accompanying patches. [~clebertsuconic] are you able to review and accept? > proton-j: codec support for UTF-8 encoding and decoding appears broken? > --- > > Key: PROTON-576 > URL: https://issues.apache.org/jira/browse/PROTON-576 > Project: Qpid Proton > Issue Type: Bug > Components: proton-j >Affects Versions: 0.7 >Reporter: Dominic Evans > Attachments: 02_fix_stringtype_encode_decode.patch > > > It seems like Proton-J has its own custom UTF-8 encoder, but relies on Java > String's built-in UTF-8 decoder. However, the code doesn't seem quite right > and complex double byte UTF-8 like emoji ('📔🚢🍛🍴🍹🏊🏄') can quite easily fail to > parse: > | | Cause:1 :- java.lang.IllegalArgumentException: Cannot parse > String > | | Message:1 :- Cannot parse String > | | StackTrace:1 :- java.lang.IllegalArgumentException: Cannot parse > String > | | at > org.apache.qpid.proton.codec.StringType$1.decode(StringType.java:48) > | | at > org.apache.qpid.proton.codec.StringType$1.decode(StringType.java:36) > | | at > org.apache.qpid.proton.codec.DecoderImpl.readRaw(DecoderImpl.java:945) > | | at > org.apache.qpid.proton.codec.StringType$AllStringEncoding.readValue(StringType.java:172) > | | at > org.apache.qpid.proton.codec.StringType$AllStringEncoding.readValue(StringType.java:124) > | | at > org.apache.qpid.proton.codec.DynamicTypeConstructor.readValue(DynamicTypeConstructor.java:39) > | | at > org.apache.qpid.proton.codec.DecoderImpl.readObject(DecoderImpl.java:885) > | | at > org.apache.qpid.proton.message.impl.MessageImpl.decode(MessageImpl.java:629) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PROTON-576) proton-j: codec support for UTF-8 encoding and decoding appears broken?
[ https://issues.apache.org/jira/browse/PROTON-576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dominic Evans updated PROTON-576: - Attachment: (was: benchmark.jar) > proton-j: codec support for UTF-8 encoding and decoding appears broken? > --- > > Key: PROTON-576 > URL: https://issues.apache.org/jira/browse/PROTON-576 > Project: Qpid Proton > Issue Type: Bug > Components: proton-j >Affects Versions: 0.7 >Reporter: Dominic Evans > > It seems like Proton-J has its own custom UTF-8 encoder, but relies on Java > String's built-in UTF-8 decoder. However, the code doesn't seem quite right > and complex double byte UTF-8 like emoji ('📔🚢🍛🍴🍹🏊🏄') can quite easily fail to > parse: > | | Cause:1 :- java.lang.IllegalArgumentException: Cannot parse > String > | | Message:1 :- Cannot parse String > | | StackTrace:1 :- java.lang.IllegalArgumentException: Cannot parse > String > | | at > org.apache.qpid.proton.codec.StringType$1.decode(StringType.java:48) > | | at > org.apache.qpid.proton.codec.StringType$1.decode(StringType.java:36) > | | at > org.apache.qpid.proton.codec.DecoderImpl.readRaw(DecoderImpl.java:945) > | | at > org.apache.qpid.proton.codec.StringType$AllStringEncoding.readValue(StringType.java:172) > | | at > org.apache.qpid.proton.codec.StringType$AllStringEncoding.readValue(StringType.java:124) > | | at > org.apache.qpid.proton.codec.DynamicTypeConstructor.readValue(DynamicTypeConstructor.java:39) > | | at > org.apache.qpid.proton.codec.DecoderImpl.readObject(DecoderImpl.java:885) > | | at > org.apache.qpid.proton.message.impl.MessageImpl.decode(MessageImpl.java:629) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PROTON-576) proton-j: codec support for UTF-8 encoding and decoding appears broken?
[ https://issues.apache.org/jira/browse/PROTON-576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dominic Evans updated PROTON-576: - Attachment: (was: Utf8Samples.txt) > proton-j: codec support for UTF-8 encoding and decoding appears broken? > --- > > Key: PROTON-576 > URL: https://issues.apache.org/jira/browse/PROTON-576 > Project: Qpid Proton > Issue Type: Bug > Components: proton-j >Affects Versions: 0.7 >Reporter: Dominic Evans > > It seems like Proton-J has its own custom UTF-8 encoder, but relies on Java > String's built-in UTF-8 decoder. However, the code doesn't seem quite right > and complex double byte UTF-8 like emoji ('📔🚢🍛🍴🍹🏊🏄') can quite easily fail to > parse: > | | Cause:1 :- java.lang.IllegalArgumentException: Cannot parse > String > | | Message:1 :- Cannot parse String > | | StackTrace:1 :- java.lang.IllegalArgumentException: Cannot parse > String > | | at > org.apache.qpid.proton.codec.StringType$1.decode(StringType.java:48) > | | at > org.apache.qpid.proton.codec.StringType$1.decode(StringType.java:36) > | | at > org.apache.qpid.proton.codec.DecoderImpl.readRaw(DecoderImpl.java:945) > | | at > org.apache.qpid.proton.codec.StringType$AllStringEncoding.readValue(StringType.java:172) > | | at > org.apache.qpid.proton.codec.StringType$AllStringEncoding.readValue(StringType.java:124) > | | at > org.apache.qpid.proton.codec.DynamicTypeConstructor.readValue(DynamicTypeConstructor.java:39) > | | at > org.apache.qpid.proton.codec.DecoderImpl.readObject(DecoderImpl.java:885) > | | at > org.apache.qpid.proton.message.impl.MessageImpl.decode(MessageImpl.java:629) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PROTON-576) proton-j: codec support for UTF-8 encoding and decoding appears broken?
[ https://issues.apache.org/jira/browse/PROTON-576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dominic Evans updated PROTON-576: - Attachment: (was: benchmark-src.zip) > proton-j: codec support for UTF-8 encoding and decoding appears broken? > --- > > Key: PROTON-576 > URL: https://issues.apache.org/jira/browse/PROTON-576 > Project: Qpid Proton > Issue Type: Bug > Components: proton-j >Affects Versions: 0.7 >Reporter: Dominic Evans > > It seems like Proton-J has its own custom UTF-8 encoder, but relies on Java > String's built-in UTF-8 decoder. However, the code doesn't seem quite right > and complex double byte UTF-8 like emoji ('📔🚢🍛🍴🍹🏊🏄') can quite easily fail to > parse: > | | Cause:1 :- java.lang.IllegalArgumentException: Cannot parse > String > | | Message:1 :- Cannot parse String > | | StackTrace:1 :- java.lang.IllegalArgumentException: Cannot parse > String > | | at > org.apache.qpid.proton.codec.StringType$1.decode(StringType.java:48) > | | at > org.apache.qpid.proton.codec.StringType$1.decode(StringType.java:36) > | | at > org.apache.qpid.proton.codec.DecoderImpl.readRaw(DecoderImpl.java:945) > | | at > org.apache.qpid.proton.codec.StringType$AllStringEncoding.readValue(StringType.java:172) > | | at > org.apache.qpid.proton.codec.StringType$AllStringEncoding.readValue(StringType.java:124) > | | at > org.apache.qpid.proton.codec.DynamicTypeConstructor.readValue(DynamicTypeConstructor.java:39) > | | at > org.apache.qpid.proton.codec.DecoderImpl.readObject(DecoderImpl.java:885) > | | at > org.apache.qpid.proton.message.impl.MessageImpl.decode(MessageImpl.java:629) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PROTON-576) proton-j: codec support for UTF-8 encoding and decoding appears broken?
[ https://issues.apache.org/jira/browse/PROTON-576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dominic Evans updated PROTON-576: - Attachment: (was: 02_fix_stringtype_encode_decode.patch) > proton-j: codec support for UTF-8 encoding and decoding appears broken? > --- > > Key: PROTON-576 > URL: https://issues.apache.org/jira/browse/PROTON-576 > Project: Qpid Proton > Issue Type: Bug > Components: proton-j >Affects Versions: 0.7 >Reporter: Dominic Evans > > It seems like Proton-J has its own custom UTF-8 encoder, but relies on Java > String's built-in UTF-8 decoder. However, the code doesn't seem quite right > and complex double byte UTF-8 like emoji ('📔🚢🍛🍴🍹🏊🏄') can quite easily fail to > parse: > | | Cause:1 :- java.lang.IllegalArgumentException: Cannot parse > String > | | Message:1 :- Cannot parse String > | | StackTrace:1 :- java.lang.IllegalArgumentException: Cannot parse > String > | | at > org.apache.qpid.proton.codec.StringType$1.decode(StringType.java:48) > | | at > org.apache.qpid.proton.codec.StringType$1.decode(StringType.java:36) > | | at > org.apache.qpid.proton.codec.DecoderImpl.readRaw(DecoderImpl.java:945) > | | at > org.apache.qpid.proton.codec.StringType$AllStringEncoding.readValue(StringType.java:172) > | | at > org.apache.qpid.proton.codec.StringType$AllStringEncoding.readValue(StringType.java:124) > | | at > org.apache.qpid.proton.codec.DynamicTypeConstructor.readValue(DynamicTypeConstructor.java:39) > | | at > org.apache.qpid.proton.codec.DecoderImpl.readObject(DecoderImpl.java:885) > | | at > org.apache.qpid.proton.message.impl.MessageImpl.decode(MessageImpl.java:629) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PROTON-576) proton-j: codec support for UTF-8 encoding and decoding appears broken?
[ https://issues.apache.org/jira/browse/PROTON-576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dominic Evans updated PROTON-576: - Attachment: Utf8Samples.txt 02_fix_stringtype_encode_decode.patch Updated patch to retain Proton's built-in fast hand-written UTF-8 encoder, but updated to conform to the UTF-8 spec and cope with high range surrogate pairs like Emoji characters. References: https://tools.ietf.org/html/rfc3629 Table 3.1b @ http://www.unicode.org/versions/corrigendum1.html This fixes the problems we were seeing. I've tested it by round tripping my Utf8Samples.txt and comparing the generated byte[] with those found by String#getBytes() and they seem to match. > proton-j: codec support for UTF-8 encoding and decoding appears broken? > --- > > Key: PROTON-576 > URL: https://issues.apache.org/jira/browse/PROTON-576 > Project: Qpid Proton > Issue Type: Bug > Components: proton-j >Affects Versions: 0.7 >Reporter: Dominic Evans > Attachments: 02_fix_stringtype_encode_decode.patch, Utf8Samples.txt, > benchmark-src.zip, benchmark.jar > > > It seems like Proton-J has its own custom UTF-8 encoder, but relies on Java > String's built-in UTF-8 decoder. However, the code doesn't seem quite right > and complex double byte UTF-8 like emoji ('📔🚢🍛🍴🍹🏊🏄') can quite easily fail to > parse: > | | Cause:1 :- java.lang.IllegalArgumentException: Cannot parse > String > | | Message:1 :- Cannot parse String > | | StackTrace:1 :- java.lang.IllegalArgumentException: Cannot parse > String > | | at > org.apache.qpid.proton.codec.StringType$1.decode(StringType.java:48) > | | at > org.apache.qpid.proton.codec.StringType$1.decode(StringType.java:36) > | | at > org.apache.qpid.proton.codec.DecoderImpl.readRaw(DecoderImpl.java:945) > | | at > org.apache.qpid.proton.codec.StringType$AllStringEncoding.readValue(StringType.java:172) > | | at > org.apache.qpid.proton.codec.StringType$AllStringEncoding.readValue(StringType.java:124) > | | at > org.apache.qpid.proton.codec.DynamicTypeConstructor.readValue(DynamicTypeConstructor.java:39) > | | at > org.apache.qpid.proton.codec.DecoderImpl.readObject(DecoderImpl.java:885) > | | at > org.apache.qpid.proton.message.impl.MessageImpl.decode(MessageImpl.java:629) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PROTON-576) proton-j: codec support for UTF-8 encoding and decoding appears broken?
[ https://issues.apache.org/jira/browse/PROTON-576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dominic Evans updated PROTON-576: - Attachment: (was: 02_fix_stringtype_encode_decode.patch) > proton-j: codec support for UTF-8 encoding and decoding appears broken? > --- > > Key: PROTON-576 > URL: https://issues.apache.org/jira/browse/PROTON-576 > Project: Qpid Proton > Issue Type: Bug > Components: proton-j >Affects Versions: 0.7 >Reporter: Dominic Evans > Attachments: benchmark-src.zip, benchmark.jar > > > It seems like Proton-J has its own custom UTF-8 encoder, but relies on Java > String's built-in UTF-8 decoder. However, the code doesn't seem quite right > and complex double byte UTF-8 like emoji ('📔🚢🍛🍴🍹🏊🏄') can quite easily fail to > parse: > | | Cause:1 :- java.lang.IllegalArgumentException: Cannot parse > String > | | Message:1 :- Cannot parse String > | | StackTrace:1 :- java.lang.IllegalArgumentException: Cannot parse > String > | | at > org.apache.qpid.proton.codec.StringType$1.decode(StringType.java:48) > | | at > org.apache.qpid.proton.codec.StringType$1.decode(StringType.java:36) > | | at > org.apache.qpid.proton.codec.DecoderImpl.readRaw(DecoderImpl.java:945) > | | at > org.apache.qpid.proton.codec.StringType$AllStringEncoding.readValue(StringType.java:172) > | | at > org.apache.qpid.proton.codec.StringType$AllStringEncoding.readValue(StringType.java:124) > | | at > org.apache.qpid.proton.codec.DynamicTypeConstructor.readValue(DynamicTypeConstructor.java:39) > | | at > org.apache.qpid.proton.codec.DecoderImpl.readObject(DecoderImpl.java:885) > | | at > org.apache.qpid.proton.message.impl.MessageImpl.decode(MessageImpl.java:629) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PROTON-576) proton-j: codec support for UTF-8 encoding and decoding appears broken?
[ https://issues.apache.org/jira/browse/PROTON-576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dominic Evans updated PROTON-576: - Attachment: benchmark-src.zip benchmark.jar Benchmark jar (and src) as requested. > proton-j: codec support for UTF-8 encoding and decoding appears broken? > --- > > Key: PROTON-576 > URL: https://issues.apache.org/jira/browse/PROTON-576 > Project: Qpid Proton > Issue Type: Bug > Components: proton-j >Affects Versions: 0.7 >Reporter: Dominic Evans > Attachments: 02_fix_stringtype_encode_decode.patch, > benchmark-src.zip, benchmark.jar > > > It seems like Proton-J has its own custom UTF-8 encoder, but relies on Java > String's built-in UTF-8 decoder. However, the code doesn't seem quite right > and complex double byte UTF-8 like emoji ('📔🚢🍛🍴🍹🏊🏄') can quite easily fail to > parse: > | | Cause:1 :- java.lang.IllegalArgumentException: Cannot parse > String > | | Message:1 :- Cannot parse String > | | StackTrace:1 :- java.lang.IllegalArgumentException: Cannot parse > String > | | at > org.apache.qpid.proton.codec.StringType$1.decode(StringType.java:48) > | | at > org.apache.qpid.proton.codec.StringType$1.decode(StringType.java:36) > | | at > org.apache.qpid.proton.codec.DecoderImpl.readRaw(DecoderImpl.java:945) > | | at > org.apache.qpid.proton.codec.StringType$AllStringEncoding.readValue(StringType.java:172) > | | at > org.apache.qpid.proton.codec.StringType$AllStringEncoding.readValue(StringType.java:124) > | | at > org.apache.qpid.proton.codec.DynamicTypeConstructor.readValue(DynamicTypeConstructor.java:39) > | | at > org.apache.qpid.proton.codec.DecoderImpl.readObject(DecoderImpl.java:885) > | | at > org.apache.qpid.proton.message.impl.MessageImpl.decode(MessageImpl.java:629) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PROTON-576) proton-j: codec support for UTF-8 encoding and decoding appears broken?
[ https://issues.apache.org/jira/browse/PROTON-576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dominic Evans updated PROTON-576: - Attachment: 02_fix_stringtype_encode_decode.patch I'm not sure why the code was originally written with a custom UTF-8 encoder. Please advise if there was a valid reason for this? For now, we are simply using the attached patch to modify the encoders to use Java's built-in support for doing String conversion to UTF-8 encoding. As you'd expect, this will round trip all UTF-8 messages successfully. > proton-j: codec support for UTF-8 encoding and decoding appears broken? > --- > > Key: PROTON-576 > URL: https://issues.apache.org/jira/browse/PROTON-576 > Project: Qpid Proton > Issue Type: Bug > Components: proton-j >Affects Versions: 0.7 >Reporter: Dominic Evans > Attachments: 02_fix_stringtype_encode_decode.patch > > > It seems like Proton-J has its own custom UTF-8 encoder, but relies on Java > String's built-in UTF-8 decoder. However, the code doesn't seem quite right > and complex double byte UTF-8 like emoji ('📔🚢🍛🍴🍹🏊🏄') can quite easily fail to > parse: > | | Cause:1 :- java.lang.IllegalArgumentException: Cannot parse > String > | | Message:1 :- Cannot parse String > | | StackTrace:1 :- java.lang.IllegalArgumentException: Cannot parse > String > | | at > org.apache.qpid.proton.codec.StringType$1.decode(StringType.java:48) > | | at > org.apache.qpid.proton.codec.StringType$1.decode(StringType.java:36) > | | at > org.apache.qpid.proton.codec.DecoderImpl.readRaw(DecoderImpl.java:945) > | | at > org.apache.qpid.proton.codec.StringType$AllStringEncoding.readValue(StringType.java:172) > | | at > org.apache.qpid.proton.codec.StringType$AllStringEncoding.readValue(StringType.java:124) > | | at > org.apache.qpid.proton.codec.DynamicTypeConstructor.readValue(DynamicTypeConstructor.java:39) > | | at > org.apache.qpid.proton.codec.DecoderImpl.readObject(DecoderImpl.java:885) > | | at > org.apache.qpid.proton.message.impl.MessageImpl.decode(MessageImpl.java:629) -- This message was sent by Atlassian JIRA (v6.2#6252)