[jira] [Updated] (PROTON-576) proton-j: codec support for UTF-8 encoding and decoding appears broken?

2014-09-23 Thread Rob Godfrey (JIRA)

 [ 
https://issues.apache.org/jira/browse/PROTON-576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rob Godfrey updated PROTON-576:
---
Attachment: PROTON-576.patch

I've changed the test a bit (so that it covers other values and not just the 
surrogate paris) and make a couple of small changes to the code which will 
hopefully get the performance even closer to the old version for non-surrogate 
pairs.  Unfortunately the benchmarking code attachment seems to have been 
removed at some point so I couldn't test that. 

(My very quick and dirty perf testing showed the original code at about 
4.1million encodes per sec, 4.0 for this patch and 3.6 for the previous 
patch...)

> proton-j: codec support for UTF-8 encoding and decoding appears broken?
> ---
>
> Key: PROTON-576
> URL: https://issues.apache.org/jira/browse/PROTON-576
> Project: Qpid Proton
>  Issue Type: Bug
>  Components: proton-j
>Affects Versions: 0.7
>Reporter: Dominic Evans
> Attachments: 02_fix_stringtype_encode_decode.patch, PROTON-576.patch
>
>
> It seems like Proton-J has its own custom UTF-8 encoder, but relies on Java 
> String's built-in UTF-8 decoder. However, the code doesn't seem quite right 
> and complex double byte UTF-8 like emoji ('📔🚢🍛🍴🍹🏊🏄') can quite easily fail to 
> parse:
> |   |   Cause:1   :-  java.lang.IllegalArgumentException: Cannot parse 
> String
> |   |   Message:1 :-  Cannot parse String
> |   |   StackTrace:1  :-  java.lang.IllegalArgumentException: Cannot parse 
> String
> |   | at 
> org.apache.qpid.proton.codec.StringType$1.decode(StringType.java:48)
> |   | at 
> org.apache.qpid.proton.codec.StringType$1.decode(StringType.java:36)
> |   | at 
> org.apache.qpid.proton.codec.DecoderImpl.readRaw(DecoderImpl.java:945)
> |   | at 
> org.apache.qpid.proton.codec.StringType$AllStringEncoding.readValue(StringType.java:172)
> |   | at 
> org.apache.qpid.proton.codec.StringType$AllStringEncoding.readValue(StringType.java:124)
> |   | at 
> org.apache.qpid.proton.codec.DynamicTypeConstructor.readValue(DynamicTypeConstructor.java:39)
> |   | at 
> org.apache.qpid.proton.codec.DecoderImpl.readObject(DecoderImpl.java:885)
> |   | at 
> org.apache.qpid.proton.message.impl.MessageImpl.decode(MessageImpl.java:629)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PROTON-576) proton-j: codec support for UTF-8 encoding and decoding appears broken?

2014-09-19 Thread Dominic Evans (JIRA)

 [ 
https://issues.apache.org/jira/browse/PROTON-576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominic Evans updated PROTON-576:
-
Attachment: 02_fix_stringtype_encode_decode.patch

Patch refreshed.

> proton-j: codec support for UTF-8 encoding and decoding appears broken?
> ---
>
> Key: PROTON-576
> URL: https://issues.apache.org/jira/browse/PROTON-576
> Project: Qpid Proton
>  Issue Type: Bug
>  Components: proton-j
>Affects Versions: 0.7
>Reporter: Dominic Evans
> Attachments: 02_fix_stringtype_encode_decode.patch
>
>
> It seems like Proton-J has its own custom UTF-8 encoder, but relies on Java 
> String's built-in UTF-8 decoder. However, the code doesn't seem quite right 
> and complex double byte UTF-8 like emoji ('📔🚢🍛🍴🍹🏊🏄') can quite easily fail to 
> parse:
> |   |   Cause:1   :-  java.lang.IllegalArgumentException: Cannot parse 
> String
> |   |   Message:1 :-  Cannot parse String
> |   |   StackTrace:1  :-  java.lang.IllegalArgumentException: Cannot parse 
> String
> |   | at 
> org.apache.qpid.proton.codec.StringType$1.decode(StringType.java:48)
> |   | at 
> org.apache.qpid.proton.codec.StringType$1.decode(StringType.java:36)
> |   | at 
> org.apache.qpid.proton.codec.DecoderImpl.readRaw(DecoderImpl.java:945)
> |   | at 
> org.apache.qpid.proton.codec.StringType$AllStringEncoding.readValue(StringType.java:172)
> |   | at 
> org.apache.qpid.proton.codec.StringType$AllStringEncoding.readValue(StringType.java:124)
> |   | at 
> org.apache.qpid.proton.codec.DynamicTypeConstructor.readValue(DynamicTypeConstructor.java:39)
> |   | at 
> org.apache.qpid.proton.codec.DecoderImpl.readObject(DecoderImpl.java:885)
> |   | at 
> org.apache.qpid.proton.message.impl.MessageImpl.decode(MessageImpl.java:629)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PROTON-576) proton-j: codec support for UTF-8 encoding and decoding appears broken?

2014-09-19 Thread Dominic Evans (JIRA)

 [ 
https://issues.apache.org/jira/browse/PROTON-576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominic Evans updated PROTON-576:
-
Attachment: (was: 02_fix_stringtype_encode_decode.patch)

> proton-j: codec support for UTF-8 encoding and decoding appears broken?
> ---
>
> Key: PROTON-576
> URL: https://issues.apache.org/jira/browse/PROTON-576
> Project: Qpid Proton
>  Issue Type: Bug
>  Components: proton-j
>Affects Versions: 0.7
>Reporter: Dominic Evans
>
> It seems like Proton-J has its own custom UTF-8 encoder, but relies on Java 
> String's built-in UTF-8 decoder. However, the code doesn't seem quite right 
> and complex double byte UTF-8 like emoji ('📔🚢🍛🍴🍹🏊🏄') can quite easily fail to 
> parse:
> |   |   Cause:1   :-  java.lang.IllegalArgumentException: Cannot parse 
> String
> |   |   Message:1 :-  Cannot parse String
> |   |   StackTrace:1  :-  java.lang.IllegalArgumentException: Cannot parse 
> String
> |   | at 
> org.apache.qpid.proton.codec.StringType$1.decode(StringType.java:48)
> |   | at 
> org.apache.qpid.proton.codec.StringType$1.decode(StringType.java:36)
> |   | at 
> org.apache.qpid.proton.codec.DecoderImpl.readRaw(DecoderImpl.java:945)
> |   | at 
> org.apache.qpid.proton.codec.StringType$AllStringEncoding.readValue(StringType.java:172)
> |   | at 
> org.apache.qpid.proton.codec.StringType$AllStringEncoding.readValue(StringType.java:124)
> |   | at 
> org.apache.qpid.proton.codec.DynamicTypeConstructor.readValue(DynamicTypeConstructor.java:39)
> |   | at 
> org.apache.qpid.proton.codec.DecoderImpl.readObject(DecoderImpl.java:885)
> |   | at 
> org.apache.qpid.proton.message.impl.MessageImpl.decode(MessageImpl.java:629)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PROTON-576) proton-j: codec support for UTF-8 encoding and decoding appears broken?

2014-06-30 Thread Dominic Evans (JIRA)

 [ 
https://issues.apache.org/jira/browse/PROTON-576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominic Evans updated PROTON-576:
-

Attachment: 02_fix_stringtype_encode_decode.patch

Updated patch that also contains a StringTypeTest.java unittest that loops over 
all the unicode characters in some "complex" ranges of the spec and attempts to 
round-trip them as an AmqpValue.

The testcase fails against vanilld qpid-proton 0.7, but passes successfully 
after applying the accompanying patches.

[~clebertsuconic] are you able to review and accept?



> proton-j: codec support for UTF-8 encoding and decoding appears broken?
> ---
>
> Key: PROTON-576
> URL: https://issues.apache.org/jira/browse/PROTON-576
> Project: Qpid Proton
>  Issue Type: Bug
>  Components: proton-j
>Affects Versions: 0.7
>Reporter: Dominic Evans
> Attachments: 02_fix_stringtype_encode_decode.patch
>
>
> It seems like Proton-J has its own custom UTF-8 encoder, but relies on Java 
> String's built-in UTF-8 decoder. However, the code doesn't seem quite right 
> and complex double byte UTF-8 like emoji ('📔🚢🍛🍴🍹🏊🏄') can quite easily fail to 
> parse:
> |   |   Cause:1   :-  java.lang.IllegalArgumentException: Cannot parse 
> String
> |   |   Message:1 :-  Cannot parse String
> |   |   StackTrace:1  :-  java.lang.IllegalArgumentException: Cannot parse 
> String
> |   | at 
> org.apache.qpid.proton.codec.StringType$1.decode(StringType.java:48)
> |   | at 
> org.apache.qpid.proton.codec.StringType$1.decode(StringType.java:36)
> |   | at 
> org.apache.qpid.proton.codec.DecoderImpl.readRaw(DecoderImpl.java:945)
> |   | at 
> org.apache.qpid.proton.codec.StringType$AllStringEncoding.readValue(StringType.java:172)
> |   | at 
> org.apache.qpid.proton.codec.StringType$AllStringEncoding.readValue(StringType.java:124)
> |   | at 
> org.apache.qpid.proton.codec.DynamicTypeConstructor.readValue(DynamicTypeConstructor.java:39)
> |   | at 
> org.apache.qpid.proton.codec.DecoderImpl.readObject(DecoderImpl.java:885)
> |   | at 
> org.apache.qpid.proton.message.impl.MessageImpl.decode(MessageImpl.java:629)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PROTON-576) proton-j: codec support for UTF-8 encoding and decoding appears broken?

2014-06-30 Thread Dominic Evans (JIRA)

 [ 
https://issues.apache.org/jira/browse/PROTON-576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominic Evans updated PROTON-576:
-

Attachment: (was: benchmark.jar)

> proton-j: codec support for UTF-8 encoding and decoding appears broken?
> ---
>
> Key: PROTON-576
> URL: https://issues.apache.org/jira/browse/PROTON-576
> Project: Qpid Proton
>  Issue Type: Bug
>  Components: proton-j
>Affects Versions: 0.7
>Reporter: Dominic Evans
>
> It seems like Proton-J has its own custom UTF-8 encoder, but relies on Java 
> String's built-in UTF-8 decoder. However, the code doesn't seem quite right 
> and complex double byte UTF-8 like emoji ('📔🚢🍛🍴🍹🏊🏄') can quite easily fail to 
> parse:
> |   |   Cause:1   :-  java.lang.IllegalArgumentException: Cannot parse 
> String
> |   |   Message:1 :-  Cannot parse String
> |   |   StackTrace:1  :-  java.lang.IllegalArgumentException: Cannot parse 
> String
> |   | at 
> org.apache.qpid.proton.codec.StringType$1.decode(StringType.java:48)
> |   | at 
> org.apache.qpid.proton.codec.StringType$1.decode(StringType.java:36)
> |   | at 
> org.apache.qpid.proton.codec.DecoderImpl.readRaw(DecoderImpl.java:945)
> |   | at 
> org.apache.qpid.proton.codec.StringType$AllStringEncoding.readValue(StringType.java:172)
> |   | at 
> org.apache.qpid.proton.codec.StringType$AllStringEncoding.readValue(StringType.java:124)
> |   | at 
> org.apache.qpid.proton.codec.DynamicTypeConstructor.readValue(DynamicTypeConstructor.java:39)
> |   | at 
> org.apache.qpid.proton.codec.DecoderImpl.readObject(DecoderImpl.java:885)
> |   | at 
> org.apache.qpid.proton.message.impl.MessageImpl.decode(MessageImpl.java:629)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PROTON-576) proton-j: codec support for UTF-8 encoding and decoding appears broken?

2014-06-30 Thread Dominic Evans (JIRA)

 [ 
https://issues.apache.org/jira/browse/PROTON-576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominic Evans updated PROTON-576:
-

Attachment: (was: Utf8Samples.txt)

> proton-j: codec support for UTF-8 encoding and decoding appears broken?
> ---
>
> Key: PROTON-576
> URL: https://issues.apache.org/jira/browse/PROTON-576
> Project: Qpid Proton
>  Issue Type: Bug
>  Components: proton-j
>Affects Versions: 0.7
>Reporter: Dominic Evans
>
> It seems like Proton-J has its own custom UTF-8 encoder, but relies on Java 
> String's built-in UTF-8 decoder. However, the code doesn't seem quite right 
> and complex double byte UTF-8 like emoji ('📔🚢🍛🍴🍹🏊🏄') can quite easily fail to 
> parse:
> |   |   Cause:1   :-  java.lang.IllegalArgumentException: Cannot parse 
> String
> |   |   Message:1 :-  Cannot parse String
> |   |   StackTrace:1  :-  java.lang.IllegalArgumentException: Cannot parse 
> String
> |   | at 
> org.apache.qpid.proton.codec.StringType$1.decode(StringType.java:48)
> |   | at 
> org.apache.qpid.proton.codec.StringType$1.decode(StringType.java:36)
> |   | at 
> org.apache.qpid.proton.codec.DecoderImpl.readRaw(DecoderImpl.java:945)
> |   | at 
> org.apache.qpid.proton.codec.StringType$AllStringEncoding.readValue(StringType.java:172)
> |   | at 
> org.apache.qpid.proton.codec.StringType$AllStringEncoding.readValue(StringType.java:124)
> |   | at 
> org.apache.qpid.proton.codec.DynamicTypeConstructor.readValue(DynamicTypeConstructor.java:39)
> |   | at 
> org.apache.qpid.proton.codec.DecoderImpl.readObject(DecoderImpl.java:885)
> |   | at 
> org.apache.qpid.proton.message.impl.MessageImpl.decode(MessageImpl.java:629)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PROTON-576) proton-j: codec support for UTF-8 encoding and decoding appears broken?

2014-06-30 Thread Dominic Evans (JIRA)

 [ 
https://issues.apache.org/jira/browse/PROTON-576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominic Evans updated PROTON-576:
-

Attachment: (was: benchmark-src.zip)

> proton-j: codec support for UTF-8 encoding and decoding appears broken?
> ---
>
> Key: PROTON-576
> URL: https://issues.apache.org/jira/browse/PROTON-576
> Project: Qpid Proton
>  Issue Type: Bug
>  Components: proton-j
>Affects Versions: 0.7
>Reporter: Dominic Evans
>
> It seems like Proton-J has its own custom UTF-8 encoder, but relies on Java 
> String's built-in UTF-8 decoder. However, the code doesn't seem quite right 
> and complex double byte UTF-8 like emoji ('📔🚢🍛🍴🍹🏊🏄') can quite easily fail to 
> parse:
> |   |   Cause:1   :-  java.lang.IllegalArgumentException: Cannot parse 
> String
> |   |   Message:1 :-  Cannot parse String
> |   |   StackTrace:1  :-  java.lang.IllegalArgumentException: Cannot parse 
> String
> |   | at 
> org.apache.qpid.proton.codec.StringType$1.decode(StringType.java:48)
> |   | at 
> org.apache.qpid.proton.codec.StringType$1.decode(StringType.java:36)
> |   | at 
> org.apache.qpid.proton.codec.DecoderImpl.readRaw(DecoderImpl.java:945)
> |   | at 
> org.apache.qpid.proton.codec.StringType$AllStringEncoding.readValue(StringType.java:172)
> |   | at 
> org.apache.qpid.proton.codec.StringType$AllStringEncoding.readValue(StringType.java:124)
> |   | at 
> org.apache.qpid.proton.codec.DynamicTypeConstructor.readValue(DynamicTypeConstructor.java:39)
> |   | at 
> org.apache.qpid.proton.codec.DecoderImpl.readObject(DecoderImpl.java:885)
> |   | at 
> org.apache.qpid.proton.message.impl.MessageImpl.decode(MessageImpl.java:629)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PROTON-576) proton-j: codec support for UTF-8 encoding and decoding appears broken?

2014-06-30 Thread Dominic Evans (JIRA)

 [ 
https://issues.apache.org/jira/browse/PROTON-576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominic Evans updated PROTON-576:
-

Attachment: (was: 02_fix_stringtype_encode_decode.patch)

> proton-j: codec support for UTF-8 encoding and decoding appears broken?
> ---
>
> Key: PROTON-576
> URL: https://issues.apache.org/jira/browse/PROTON-576
> Project: Qpid Proton
>  Issue Type: Bug
>  Components: proton-j
>Affects Versions: 0.7
>Reporter: Dominic Evans
>
> It seems like Proton-J has its own custom UTF-8 encoder, but relies on Java 
> String's built-in UTF-8 decoder. However, the code doesn't seem quite right 
> and complex double byte UTF-8 like emoji ('📔🚢🍛🍴🍹🏊🏄') can quite easily fail to 
> parse:
> |   |   Cause:1   :-  java.lang.IllegalArgumentException: Cannot parse 
> String
> |   |   Message:1 :-  Cannot parse String
> |   |   StackTrace:1  :-  java.lang.IllegalArgumentException: Cannot parse 
> String
> |   | at 
> org.apache.qpid.proton.codec.StringType$1.decode(StringType.java:48)
> |   | at 
> org.apache.qpid.proton.codec.StringType$1.decode(StringType.java:36)
> |   | at 
> org.apache.qpid.proton.codec.DecoderImpl.readRaw(DecoderImpl.java:945)
> |   | at 
> org.apache.qpid.proton.codec.StringType$AllStringEncoding.readValue(StringType.java:172)
> |   | at 
> org.apache.qpid.proton.codec.StringType$AllStringEncoding.readValue(StringType.java:124)
> |   | at 
> org.apache.qpid.proton.codec.DynamicTypeConstructor.readValue(DynamicTypeConstructor.java:39)
> |   | at 
> org.apache.qpid.proton.codec.DecoderImpl.readObject(DecoderImpl.java:885)
> |   | at 
> org.apache.qpid.proton.message.impl.MessageImpl.decode(MessageImpl.java:629)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PROTON-576) proton-j: codec support for UTF-8 encoding and decoding appears broken?

2014-05-02 Thread Dominic Evans (JIRA)

 [ 
https://issues.apache.org/jira/browse/PROTON-576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominic Evans updated PROTON-576:
-

Attachment: Utf8Samples.txt
02_fix_stringtype_encode_decode.patch

Updated patch to retain Proton's built-in fast hand-written UTF-8 encoder, but 
updated to conform to the UTF-8 spec and cope with high range surrogate pairs 
like Emoji characters.

References:
https://tools.ietf.org/html/rfc3629
Table 3.1b @ http://www.unicode.org/versions/corrigendum1.html

This fixes the problems we were seeing. I've tested it by round tripping my 
Utf8Samples.txt and comparing the generated byte[] with those found by 
String#getBytes() and they seem to match.

> proton-j: codec support for UTF-8 encoding and decoding appears broken?
> ---
>
> Key: PROTON-576
> URL: https://issues.apache.org/jira/browse/PROTON-576
> Project: Qpid Proton
>  Issue Type: Bug
>  Components: proton-j
>Affects Versions: 0.7
>Reporter: Dominic Evans
> Attachments: 02_fix_stringtype_encode_decode.patch, Utf8Samples.txt, 
> benchmark-src.zip, benchmark.jar
>
>
> It seems like Proton-J has its own custom UTF-8 encoder, but relies on Java 
> String's built-in UTF-8 decoder. However, the code doesn't seem quite right 
> and complex double byte UTF-8 like emoji ('📔🚢🍛🍴🍹🏊🏄') can quite easily fail to 
> parse:
> |   |   Cause:1   :-  java.lang.IllegalArgumentException: Cannot parse 
> String
> |   |   Message:1 :-  Cannot parse String
> |   |   StackTrace:1  :-  java.lang.IllegalArgumentException: Cannot parse 
> String
> |   | at 
> org.apache.qpid.proton.codec.StringType$1.decode(StringType.java:48)
> |   | at 
> org.apache.qpid.proton.codec.StringType$1.decode(StringType.java:36)
> |   | at 
> org.apache.qpid.proton.codec.DecoderImpl.readRaw(DecoderImpl.java:945)
> |   | at 
> org.apache.qpid.proton.codec.StringType$AllStringEncoding.readValue(StringType.java:172)
> |   | at 
> org.apache.qpid.proton.codec.StringType$AllStringEncoding.readValue(StringType.java:124)
> |   | at 
> org.apache.qpid.proton.codec.DynamicTypeConstructor.readValue(DynamicTypeConstructor.java:39)
> |   | at 
> org.apache.qpid.proton.codec.DecoderImpl.readObject(DecoderImpl.java:885)
> |   | at 
> org.apache.qpid.proton.message.impl.MessageImpl.decode(MessageImpl.java:629)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PROTON-576) proton-j: codec support for UTF-8 encoding and decoding appears broken?

2014-05-02 Thread Dominic Evans (JIRA)

 [ 
https://issues.apache.org/jira/browse/PROTON-576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominic Evans updated PROTON-576:
-

Attachment: (was: 02_fix_stringtype_encode_decode.patch)

> proton-j: codec support for UTF-8 encoding and decoding appears broken?
> ---
>
> Key: PROTON-576
> URL: https://issues.apache.org/jira/browse/PROTON-576
> Project: Qpid Proton
>  Issue Type: Bug
>  Components: proton-j
>Affects Versions: 0.7
>Reporter: Dominic Evans
> Attachments: benchmark-src.zip, benchmark.jar
>
>
> It seems like Proton-J has its own custom UTF-8 encoder, but relies on Java 
> String's built-in UTF-8 decoder. However, the code doesn't seem quite right 
> and complex double byte UTF-8 like emoji ('📔🚢🍛🍴🍹🏊🏄') can quite easily fail to 
> parse:
> |   |   Cause:1   :-  java.lang.IllegalArgumentException: Cannot parse 
> String
> |   |   Message:1 :-  Cannot parse String
> |   |   StackTrace:1  :-  java.lang.IllegalArgumentException: Cannot parse 
> String
> |   | at 
> org.apache.qpid.proton.codec.StringType$1.decode(StringType.java:48)
> |   | at 
> org.apache.qpid.proton.codec.StringType$1.decode(StringType.java:36)
> |   | at 
> org.apache.qpid.proton.codec.DecoderImpl.readRaw(DecoderImpl.java:945)
> |   | at 
> org.apache.qpid.proton.codec.StringType$AllStringEncoding.readValue(StringType.java:172)
> |   | at 
> org.apache.qpid.proton.codec.StringType$AllStringEncoding.readValue(StringType.java:124)
> |   | at 
> org.apache.qpid.proton.codec.DynamicTypeConstructor.readValue(DynamicTypeConstructor.java:39)
> |   | at 
> org.apache.qpid.proton.codec.DecoderImpl.readObject(DecoderImpl.java:885)
> |   | at 
> org.apache.qpid.proton.message.impl.MessageImpl.decode(MessageImpl.java:629)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PROTON-576) proton-j: codec support for UTF-8 encoding and decoding appears broken?

2014-05-01 Thread Dominic Evans (JIRA)

 [ 
https://issues.apache.org/jira/browse/PROTON-576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominic Evans updated PROTON-576:
-

Attachment: benchmark-src.zip
benchmark.jar

Benchmark jar (and src) as requested.

> proton-j: codec support for UTF-8 encoding and decoding appears broken?
> ---
>
> Key: PROTON-576
> URL: https://issues.apache.org/jira/browse/PROTON-576
> Project: Qpid Proton
>  Issue Type: Bug
>  Components: proton-j
>Affects Versions: 0.7
>Reporter: Dominic Evans
> Attachments: 02_fix_stringtype_encode_decode.patch, 
> benchmark-src.zip, benchmark.jar
>
>
> It seems like Proton-J has its own custom UTF-8 encoder, but relies on Java 
> String's built-in UTF-8 decoder. However, the code doesn't seem quite right 
> and complex double byte UTF-8 like emoji ('📔🚢🍛🍴🍹🏊🏄') can quite easily fail to 
> parse:
> |   |   Cause:1   :-  java.lang.IllegalArgumentException: Cannot parse 
> String
> |   |   Message:1 :-  Cannot parse String
> |   |   StackTrace:1  :-  java.lang.IllegalArgumentException: Cannot parse 
> String
> |   | at 
> org.apache.qpid.proton.codec.StringType$1.decode(StringType.java:48)
> |   | at 
> org.apache.qpid.proton.codec.StringType$1.decode(StringType.java:36)
> |   | at 
> org.apache.qpid.proton.codec.DecoderImpl.readRaw(DecoderImpl.java:945)
> |   | at 
> org.apache.qpid.proton.codec.StringType$AllStringEncoding.readValue(StringType.java:172)
> |   | at 
> org.apache.qpid.proton.codec.StringType$AllStringEncoding.readValue(StringType.java:124)
> |   | at 
> org.apache.qpid.proton.codec.DynamicTypeConstructor.readValue(DynamicTypeConstructor.java:39)
> |   | at 
> org.apache.qpid.proton.codec.DecoderImpl.readObject(DecoderImpl.java:885)
> |   | at 
> org.apache.qpid.proton.message.impl.MessageImpl.decode(MessageImpl.java:629)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PROTON-576) proton-j: codec support for UTF-8 encoding and decoding appears broken?

2014-05-01 Thread Dominic Evans (JIRA)

 [ 
https://issues.apache.org/jira/browse/PROTON-576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dominic Evans updated PROTON-576:
-

Attachment: 02_fix_stringtype_encode_decode.patch

I'm not sure why the code was originally written with a custom UTF-8 encoder. 
Please advise if there was a valid reason for this?

For now, we are simply using the attached patch to modify the encoders to use 
Java's built-in support for doing String conversion to UTF-8 encoding. As you'd 
expect, this will round trip all UTF-8 messages successfully.

> proton-j: codec support for UTF-8 encoding and decoding appears broken?
> ---
>
> Key: PROTON-576
> URL: https://issues.apache.org/jira/browse/PROTON-576
> Project: Qpid Proton
>  Issue Type: Bug
>  Components: proton-j
>Affects Versions: 0.7
>Reporter: Dominic Evans
> Attachments: 02_fix_stringtype_encode_decode.patch
>
>
> It seems like Proton-J has its own custom UTF-8 encoder, but relies on Java 
> String's built-in UTF-8 decoder. However, the code doesn't seem quite right 
> and complex double byte UTF-8 like emoji ('📔🚢🍛🍴🍹🏊🏄') can quite easily fail to 
> parse:
> |   |   Cause:1   :-  java.lang.IllegalArgumentException: Cannot parse 
> String
> |   |   Message:1 :-  Cannot parse String
> |   |   StackTrace:1  :-  java.lang.IllegalArgumentException: Cannot parse 
> String
> |   | at 
> org.apache.qpid.proton.codec.StringType$1.decode(StringType.java:48)
> |   | at 
> org.apache.qpid.proton.codec.StringType$1.decode(StringType.java:36)
> |   | at 
> org.apache.qpid.proton.codec.DecoderImpl.readRaw(DecoderImpl.java:945)
> |   | at 
> org.apache.qpid.proton.codec.StringType$AllStringEncoding.readValue(StringType.java:172)
> |   | at 
> org.apache.qpid.proton.codec.StringType$AllStringEncoding.readValue(StringType.java:124)
> |   | at 
> org.apache.qpid.proton.codec.DynamicTypeConstructor.readValue(DynamicTypeConstructor.java:39)
> |   | at 
> org.apache.qpid.proton.codec.DecoderImpl.readObject(DecoderImpl.java:885)
> |   | at 
> org.apache.qpid.proton.message.impl.MessageImpl.decode(MessageImpl.java:629)



--
This message was sent by Atlassian JIRA
(v6.2#6252)