[jira] [Commented] (AVRO-1584) Json output doesn't generate base64 for byte arrays

2015-12-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057147#comment-15057147
 ] 

Hudson commented on AVRO-1584:
--

SUCCESS: Integrated in AvroJava #560 (See 
[https://builds.apache.org/job/AvroJava/560/])
AVRO-1584: Java: Escape characters not allowed in JSON in toString.

>From the JSON spec: "All Unicode characters may be placed within the
quotation marks except for the characters that must be escaped:
quotation mark, reverse solidus, and the control characters (U+
through U+001F)."

This uses the existing string escape function. (blue: rev 1720055)
* trunk/CHANGES.txt
* trunk/lang/java/avro/src/main/java/org/apache/avro/generic/GenericData.java
* 
trunk/lang/java/avro/src/test/java/org/apache/avro/generic/TestGenericData.java


> Json output doesn't generate base64 for byte arrays
> ---
>
> Key: AVRO-1584
> URL: https://issues.apache.org/jira/browse/AVRO-1584
> Project: Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.7.7
> Environment: Pure java.
>Reporter: Christophe Lorenz
> Attachments: AVRO-1584-Jackson-Base64-Default-Variant.patch, 
> AVRO-1584.1.patch, AVRO-1584.patch
>
>
> The Json output of java generated code doesn't correctly encode byte arrays.
> Using this simple schema : 
> {"namespace": "example.avro",
>  "type": "record",
>  "name": "ByteArrayEncoding",
>  "fields": [ {"name": "data", "type": "bytes"} ]
> }
> The toString()  
>   System.out.println(new ByteArrayEncoding(ByteBuffer.wrap(new 
> byte[]{0,31,65,66,67,(byte)255,(byte)182})));
> Returns raw bytes to string in the json :
> {"data": {"bytes": "  ABC??"}}
> As a byte array is not tied to be a valid string, it should be converted back 
> and forth to Base64 like other Json implementations : 
> {"data": {"bytes": "AB9BQkP/tg=="}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AVRO-1584) Json output doesn't generate base64 for byte arrays

2015-12-14 Thread Doug Cutting (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057074#comment-15057074
 ] 

Doug Cutting commented on AVRO-1584:


Ryan, your patch looks good to me.  +1

> Json output doesn't generate base64 for byte arrays
> ---
>
> Key: AVRO-1584
> URL: https://issues.apache.org/jira/browse/AVRO-1584
> Project: Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.7.7
> Environment: Pure java.
>Reporter: Christophe Lorenz
> Attachments: AVRO-1584-Jackson-Base64-Default-Variant.patch, 
> AVRO-1584.1.patch, AVRO-1584.patch
>
>
> The Json output of java generated code doesn't correctly encode byte arrays.
> Using this simple schema : 
> {"namespace": "example.avro",
>  "type": "record",
>  "name": "ByteArrayEncoding",
>  "fields": [ {"name": "data", "type": "bytes"} ]
> }
> The toString()  
>   System.out.println(new ByteArrayEncoding(ByteBuffer.wrap(new 
> byte[]{0,31,65,66,67,(byte)255,(byte)182})));
> Returns raw bytes to string in the json :
> {"data": {"bytes": "  ABC??"}}
> As a byte array is not tied to be a valid string, it should be converted back 
> and forth to Base64 like other Json implementations : 
> {"data": {"bytes": "AB9BQkP/tg=="}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AVRO-1584) Json output doesn't generate base64 for byte arrays

2015-12-14 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057087#comment-15057087
 ] 

Ryan Blue commented on AVRO-1584:
-

Committed. Thanks for taking a look, Doug!

> Json output doesn't generate base64 for byte arrays
> ---
>
> Key: AVRO-1584
> URL: https://issues.apache.org/jira/browse/AVRO-1584
> Project: Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.7.7
> Environment: Pure java.
>Reporter: Christophe Lorenz
> Attachments: AVRO-1584-Jackson-Base64-Default-Variant.patch, 
> AVRO-1584.1.patch, AVRO-1584.patch
>
>
> The Json output of java generated code doesn't correctly encode byte arrays.
> Using this simple schema : 
> {"namespace": "example.avro",
>  "type": "record",
>  "name": "ByteArrayEncoding",
>  "fields": [ {"name": "data", "type": "bytes"} ]
> }
> The toString()  
>   System.out.println(new ByteArrayEncoding(ByteBuffer.wrap(new 
> byte[]{0,31,65,66,67,(byte)255,(byte)182})));
> Returns raw bytes to string in the json :
> {"data": {"bytes": "  ABC??"}}
> As a byte array is not tied to be a valid string, it should be converted back 
> and forth to Base64 like other Json implementations : 
> {"data": {"bytes": "AB9BQkP/tg=="}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AVRO-1584) Json output doesn't generate base64 for byte arrays

2015-12-14 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057086#comment-15057086
 ] 

ASF subversion and git services commented on AVRO-1584:
---

Commit 1720055 from [~b...@cloudera.com] in branch 'avro/trunk'
[ https://svn.apache.org/r1720055 ]

AVRO-1584: Java: Escape characters not allowed in JSON in toString.

>From the JSON spec: "All Unicode characters may be placed within the
quotation marks except for the characters that must be escaped:
quotation mark, reverse solidus, and the control characters (U+
through U+001F)."

This uses the existing string escape function.

> Json output doesn't generate base64 for byte arrays
> ---
>
> Key: AVRO-1584
> URL: https://issues.apache.org/jira/browse/AVRO-1584
> Project: Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.7.7
> Environment: Pure java.
>Reporter: Christophe Lorenz
> Attachments: AVRO-1584-Jackson-Base64-Default-Variant.patch, 
> AVRO-1584.1.patch, AVRO-1584.patch
>
>
> The Json output of java generated code doesn't correctly encode byte arrays.
> Using this simple schema : 
> {"namespace": "example.avro",
>  "type": "record",
>  "name": "ByteArrayEncoding",
>  "fields": [ {"name": "data", "type": "bytes"} ]
> }
> The toString()  
>   System.out.println(new ByteArrayEncoding(ByteBuffer.wrap(new 
> byte[]{0,31,65,66,67,(byte)255,(byte)182})));
> Returns raw bytes to string in the json :
> {"data": {"bytes": "  ABC??"}}
> As a byte array is not tied to be a valid string, it should be converted back 
> and forth to Base64 like other Json implementations : 
> {"data": {"bytes": "AB9BQkP/tg=="}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AVRO-1584) Json output doesn't generate base64 for byte arrays

2015-12-14 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057097#comment-15057097
 ] 

Ryan Blue commented on AVRO-1584:
-

[~lemieud], thank you for your work to get this addressed! I think that the fix 
we ended up with should fix the problem you were seeing since the control 
characters will be properly escaped. If moving to base64 is important to you as 
well, then I think the right way forward is to help standardize a different 
JSON encoding, like Doug suggested for the long term. For now, I'm going to 
mark this issue resolved since we've decided the way forward in the short term.

> Json output doesn't generate base64 for byte arrays
> ---
>
> Key: AVRO-1584
> URL: https://issues.apache.org/jira/browse/AVRO-1584
> Project: Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.7.7
> Environment: Pure java.
>Reporter: Christophe Lorenz
> Attachments: AVRO-1584-Jackson-Base64-Default-Variant.patch, 
> AVRO-1584.1.patch, AVRO-1584.patch
>
>
> The Json output of java generated code doesn't correctly encode byte arrays.
> Using this simple schema : 
> {"namespace": "example.avro",
>  "type": "record",
>  "name": "ByteArrayEncoding",
>  "fields": [ {"name": "data", "type": "bytes"} ]
> }
> The toString()  
>   System.out.println(new ByteArrayEncoding(ByteBuffer.wrap(new 
> byte[]{0,31,65,66,67,(byte)255,(byte)182})));
> Returns raw bytes to string in the json :
> {"data": {"bytes": "  ABC??"}}
> As a byte array is not tied to be a valid string, it should be converted back 
> and forth to Base64 like other Json implementations : 
> {"data": {"bytes": "AB9BQkP/tg=="}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AVRO-1584) Json output doesn't generate base64 for byte arrays

2015-12-14 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057042#comment-15057042
 ] 

Ryan Blue commented on AVRO-1584:
-

Thanks for the context, Doug.

I agree that we shouldn't change the specified JSON encoding or toString 
behavior. I think we have some flexibility with toString, since it was intended 
for debugging (so it isn't used to encode default values) and doesn't encode 
either bytes or fixed as expected. For bytes, an extra object layer is added 
and fixed is encoded as an array of integers. I think that makes it unlikely 
that anyone would use it to serialize data as JSON, but I have no problem being 
cautious and not breaking anything unless we have a plan for what toString 
should produce.

I'm attaching a patch that uses the fix from AVRO-713 to fix just the escape 
problem. It also adds tests to validate the current behavior of toString for 
bytes and fixed.

> Json output doesn't generate base64 for byte arrays
> ---
>
> Key: AVRO-1584
> URL: https://issues.apache.org/jira/browse/AVRO-1584
> Project: Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.7.7
> Environment: Pure java.
>Reporter: Christophe Lorenz
> Attachments: AVRO-1584-Jackson-Base64-Default-Variant.patch, 
> AVRO-1584.patch
>
>
> The Json output of java generated code doesn't correctly encode byte arrays.
> Using this simple schema : 
> {"namespace": "example.avro",
>  "type": "record",
>  "name": "ByteArrayEncoding",
>  "fields": [ {"name": "data", "type": "bytes"} ]
> }
> The toString()  
>   System.out.println(new ByteArrayEncoding(ByteBuffer.wrap(new 
> byte[]{0,31,65,66,67,(byte)255,(byte)182})));
> Returns raw bytes to string in the json :
> {"data": {"bytes": "  ABC??"}}
> As a byte array is not tied to be a valid string, it should be converted back 
> and forth to Base64 like other Json implementations : 
> {"data": {"bytes": "AB9BQkP/tg=="}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AVRO-1584) Json output doesn't generate base64 for byte arrays

2015-12-14 Thread David Lemieux (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057312#comment-15057312
 ] 

David Lemieux commented on AVRO-1584:
-

[~rdblue] My pleasure. I agree that is should fix my problem. Thanks

> Json output doesn't generate base64 for byte arrays
> ---
>
> Key: AVRO-1584
> URL: https://issues.apache.org/jira/browse/AVRO-1584
> Project: Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.7.7
> Environment: Pure java.
>Reporter: Christophe Lorenz
> Attachments: AVRO-1584-Jackson-Base64-Default-Variant.patch, 
> AVRO-1584.1.patch, AVRO-1584.patch
>
>
> The Json output of java generated code doesn't correctly encode byte arrays.
> Using this simple schema : 
> {"namespace": "example.avro",
>  "type": "record",
>  "name": "ByteArrayEncoding",
>  "fields": [ {"name": "data", "type": "bytes"} ]
> }
> The toString()  
>   System.out.println(new ByteArrayEncoding(ByteBuffer.wrap(new 
> byte[]{0,31,65,66,67,(byte)255,(byte)182})));
> Returns raw bytes to string in the json :
> {"data": {"bytes": "  ABC??"}}
> As a byte array is not tied to be a valid string, it should be converted back 
> and forth to Base64 like other Json implementations : 
> {"data": {"bytes": "AB9BQkP/tg=="}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AVRO-1584) Json output doesn't generate base64 for byte arrays

2015-12-14 Thread Doug Cutting (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056779#comment-15056779
 ] 

Doug Cutting commented on AVRO-1584:


Ryan, I agree this is a bug in the current implementation.  According to 
section RFC 4627, control characters must be escaped.
bq. All Unicode characters may be placed within the quotation marks except for 
the characters that must be escaped: quotation mark, reverse solidus, and the 
control characters (U+ through U+001F).
I note that this was fixed for strings in AVRO-713 and we can probably share 
this logic.

The difference between toString() JSON and Avro's JSON data encoding is 
longstanding and primarily around the encoding of unions.  For full read/write 
fidelity, many union values must be tagged with their type, so that's what the 
JSON encoding requires.  The toString() encoding was not intended for data 
fidelity but for debugging, so a simpler version was implemented.  (It actually 
pre-dates the specification of the JSON encoding.)  It so happens that default 
values in schemas do not need to be tagged, so the toString() format is 
identical to the default-value format.

However there are frequent requests for a reader that accepts such an untagged 
format, for interaction with other JSON-generating software.  In retrospect, 
the JSON encoding should perhaps not require tagging for unions with null or 
unions between a primitive and a non-primitive, i.e., only tag unions when it's 
required.  We instead opted for simplicity of specification implementation, to 
ease interoperability between various Avro implementations, when perhaps in 
this case we should have optimized for ease of interoperability with non-Avro 
producers and consumers of JSON.

So long-term we might add an encoder/decoder that doesn't handle unions at all 
or that handles them more parsimoniously, then perhaps implement default values 
and toString() using this encoding.  But I don't think we should alter the 
currently specified JSON encoding, nor change the default or toString() format.



> Json output doesn't generate base64 for byte arrays
> ---
>
> Key: AVRO-1584
> URL: https://issues.apache.org/jira/browse/AVRO-1584
> Project: Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.7.7
> Environment: Pure java.
>Reporter: Christophe Lorenz
> Attachments: AVRO-1584-Jackson-Base64-Default-Variant.patch, 
> AVRO-1584.patch
>
>
> The Json output of java generated code doesn't correctly encode byte arrays.
> Using this simple schema : 
> {"namespace": "example.avro",
>  "type": "record",
>  "name": "ByteArrayEncoding",
>  "fields": [ {"name": "data", "type": "bytes"} ]
> }
> The toString()  
>   System.out.println(new ByteArrayEncoding(ByteBuffer.wrap(new 
> byte[]{0,31,65,66,67,(byte)255,(byte)182})));
> Returns raw bytes to string in the json :
> {"data": {"bytes": "  ABC??"}}
> As a byte array is not tied to be a valid string, it should be converted back 
> and forth to Base64 like other Json implementations : 
> {"data": {"bytes": "AB9BQkP/tg=="}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AVRO-1584) Json output doesn't generate base64 for byte arrays

2015-12-14 Thread Doug Cutting (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056532#comment-15056532
 ] 

Doug Cutting commented on AVRO-1584:


The problem you originally cite (question marks in output) is caused by using a 
non-UTF8 encoding when printing the value of toString(), not with that value 
itself.  So there's not actually a bug here.  The string produced by toString() 
loses no information.  Rather, you seek either a (incompatible) change or a new 
feature.

Changing the format of toString() for binary values incompatibly to base64 
seems likely to break applications, e.g. those that that use toString() to 
supply default values to the schema builder API.  I question that this is of 
sufficient benefit to be worth doing even in a release that permits 
incompatibilities.  There is no perfect string format for binary values.  The 
one currently used here (and by the spec for default values) makes textual 
values more legible, while base64 makes non-textual values more tolerant of 
non-UTF8-safe i/o.

Perhaps we should instead add a flag that one can set to change 
GenericData#toString() so that it generates base64?  We should also certainly 
add some tests for the current format if there are none.

> Json output doesn't generate base64 for byte arrays
> ---
>
> Key: AVRO-1584
> URL: https://issues.apache.org/jira/browse/AVRO-1584
> Project: Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.7.7
> Environment: Pure java.
>Reporter: Christophe Lorenz
> Attachments: AVRO-1584-Jackson-Base64-Default-Variant.patch, 
> AVRO-1584.patch
>
>
> The Json output of java generated code doesn't correctly encode byte arrays.
> Using this simple schema : 
> {"namespace": "example.avro",
>  "type": "record",
>  "name": "ByteArrayEncoding",
>  "fields": [ {"name": "data", "type": "bytes"} ]
> }
> The toString()  
>   System.out.println(new ByteArrayEncoding(ByteBuffer.wrap(new 
> byte[]{0,31,65,66,67,(byte)255,(byte)182})));
> Returns raw bytes to string in the json :
> {"data": {"bytes": "  ABC??"}}
> As a byte array is not tied to be a valid string, it should be converted back 
> and forth to Base64 like other Json implementations : 
> {"data": {"bytes": "AB9BQkP/tg=="}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AVRO-1584) Json output doesn't generate base64 for byte arrays

2015-12-14 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056701#comment-15056701
 ] 

Ryan Blue commented on AVRO-1584:
-

It looks like the conversion used for default values is independent of 
toString. Callers can pass either a JsonNode, which bypasses the problem, or an 
object that gets [converted in 
JacksonUtils|https://github.com/apache/avro/blob/trunk/lang/java/avro/src/main/java/org/apache/avro/util/internal/JacksonUtils.java#L73].
 That converts a byte array to a string using ISO-8859-1, which correctly 
implements the spec. When the JSON is written, the characters that aren't 
allowed in JSON strings are escaped by the generator. Changing the output of 
toString won't break the case that Doug mentions, but I think it is a fair 
point that changing what is currently produced could break applications.

However, the JSON currently produced by toString is broken because it doesn't 
convert control characters to escape sequences (0x0a to \n). We could safely 
fix that problem without moving to base64 and I think at a minimum we should do 
that.

But this still leaves a problem: what do we do about toString not conforming to 
the JSON required by the Avro spec?

> Json output doesn't generate base64 for byte arrays
> ---
>
> Key: AVRO-1584
> URL: https://issues.apache.org/jira/browse/AVRO-1584
> Project: Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.7.7
> Environment: Pure java.
>Reporter: Christophe Lorenz
> Attachments: AVRO-1584-Jackson-Base64-Default-Variant.patch, 
> AVRO-1584.patch
>
>
> The Json output of java generated code doesn't correctly encode byte arrays.
> Using this simple schema : 
> {"namespace": "example.avro",
>  "type": "record",
>  "name": "ByteArrayEncoding",
>  "fields": [ {"name": "data", "type": "bytes"} ]
> }
> The toString()  
>   System.out.println(new ByteArrayEncoding(ByteBuffer.wrap(new 
> byte[]{0,31,65,66,67,(byte)255,(byte)182})));
> Returns raw bytes to string in the json :
> {"data": {"bytes": "  ABC??"}}
> As a byte array is not tied to be a valid string, it should be converted back 
> and forth to Base64 like other Json implementations : 
> {"data": {"bytes": "AB9BQkP/tg=="}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AVRO-1584) Json output doesn't generate base64 for byte arrays

2015-12-09 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15049388#comment-15049388
 ] 

Ryan Blue commented on AVRO-1584:
-

I think using a helper library would be good. My main concern here is correctly 
representing the data and not performance, but there's a base64 helper lib from 
Jackson you can use that allows you to add a character or byte at a time 
(Base64Variants.defaultVariant()) that would work for ByteBuffer. I think a 
helper method would be fine.

For fixed, I'm referring to the GenericData.Fixed class for generic. That 
corresponds to the "fixed" type in the spec that is a fixed-length byte array. 
Right now, those become a JSON list of integers. Thanks, David!

> Json output doesn't generate base64 for byte arrays
> ---
>
> Key: AVRO-1584
> URL: https://issues.apache.org/jira/browse/AVRO-1584
> Project: Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.7.7
> Environment: Pure java.
>Reporter: Christophe Lorenz
> Attachments: AVRO-1584.patch
>
>
> The Json output of java generated code doesn't correctly encode byte arrays.
> Using this simple schema : 
> {"namespace": "example.avro",
>  "type": "record",
>  "name": "ByteArrayEncoding",
>  "fields": [ {"name": "data", "type": "bytes"} ]
> }
> The toString()  
>   System.out.println(new ByteArrayEncoding(ByteBuffer.wrap(new 
> byte[]{0,31,65,66,67,(byte)255,(byte)182})));
> Returns raw bytes to string in the json :
> {"data": {"bytes": "  ABC??"}}
> As a byte array is not tied to be a valid string, it should be converted back 
> and forth to Base64 like other Json implementations : 
> {"data": {"bytes": "AB9BQkP/tg=="}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AVRO-1584) Json output doesn't generate base64 for byte arrays

2015-12-08 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15047650#comment-15047650
 ] 

Ryan Blue commented on AVRO-1584:
-

David, you're right. There are actually two different code paths to transform a 
record to JSON. The encoder, and the {{GenericData#toString(Object)}} path. I'm 
not entirely sure what the history is there. Maybe [~cutting] or [~tomwhite] 
knows?

I don't think the output of toString here is covered by the spec since it 
doesn't appear to implement the spec, as you can see with the addition of the 
extra object layer with a "bytes" key. I'm a little reluctant to change what 
this currently produces, since the method claims to produce JSON and it is 
probably used by someone to produce data files, but the 1.8.0 release is coming 
up so we can.

Your patch adds a method for converting bytes to base64. Is it possible to use 
a library method there instead of adding an implementation to maintain? Also, 
could you apply a similar change to how Fixed is handled? I think we should 
probably fix both at the same time.

> Json output doesn't generate base64 for byte arrays
> ---
>
> Key: AVRO-1584
> URL: https://issues.apache.org/jira/browse/AVRO-1584
> Project: Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.7.7
> Environment: Pure java.
>Reporter: Christophe Lorenz
> Attachments: AVRO-1584.patch
>
>
> The Json output of java generated code doesn't correctly encode byte arrays.
> Using this simple schema : 
> {"namespace": "example.avro",
>  "type": "record",
>  "name": "ByteArrayEncoding",
>  "fields": [ {"name": "data", "type": "bytes"} ]
> }
> The toString()  
>   System.out.println(new ByteArrayEncoding(ByteBuffer.wrap(new 
> byte[]{0,31,65,66,67,(byte)255,(byte)182})));
> Returns raw bytes to string in the json :
> {"data": {"bytes": "  ABC??"}}
> As a byte array is not tied to be a valid string, it should be converted back 
> and forth to Base64 like other Json implementations : 
> {"data": {"bytes": "AB9BQkP/tg=="}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AVRO-1584) Json output doesn't generate base64 for byte arrays

2015-12-08 Thread David Lemieux (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15047821#comment-15047821
 ] 

David Lemieux commented on AVRO-1584:
-

Ryan,

I am not quite sure what you mean by Fixed. Are you talking about 
org.apache.avro.specific.SpecificFixed ?
If so, I can look into it.

As for using a library method, I can as well.
I opted for that implementation because I could not find a library that would 
work on ByteBuffer without making a copy to a byte[] first.
I think saving a copy is worth it, but I don't have that much time to spend on 
this either.
Your call.

If we keep to the currently proposed version and I also look into Fixed.
It will make sense to extract that method to some common ground.
Without knowing the code base that well, I would put it into 
org.apache.avro.util.Base64
Do you know a better place?

I'll wait for your answer before jumping on it.

> Json output doesn't generate base64 for byte arrays
> ---
>
> Key: AVRO-1584
> URL: https://issues.apache.org/jira/browse/AVRO-1584
> Project: Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.7.7
> Environment: Pure java.
>Reporter: Christophe Lorenz
> Attachments: AVRO-1584.patch
>
>
> The Json output of java generated code doesn't correctly encode byte arrays.
> Using this simple schema : 
> {"namespace": "example.avro",
>  "type": "record",
>  "name": "ByteArrayEncoding",
>  "fields": [ {"name": "data", "type": "bytes"} ]
> }
> The toString()  
>   System.out.println(new ByteArrayEncoding(ByteBuffer.wrap(new 
> byte[]{0,31,65,66,67,(byte)255,(byte)182})));
> Returns raw bytes to string in the json :
> {"data": {"bytes": "  ABC??"}}
> As a byte array is not tied to be a valid string, it should be converted back 
> and forth to Base64 like other Json implementations : 
> {"data": {"bytes": "AB9BQkP/tg=="}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AVRO-1584) Json output doesn't generate base64 for byte arrays

2015-12-03 Thread David Lemieux (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15038447#comment-15038447
 ] 

David Lemieux commented on AVRO-1584:
-

Hi Ryan,

Re-reading the bug, I realize there is some confusion. Maybe I hijacked the 
bug, maybe the title is misleading.

The patch is actually not touching the JSON ser/des, only the toString() 
implementation which outputs JSON like data.
The current implementation of toString() will output each byte casted as a char 
without escaping or anything.
The net result is that logging quickly become useless as some characters will 
corrupt the console or logs.



> Json output doesn't generate base64 for byte arrays
> ---
>
> Key: AVRO-1584
> URL: https://issues.apache.org/jira/browse/AVRO-1584
> Project: Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.7.7
> Environment: Pure java.
>Reporter: Christophe Lorenz
> Attachments: AVRO-1584.patch
>
>
> The Json output of java generated code doesn't correctly encode byte arrays.
> Using this simple schema : 
> {"namespace": "example.avro",
>  "type": "record",
>  "name": "ByteArrayEncoding",
>  "fields": [ {"name": "data", "type": "bytes"} ]
> }
> The toString()  
>   System.out.println(new ByteArrayEncoding(ByteBuffer.wrap(new 
> byte[]{0,31,65,66,67,(byte)255,(byte)182})));
> Returns raw bytes to string in the json :
> {"data": {"bytes": "  ABC??"}}
> As a byte array is not tied to be a valid string, it should be converted back 
> and forth to Base64 like other Json implementations : 
> {"data": {"bytes": "AB9BQkP/tg=="}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AVRO-1584) Json output doesn't generate base64 for byte arrays

2015-12-03 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15038210#comment-15038210
 ] 

Ryan Blue commented on AVRO-1584:
-

I agree that this seems like a bug, but while looking at AVRO-1746 recently I 
found out that the spec actually states that [bytes 0-255 should be mapped to 
unicode code points 
0-255|https://avro.apache.org/docs/1.7.7/spec.html#schema_record]. After that, 
several characters need to be escaped as required by the JSON spec, but 
otherwise the unicode characters are allowed in JSON. So I think what Java does 
currently is the correct behavior, however it does seem odd.

> Json output doesn't generate base64 for byte arrays
> ---
>
> Key: AVRO-1584
> URL: https://issues.apache.org/jira/browse/AVRO-1584
> Project: Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.7.7
> Environment: Pure java.
>Reporter: Christophe Lorenz
> Attachments: AVRO-1584.patch
>
>
> The Json output of java generated code doesn't correctly encode byte arrays.
> Using this simple schema : 
> {"namespace": "example.avro",
>  "type": "record",
>  "name": "ByteArrayEncoding",
>  "fields": [ {"name": "data", "type": "bytes"} ]
> }
> The toString()  
>   System.out.println(new ByteArrayEncoding(ByteBuffer.wrap(new 
> byte[]{0,31,65,66,67,(byte)255,(byte)182})));
> Returns raw bytes to string in the json :
> {"data": {"bytes": "  ABC??"}}
> As a byte array is not tied to be a valid string, it should be converted back 
> and forth to Base64 like other Json implementations : 
> {"data": {"bytes": "AB9BQkP/tg=="}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)