[jira] [Commented] (AVRO-1584) Json output doesn't generate base64 for byte arrays
[ https://issues.apache.org/jira/browse/AVRO-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057147#comment-15057147 ] Hudson commented on AVRO-1584: -- SUCCESS: Integrated in AvroJava #560 (See [https://builds.apache.org/job/AvroJava/560/]) AVRO-1584: Java: Escape characters not allowed in JSON in toString. >From the JSON spec: "All Unicode characters may be placed within the quotation marks except for the characters that must be escaped: quotation mark, reverse solidus, and the control characters (U+ through U+001F)." This uses the existing string escape function. (blue: rev 1720055) * trunk/CHANGES.txt * trunk/lang/java/avro/src/main/java/org/apache/avro/generic/GenericData.java * trunk/lang/java/avro/src/test/java/org/apache/avro/generic/TestGenericData.java > Json output doesn't generate base64 for byte arrays > --- > > Key: AVRO-1584 > URL: https://issues.apache.org/jira/browse/AVRO-1584 > Project: Avro > Issue Type: Bug > Components: java >Affects Versions: 1.7.7 > Environment: Pure java. >Reporter: Christophe Lorenz > Attachments: AVRO-1584-Jackson-Base64-Default-Variant.patch, > AVRO-1584.1.patch, AVRO-1584.patch > > > The Json output of java generated code doesn't correctly encode byte arrays. > Using this simple schema : > {"namespace": "example.avro", > "type": "record", > "name": "ByteArrayEncoding", > "fields": [ {"name": "data", "type": "bytes"} ] > } > The toString() > System.out.println(new ByteArrayEncoding(ByteBuffer.wrap(new > byte[]{0,31,65,66,67,(byte)255,(byte)182}))); > Returns raw bytes to string in the json : > {"data": {"bytes": " ABC??"}} > As a byte array is not tied to be a valid string, it should be converted back > and forth to Base64 like other Json implementations : > {"data": {"bytes": "AB9BQkP/tg=="}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AVRO-1584) Json output doesn't generate base64 for byte arrays
[ https://issues.apache.org/jira/browse/AVRO-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057074#comment-15057074 ] Doug Cutting commented on AVRO-1584: Ryan, your patch looks good to me. +1 > Json output doesn't generate base64 for byte arrays > --- > > Key: AVRO-1584 > URL: https://issues.apache.org/jira/browse/AVRO-1584 > Project: Avro > Issue Type: Bug > Components: java >Affects Versions: 1.7.7 > Environment: Pure java. >Reporter: Christophe Lorenz > Attachments: AVRO-1584-Jackson-Base64-Default-Variant.patch, > AVRO-1584.1.patch, AVRO-1584.patch > > > The Json output of java generated code doesn't correctly encode byte arrays. > Using this simple schema : > {"namespace": "example.avro", > "type": "record", > "name": "ByteArrayEncoding", > "fields": [ {"name": "data", "type": "bytes"} ] > } > The toString() > System.out.println(new ByteArrayEncoding(ByteBuffer.wrap(new > byte[]{0,31,65,66,67,(byte)255,(byte)182}))); > Returns raw bytes to string in the json : > {"data": {"bytes": " ABC??"}} > As a byte array is not tied to be a valid string, it should be converted back > and forth to Base64 like other Json implementations : > {"data": {"bytes": "AB9BQkP/tg=="}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AVRO-1584) Json output doesn't generate base64 for byte arrays
[ https://issues.apache.org/jira/browse/AVRO-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057087#comment-15057087 ] Ryan Blue commented on AVRO-1584: - Committed. Thanks for taking a look, Doug! > Json output doesn't generate base64 for byte arrays > --- > > Key: AVRO-1584 > URL: https://issues.apache.org/jira/browse/AVRO-1584 > Project: Avro > Issue Type: Bug > Components: java >Affects Versions: 1.7.7 > Environment: Pure java. >Reporter: Christophe Lorenz > Attachments: AVRO-1584-Jackson-Base64-Default-Variant.patch, > AVRO-1584.1.patch, AVRO-1584.patch > > > The Json output of java generated code doesn't correctly encode byte arrays. > Using this simple schema : > {"namespace": "example.avro", > "type": "record", > "name": "ByteArrayEncoding", > "fields": [ {"name": "data", "type": "bytes"} ] > } > The toString() > System.out.println(new ByteArrayEncoding(ByteBuffer.wrap(new > byte[]{0,31,65,66,67,(byte)255,(byte)182}))); > Returns raw bytes to string in the json : > {"data": {"bytes": " ABC??"}} > As a byte array is not tied to be a valid string, it should be converted back > and forth to Base64 like other Json implementations : > {"data": {"bytes": "AB9BQkP/tg=="}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AVRO-1584) Json output doesn't generate base64 for byte arrays
[ https://issues.apache.org/jira/browse/AVRO-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057086#comment-15057086 ] ASF subversion and git services commented on AVRO-1584: --- Commit 1720055 from [~b...@cloudera.com] in branch 'avro/trunk' [ https://svn.apache.org/r1720055 ] AVRO-1584: Java: Escape characters not allowed in JSON in toString. >From the JSON spec: "All Unicode characters may be placed within the quotation marks except for the characters that must be escaped: quotation mark, reverse solidus, and the control characters (U+ through U+001F)." This uses the existing string escape function. > Json output doesn't generate base64 for byte arrays > --- > > Key: AVRO-1584 > URL: https://issues.apache.org/jira/browse/AVRO-1584 > Project: Avro > Issue Type: Bug > Components: java >Affects Versions: 1.7.7 > Environment: Pure java. >Reporter: Christophe Lorenz > Attachments: AVRO-1584-Jackson-Base64-Default-Variant.patch, > AVRO-1584.1.patch, AVRO-1584.patch > > > The Json output of java generated code doesn't correctly encode byte arrays. > Using this simple schema : > {"namespace": "example.avro", > "type": "record", > "name": "ByteArrayEncoding", > "fields": [ {"name": "data", "type": "bytes"} ] > } > The toString() > System.out.println(new ByteArrayEncoding(ByteBuffer.wrap(new > byte[]{0,31,65,66,67,(byte)255,(byte)182}))); > Returns raw bytes to string in the json : > {"data": {"bytes": " ABC??"}} > As a byte array is not tied to be a valid string, it should be converted back > and forth to Base64 like other Json implementations : > {"data": {"bytes": "AB9BQkP/tg=="}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AVRO-1584) Json output doesn't generate base64 for byte arrays
[ https://issues.apache.org/jira/browse/AVRO-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057097#comment-15057097 ] Ryan Blue commented on AVRO-1584: - [~lemieud], thank you for your work to get this addressed! I think that the fix we ended up with should fix the problem you were seeing since the control characters will be properly escaped. If moving to base64 is important to you as well, then I think the right way forward is to help standardize a different JSON encoding, like Doug suggested for the long term. For now, I'm going to mark this issue resolved since we've decided the way forward in the short term. > Json output doesn't generate base64 for byte arrays > --- > > Key: AVRO-1584 > URL: https://issues.apache.org/jira/browse/AVRO-1584 > Project: Avro > Issue Type: Bug > Components: java >Affects Versions: 1.7.7 > Environment: Pure java. >Reporter: Christophe Lorenz > Attachments: AVRO-1584-Jackson-Base64-Default-Variant.patch, > AVRO-1584.1.patch, AVRO-1584.patch > > > The Json output of java generated code doesn't correctly encode byte arrays. > Using this simple schema : > {"namespace": "example.avro", > "type": "record", > "name": "ByteArrayEncoding", > "fields": [ {"name": "data", "type": "bytes"} ] > } > The toString() > System.out.println(new ByteArrayEncoding(ByteBuffer.wrap(new > byte[]{0,31,65,66,67,(byte)255,(byte)182}))); > Returns raw bytes to string in the json : > {"data": {"bytes": " ABC??"}} > As a byte array is not tied to be a valid string, it should be converted back > and forth to Base64 like other Json implementations : > {"data": {"bytes": "AB9BQkP/tg=="}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AVRO-1584) Json output doesn't generate base64 for byte arrays
[ https://issues.apache.org/jira/browse/AVRO-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057042#comment-15057042 ] Ryan Blue commented on AVRO-1584: - Thanks for the context, Doug. I agree that we shouldn't change the specified JSON encoding or toString behavior. I think we have some flexibility with toString, since it was intended for debugging (so it isn't used to encode default values) and doesn't encode either bytes or fixed as expected. For bytes, an extra object layer is added and fixed is encoded as an array of integers. I think that makes it unlikely that anyone would use it to serialize data as JSON, but I have no problem being cautious and not breaking anything unless we have a plan for what toString should produce. I'm attaching a patch that uses the fix from AVRO-713 to fix just the escape problem. It also adds tests to validate the current behavior of toString for bytes and fixed. > Json output doesn't generate base64 for byte arrays > --- > > Key: AVRO-1584 > URL: https://issues.apache.org/jira/browse/AVRO-1584 > Project: Avro > Issue Type: Bug > Components: java >Affects Versions: 1.7.7 > Environment: Pure java. >Reporter: Christophe Lorenz > Attachments: AVRO-1584-Jackson-Base64-Default-Variant.patch, > AVRO-1584.patch > > > The Json output of java generated code doesn't correctly encode byte arrays. > Using this simple schema : > {"namespace": "example.avro", > "type": "record", > "name": "ByteArrayEncoding", > "fields": [ {"name": "data", "type": "bytes"} ] > } > The toString() > System.out.println(new ByteArrayEncoding(ByteBuffer.wrap(new > byte[]{0,31,65,66,67,(byte)255,(byte)182}))); > Returns raw bytes to string in the json : > {"data": {"bytes": " ABC??"}} > As a byte array is not tied to be a valid string, it should be converted back > and forth to Base64 like other Json implementations : > {"data": {"bytes": "AB9BQkP/tg=="}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AVRO-1584) Json output doesn't generate base64 for byte arrays
[ https://issues.apache.org/jira/browse/AVRO-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057312#comment-15057312 ] David Lemieux commented on AVRO-1584: - [~rdblue] My pleasure. I agree that is should fix my problem. Thanks > Json output doesn't generate base64 for byte arrays > --- > > Key: AVRO-1584 > URL: https://issues.apache.org/jira/browse/AVRO-1584 > Project: Avro > Issue Type: Bug > Components: java >Affects Versions: 1.7.7 > Environment: Pure java. >Reporter: Christophe Lorenz > Attachments: AVRO-1584-Jackson-Base64-Default-Variant.patch, > AVRO-1584.1.patch, AVRO-1584.patch > > > The Json output of java generated code doesn't correctly encode byte arrays. > Using this simple schema : > {"namespace": "example.avro", > "type": "record", > "name": "ByteArrayEncoding", > "fields": [ {"name": "data", "type": "bytes"} ] > } > The toString() > System.out.println(new ByteArrayEncoding(ByteBuffer.wrap(new > byte[]{0,31,65,66,67,(byte)255,(byte)182}))); > Returns raw bytes to string in the json : > {"data": {"bytes": " ABC??"}} > As a byte array is not tied to be a valid string, it should be converted back > and forth to Base64 like other Json implementations : > {"data": {"bytes": "AB9BQkP/tg=="}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AVRO-1584) Json output doesn't generate base64 for byte arrays
[ https://issues.apache.org/jira/browse/AVRO-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056779#comment-15056779 ] Doug Cutting commented on AVRO-1584: Ryan, I agree this is a bug in the current implementation. According to section RFC 4627, control characters must be escaped. bq. All Unicode characters may be placed within the quotation marks except for the characters that must be escaped: quotation mark, reverse solidus, and the control characters (U+ through U+001F). I note that this was fixed for strings in AVRO-713 and we can probably share this logic. The difference between toString() JSON and Avro's JSON data encoding is longstanding and primarily around the encoding of unions. For full read/write fidelity, many union values must be tagged with their type, so that's what the JSON encoding requires. The toString() encoding was not intended for data fidelity but for debugging, so a simpler version was implemented. (It actually pre-dates the specification of the JSON encoding.) It so happens that default values in schemas do not need to be tagged, so the toString() format is identical to the default-value format. However there are frequent requests for a reader that accepts such an untagged format, for interaction with other JSON-generating software. In retrospect, the JSON encoding should perhaps not require tagging for unions with null or unions between a primitive and a non-primitive, i.e., only tag unions when it's required. We instead opted for simplicity of specification implementation, to ease interoperability between various Avro implementations, when perhaps in this case we should have optimized for ease of interoperability with non-Avro producers and consumers of JSON. So long-term we might add an encoder/decoder that doesn't handle unions at all or that handles them more parsimoniously, then perhaps implement default values and toString() using this encoding. But I don't think we should alter the currently specified JSON encoding, nor change the default or toString() format. > Json output doesn't generate base64 for byte arrays > --- > > Key: AVRO-1584 > URL: https://issues.apache.org/jira/browse/AVRO-1584 > Project: Avro > Issue Type: Bug > Components: java >Affects Versions: 1.7.7 > Environment: Pure java. >Reporter: Christophe Lorenz > Attachments: AVRO-1584-Jackson-Base64-Default-Variant.patch, > AVRO-1584.patch > > > The Json output of java generated code doesn't correctly encode byte arrays. > Using this simple schema : > {"namespace": "example.avro", > "type": "record", > "name": "ByteArrayEncoding", > "fields": [ {"name": "data", "type": "bytes"} ] > } > The toString() > System.out.println(new ByteArrayEncoding(ByteBuffer.wrap(new > byte[]{0,31,65,66,67,(byte)255,(byte)182}))); > Returns raw bytes to string in the json : > {"data": {"bytes": " ABC??"}} > As a byte array is not tied to be a valid string, it should be converted back > and forth to Base64 like other Json implementations : > {"data": {"bytes": "AB9BQkP/tg=="}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AVRO-1584) Json output doesn't generate base64 for byte arrays
[ https://issues.apache.org/jira/browse/AVRO-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056532#comment-15056532 ] Doug Cutting commented on AVRO-1584: The problem you originally cite (question marks in output) is caused by using a non-UTF8 encoding when printing the value of toString(), not with that value itself. So there's not actually a bug here. The string produced by toString() loses no information. Rather, you seek either a (incompatible) change or a new feature. Changing the format of toString() for binary values incompatibly to base64 seems likely to break applications, e.g. those that that use toString() to supply default values to the schema builder API. I question that this is of sufficient benefit to be worth doing even in a release that permits incompatibilities. There is no perfect string format for binary values. The one currently used here (and by the spec for default values) makes textual values more legible, while base64 makes non-textual values more tolerant of non-UTF8-safe i/o. Perhaps we should instead add a flag that one can set to change GenericData#toString() so that it generates base64? We should also certainly add some tests for the current format if there are none. > Json output doesn't generate base64 for byte arrays > --- > > Key: AVRO-1584 > URL: https://issues.apache.org/jira/browse/AVRO-1584 > Project: Avro > Issue Type: Bug > Components: java >Affects Versions: 1.7.7 > Environment: Pure java. >Reporter: Christophe Lorenz > Attachments: AVRO-1584-Jackson-Base64-Default-Variant.patch, > AVRO-1584.patch > > > The Json output of java generated code doesn't correctly encode byte arrays. > Using this simple schema : > {"namespace": "example.avro", > "type": "record", > "name": "ByteArrayEncoding", > "fields": [ {"name": "data", "type": "bytes"} ] > } > The toString() > System.out.println(new ByteArrayEncoding(ByteBuffer.wrap(new > byte[]{0,31,65,66,67,(byte)255,(byte)182}))); > Returns raw bytes to string in the json : > {"data": {"bytes": " ABC??"}} > As a byte array is not tied to be a valid string, it should be converted back > and forth to Base64 like other Json implementations : > {"data": {"bytes": "AB9BQkP/tg=="}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AVRO-1584) Json output doesn't generate base64 for byte arrays
[ https://issues.apache.org/jira/browse/AVRO-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056701#comment-15056701 ] Ryan Blue commented on AVRO-1584: - It looks like the conversion used for default values is independent of toString. Callers can pass either a JsonNode, which bypasses the problem, or an object that gets [converted in JacksonUtils|https://github.com/apache/avro/blob/trunk/lang/java/avro/src/main/java/org/apache/avro/util/internal/JacksonUtils.java#L73]. That converts a byte array to a string using ISO-8859-1, which correctly implements the spec. When the JSON is written, the characters that aren't allowed in JSON strings are escaped by the generator. Changing the output of toString won't break the case that Doug mentions, but I think it is a fair point that changing what is currently produced could break applications. However, the JSON currently produced by toString is broken because it doesn't convert control characters to escape sequences (0x0a to \n). We could safely fix that problem without moving to base64 and I think at a minimum we should do that. But this still leaves a problem: what do we do about toString not conforming to the JSON required by the Avro spec? > Json output doesn't generate base64 for byte arrays > --- > > Key: AVRO-1584 > URL: https://issues.apache.org/jira/browse/AVRO-1584 > Project: Avro > Issue Type: Bug > Components: java >Affects Versions: 1.7.7 > Environment: Pure java. >Reporter: Christophe Lorenz > Attachments: AVRO-1584-Jackson-Base64-Default-Variant.patch, > AVRO-1584.patch > > > The Json output of java generated code doesn't correctly encode byte arrays. > Using this simple schema : > {"namespace": "example.avro", > "type": "record", > "name": "ByteArrayEncoding", > "fields": [ {"name": "data", "type": "bytes"} ] > } > The toString() > System.out.println(new ByteArrayEncoding(ByteBuffer.wrap(new > byte[]{0,31,65,66,67,(byte)255,(byte)182}))); > Returns raw bytes to string in the json : > {"data": {"bytes": " ABC??"}} > As a byte array is not tied to be a valid string, it should be converted back > and forth to Base64 like other Json implementations : > {"data": {"bytes": "AB9BQkP/tg=="}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AVRO-1584) Json output doesn't generate base64 for byte arrays
[ https://issues.apache.org/jira/browse/AVRO-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15049388#comment-15049388 ] Ryan Blue commented on AVRO-1584: - I think using a helper library would be good. My main concern here is correctly representing the data and not performance, but there's a base64 helper lib from Jackson you can use that allows you to add a character or byte at a time (Base64Variants.defaultVariant()) that would work for ByteBuffer. I think a helper method would be fine. For fixed, I'm referring to the GenericData.Fixed class for generic. That corresponds to the "fixed" type in the spec that is a fixed-length byte array. Right now, those become a JSON list of integers. Thanks, David! > Json output doesn't generate base64 for byte arrays > --- > > Key: AVRO-1584 > URL: https://issues.apache.org/jira/browse/AVRO-1584 > Project: Avro > Issue Type: Bug > Components: java >Affects Versions: 1.7.7 > Environment: Pure java. >Reporter: Christophe Lorenz > Attachments: AVRO-1584.patch > > > The Json output of java generated code doesn't correctly encode byte arrays. > Using this simple schema : > {"namespace": "example.avro", > "type": "record", > "name": "ByteArrayEncoding", > "fields": [ {"name": "data", "type": "bytes"} ] > } > The toString() > System.out.println(new ByteArrayEncoding(ByteBuffer.wrap(new > byte[]{0,31,65,66,67,(byte)255,(byte)182}))); > Returns raw bytes to string in the json : > {"data": {"bytes": " ABC??"}} > As a byte array is not tied to be a valid string, it should be converted back > and forth to Base64 like other Json implementations : > {"data": {"bytes": "AB9BQkP/tg=="}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AVRO-1584) Json output doesn't generate base64 for byte arrays
[ https://issues.apache.org/jira/browse/AVRO-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15047650#comment-15047650 ] Ryan Blue commented on AVRO-1584: - David, you're right. There are actually two different code paths to transform a record to JSON. The encoder, and the {{GenericData#toString(Object)}} path. I'm not entirely sure what the history is there. Maybe [~cutting] or [~tomwhite] knows? I don't think the output of toString here is covered by the spec since it doesn't appear to implement the spec, as you can see with the addition of the extra object layer with a "bytes" key. I'm a little reluctant to change what this currently produces, since the method claims to produce JSON and it is probably used by someone to produce data files, but the 1.8.0 release is coming up so we can. Your patch adds a method for converting bytes to base64. Is it possible to use a library method there instead of adding an implementation to maintain? Also, could you apply a similar change to how Fixed is handled? I think we should probably fix both at the same time. > Json output doesn't generate base64 for byte arrays > --- > > Key: AVRO-1584 > URL: https://issues.apache.org/jira/browse/AVRO-1584 > Project: Avro > Issue Type: Bug > Components: java >Affects Versions: 1.7.7 > Environment: Pure java. >Reporter: Christophe Lorenz > Attachments: AVRO-1584.patch > > > The Json output of java generated code doesn't correctly encode byte arrays. > Using this simple schema : > {"namespace": "example.avro", > "type": "record", > "name": "ByteArrayEncoding", > "fields": [ {"name": "data", "type": "bytes"} ] > } > The toString() > System.out.println(new ByteArrayEncoding(ByteBuffer.wrap(new > byte[]{0,31,65,66,67,(byte)255,(byte)182}))); > Returns raw bytes to string in the json : > {"data": {"bytes": " ABC??"}} > As a byte array is not tied to be a valid string, it should be converted back > and forth to Base64 like other Json implementations : > {"data": {"bytes": "AB9BQkP/tg=="}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AVRO-1584) Json output doesn't generate base64 for byte arrays
[ https://issues.apache.org/jira/browse/AVRO-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15047821#comment-15047821 ] David Lemieux commented on AVRO-1584: - Ryan, I am not quite sure what you mean by Fixed. Are you talking about org.apache.avro.specific.SpecificFixed ? If so, I can look into it. As for using a library method, I can as well. I opted for that implementation because I could not find a library that would work on ByteBuffer without making a copy to a byte[] first. I think saving a copy is worth it, but I don't have that much time to spend on this either. Your call. If we keep to the currently proposed version and I also look into Fixed. It will make sense to extract that method to some common ground. Without knowing the code base that well, I would put it into org.apache.avro.util.Base64 Do you know a better place? I'll wait for your answer before jumping on it. > Json output doesn't generate base64 for byte arrays > --- > > Key: AVRO-1584 > URL: https://issues.apache.org/jira/browse/AVRO-1584 > Project: Avro > Issue Type: Bug > Components: java >Affects Versions: 1.7.7 > Environment: Pure java. >Reporter: Christophe Lorenz > Attachments: AVRO-1584.patch > > > The Json output of java generated code doesn't correctly encode byte arrays. > Using this simple schema : > {"namespace": "example.avro", > "type": "record", > "name": "ByteArrayEncoding", > "fields": [ {"name": "data", "type": "bytes"} ] > } > The toString() > System.out.println(new ByteArrayEncoding(ByteBuffer.wrap(new > byte[]{0,31,65,66,67,(byte)255,(byte)182}))); > Returns raw bytes to string in the json : > {"data": {"bytes": " ABC??"}} > As a byte array is not tied to be a valid string, it should be converted back > and forth to Base64 like other Json implementations : > {"data": {"bytes": "AB9BQkP/tg=="}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AVRO-1584) Json output doesn't generate base64 for byte arrays
[ https://issues.apache.org/jira/browse/AVRO-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15038447#comment-15038447 ] David Lemieux commented on AVRO-1584: - Hi Ryan, Re-reading the bug, I realize there is some confusion. Maybe I hijacked the bug, maybe the title is misleading. The patch is actually not touching the JSON ser/des, only the toString() implementation which outputs JSON like data. The current implementation of toString() will output each byte casted as a char without escaping or anything. The net result is that logging quickly become useless as some characters will corrupt the console or logs. > Json output doesn't generate base64 for byte arrays > --- > > Key: AVRO-1584 > URL: https://issues.apache.org/jira/browse/AVRO-1584 > Project: Avro > Issue Type: Bug > Components: java >Affects Versions: 1.7.7 > Environment: Pure java. >Reporter: Christophe Lorenz > Attachments: AVRO-1584.patch > > > The Json output of java generated code doesn't correctly encode byte arrays. > Using this simple schema : > {"namespace": "example.avro", > "type": "record", > "name": "ByteArrayEncoding", > "fields": [ {"name": "data", "type": "bytes"} ] > } > The toString() > System.out.println(new ByteArrayEncoding(ByteBuffer.wrap(new > byte[]{0,31,65,66,67,(byte)255,(byte)182}))); > Returns raw bytes to string in the json : > {"data": {"bytes": " ABC??"}} > As a byte array is not tied to be a valid string, it should be converted back > and forth to Base64 like other Json implementations : > {"data": {"bytes": "AB9BQkP/tg=="}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AVRO-1584) Json output doesn't generate base64 for byte arrays
[ https://issues.apache.org/jira/browse/AVRO-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15038210#comment-15038210 ] Ryan Blue commented on AVRO-1584: - I agree that this seems like a bug, but while looking at AVRO-1746 recently I found out that the spec actually states that [bytes 0-255 should be mapped to unicode code points 0-255|https://avro.apache.org/docs/1.7.7/spec.html#schema_record]. After that, several characters need to be escaped as required by the JSON spec, but otherwise the unicode characters are allowed in JSON. So I think what Java does currently is the correct behavior, however it does seem odd. > Json output doesn't generate base64 for byte arrays > --- > > Key: AVRO-1584 > URL: https://issues.apache.org/jira/browse/AVRO-1584 > Project: Avro > Issue Type: Bug > Components: java >Affects Versions: 1.7.7 > Environment: Pure java. >Reporter: Christophe Lorenz > Attachments: AVRO-1584.patch > > > The Json output of java generated code doesn't correctly encode byte arrays. > Using this simple schema : > {"namespace": "example.avro", > "type": "record", > "name": "ByteArrayEncoding", > "fields": [ {"name": "data", "type": "bytes"} ] > } > The toString() > System.out.println(new ByteArrayEncoding(ByteBuffer.wrap(new > byte[]{0,31,65,66,67,(byte)255,(byte)182}))); > Returns raw bytes to string in the json : > {"data": {"bytes": " ABC??"}} > As a byte array is not tied to be a valid string, it should be converted back > and forth to Base64 like other Json implementations : > {"data": {"bytes": "AB9BQkP/tg=="}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)