[GitHub] avro pull request: Update JsonIO.hh
Github user hatemhelal closed the pull request at: https://github.com/apache/avro/pull/15 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] avro pull request: AVRO-1593. Use classic locale to escape control...
GitHub user hatemhelal opened a pull request: https://github.com/apache/avro/pull/32 AVRO-1593. Use classic locale to escape control chars See https://issues.apache.org/jira/browse/AVRO-1593 You can merge this pull request into a Git repository by running: $ git pull https://github.com/hatemhelal/avro patch-1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/avro/pull/32.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #32 commit 3c87e7cbf2a8f03c570ec763b7d92da80bc86326 Author: hatemhelal hatem.he...@gmail.com Date: 2015-04-02T09:41:12Z AVRO-1593. Use classic locale to escape control chars See https://issues.apache.org/jira/browse/AVRO-1593 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (AVRO-1593) C++ json encoder assumes C locale and generates invalid UTF-8 sequence
[ https://issues.apache.org/jira/browse/AVRO-1593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14392454#comment-14392454 ] ASF GitHub Bot commented on AVRO-1593: -- GitHub user hatemhelal opened a pull request: https://github.com/apache/avro/pull/32 AVRO-1593. Use classic locale to escape control chars See https://issues.apache.org/jira/browse/AVRO-1593 You can merge this pull request into a Git repository by running: $ git pull https://github.com/hatemhelal/avro patch-1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/avro/pull/32.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #32 commit 3c87e7cbf2a8f03c570ec763b7d92da80bc86326 Author: hatemhelal hatem.he...@gmail.com Date: 2015-04-02T09:41:12Z AVRO-1593. Use classic locale to escape control chars See https://issues.apache.org/jira/browse/AVRO-1593 C++ json encoder assumes C locale and generates invalid UTF-8 sequence - Key: AVRO-1593 URL: https://issues.apache.org/jira/browse/AVRO-1593 Project: Avro Issue Type: Bug Components: c++ Affects Versions: 1.7.7 Environment: windows-1252 encoding Reporter: Hatem Helal Priority: Critical Fix For: 1.7.8 encoding a multibyte UTF-8 code point such as: \xEF\xBD\x81 Incorrectly becomes: \xEF\xBD\U0081 When encoded in the service running in the windows-1252 locale. This isn¹t a valid UTF-8 sequence so we end up with Mojibake when reading back the JSON encoded string. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AVRO-1593) C++ json encoder assumes C locale and generates invalid UTF-8 sequence
[ https://issues.apache.org/jira/browse/AVRO-1593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14392457#comment-14392457 ] Hatem Helal commented on AVRO-1593: --- Just created a new pull request: https://github.com/apache/avro/pull/32 which uses a simpler solution of using the std::locale::classic object: http://en.cppreference.com/w/cpp/locale/locale/classic C++ json encoder assumes C locale and generates invalid UTF-8 sequence - Key: AVRO-1593 URL: https://issues.apache.org/jira/browse/AVRO-1593 Project: Avro Issue Type: Bug Components: c++ Affects Versions: 1.7.7 Environment: windows-1252 encoding Reporter: Hatem Helal Priority: Critical Fix For: 1.7.8 encoding a multibyte UTF-8 code point such as: \xEF\xBD\x81 Incorrectly becomes: \xEF\xBD\U0081 When encoded in the service running in the windows-1252 locale. This isn¹t a valid UTF-8 sequence so we end up with Mojibake when reading back the JSON encoded string. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (AVRO-1657) Namespaced reader schema w/field aliases can not read non-namespaced writer schema
David Korz created AVRO-1657: Summary: Namespaced reader schema w/field aliases can not read non-namespaced writer schema Key: AVRO-1657 URL: https://issues.apache.org/jira/browse/AVRO-1657 Project: Avro Issue Type: Bug Components: java Affects Versions: 1.7.7 Reporter: David Korz The writer uses a non-namespaced schema as follows: {noformat} { type:record, name: MyRecord, fields:[ { type:string, name:Name }, { type:double, name:Temperature } ] } {noformat} The reader uses a namespaced schema with a field alias. {noformat} { type:record, name: MyRecord, namespace: com.example, aliases: [.MyRecord], fields:[ { type:string, name:Name }, { type:double, name:TemperatureC, aliases: [Temperature] } ] } {noformat} The following reading code will fail. {noformat} DatumReaderMyRecord datumReader = new SpecificDatumReader(MyRecord.class); FileReaderMyRecord fileReader = DataFileReader.openReader(file, datumReader); MyRecord record = null; while (fileReader.hasNext()) { record = fileReader.next(record); CharSequence name = record.getName(); Double temp = record.getTemperatureC(); System.out.println(name + + temp); } {noformat} The reader's alias is not found. {noformat} Exception in thread main org.apache.avro.AvroTypeException: Found MyRecord, expecting com.example.MyRecord, missing required field TemperatureC at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:292) at org.apache.avro.io.parsing.Parser.advance(Parser.java:88) at org.apache.avro.io.ResolvingDecoder.readFieldOrder(ResolvingDecoder.java:130) at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:176) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142) at org.apache.avro.file.DataFileStream.next(DataFileStream.java:233) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AVRO-1657) Namespaced reader schema w/field aliases can not read non-namespaced writer schema
[ https://issues.apache.org/jira/browse/AVRO-1657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393782#comment-14393782 ] David Korz commented on AVRO-1657: -- The problem is that in org.apache.avro.Schema.applyAliases(Schema,Map,Map,Map) the call to getFieldAlias() passes the Name of the writer schema and the fieldAliases map has an entry keyed with the reader's namespaced Name so no alias is found for the field TemperatureC. Namespaced reader schema w/field aliases can not read non-namespaced writer schema -- Key: AVRO-1657 URL: https://issues.apache.org/jira/browse/AVRO-1657 Project: Avro Issue Type: Bug Components: java Affects Versions: 1.7.7 Reporter: David Korz The writer uses a non-namespaced schema as follows: {noformat} { type:record, name: MyRecord, fields:[ { type:string, name:Name }, { type:double, name:Temperature } ] } {noformat} The reader uses a namespaced schema with a field alias. {noformat} { type:record, name: MyRecord, namespace: com.example, aliases: [.MyRecord], fields:[ { type:string, name:Name }, { type:double, name:TemperatureC, aliases: [Temperature] } ] } {noformat} The following reading code will fail. {noformat} DatumReaderMyRecord datumReader = new SpecificDatumReader(MyRecord.class); FileReaderMyRecord fileReader = DataFileReader.openReader(file, datumReader); MyRecord record = null; while (fileReader.hasNext()) { record = fileReader.next(record); CharSequence name = record.getName(); Double temp = record.getTemperatureC(); System.out.println(name + + temp); } {noformat} The reader's alias is not found. {noformat} Exception in thread main org.apache.avro.AvroTypeException: Found MyRecord, expecting com.example.MyRecord, missing required field TemperatureC at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:292) at org.apache.avro.io.parsing.Parser.advance(Parser.java:88) at org.apache.avro.io.ResolvingDecoder.readFieldOrder(ResolvingDecoder.java:130) at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:176) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142) at org.apache.avro.file.DataFileStream.next(DataFileStream.java:233) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AVRO-1341) Allow controlling avro via java annotations when using reflection.
[ https://issues.apache.org/jira/browse/AVRO-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393138#comment-14393138 ] Ryan Blue commented on AVRO-1341: - [~sunzhaonan], there isn't an @AvroDoc annotation, but its a great idea. Could you open an issue to add it? It shouldn't be too much work, either. Allow controlling avro via java annotations when using reflection. --- Key: AVRO-1341 URL: https://issues.apache.org/jira/browse/AVRO-1341 Project: Avro Issue Type: New Feature Components: java Reporter: Vincenz Priesnitz Assignee: Vincenz Priesnitz Fix For: 1.7.5 Attachments: AVRO-1341.patch, AVRO-1341.patch, AVRO-1341.patch, AVRO-1341.patch, AVRO-1341.patch, AVRO-1341.patch, AVRO-1341.patch It would be great if one could control avro with java annotations. As of now, it is already possible to mark fields as Nullable or classes being encoded as a String. I propose a bigger set of annotations to control the behavior of avro on fields and classes. Such annotations have proven useful with jacksons json serialization and morphias mongoDB serialization. I propose the following additional annotations: @AvroName(alternativeName) @AvroAlias(alias=alias, space=space) @AvroIgnore @AvroMeta(key=K, value=V) @AvroEncode(using=CustomEncoding.class) Java fields with the @AvroName(alternativeName) annotation will be renamed in the induced schema. When reading an avro file via reflection, the reflection reader will look for fields in the schema with alternativeName. For example: {code} @AvroName(foo) int bar; {code} is serialized as {code} { name : foo, type : int } {code} The @AvroAlias annotation will add a new alias to the induced schema of a record, enum or field. The space parameter is optional and defaults to the namespace of the named schema the alias is added to. Fields with the @AvroIgnore annotation will be treated as if they had a transient modifier, i.e. they will not be written to or read from avro files. The @AvroMeta(key=K, value=V) annotation allows you to store an arbitrary key : value pair at every node in the schema. {code} @AvroMeta(key=fieldKey, value=fieldValue) int foo; {code} will create the following schema {code} {name : foo, type : int, fieldKey : fieldValue } {code} Fields can be custom encoded with the AvroEncode(using=CustomEncoding.class) annotation. This annotation is a generalization of the @Stringable annotation. The @Stringable annotation is limited to classes with string argument constructors. Some classes can be similarly reduced to a smaller class or even a single primitive, but dont fit the requirements for @Stringable. A prominent example is java.util.Date, which instances can essentially be described with a single long. Such classes can now be encoded with a CustomEncoding, which reads and writes directly from the encoder/decoder. One simply extends the abstract CustomEncodings class by implementing a schema, a read method and a write method. A java field can then be annotated like this: {code} @AvroEncode(using=DateAslongEncoding.class) Date date; {code} The custom encoding implementation would look like {code} public class DateAsLongEncoding extends CustomEncodingDate { { schema = Schema.create(Schema.Type.LONG); schema.addProp(CustomEncoding, DateAsLongEncoding); } @Override public void write(Object datum, Encoder out) throws IOException { out.writeLong(((Date)datum).getTime()); } @Override public Date read(Object reuse, Decoder in) throws IOException { if (reuse != null) { ((Date)reuse).setTime(in.readLong()); return (Date)reuse; } else return new Date(in.readLong()); } } {code} I implemented said annotations and a custom encoding for java.util.Date as a proof of concept and also extended the @Stringable annotations to fields. This issue is a followup of AVRO-1328 and AVRO-1330. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AVRO-1341) Allow controlling avro via java annotations when using reflection.
[ https://issues.apache.org/jira/browse/AVRO-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393091#comment-14393091 ] Zhaonan Sun commented on AVRO-1341: --- Looks like @AvroMeta can't add reserved fields, like @AvroMeta(doc, some doc) will have exceptions. Do we have a @AvroDoc or something similar to this? Allow controlling avro via java annotations when using reflection. --- Key: AVRO-1341 URL: https://issues.apache.org/jira/browse/AVRO-1341 Project: Avro Issue Type: New Feature Components: java Reporter: Vincenz Priesnitz Assignee: Vincenz Priesnitz Fix For: 1.7.5 Attachments: AVRO-1341.patch, AVRO-1341.patch, AVRO-1341.patch, AVRO-1341.patch, AVRO-1341.patch, AVRO-1341.patch, AVRO-1341.patch It would be great if one could control avro with java annotations. As of now, it is already possible to mark fields as Nullable or classes being encoded as a String. I propose a bigger set of annotations to control the behavior of avro on fields and classes. Such annotations have proven useful with jacksons json serialization and morphias mongoDB serialization. I propose the following additional annotations: @AvroName(alternativeName) @AvroAlias(alias=alias, space=space) @AvroIgnore @AvroMeta(key=K, value=V) @AvroEncode(using=CustomEncoding.class) Java fields with the @AvroName(alternativeName) annotation will be renamed in the induced schema. When reading an avro file via reflection, the reflection reader will look for fields in the schema with alternativeName. For example: {code} @AvroName(foo) int bar; {code} is serialized as {code} { name : foo, type : int } {code} The @AvroAlias annotation will add a new alias to the induced schema of a record, enum or field. The space parameter is optional and defaults to the namespace of the named schema the alias is added to. Fields with the @AvroIgnore annotation will be treated as if they had a transient modifier, i.e. they will not be written to or read from avro files. The @AvroMeta(key=K, value=V) annotation allows you to store an arbitrary key : value pair at every node in the schema. {code} @AvroMeta(key=fieldKey, value=fieldValue) int foo; {code} will create the following schema {code} {name : foo, type : int, fieldKey : fieldValue } {code} Fields can be custom encoded with the AvroEncode(using=CustomEncoding.class) annotation. This annotation is a generalization of the @Stringable annotation. The @Stringable annotation is limited to classes with string argument constructors. Some classes can be similarly reduced to a smaller class or even a single primitive, but dont fit the requirements for @Stringable. A prominent example is java.util.Date, which instances can essentially be described with a single long. Such classes can now be encoded with a CustomEncoding, which reads and writes directly from the encoder/decoder. One simply extends the abstract CustomEncodings class by implementing a schema, a read method and a write method. A java field can then be annotated like this: {code} @AvroEncode(using=DateAslongEncoding.class) Date date; {code} The custom encoding implementation would look like {code} public class DateAsLongEncoding extends CustomEncodingDate { { schema = Schema.create(Schema.Type.LONG); schema.addProp(CustomEncoding, DateAsLongEncoding); } @Override public void write(Object datum, Encoder out) throws IOException { out.writeLong(((Date)datum).getTime()); } @Override public Date read(Object reuse, Decoder in) throws IOException { if (reuse != null) { ((Date)reuse).setTime(in.readLong()); return (Date)reuse; } else return new Date(in.readLong()); } } {code} I implemented said annotations and a custom encoding for java.util.Date as a proof of concept and also extended the @Stringable annotations to fields. This issue is a followup of AVRO-1328 and AVRO-1330. -- This message was sent by Atlassian JIRA (v6.3.4#6332)