[GitHub] avro pull request: Update JsonIO.hh

2015-04-02 Thread hatemhelal
Github user hatemhelal closed the pull request at:

https://github.com/apache/avro/pull/15


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] avro pull request: AVRO-1593. Use classic locale to escape control...

2015-04-02 Thread hatemhelal
GitHub user hatemhelal opened a pull request:

https://github.com/apache/avro/pull/32

AVRO-1593. Use classic locale to escape control chars

See https://issues.apache.org/jira/browse/AVRO-1593

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/hatemhelal/avro patch-1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/avro/pull/32.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #32


commit 3c87e7cbf2a8f03c570ec763b7d92da80bc86326
Author: hatemhelal hatem.he...@gmail.com
Date:   2015-04-02T09:41:12Z

AVRO-1593. Use classic locale to escape control chars

See https://issues.apache.org/jira/browse/AVRO-1593




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (AVRO-1593) C++ json encoder assumes C locale and generates invalid UTF-8 sequence

2015-04-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14392454#comment-14392454
 ] 

ASF GitHub Bot commented on AVRO-1593:
--

GitHub user hatemhelal opened a pull request:

https://github.com/apache/avro/pull/32

AVRO-1593. Use classic locale to escape control chars

See https://issues.apache.org/jira/browse/AVRO-1593

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/hatemhelal/avro patch-1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/avro/pull/32.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #32


commit 3c87e7cbf2a8f03c570ec763b7d92da80bc86326
Author: hatemhelal hatem.he...@gmail.com
Date:   2015-04-02T09:41:12Z

AVRO-1593. Use classic locale to escape control chars

See https://issues.apache.org/jira/browse/AVRO-1593




 C++ json encoder assumes C locale and generates invalid UTF-8 sequence 
 -

 Key: AVRO-1593
 URL: https://issues.apache.org/jira/browse/AVRO-1593
 Project: Avro
  Issue Type: Bug
  Components: c++
Affects Versions: 1.7.7
 Environment: windows-1252 encoding
Reporter: Hatem Helal
Priority: Critical
 Fix For: 1.7.8


 encoding a multibyte UTF-8 code point such as:
 \xEF\xBD\x81
 Incorrectly becomes:
 \xEF\xBD\U0081
 When encoded in the service running in the windows-1252 locale.  This isn¹t a 
 valid UTF-8 sequence so we end up with Mojibake when reading back the JSON 
 encoded string.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AVRO-1593) C++ json encoder assumes C locale and generates invalid UTF-8 sequence

2015-04-02 Thread Hatem Helal (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14392457#comment-14392457
 ] 

Hatem Helal commented on AVRO-1593:
---

Just created a new pull request:

https://github.com/apache/avro/pull/32

which uses a simpler solution of using the std::locale::classic object:

http://en.cppreference.com/w/cpp/locale/locale/classic



 C++ json encoder assumes C locale and generates invalid UTF-8 sequence 
 -

 Key: AVRO-1593
 URL: https://issues.apache.org/jira/browse/AVRO-1593
 Project: Avro
  Issue Type: Bug
  Components: c++
Affects Versions: 1.7.7
 Environment: windows-1252 encoding
Reporter: Hatem Helal
Priority: Critical
 Fix For: 1.7.8


 encoding a multibyte UTF-8 code point such as:
 \xEF\xBD\x81
 Incorrectly becomes:
 \xEF\xBD\U0081
 When encoded in the service running in the windows-1252 locale.  This isn¹t a 
 valid UTF-8 sequence so we end up with Mojibake when reading back the JSON 
 encoded string.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (AVRO-1657) Namespaced reader schema w/field aliases can not read non-namespaced writer schema

2015-04-02 Thread David Korz (JIRA)
David Korz created AVRO-1657:


 Summary: Namespaced reader schema w/field aliases can not read 
non-namespaced writer schema
 Key: AVRO-1657
 URL: https://issues.apache.org/jira/browse/AVRO-1657
 Project: Avro
  Issue Type: Bug
  Components: java
Affects Versions: 1.7.7
Reporter: David Korz


The writer uses a non-namespaced schema as follows:
{noformat}
{ 
type:record,
name: MyRecord,
fields:[
{
type:string,
name:Name
},
{
type:double,
name:Temperature
}
 ]
}
{noformat}
The reader uses a namespaced schema with a field alias.
{noformat}
{ 
type:record,
name: MyRecord,
namespace: com.example,
aliases: [.MyRecord],
fields:[
{
type:string,
name:Name
},
{
type:double,
name:TemperatureC,
aliases: [Temperature]
}
 ]
}
{noformat}
The following reading code will fail.
{noformat}
DatumReaderMyRecord datumReader = new SpecificDatumReader(MyRecord.class);
FileReaderMyRecord fileReader = DataFileReader.openReader(file, datumReader);

MyRecord record = null;
while (fileReader.hasNext()) {
record = fileReader.next(record);
CharSequence name = record.getName();
Double temp = record.getTemperatureC();
System.out.println(name +   + temp);
}
{noformat}
The reader's alias is not found.
{noformat}
Exception in thread main org.apache.avro.AvroTypeException: Found MyRecord, 
expecting com.example.MyRecord, missing required field TemperatureC
at 
org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:292)
at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
at 
org.apache.avro.io.ResolvingDecoder.readFieldOrder(ResolvingDecoder.java:130)
at 
org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:176)
at 
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151)
at 
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142)
at org.apache.avro.file.DataFileStream.next(DataFileStream.java:233)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AVRO-1657) Namespaced reader schema w/field aliases can not read non-namespaced writer schema

2015-04-02 Thread David Korz (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393782#comment-14393782
 ] 

David Korz commented on AVRO-1657:
--

The problem is that in org.apache.avro.Schema.applyAliases(Schema,Map,Map,Map) 
the call to getFieldAlias() passes the Name of the writer schema and the 
fieldAliases map has an entry keyed with the reader's namespaced Name so no 
alias is found for the field TemperatureC.

 Namespaced reader schema w/field aliases can not read non-namespaced writer 
 schema
 --

 Key: AVRO-1657
 URL: https://issues.apache.org/jira/browse/AVRO-1657
 Project: Avro
  Issue Type: Bug
  Components: java
Affects Versions: 1.7.7
Reporter: David Korz

 The writer uses a non-namespaced schema as follows:
 {noformat}
 { 
 type:record,
 name: MyRecord,
 fields:[
 {
 type:string,
 name:Name
 },
 {
 type:double,
 name:Temperature
 }
  ]
 }
 {noformat}
 The reader uses a namespaced schema with a field alias.
 {noformat}
 { 
 type:record,
 name: MyRecord,
 namespace: com.example,
 aliases: [.MyRecord],
 fields:[
 {
 type:string,
 name:Name
 },
 {
 type:double,
 name:TemperatureC,
 aliases: [Temperature]
 }
  ]
 }
 {noformat}
 The following reading code will fail.
 {noformat}
 DatumReaderMyRecord datumReader = new SpecificDatumReader(MyRecord.class);
 FileReaderMyRecord fileReader = DataFileReader.openReader(file, 
 datumReader);
 
 MyRecord record = null;
 while (fileReader.hasNext()) {
 record = fileReader.next(record);
 CharSequence name = record.getName();
 Double temp = record.getTemperatureC();
 System.out.println(name +   + temp);
 }
 {noformat}
 The reader's alias is not found.
 {noformat}
 Exception in thread main org.apache.avro.AvroTypeException: Found MyRecord, 
 expecting com.example.MyRecord, missing required field TemperatureC
   at 
 org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:292)
   at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
   at 
 org.apache.avro.io.ResolvingDecoder.readFieldOrder(ResolvingDecoder.java:130)
   at 
 org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:176)
   at 
 org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151)
   at 
 org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142)
   at org.apache.avro.file.DataFileStream.next(DataFileStream.java:233)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AVRO-1341) Allow controlling avro via java annotations when using reflection.

2015-04-02 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393138#comment-14393138
 ] 

Ryan Blue commented on AVRO-1341:
-

[~sunzhaonan], there isn't an @AvroDoc annotation, but its a great idea. Could 
you open an issue to add it? It shouldn't be too much work, either.

 Allow controlling avro via java annotations when using reflection. 
 ---

 Key: AVRO-1341
 URL: https://issues.apache.org/jira/browse/AVRO-1341
 Project: Avro
  Issue Type: New Feature
  Components: java
Reporter: Vincenz Priesnitz
Assignee: Vincenz Priesnitz
 Fix For: 1.7.5

 Attachments: AVRO-1341.patch, AVRO-1341.patch, AVRO-1341.patch, 
 AVRO-1341.patch, AVRO-1341.patch, AVRO-1341.patch, AVRO-1341.patch


 It would be great if one could control avro with java annotations. As of now, 
 it is already possible to mark fields as Nullable or classes being encoded as 
 a String. I propose a bigger set of annotations to control the behavior of 
 avro on fields and classes. Such annotations have proven useful with jacksons 
 json serialization and morphias mongoDB serialization.
 I propose the following additional annotations: 
 @AvroName(alternativeName)
 @AvroAlias(alias=alias, space=space)
 @AvroIgnore
 @AvroMeta(key=K, value=V)
 @AvroEncode(using=CustomEncoding.class)
 Java fields with the @AvroName(alternativeName) annotation will be renamed 
 in the induced schema. When reading an avro file via reflection, the 
 reflection reader will look for fields in the schema with alternativeName. 
 For example:
 {code}
@AvroName(foo)
int bar;  
 {code}
 is serialized as
 {code}
   { name : foo, type : int } 
 {code}
 The @AvroAlias annotation will add a new alias to the induced schema of a 
 record, enum or field. The space parameter is optional and defaults to the 
 namespace of the named schema the alias is added to.
 Fields with the @AvroIgnore annotation will be treated as if they had a 
 transient modifier, i.e. they will not be written to or read from avro files. 
 The @AvroMeta(key=K, value=V) annotation allows you to store an arbitrary 
 key : value pair at every node in the schema.
 {code}
@AvroMeta(key=fieldKey, value=fieldValue)
int foo;  
 {code}
 will create the following schema
 {code}
 {name : foo, type : int, fieldKey : fieldValue } 
 {code}
 Fields can be custom encoded with the AvroEncode(using=CustomEncoding.class) 
 annotation. This annotation is a generalization of the @Stringable 
 annotation. The @Stringable annotation is limited to classes with string 
 argument constructors. Some classes can be similarly reduced to a smaller 
 class or even a single primitive, but dont fit the requirements for 
 @Stringable. A prominent example is java.util.Date, which instances can 
 essentially be described with a single long. Such classes can now be encoded 
 with a CustomEncoding, which reads and writes directly from the 
 encoder/decoder. 
 One simply extends the abstract CustomEncodings class by implementing a 
 schema, a read method and a write method. A java field can then be annotated 
 like this:
 {code}
 @AvroEncode(using=DateAslongEncoding.class)
 Date date;
 {code}
 The custom encoding implementation would look like 
 {code}
 public class DateAsLongEncoding extends CustomEncodingDate {
   {
 schema = Schema.create(Schema.Type.LONG);
 schema.addProp(CustomEncoding, DateAsLongEncoding);
   }
   
   @Override
   public void write(Object datum, Encoder out) throws IOException {
 out.writeLong(((Date)datum).getTime());
   }
   
   @Override
   public Date read(Object reuse, Decoder in) throws IOException {
 if (reuse != null) {
   ((Date)reuse).setTime(in.readLong());
   return (Date)reuse;
 }
 else return new Date(in.readLong());
   }
 }
 {code}
 I implemented said annotations and a custom encoding for java.util.Date as a 
 proof of concept and also extended the @Stringable annotations to fields.
 This issue is a followup of AVRO-1328 and AVRO-1330.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AVRO-1341) Allow controlling avro via java annotations when using reflection.

2015-04-02 Thread Zhaonan Sun (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393091#comment-14393091
 ] 

Zhaonan Sun commented on AVRO-1341:
---

Looks like @AvroMeta can't add reserved fields, like @AvroMeta(doc, some 
doc) will have exceptions.
Do we have a @AvroDoc or something similar to this?

 Allow controlling avro via java annotations when using reflection. 
 ---

 Key: AVRO-1341
 URL: https://issues.apache.org/jira/browse/AVRO-1341
 Project: Avro
  Issue Type: New Feature
  Components: java
Reporter: Vincenz Priesnitz
Assignee: Vincenz Priesnitz
 Fix For: 1.7.5

 Attachments: AVRO-1341.patch, AVRO-1341.patch, AVRO-1341.patch, 
 AVRO-1341.patch, AVRO-1341.patch, AVRO-1341.patch, AVRO-1341.patch


 It would be great if one could control avro with java annotations. As of now, 
 it is already possible to mark fields as Nullable or classes being encoded as 
 a String. I propose a bigger set of annotations to control the behavior of 
 avro on fields and classes. Such annotations have proven useful with jacksons 
 json serialization and morphias mongoDB serialization.
 I propose the following additional annotations: 
 @AvroName(alternativeName)
 @AvroAlias(alias=alias, space=space)
 @AvroIgnore
 @AvroMeta(key=K, value=V)
 @AvroEncode(using=CustomEncoding.class)
 Java fields with the @AvroName(alternativeName) annotation will be renamed 
 in the induced schema. When reading an avro file via reflection, the 
 reflection reader will look for fields in the schema with alternativeName. 
 For example:
 {code}
@AvroName(foo)
int bar;  
 {code}
 is serialized as
 {code}
   { name : foo, type : int } 
 {code}
 The @AvroAlias annotation will add a new alias to the induced schema of a 
 record, enum or field. The space parameter is optional and defaults to the 
 namespace of the named schema the alias is added to.
 Fields with the @AvroIgnore annotation will be treated as if they had a 
 transient modifier, i.e. they will not be written to or read from avro files. 
 The @AvroMeta(key=K, value=V) annotation allows you to store an arbitrary 
 key : value pair at every node in the schema.
 {code}
@AvroMeta(key=fieldKey, value=fieldValue)
int foo;  
 {code}
 will create the following schema
 {code}
 {name : foo, type : int, fieldKey : fieldValue } 
 {code}
 Fields can be custom encoded with the AvroEncode(using=CustomEncoding.class) 
 annotation. This annotation is a generalization of the @Stringable 
 annotation. The @Stringable annotation is limited to classes with string 
 argument constructors. Some classes can be similarly reduced to a smaller 
 class or even a single primitive, but dont fit the requirements for 
 @Stringable. A prominent example is java.util.Date, which instances can 
 essentially be described with a single long. Such classes can now be encoded 
 with a CustomEncoding, which reads and writes directly from the 
 encoder/decoder. 
 One simply extends the abstract CustomEncodings class by implementing a 
 schema, a read method and a write method. A java field can then be annotated 
 like this:
 {code}
 @AvroEncode(using=DateAslongEncoding.class)
 Date date;
 {code}
 The custom encoding implementation would look like 
 {code}
 public class DateAsLongEncoding extends CustomEncodingDate {
   {
 schema = Schema.create(Schema.Type.LONG);
 schema.addProp(CustomEncoding, DateAsLongEncoding);
   }
   
   @Override
   public void write(Object datum, Encoder out) throws IOException {
 out.writeLong(((Date)datum).getTime());
   }
   
   @Override
   public Date read(Object reuse, Decoder in) throws IOException {
 if (reuse != null) {
   ((Date)reuse).setTime(in.readLong());
   return (Date)reuse;
 }
 else return new Date(in.readLong());
   }
 }
 {code}
 I implemented said annotations and a custom encoding for java.util.Date as a 
 proof of concept and also extended the @Stringable annotations to fields.
 This issue is a followup of AVRO-1328 and AVRO-1330.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)