[jira] [Commented] (AVRO-2199) Validate that field defaults have the correct type

2018-10-10 Thread Daniel Orner (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16645606#comment-16645606
 ] 

Daniel Orner commented on AVRO-2199:


[~theturtle32] I definitely feel your pain. :( But the fact is that the current 
implementation is incorrect - the fact that you have bad data out there is a 
result of the incorrect implementation. In my case we had other Avro consumers 
not written in Ruby whose parsers were stricter that were crashing because we 
were sending bad data, so IMO it's more important to fix the behavior in Ruby 
which doesn't match the Avro specification.

>  Validate that field defaults have the correct type
> ---
>
> Key: AVRO-2199
> URL: https://issues.apache.org/jira/browse/AVRO-2199
> Project: Avro
>  Issue Type: Improvement
>  Components: ruby
>Affects Versions: 1.8.2
>Reporter: Daniel Orner
>Priority: Minor
>
> Currently, on the master branch, when a schema is parsed, it is possible to 
> define a field with a type and a default of a totally different type. E.g. if 
> the field has type "string", the default can be set to "null".
> I'd like to open a PR which will fix this by running the default through the 
> SchemaValidator whenever a new Field is created. See 
> [https://github.com/salsify/avro-patches/pull/16]
> cc: [~tjwp]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AVRO-1335) C++ should support field default values

2018-10-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-1335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16645586#comment-16645586
 ] 

ASF GitHub Bot commented on AVRO-1335:
--

vimota commented on a change in pull request #241: AVRO-1335: Adds C++ support 
for default values in schema serializatio…
URL: https://github.com/apache/avro/pull/241#discussion_r224252562
 
 

 ##
 File path: lang/c++/api/NodeImpl.hh
 ##
 @@ -538,6 +557,16 @@ inline NodePtr resolveSymbol(const NodePtr )
 return symNode->getNode();
 }
 
+template< typename T >
+inline std::string intToHex( T i )
 
 Review comment:
   Done.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> C++ should support field default values
> ---
>
> Key: AVRO-1335
> URL: https://issues.apache.org/jira/browse/AVRO-1335
> Project: Avro
>  Issue Type: Improvement
>  Components: c++
>Affects Versions: 1.7.4
>Reporter: Bin Guo
>Assignee: Victor Mota
>Priority: Major
> Attachments: AVRO-1335.patch
>
>
> We found that resolvingDecoder could not provide bidirectional compatibility 
> between different version of schemas.
> Especially for records, for example:
> {code:title=First schema}
> {
> "type": "record",
> "name": "TestRecord",
> "fields": [
> {
> "name": "MyData",
>   "type": {
>   "type": "record",
>   "name": "SubData",
>   "fields": [
>   {
>   "name": "Version1",
>   "type": "string"
>   }
>   ]
>   }
> },
>   {
> "name": "OtherData",
> "type": "string"
> }
> ]
> }
> {code}
> {code:title=Second schema}
> {
> "type": "record",
> "name": "TestRecord",
> "fields": [
> {
> "name": "MyData",
>   "type": {
>   "type": "record",
>   "name": "SubData",
>   "fields": [
>   {
>   "name": "Version1",
>   "type": "string"
>   },
>   {
>   "name": "Version2",
>   "type": "string"
>   }
>   ]
>   }
> },
>   {
> "name": "OtherData",
> "type": "string"
> }
> ]
> }
> {code}
> Say, node A knows only the first schema and node B knows the second schema, 
> and the second schema has more fields. 
> Any data generated by node B can be resolved by first schema 'cause the 
> additional field is marked as skipped.
> But data generated by node A can not be resolved by second schema and throws 
> an exception *"Don't know how to handle excess fields for reader."*
> This is because data is resolved exactly according to the auto-generated 
> codec_traits which trying to read the excess field.
> The problem is we just can not only ignore the excess field in record, since 
> the data after the troublesome record also needs to be resolved.
> Actually this problem stucked us for a very long time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AVRO-1256) C++ API compileJsonSchema ignores "doc" and custom attributes on a field/record

2018-10-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-1256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16645568#comment-16645568
 ] 

ASF GitHub Bot commented on AVRO-1256:
--

vimota commented on a change in pull request #345:  AVRO-1256: C++ API 
compileJsonSchema ignores "doc" and custom attributes on a field/record
URL: https://github.com/apache/avro/pull/345#discussion_r224248006
 
 

 ##
 File path: lang/c++/impl/NodeImpl.cc
 ##
 @@ -125,54 +125,60 @@ NodeUnion::resolve(const Node ) const
 return match;
 }
 
-SchemaResolution 
+SchemaResolution
 NodeFixed::resolve(const Node ) const
 {
 if(reader.type() == AVRO_FIXED) {
 return (
 (reader.fixedSize() == fixedSize()) &&
-(reader.name() == name()) 
-) ? 
+(reader.name() == name())
+) ?
 RESOLVE_MATCH : RESOLVE_NO_MATCH;
 }
 return furtherResolution(reader);
 }
 
-SchemaResolution 
+SchemaResolution
 NodeSymbolic::resolve(const Node ) const
 {
 const NodePtr  = leafAt(0);
 return node->resolve(reader);
 }
 
-// Wrap an indentation in a struct for ostream operator<< 
-struct indent { 
+// Wrap an indentation in a struct for ostream operator<<
+struct indent {
 indent(int depth) :
 d(depth)
 { }
-int d; 
+int d;
 };
 
 /// ostream operator for indent
 std::ostream& operator <<(std::ostream , indent x)
 {
 static const std::string spaces("");
 while(x.d--) {
-os << spaces; 
+os << spaces;
 }
 return os;
 }
 
-void 
+void
 NodePrimitive::printJson(std::ostream , int depth) const
 {
 os << '\"' << type() << '\"';
+if (getDoc().size()) {
+os << ",\n" << indent(depth) << "\"doc\": \"" << getDoc() << "\"";
 
 Review comment:
   Once we add escape() function in 
https://github.com/apache/avro/pull/241/files we should wrap getDoc() in 
escape().


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> C++ API compileJsonSchema ignores "doc" and custom attributes on a 
> field/record
> ---
>
> Key: AVRO-1256
> URL: https://issues.apache.org/jira/browse/AVRO-1256
> Project: Avro
>  Issue Type: Improvement
>  Components: c++
>Affects Versions: 1.7.2
> Environment: Running on all platforms (Windows, OSX, Linux)
>Reporter: Tim Menninger
>Priority: Minor
> Attachments: AVRO-1256.patch
>
>
> It appears that when my JSON is compiled into a valid schema object it is 
> ignoring all types of "documentation" that I am trying to adorn with each 
> field in my record. Reading through the Java issues it seems that this was a 
> bug and fixed (AVRO-601, AVRO-612, AVRO-779) but it seems the C++ 
> implementation has yet to adopt this feature? This is my sample schema, I 
> have attempted to insert both "doc" and "mycustom" in multiple places to see 
> if it is supported at any level. Please excuse if there appears to be a 
> syntax error in the JSON I hand tweaked some of this. The schema is valid and 
> successfully parses.
> {
>   "type": "record",
>   "name": "myschema",
>   "doc": "Doc Meta",
>   "mycustom": "My Custom",
>   "fields": [
>   { "name":"field_a","type":["string","null"], "doc":"Doc Meta", 
> "mycustom":"My Custom A"},
>   { "name":"field_b","type":["string","null"], "doc":"Doc Meta", 
> "mycustom":"My Custom B"},
>   { "name":"field_c","type":["string","null"], "doc":"Doc Meta", 
> "mycustom":"My Custom C"}
>   ]
> }
> I looked through the SchemaTests.cc code for 1.7.3 and there was not a test 
> case for this there so i didn't think this was addressed in that version. I 
> am running 1.7.2. When this schema is used to load with compileJsonSchema and 
> then a file is serialized the file schema looks like this.
> {
>   "type":"record",
>   "name":"myschema",
>   "fields": [
>   { "name":"field_a","type":["string","null"]},
>   { "name":"field_b","type":["string","null"]},
>   { "name":"field_c","type":["string","null"]}
>   ]
> }



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AVRO-2199) Validate that field defaults have the correct type

2018-10-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16645527#comment-16645527
 ] 

ASF GitHub Bot commented on AVRO-2199:
--

theturtle32 commented on issue #320: AVRO-2199:  Validate that field defaults 
have the correct type
URL: https://github.com/apache/avro/pull/320#issuecomment-428729499
 
 
   This has an unfortunate side effect: it breaks applications from reading 
existing data that's been encoded with older versions of schemas that had 
incompatible default values specified. The library won't even finish parsing 
the old schemas, so you can't decode existing data.
   
   The Avro specification says that when you have a union type, any default 
value must be of the first type specified in the union. We have millions of 
records of data encoded using a schema where the default value was of the 
second or third type in the union instead of the first type. This always worked 
just fine before but with this more stringent validation, our application 
suddenly fails to decode our existing data because it raises an error while 
parsing the writer's schema.
   
   I'm going to try to write a monkey-patch to relax this validation 
temporarily, long enough to read all existing data, migrate it to a newer 
schema, and re-save it. But that's a lot of time and effort to compensate for a 
sudden change in schema validation.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


>  Validate that field defaults have the correct type
> ---
>
> Key: AVRO-2199
> URL: https://issues.apache.org/jira/browse/AVRO-2199
> Project: Avro
>  Issue Type: Improvement
>  Components: ruby
>Affects Versions: 1.8.2
>Reporter: Daniel Orner
>Priority: Minor
>
> Currently, on the master branch, when a schema is parsed, it is possible to 
> define a field with a type and a default of a totally different type. E.g. if 
> the field has type "string", the default can be set to "null".
> I'd like to open a PR which will fix this by running the default through the 
> SchemaValidator whenever a new Field is created. See 
> [https://github.com/salsify/avro-patches/pull/16]
> cc: [~tjwp]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (AVRO-2199) Validate that field defaults have the correct type

2018-10-10 Thread Brian McKelvey (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16645523#comment-16645523
 ] 

Brian McKelvey edited comment on AVRO-2199 at 10/10/18 8:57 PM:


This has an unfortunate side effect: it breaks applications from reading 
existing data that's been encoded with older versions of schemas that had 
incompatible default values specified. The library won't even finish parsing 
the old schemas, so you can't decode existing data.

The Avro specification says that when you have a union type, any default value 
must be of the first type specified in the union. We have millions of records 
of data encoded using a schema where the default value was of the second or 
third type in the union instead of the first type. This always worked just fine 
before but with this more stringent validation, our application suddenly fails 
to decode our existing data because it raises an error while parsing the 
writer's schema.

I'm going to try to write a monkey-patch to relax this validation temporarily, 
long enough to read all existing data, migrate it to a newer schema, and 
re-save it. But that's a lot of time and effort to compensate for a sudden 
change in schema validation.


was (Author: theturtle32):
This has an unfortunate side effect: it breaks applications from reading 
existing data that's been encoded with older versions of schemas that had 
incorrect default values specified. The library won't even finish parsing the 
old schemas, so you can't decode existing data.

The Avro specification says that when you have a union type, any default value 
must be of the first type specified in the union. We have millions of records 
of data encoded using a schema where the default value was of the second or 
third type in the union. This always worked before, but with this more 
stringent validation, our application suddenly fails to decode our existing 
data because it raises an error while parsing the old historical schema.

I'm going to try to write a monkey-patch to relax this validation temporarily, 
long enough to read all existing data, migrate it to a newer schema, and 
re-save it. But that's a lot of time and effort to compensate for a sudden 
change in schema validation.

>  Validate that field defaults have the correct type
> ---
>
> Key: AVRO-2199
> URL: https://issues.apache.org/jira/browse/AVRO-2199
> Project: Avro
>  Issue Type: Improvement
>  Components: ruby
>Affects Versions: 1.8.2
>Reporter: Daniel Orner
>Priority: Minor
>
> Currently, on the master branch, when a schema is parsed, it is possible to 
> define a field with a type and a default of a totally different type. E.g. if 
> the field has type "string", the default can be set to "null".
> I'd like to open a PR which will fix this by running the default through the 
> SchemaValidator whenever a new Field is created. See 
> [https://github.com/salsify/avro-patches/pull/16]
> cc: [~tjwp]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AVRO-2199) Validate that field defaults have the correct type

2018-10-10 Thread Brian McKelvey (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16645523#comment-16645523
 ] 

Brian McKelvey commented on AVRO-2199:
--

This has an unfortunate side effect: it breaks applications from reading 
existing data that's been encoded with older versions of schemas that had 
incorrect default values specified. The library won't even finish parsing the 
old schemas, so you can't decode existing data.

The Avro specification says that when you have a union type, any default value 
must be of the first type specified in the union. We have millions of records 
of data encoded using a schema where the default value was of the second or 
third type in the union. This always worked before, but with this more 
stringent validation, our application suddenly fails to decode our existing 
data because it raises an error while parsing the old historical schema.

I'm going to try to write a monkey-patch to relax this validation temporarily, 
long enough to read all existing data, migrate it to a newer schema, and 
re-save it. But that's a lot of time and effort to compensate for a sudden 
change in schema validation.

>  Validate that field defaults have the correct type
> ---
>
> Key: AVRO-2199
> URL: https://issues.apache.org/jira/browse/AVRO-2199
> Project: Avro
>  Issue Type: Improvement
>  Components: ruby
>Affects Versions: 1.8.2
>Reporter: Daniel Orner
>Priority: Minor
>
> Currently, on the master branch, when a schema is parsed, it is possible to 
> define a field with a type and a default of a totally different type. E.g. if 
> the field has type "string", the default can be set to "null".
> I'd like to open a PR which will fix this by running the default through the 
> SchemaValidator whenever a new Field is created. See 
> [https://github.com/salsify/avro-patches/pull/16]
> cc: [~tjwp]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AVRO-1256) C++ API compileJsonSchema ignores "doc" and custom attributes on a field/record

2018-10-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-1256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16645387#comment-16645387
 ] 

ASF GitHub Bot commented on AVRO-1256:
--

aniket486 opened a new pull request #345:  AVRO-1256: C++ API compileJsonSchema 
ignores "doc" and custom attributes on a field/record
URL: https://github.com/apache/avro/pull/345
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> C++ API compileJsonSchema ignores "doc" and custom attributes on a 
> field/record
> ---
>
> Key: AVRO-1256
> URL: https://issues.apache.org/jira/browse/AVRO-1256
> Project: Avro
>  Issue Type: Improvement
>  Components: c++
>Affects Versions: 1.7.2
> Environment: Running on all platforms (Windows, OSX, Linux)
>Reporter: Tim Menninger
>Priority: Minor
> Attachments: AVRO-1256.patch
>
>
> It appears that when my JSON is compiled into a valid schema object it is 
> ignoring all types of "documentation" that I am trying to adorn with each 
> field in my record. Reading through the Java issues it seems that this was a 
> bug and fixed (AVRO-601, AVRO-612, AVRO-779) but it seems the C++ 
> implementation has yet to adopt this feature? This is my sample schema, I 
> have attempted to insert both "doc" and "mycustom" in multiple places to see 
> if it is supported at any level. Please excuse if there appears to be a 
> syntax error in the JSON I hand tweaked some of this. The schema is valid and 
> successfully parses.
> {
>   "type": "record",
>   "name": "myschema",
>   "doc": "Doc Meta",
>   "mycustom": "My Custom",
>   "fields": [
>   { "name":"field_a","type":["string","null"], "doc":"Doc Meta", 
> "mycustom":"My Custom A"},
>   { "name":"field_b","type":["string","null"], "doc":"Doc Meta", 
> "mycustom":"My Custom B"},
>   { "name":"field_c","type":["string","null"], "doc":"Doc Meta", 
> "mycustom":"My Custom C"}
>   ]
> }
> I looked through the SchemaTests.cc code for 1.7.3 and there was not a test 
> case for this there so i didn't think this was addressed in that version. I 
> am running 1.7.2. When this schema is used to load with compileJsonSchema and 
> then a file is serialized the file schema looks like this.
> {
>   "type":"record",
>   "name":"myschema",
>   "fields": [
>   { "name":"field_a","type":["string","null"]},
>   { "name":"field_b","type":["string","null"]},
>   { "name":"field_c","type":["string","null"]}
>   ]
> }



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)