[jira] [Commented] (AVRO-1861) Avro Schema parser treats Avro float type as Java Double for default values

2016-06-09 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15322882#comment-15322882
 ] 

Ryan Blue commented on AVRO-1861:
-

Sounds reasonable to me. If you want to submit a PR for this, I'll review it. 
Thanks, [~amok]!

> Avro Schema parser treats Avro float type as Java Double for default values
> ---
>
> Key: AVRO-1861
> URL: https://issues.apache.org/jira/browse/AVRO-1861
> Project: Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.8.1
>Reporter: Andy Mok
>
> The following code snippet in the [Schema 
> class|https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/Schema.java]
>  shows that we explicitly treat Avro {{FLOAT}} and {{DOUBLE}} as a Java 
> {{Double}}.
> {code:java}
> JsonNode defaultValue = field.get("default");
> if (defaultValue != null
> && (Type.FLOAT.equals(fieldSchema.getType())
> || Type.DOUBLE.equals(fieldSchema.getType()))
> && defaultValue.isTextual())
>   defaultValue =
> new DoubleNode(Double.valueOf(defaultValue.getTextValue()));
> {code}
> Jackson has support for 
> [FloatNode|https://fasterxml.github.io/jackson-databind/javadoc/2.3.0/com/fasterxml/jackson/databind/node/FloatNode.html]
>  so why don't we use that?
> This is a problem when someone calls 
> [Schema.Field#defaultVal|https://avro.apache.org/docs/1.8.1/api/java/org/apache/avro/Schema.Field.html#defaultVal()]
>  for an Avro field with Avro type {{FLOAT}} and they try to typecast the 
> object to a Java {{float}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (AVRO-1848) Can't use null or false defaults in Ruby

2016-06-09 Thread Brian McKelvey (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian McKelvey updated AVRO-1848:
-
Fix Version/s: 1.8.2

> Can't use null or false defaults in Ruby
> 
>
> Key: AVRO-1848
> URL: https://issues.apache.org/jira/browse/AVRO-1848
> Project: Avro
>  Issue Type: Bug
>  Components: ruby
>Affects Versions: 1.8.0
> Environment: Any
>Reporter: Brian McKelvey
>Priority: Critical
>  Labels: easyfix
> Fix For: 1.8.2
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> When calling {{to_avro}} on an {{Avro::Schema::Field}} instance (part of 
> calling {{to_avro}} on an instance of {{Avro::Schema::RecordSchema}}), it 
> will not include the default value definition if the default value is falsey.
> The offending code is:
> {code:ruby}
>   def to_avro(names=Set.new)
> {'name' => name, 'type' => type.to_avro(names)}.tap do |avro|
>   avro['default'] = default if default
>   avro['order'] = order if order
> end
>   end
> {code}
> Using the {{if default}} conditional predicate here is inappropriate, as is 
> relying on {{nil}} values to represent no default, because {{null}} in JSON 
> maps to {{nil}} in Ruby.
> This is a critical show-stopper to using AvroTurf with the Confluent Schema 
> Registry because it is quietly uploading incorrect schemas, causing 
> downstream readers to behave incorrectly and also causing the schema registry 
> to reject new schema versions as incompatible when they are actually just 
> fine if the falsey default values are included when submitting the schema to 
> the registry.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Avro union compatibility mode enhancement proposal

2016-06-09 Thread Matthieu Monsch
Thinking about this a bit more (and a couple months later…), maybe there is a 
simpler alternative.

Currently, a reason why writer evolution is hard (the union issue described 
below is a special case of this) is that aliases are only used on the reader 
side. Why not also allow readers to use the writer’s aliases?

Resolution would first be done on names, then fall back to reader aliases, and 
finally fall back to writer aliases. In the example below, it would be enough 
to add an alias to the base record inside any new records to have evolution 
work.

-Matthieu



> On Apr 22, 2016, at 8:42 AM, Matthieu Monsch  wrote:
> 
> The second solution sounds like a great alternative.
> 
> Branch aliases are more straightforward than an implicit order-sensitive 
> policy. They also have the additional benefit of giving users a bit more 
> flexibility: since defaults are specified on the branches’ types, it is 
> possible to have different branches have different defaults inside the same 
> union. There are probably a few edge cases (e.g. allowing multiple such 
> aliases would be useful) but they should be simple to address.
> 
> What would be a good attribute name for this? `baseTypes`?
> 
> -Matthieu
> 
> 
> 
>> On Apr 21, 2016, at 10:52 AM, Doug Cutting  wrote:
>> 
>> On Wed, Apr 20, 2016 at 9:09 PM, Ryan Blue  wrote:
>>> Making the default a property of an
>>> inner schema makes me think that we will have to deal with multiple schemas
>>> with such a label at some point.
>> 
>> On Thu, Apr 21, 2016 at 6:54 AM, Matthieu Monsch  wrote:
>>> Delegating default selection to the branches themselves is a great idea but 
>>> it
>>> will be tricky to handle reference branches smoothly. More minor but it also
>>> doesn’t feel intuitive to not have the union “own” its default attribute.
>> 
>> If I understand your concerns correctly, I attempted to address this above:
>> 
>> "Note however that, when using a record as the default branch, one
>> could not then
>> use that same record as a non-default branch in another union.  To
>> ameliorate that, we might permit multiple default branches in a union
>> to be specified as default with the convention that the first such is
>> used."
>> 
>> Does that make sense?
>> 
>> This isn't ideal syntax, but it's not terrible, and it doesn't change
>> schema syntax incompatibly, which seems important, especially when its
>> unlikely that all implementations would implement such a syntax change
>> in a synchronized manner.
>> 
>> Alternately, one might annotate each derived record with the name of
>> its base record, then one wouldn't need to alter union definitions.
>> This would work like an alias.  If a record doesn't exist in the
>> reader's schema, then an alias to the missing record would be added in
>> the reader's schema to the base record it names in the writer's
>> schema.  Aliases work by rewriting the writer's schema at read-time,
>> updating names, including those in unions.  Might that work?  It seems
>> like perhaps a more elegant approach.  It has compatible syntax and
>> only alters behavior of a case that fails today.
>> 
>> Doug
> 



[jira] [Comment Edited] (AVRO-1340) use default to allow old readers to specify default enum value when encountering new enum symbols

2016-06-09 Thread Matthieu Monsch (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15323752#comment-15323752
 ] 

Matthieu Monsch edited comment on AVRO-1340 at 6/10/16 2:27 AM:


Zoltan's suggestion makes sense.

Alternatively we could allow each enum symbol to optionally declare another 
symbol to use if the original doesn’t exist in the reader’s schema (resolution 
could be transitive). It’s slightly more flexible since it allows aliasing to 
be done per symbol and defined when the new symbol is added (rather than when 
the union is initially created). A downside is that it can be more verbose.

The JSON schema could look something like:

{code:javascript}
{
  “type”: “enum”,
  “name”: “Suit”,
  “symbols”: [“UNKNOWN”, "CLUBS", "HEARTS", “SPADES”, “DIAMONDS”],
  “symbolAliases”: {
“DIAMONDS”: “UNKNOWN”,
“SPADES”: “UNKNOWN"
  }
}
{code}

And the IDL:

{code:java}
enum Suit {
  UNKNOWN,
  CLUBS,
  HEARTS,
  SPADES ? UNKNOWN,
  DIAMONDS ? UNKNOWN
}
{code}


was (Author: mtth):
Zoltan's suggestion makes sense.

Alternatively we could allow each enum symbol to optionally declare another 
symbol to use if the original doesn’t exist in the reader’s schema (resolution 
could be transitive). It’s slightly more flexible since it allows aliasing to 
be done per symbol and defined when the new symbol is added (rather than when 
the union is initially created). A downside is that it can be more verbose.

The JSON schema could look something like:

{code:json}
{
  “type”: “enum”,
  “name”: “Suit”,
  “symbols”: [“UNKNOWN”, "CLUBS", "HEARTS", “SPADES”, “DIAMONDS”],
  “symbolAliases”: {
“DIAMONDS”: “UNKNOWN”,
“SPADES”: “UNKNOWN"
  }
}
{code}

And the IDL:

{code:java}
enum Suit {
  UNKNOWN,
  CLUBS,
  HEARTS,
  SPADES ? UNKNOWN,
  DIAMONDS ? UNKNOWN
}
{code}

> use default to allow old readers to specify default enum value when 
> encountering new enum symbols
> -
>
> Key: AVRO-1340
> URL: https://issues.apache.org/jira/browse/AVRO-1340
> Project: Avro
>  Issue Type: Improvement
>  Components: spec
> Environment: N/A
>Reporter: Jim Donofrio
>Priority: Minor
>
> The schema resolution page says:
> > if both are enums:
> > if the writer's symbol is not present in the reader's enum, then an
> error is signalled.
> This makes it difficult to use enum's because you can never add a enum value 
> and keep old reader's compatible. Why not use the default option to refer to 
> one of enum values so that when a old reader encounters a enum ordinal it 
> does not recognize, it can default to the optional schema provided one. If 
> the old schema does not provide a default then the older reader can continue 
> to fail as it does today.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AVRO-1340) use default to allow old readers to specify default enum value when encountering new enum symbols

2016-06-09 Thread Matthieu Monsch (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15323752#comment-15323752
 ] 

Matthieu Monsch commented on AVRO-1340:
---

Zoltan's suggestion makes sense.

Alternatively we could allow each enum symbol to optionally declare another 
symbol to use if the original doesn’t exist in the reader’s schema (resolution 
could be transitive). It’s slightly more flexible since it allows aliasing to 
be done per symbol and defined when the new symbol is added (rather than when 
the union is initially created). A downside is that it can be more verbose.

The JSON schema could look something like:

{code:json}
{
  “type”: “enum”,
  “name”: “Suit”,
  “symbols”: [“UNKNOWN”, "CLUBS", "HEARTS", “SPADES”, “DIAMONDS”],
  “symbolAliases”: {
“DIAMONDS”: “UNKNOWN”,
“SPADES”: “UNKNOWN"
  }
}
{code}

And the IDL:

{code:java}
enum Suit {
  UNKNOWN,
  CLUBS,
  HEARTS,
  SPADES ? UNKNOWN,
  DIAMONDS ? UNKNOWN
}
{code}

> use default to allow old readers to specify default enum value when 
> encountering new enum symbols
> -
>
> Key: AVRO-1340
> URL: https://issues.apache.org/jira/browse/AVRO-1340
> Project: Avro
>  Issue Type: Improvement
>  Components: spec
> Environment: N/A
>Reporter: Jim Donofrio
>Priority: Minor
>
> The schema resolution page says:
> > if both are enums:
> > if the writer's symbol is not present in the reader's enum, then an
> error is signalled.
> This makes it difficult to use enum's because you can never add a enum value 
> and keep old reader's compatible. Why not use the default option to refer to 
> one of enum values so that when a old reader encounters a enum ordinal it 
> does not recognize, it can default to the optional schema provided one. If 
> the old schema does not provide a default then the older reader can continue 
> to fail as it does today.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)