[jira] [Commented] (AVRO-3210) how the Avro Schema with Union type can accept the ‘normal JSON’

2021-10-04 Thread Zoltan Farkas (Jira)


[ 
https://issues.apache.org/jira/browse/AVRO-3210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17424120#comment-17424120
 ] 

Zoltan Farkas commented on AVRO-3210:
-

This issue is basically a duplicate of AVRO-1582

I have been using a [custom json 
encoder/decoder|https://github.com/zolyfarkas/avro/blob/trunk/lang/java/avro/src/main/java/org/apache/avro/io/ExtendedJsonDecoder.java]
 for this. I think somebody with time on their hand should should add a new 
json enc/dec...

> how the Avro Schema with Union type can accept the ‘normal JSON’ 
> -
>
> Key: AVRO-3210
> URL: https://issues.apache.org/jira/browse/AVRO-3210
> Project: Apache Avro
>  Issue Type: Improvement
>Reporter: Ning Chang
>Priority: Major
> Attachments: test2.avsc
>
>
> how the Avro Schema with Union type can accept the ‘normal JSON’
> Avro Schema; 
> {
> "name": "middle_name",
> "type": [
> "null",
> "string"
> ],
> "default": null
> }
>  
> how to accept the Normal json payload like:   "middle_name": "chang" ,  not 
> the one:  
> "middle_name": {
> "string": "chang"
> }
> Thanks In advance.
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AVRO-3135) Add schema serialization/deserialization hooks, to aid implementation of "schema references"

2021-10-04 Thread Zoltan Farkas (Jira)


[ 
https://issues.apache.org/jira/browse/AVRO-3135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17424117#comment-17424117
 ] 

Zoltan Farkas commented on AVRO-3135:
-

I have a PR for this available:

https://github.com/apache/avro/pull/1217

> Add schema serialization/deserialization hooks, to aid implementation of 
> "schema references"
> 
>
> Key: AVRO-3135
> URL: https://issues.apache.org/jira/browse/AVRO-3135
> Project: Apache Avro
>  Issue Type: New Feature
>  Components: java
>Reporter: Zoltan Farkas
>Assignee: Zoltan Farkas
>Priority: Major
>
> This capability's main use case is to allow easy implementation of schema 
> references. For a more detailed writeup please see: 
> https://github.com/zolyfarkas/jaxrs-spf4j-demo/wiki/AvroReferences#why-avro-references



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (AVRO-3135) Add schema serialization/deserialization hooks, to aid implementation of "schema references"

2021-05-11 Thread Zoltan Farkas (Jira)
Zoltan Farkas created AVRO-3135:
---

 Summary: Add schema serialization/deserialization hooks, to aid 
implementation of "schema references"
 Key: AVRO-3135
 URL: https://issues.apache.org/jira/browse/AVRO-3135
 Project: Apache Avro
  Issue Type: New Feature
  Components: java
Reporter: Zoltan Farkas
Assignee: Zoltan Farkas


This capability's main use case is to allow easy implementation of schema 
references. For a more detailed writeup please see: 
https://github.com/zolyfarkas/jaxrs-spf4j-demo/wiki/AvroReferences#why-avro-references




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (AVRO-2938) Make Conversion more generic.

2020-10-09 Thread Zoltan Farkas (Jira)
Zoltan Farkas created AVRO-2938:
---

 Summary: Make Conversion more generic.
 Key: AVRO-2938
 URL: https://issues.apache.org/jira/browse/AVRO-2938
 Project: Apache Avro
  Issue Type: Improvement
Reporter: Zoltan Farkas


Currently  Conversion is parametrizable only by the java Type.
However, I think there would be a benefit to add an extra parameter:
Conversion

this way for DecimalConversion implementation would become cleaner without the 
need of casts everywhere the logical type attributes are needed:

```
@Override
public BigDecimal fromBytes(ByteBuffer value, Schema schema, LogicalType 
type) {
  int scale = ((Decimal) type).getScale();
```

to 

```
@Override
public BigDecimal fromBytes(ByteBuffer value, Schema schema, Decimal type) {
  int scale = type.getScale();
```

I could do a PR for this is there are no abjections.







--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (AVRO-2936) Small optimization of GenericData.addLogicalTypeConversion

2020-10-05 Thread Zoltan Farkas (Jira)
Zoltan Farkas created AVRO-2936:
---

 Summary: Small optimization of GenericData.addLogicalTypeConversion
 Key: AVRO-2936
 URL: https://issues.apache.org/jira/browse/AVRO-2936
 Project: Apache Avro
  Issue Type: Improvement
Affects Versions: 1.10.0
Reporter: Zoltan Farkas
Assignee: Zoltan Farkas


GenericData.addLogicalTypeConversion can be written a bit more efficiently to 
avoid double lookup.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AVRO-2278) GenericData.Record field getter not correct

2020-05-09 Thread Zoltan Farkas (Jira)


[ 
https://issues.apache.org/jira/browse/AVRO-2278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17103258#comment-17103258
 ] 

Zoltan Farkas commented on AVRO-2278:
-

[~anhldbk] to clarify we are discussing here the semantics of GenericRecord.get.

regarding Opt #1, existence of a field can already be checked with: 
GenericRecord.getSchema().getField(String) !=null.

In both instances,  leaving get as is in my opinion is a bad idea, it has the 
potential to hide bugs, and I would think that is not something people want to 
do...


> GenericData.Record field getter not correct
> ---
>
> Key: AVRO-2278
> URL: https://issues.apache.org/jira/browse/AVRO-2278
> Project: Apache Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.8.2, 1.9.2
>Reporter: Zoltan Farkas
>Assignee: Zoltan Farkas
>Priority: Major
>
> Currently the get field implementation is not correct in GenericData.Record:
> at: 
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/generic/GenericData.java#L209
> {code}
>@Override public Object get(String key) {
>   Field field = schema.getField(key);
>   if (field == null) return null;
>   return values[field.pos()];
> }
> {code}
> The method returns null when a field is not present, making it impossible to 
> distinguish between:
> field value = null
> and
> field does not exist.
> A more "correct" implementation would be:
> {code}
> @Override public Object get(String key) {
>   Field field = schema.getField(key);
>   if (field == null) {
> throw new IllegalArgumentException("Invalid field " + key);
>   }
>   return values[field.pos()];
> }
> {code}
> this will make the behavior consistent with put which will throw a exception 
> when setting a non existent field.
> when I make this change in my fork, some bugs in unit tests showed up



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AVRO-2278) GenericData.Record field getter not correct

2020-05-06 Thread Zoltan Farkas (Jira)


[ 
https://issues.apache.org/jira/browse/AVRO-2278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17100864#comment-17100864
 ] 

Zoltan Farkas commented on AVRO-2278:
-

There is basically one behavior that we are changing here significantly: 
GenericData.Record.get(String) will throw an exception  instead of returning 
null when querying record for an invalid field. The other changes are 
cleanup... (we are throwing AvroRuntimeException instead of NPE).

So if we decide to protect this via a feature flag, we should only do this for 
GenericData.Record.get(String) 

we should also not forget that currently the 2 implementations of 
GenericRecord.get(String):

GenericData.Record.get(String)
SpecificRecordBase.get(String)

are inconsistent in behavior... one returns null for non-existent field, the 
other throws a NPE... so you can't rely on current behavior for 
GenericRecord.get...

But since we are talking here about 1.10 and not 1.8.x and 1.9.x, where users 
will not expect API compatibility, I think in this specific scenario, a feature 
flag will add more complexity for not much benefit...

how can we get more people to chime in? 

> GenericData.Record field getter not correct
> ---
>
> Key: AVRO-2278
> URL: https://issues.apache.org/jira/browse/AVRO-2278
> Project: Apache Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.8.2, 1.9.2
>Reporter: Zoltan Farkas
>Assignee: Zoltan Farkas
>Priority: Major
>
> Currently the get field implementation is not correct in GenericData.Record:
> at: 
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/generic/GenericData.java#L209
> {code}
>@Override public Object get(String key) {
>   Field field = schema.getField(key);
>   if (field == null) return null;
>   return values[field.pos()];
> }
> {code}
> The method returns null when a field is not present, making it impossible to 
> distinguish between:
> field value = null
> and
> field does not exist.
> A more "correct" implementation would be:
> {code}
> @Override public Object get(String key) {
>   Field field = schema.getField(key);
>   if (field == null) {
> throw new IllegalArgumentException("Invalid field " + key);
>   }
>   return values[field.pos()];
> }
> {code}
> this will make the behavior consistent with put which will throw a exception 
> when setting a non existent field.
> when I make this change in my fork, some bugs in unit tests showed up



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (AVRO-1607) Minor performance enhancement

2020-05-06 Thread Zoltan Farkas (Jira)


 [ 
https://issues.apache.org/jira/browse/AVRO-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Farkas updated AVRO-1607:

Description: 
In SpecificData.getClass, line 164:

{code}
case UNION:
  List types = schema.getTypes(); // elide unions with null
  if ((types.size() == 2) && types.contains(NULL_SCHEMA))
return getWrapper(types.get(types.get(0).equals(NULL_SCHEMA) ? 1 : 0));
  return Object.class;
{code}

can be written more efficiently as:

{code}
case UNION:
  List types = schema.getTypes(); // elide unions with null
  if ((types.size() == 2)) {
if (NULL_SCHEMA.equals(types.get(0))) {
  return getWrapper(types.get(1));
} else if (NULL_SCHEMA.equals(types.get(1))) {
   return getWrapper(types.get(0));
}
  }
  return Object.class;
{code}



  was:
In SpecificData.getClass, line 164:

case UNION:
  List types = schema.getTypes(); // elide unions with null
  if ((types.size() == 2) && types.contains(NULL_SCHEMA))
return getWrapper(types.get(types.get(0).equals(NULL_SCHEMA) ? 1 : 0));
  return Object.class;

can be written more efficiently as:

case UNION:
  List types = schema.getTypes(); // elide unions with null
  if ((types.size() == 2)) {
if (NULL_SCHEMA.equals(types.get(0))) {
  return getWrapper(types.get(1));
} else if (NULL_SCHEMA.equals(types.get(1))) {
   return getWrapper(types.get(0));
}
  }
  return Object.class;





> Minor performance enhancement
> -
>
> Key: AVRO-1607
> URL: https://issues.apache.org/jira/browse/AVRO-1607
> Project: Apache Avro
>  Issue Type: Improvement
>  Components: java
>Affects Versions: 1.8.0
>Reporter: Zoltan Farkas
>Priority: Minor
>   Original Estimate: 5m
>  Remaining Estimate: 5m
>
> In SpecificData.getClass, line 164:
> {code}
> case UNION:
>   List types = schema.getTypes(); // elide unions with null
>   if ((types.size() == 2) && types.contains(NULL_SCHEMA))
> return getWrapper(types.get(types.get(0).equals(NULL_SCHEMA) ? 1 : 
> 0));
>   return Object.class;
> {code}
> can be written more efficiently as:
> {code}
> case UNION:
>   List types = schema.getTypes(); // elide unions with null
>   if ((types.size() == 2)) {
> if (NULL_SCHEMA.equals(types.get(0))) {
>   return getWrapper(types.get(1));
> } else if (NULL_SCHEMA.equals(types.get(1))) {
>return getWrapper(types.get(0));
> }
>   }
>   return Object.class;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AVRO-1603) maven avro plugin to also generate avsc files.

2020-05-06 Thread Zoltan Farkas (Jira)


[ 
https://issues.apache.org/jira/browse/AVRO-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17100790#comment-17100790
 ] 

Zoltan Farkas commented on AVRO-1603:
-

One other reason for this is to instead of having the avro schema strings 
inside the generated .class files (without being able to even get to them), we 
could just load the $SCHEMA from these avsc resources instead, and reduce the 
memory footprint of our app a bit.

> maven avro plugin to also generate avsc files.
> --
>
> Key: AVRO-1603
> URL: https://issues.apache.org/jira/browse/AVRO-1603
> Project: Apache Avro
>  Issue Type: New Feature
>  Components: java
>Affects Versions: 1.8.0
>Reporter: Zoltan Farkas
>Priority: Minor
>
> It would be nice to be able to generate also all avsc schema files during 
> compilation.
> This schema files than could be packages, versioned, distributed with maven...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (AVRO-1580) Use newer version of surefire, 2.17 instead of 2.12

2020-05-06 Thread Zoltan Farkas (Jira)


 [ 
https://issues.apache.org/jira/browse/AVRO-1580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Farkas resolved AVRO-1580.
-
Resolution: Fixed

> Use newer version of surefire, 2.17 instead of 2.12
> ---
>
> Key: AVRO-1580
> URL: https://issues.apache.org/jira/browse/AVRO-1580
> Project: Apache Avro
>  Issue Type: Improvement
>  Components: build, java
>Affects Versions: 1.7.7
>Reporter: Zoltan Farkas
>Priority: Minor
>  Labels: beginner, build
>
> version 2.12 does not work well with Netbeans 8.0 and makes development 
> cumbersome.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (AVRO-2278) GenericData.Record field getter not correct

2020-05-06 Thread Zoltan Farkas (Jira)


[ 
https://issues.apache.org/jira/browse/AVRO-2278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17100697#comment-17100697
 ] 

Zoltan Farkas edited comment on AVRO-2278 at 5/6/20, 11:49 AM:
---

[~rskraba] what about controlling this behavior with a System Property? Based 
on the experience with the field default value validation flag, I would not 
bother making this configurable... we would need to at least document this in 
the release notes though.


was (Author: zolyfarkas):
[~rskraba] what about controlling this behavior with a System Property? Based 
on the experience with the default validation flag I would not bother making 
this configurable... we would need to at least document this in the release 
notes though.

> GenericData.Record field getter not correct
> ---
>
> Key: AVRO-2278
> URL: https://issues.apache.org/jira/browse/AVRO-2278
> Project: Apache Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.8.2, 1.9.2
>Reporter: Zoltan Farkas
>Assignee: Zoltan Farkas
>Priority: Major
>
> Currently the get field implementation is not correct in GenericData.Record:
> at: 
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/generic/GenericData.java#L209
> {code}
>@Override public Object get(String key) {
>   Field field = schema.getField(key);
>   if (field == null) return null;
>   return values[field.pos()];
> }
> {code}
> The method returns null when a field is not present, making it impossible to 
> distinguish between:
> field value = null
> and
> field does not exist.
> A more "correct" implementation would be:
> {code}
> @Override public Object get(String key) {
>   Field field = schema.getField(key);
>   if (field == null) {
> throw new IllegalArgumentException("Invalid field " + key);
>   }
>   return values[field.pos()];
> }
> {code}
> this will make the behavior consistent with put which will throw a exception 
> when setting a non existent field.
> when I make this change in my fork, some bugs in unit tests showed up



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (AVRO-2278) GenericData.Record field getter not correct

2020-05-06 Thread Zoltan Farkas (Jira)


[ 
https://issues.apache.org/jira/browse/AVRO-2278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17100697#comment-17100697
 ] 

Zoltan Farkas edited comment on AVRO-2278 at 5/6/20, 11:48 AM:
---

[~rskraba] what about controlling this behavior with a System Property? Based 
on the experience with the default validation flag I would not bother making 
this configurable... we would need to at least document this in the release 
notes though.


was (Author: zolyfarkas):
[~rskraba] what about controlling this behavior with a System Property? 

> GenericData.Record field getter not correct
> ---
>
> Key: AVRO-2278
> URL: https://issues.apache.org/jira/browse/AVRO-2278
> Project: Apache Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.8.2, 1.9.2
>Reporter: Zoltan Farkas
>Assignee: Zoltan Farkas
>Priority: Major
>
> Currently the get field implementation is not correct in GenericData.Record:
> at: 
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/generic/GenericData.java#L209
> {code}
>@Override public Object get(String key) {
>   Field field = schema.getField(key);
>   if (field == null) return null;
>   return values[field.pos()];
> }
> {code}
> The method returns null when a field is not present, making it impossible to 
> distinguish between:
> field value = null
> and
> field does not exist.
> A more "correct" implementation would be:
> {code}
> @Override public Object get(String key) {
>   Field field = schema.getField(key);
>   if (field == null) {
> throw new IllegalArgumentException("Invalid field " + key);
>   }
>   return values[field.pos()];
> }
> {code}
> this will make the behavior consistent with put which will throw a exception 
> when setting a non existent field.
> when I make this change in my fork, some bugs in unit tests showed up



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AVRO-2278) GenericData.Record field getter not correct

2020-05-06 Thread Zoltan Farkas (Jira)


[ 
https://issues.apache.org/jira/browse/AVRO-2278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17100697#comment-17100697
 ] 

Zoltan Farkas commented on AVRO-2278:
-

[~rskraba] what about controlling this behavior with a System Property? 

> GenericData.Record field getter not correct
> ---
>
> Key: AVRO-2278
> URL: https://issues.apache.org/jira/browse/AVRO-2278
> Project: Apache Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.8.2, 1.9.2
>Reporter: Zoltan Farkas
>Assignee: Zoltan Farkas
>Priority: Major
>
> Currently the get field implementation is not correct in GenericData.Record:
> at: 
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/generic/GenericData.java#L209
> {code}
>@Override public Object get(String key) {
>   Field field = schema.getField(key);
>   if (field == null) return null;
>   return values[field.pos()];
> }
> {code}
> The method returns null when a field is not present, making it impossible to 
> distinguish between:
> field value = null
> and
> field does not exist.
> A more "correct" implementation would be:
> {code}
> @Override public Object get(String key) {
>   Field field = schema.getField(key);
>   if (field == null) {
> throw new IllegalArgumentException("Invalid field " + key);
>   }
>   return values[field.pos()];
> }
> {code}
> this will make the behavior consistent with put which will throw a exception 
> when setting a non existent field.
> when I make this change in my fork, some bugs in unit tests showed up



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (AVRO-2278) GenericData.Record field getter not correct

2020-04-28 Thread Zoltan Farkas (Jira)


 [ 
https://issues.apache.org/jira/browse/AVRO-2278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Farkas updated AVRO-2278:

Affects Version/s: 1.9.2

> GenericData.Record field getter not correct
> ---
>
> Key: AVRO-2278
> URL: https://issues.apache.org/jira/browse/AVRO-2278
> Project: Apache Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.8.2, 1.9.2
>Reporter: Zoltan Farkas
>Assignee: Zoltan Farkas
>Priority: Major
>
> Currently the get field implementation is not correct in GenericData.Record:
> at: 
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/generic/GenericData.java#L209
> {code}
>@Override public Object get(String key) {
>   Field field = schema.getField(key);
>   if (field == null) return null;
>   return values[field.pos()];
> }
> {code}
> The method returns null when a field is not present, making it impossible to 
> distinguish between:
> field value = null
> and
> field does not exist.
> A more "correct" implementation would be:
> {code}
> @Override public Object get(String key) {
>   Field field = schema.getField(key);
>   if (field == null) {
> throw new IllegalArgumentException("Invalid field " + key);
>   }
>   return values[field.pos()];
> }
> {code}
> this will make the behavior consistent with put which will throw a exception 
> when setting a non existent field.
> when I make this change in my fork, some bugs in unit tests showed up



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (AVRO-2278) GenericData.Record field getter not correct

2020-04-28 Thread Zoltan Farkas (Jira)


 [ 
https://issues.apache.org/jira/browse/AVRO-2278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Farkas reassigned AVRO-2278:
---

Assignee: Zoltan Farkas

> GenericData.Record field getter not correct
> ---
>
> Key: AVRO-2278
> URL: https://issues.apache.org/jira/browse/AVRO-2278
> Project: Apache Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.8.2
>Reporter: Zoltan Farkas
>Assignee: Zoltan Farkas
>Priority: Major
>
> Currently the get field implementation is not correct in GenericData.Record:
> at: 
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/generic/GenericData.java#L209
> {code}
>@Override public Object get(String key) {
>   Field field = schema.getField(key);
>   if (field == null) return null;
>   return values[field.pos()];
> }
> {code}
> The method returns null when a field is not present, making it impossible to 
> distinguish between:
> field value = null
> and
> field does not exist.
> A more "correct" implementation would be:
> {code}
> @Override public Object get(String key) {
>   Field field = schema.getField(key);
>   if (field == null) {
> throw new IllegalArgumentException("Invalid field " + key);
>   }
>   return values[field.pos()];
> }
> {code}
> this will make the behavior consistent with put which will throw a exception 
> when setting a non existent field.
> when I make this change in my fork, some bugs in unit tests showed up



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AVRO-2278) GenericData.Record field getter not correct

2020-04-28 Thread Zoltan Farkas (Jira)


[ 
https://issues.apache.org/jira/browse/AVRO-2278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17094476#comment-17094476
 ] 

Zoltan Farkas commented on AVRO-2278:
-

updated PR.

> GenericData.Record field getter not correct
> ---
>
> Key: AVRO-2278
> URL: https://issues.apache.org/jira/browse/AVRO-2278
> Project: Apache Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.8.2
>Reporter: Zoltan Farkas
>Priority: Major
>
> Currently the get field implementation is not correct in GenericData.Record:
> at: 
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/generic/GenericData.java#L209
> {code}
>@Override public Object get(String key) {
>   Field field = schema.getField(key);
>   if (field == null) return null;
>   return values[field.pos()];
> }
> {code}
> The method returns null when a field is not present, making it impossible to 
> distinguish between:
> field value = null
> and
> field does not exist.
> A more "correct" implementation would be:
> {code}
> @Override public Object get(String key) {
>   Field field = schema.getField(key);
>   if (field == null) {
> throw new IllegalArgumentException("Invalid field " + key);
>   }
>   return values[field.pos()];
> }
> {code}
> this will make the behavior consistent with put which will throw a exception 
> when setting a non existent field.
> when I make this change in my fork, some bugs in unit tests showed up



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AVRO-2278) GenericData.Record field getter not correct

2020-04-28 Thread Zoltan Farkas (Jira)


[ 
https://issues.apache.org/jira/browse/AVRO-2278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17094455#comment-17094455
 ] 

Zoltan Farkas commented on AVRO-2278:
-

[~rskraba] Excellent points!
I would make all accesses by index to throw IndexOutOfBoundsException 
consistently. The exception I think is a better fit for this scenario.
The SpecificRecord access by name NPE behavior needs to be changed to be in 
sync with the GenericRecord.

The change in behavior might break apps, which is why we would make this change 
in a release where at lear the minor version number changes.

Will update the PR with your suggestions.

> GenericData.Record field getter not correct
> ---
>
> Key: AVRO-2278
> URL: https://issues.apache.org/jira/browse/AVRO-2278
> Project: Apache Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.8.2
>Reporter: Zoltan Farkas
>Priority: Major
>
> Currently the get field implementation is not correct in GenericData.Record:
> at: 
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/generic/GenericData.java#L209
> {code}
>@Override public Object get(String key) {
>   Field field = schema.getField(key);
>   if (field == null) return null;
>   return values[field.pos()];
> }
> {code}
> The method returns null when a field is not present, making it impossible to 
> distinguish between:
> field value = null
> and
> field does not exist.
> A more "correct" implementation would be:
> {code}
> @Override public Object get(String key) {
>   Field field = schema.getField(key);
>   if (field == null) {
> throw new IllegalArgumentException("Invalid field " + key);
>   }
>   return values[field.pos()];
> }
> {code}
> this will make the behavior consistent with put which will throw a exception 
> when setting a non existent field.
> when I make this change in my fork, some bugs in unit tests showed up



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AVRO-2278) GenericData.Record field getter not correct

2020-04-23 Thread Zoltan Farkas (Jira)


[ 
https://issues.apache.org/jira/browse/AVRO-2278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17090604#comment-17090604
 ] 

Zoltan Farkas commented on AVRO-2278:
-

Created PR: https://github.com/apache/avro/pull/864

> GenericData.Record field getter not correct
> ---
>
> Key: AVRO-2278
> URL: https://issues.apache.org/jira/browse/AVRO-2278
> Project: Apache Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.8.2
>Reporter: Zoltan Farkas
>Priority: Major
>
> Currently the get field implementation is not correct in GenericData.Record:
> at: 
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/generic/GenericData.java#L209
> {code}
>@Override public Object get(String key) {
>   Field field = schema.getField(key);
>   if (field == null) return null;
>   return values[field.pos()];
> }
> {code}
> The method returns null when a field is not present, making it impossible to 
> distinguish between:
> field value = null
> and
> field does not exist.
> A more "correct" implementation would be:
> {code}
> @Override public Object get(String key) {
>   Field field = schema.getField(key);
>   if (field == null) {
> throw new IllegalArgumentException("Invalid field " + key);
>   }
>   return values[field.pos()];
> }
> {code}
> this will make the behavior consistent with put which will throw a exception 
> when setting a non existent field.
> when I make this change in my fork, some bugs in unit tests showed up



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AVRO-2742) Schema.Parser.parse() does not validate namespace

2020-02-19 Thread Zoltan Farkas (Jira)


[ 
https://issues.apache.org/jira/browse/AVRO-2742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17039936#comment-17039936
 ] 

Zoltan Farkas commented on AVRO-2742:
-

[~rskraba] before I went the path of disabling name validation, I was thinking 
of implementing something similar to percent encoding (use _ instead of %) to 
basically convert arbitrary strings into ids... 

disabling validation was the path of least resistance...




> Schema.Parser.parse() does not validate namespace
> -
>
> Key: AVRO-2742
> URL: https://issues.apache.org/jira/browse/AVRO-2742
> Project: Apache Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.9.2
>Reporter: radai rosenblatt
>Priority: Major
>
> [the spec|https://avro.apache.org/docs/current/spec.html#names] has the 
> following to say about names:
> {quote}The name portion of a fullname, record field names, and enum symbols 
> must: ... A namespace is a dot-separated sequence of such names.
> {quote}
> and yet the following schema parses just fine for me:
> {code:java}
> {
>   "type": "record",
>   "namespace": "this thing. has spaces.in it?!",
>   "name": "HasInvalidNamespace",
>   "fields": [
> {
>   "name": "stringField",
>   "type": "string"
> }
>   ]
> }
> {code}
> am I misunderstanding the spec? also, even if this is technically a legal 
> schema it will never survive code generation of specific record classes (at 
> least in java?)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AVRO-2742) Schema.Parser.parse() does not validate namespace

2020-02-18 Thread Zoltan Farkas (Jira)


[ 
https://issues.apache.org/jira/browse/AVRO-2742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17039480#comment-17039480
 ] 

Zoltan Farkas commented on AVRO-2742:
-

[~radai] right now, you do restrict yourself to generic records (json works, 
the examples I shared use the json encoder.)

Now the spec is not ambiguous, but since there is code to disable name 
validation there are probably other use cases... 






> Schema.Parser.parse() does not validate namespace
> -
>
> Key: AVRO-2742
> URL: https://issues.apache.org/jira/browse/AVRO-2742
> Project: Apache Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.9.2
>Reporter: radai rosenblatt
>Priority: Major
>
> [the spec|https://avro.apache.org/docs/current/spec.html#names] has the 
> following to say about names:
> {quote}The name portion of a fullname, record field names, and enum symbols 
> must: ... A namespace is a dot-separated sequence of such names.
> {quote}
> and yet the following schema parses just fine for me:
> {code:java}
> {
>   "type": "record",
>   "namespace": "this thing. has spaces.in it?!",
>   "name": "HasInvalidNamespace",
>   "fields": [
> {
>   "name": "stringField",
>   "type": "string"
> }
>   ]
> }
> {code}
> am I misunderstanding the spec? also, even if this is technically a legal 
> schema it will never survive code generation of specific record classes (at 
> least in java?)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AVRO-2742) Schema.Parser.parse() does not validate namespace

2020-02-18 Thread Zoltan Farkas (Jira)


[ 
https://issues.apache.org/jira/browse/AVRO-2742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17039445#comment-17039445
 ] 

Zoltan Farkas commented on AVRO-2742:
-

Well the schema parser, explicitly allows to disable name validation 
(https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/Schema.java#L1331):

{code}
/** Enable or disable name validation. */
public Parser setValidate(boolean validate) {
  this.validate = validate;
  return this;
}
{code}

I am not sure why the implementation allows this, but let me give an example of 
where I use avro with name validation disabled:

When  playing around with avro as a "frontend" for apache calcite, I was 
converting a calcite result-set row to a avro record, the following query 
resultset would not be possible to easily represent in avro:

https://demo.spf4j.org/avql/query?query=select%20originPlanet,count(*)%20as%20nrSpecies,sum(averageLifeSpanYears)/count(*)%20from%20species%20group%20by%20originPlanet

one could work around it with "as":

https://demo.spf4j.org/avql/query?query=select%20originPlanet,count(*)%20as%20nrSpecies,sum(averageLifeSpanYears)/count(*)%20as%20avgLifeSpanYears%20from%20species%20group%20by%20originPlanet

The naming restrictions are useful only when generating specific records. But 
not every use case needs them.

for more background on my experiment see: 
https://github.com/zolyfarkas/jaxrs-spf4j-demo/wiki/AvroCalciteRest

Another thing is aliases, I am not sure what the spec says about them, but they 
are not validated at all...


> Schema.Parser.parse() does not validate namespace
> -
>
> Key: AVRO-2742
> URL: https://issues.apache.org/jira/browse/AVRO-2742
> Project: Apache Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.9.2
>Reporter: radai rosenblatt
>Priority: Major
>
> [the spec|https://avro.apache.org/docs/current/spec.html#names] has the 
> following to say about names:
> {quote}The name portion of a fullname, record field names, and enum symbols 
> must: ... A namespace is a dot-separated sequence of such names.
> {quote}
> and yet the following schema parses just fine for me:
> {code:java}
> {
>   "type": "record",
>   "namespace": "this thing. has spaces.in it?!",
>   "name": "HasInvalidNamespace",
>   "fields": [
> {
>   "name": "stringField",
>   "type": "string"
> }
>   ]
> }
> {code}
> am I misunderstanding the spec? also, even if this is technically a legal 
> schema it will never survive code generation of specific record classes (at 
> least in java?)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (AVRO-2057) JsonDecoder.skipChildren does not skip map/records correctly

2020-01-31 Thread Zoltan Farkas (Jira)


 [ 
https://issues.apache.org/jira/browse/AVRO-2057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Farkas resolved AVRO-2057.
-
Resolution: Implemented

Looks like this was resolved by another chamge

> JsonDecoder.skipChildren does not skip map/records correctly
> 
>
> Key: AVRO-2057
> URL: https://issues.apache.org/jira/browse/AVRO-2057
> Project: Apache Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.8.2
>Reporter: Zoltan Farkas
>Priority: Critical
>
> at 
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/io/JsonDecoder.java#L585
> {code}
>   @Override
>   public JsonParser skipChildren() throws IOException {
> JsonToken tkn = elements.get(pos).token;
> int level = (tkn == JsonToken.START_ARRAY || tkn == 
> JsonToken.END_ARRAY) ? 1 : 0;
> while (level > 0) {
>   switch(elements.get(++pos).token) {
>   case START_ARRAY:
>   case START_OBJECT:
> level++;
> break;
>   case END_ARRAY:
>   case END_OBJECT:
> level--;
> break;
>   }
> }
> return this;
>   }
> {code}
> should be:
> {code}
>   @Override
>   public JsonParser skipChildren() throws IOException {
> JsonToken tkn = elements.get(pos).token;
> int level = (tkn == JsonToken.START_ARRAY || tkn == 
> JsonToken.START_OBJECT) ? 1 : 0;
> while (level > 0) {
>   switch(elements.get(++pos).token) {
>   case START_ARRAY:
>   case START_OBJECT:
> level++;
> break;
>   case END_ARRAY:
>   case END_OBJECT:
> level--;
> break;
>   }
> }
> return this;
>   }
> {code}
> This results in de-serialization failures when the reader schema does not 
> have fields that are present in the serialized object and the writer schema. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (AVRO-2716) Unused local variable

2020-01-23 Thread Zoltan Farkas (Jira)
Zoltan Farkas created AVRO-2716:
---

 Summary: Unused local variable
 Key: AVRO-2716
 URL: https://issues.apache.org/jira/browse/AVRO-2716
 Project: Apache Avro
  Issue Type: Bug
Affects Versions: 1.9.1
Reporter: Zoltan Farkas


there is a unused local variable at:

https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/Resolver.java#L233



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AVRO-2418) BUG in schema resolver, when resolving map<> to union {null, map<>}

2019-06-13 Thread Zoltan Farkas (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-2418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16863292#comment-16863292
 ] 

Zoltan Farkas commented on AVRO-2418:
-

Se PR for fix: https://github.com/apache/avro/pull/543


> BUG in schema resolver, when resolving map<> to union {null, map<>}
> ---
>
> Key: AVRO-2418
> URL: https://issues.apache.org/jira/browse/AVRO-2418
> Project: Apache Avro
>  Issue Type: Bug
>Affects Versions: 1.9.0
>Reporter: Zoltan Farkas
>Priority: Major
>
> Here is unit test to reproduce the issue:
> {code}
> package org.apache.avro.io.parsing;
> import java.io.ByteArrayInputStream;
> import java.io.ByteArrayOutputStream;
> import java.io.IOException;
> import java.io.InputStream;
> import java.io.OutputStream;
> import java.util.HashMap;
> import java.util.Map;
> import org.junit.Assert;
> import org.apache.avro.Schema;
> import org.apache.avro.SchemaBuilder;
> import org.apache.avro.generic.GenericData;
> import org.apache.avro.generic.GenericDatumReader;
> import org.apache.avro.generic.GenericDatumWriter;
> import org.apache.avro.generic.GenericRecord;
> import org.apache.avro.generic.GenericRecordBuilder;
> import org.apache.avro.io.DatumReader;
> import org.apache.avro.io.DatumWriter;
> import org.apache.avro.io.Decoder;
> import org.apache.avro.io.DecoderFactory;
> import org.apache.avro.io.Encoder;
> import org.apache.avro.io.EncoderFactory;
> import org.apache.avro.util.Utf8;
> import org.junit.Test;
> /**
>  *
>  * @author Zoltan Farkas
>  */
> public class TestUnionPromotion {
>   @Test
>   public void testUnionPromotionCollection() throws Exception {
> Schema directFieldSchema = 
> SchemaBuilder.record("MyRecord").namespace("ns").fields().name("field1").type().map()
> .values().stringType().noDefault().endRecord();
> Schema schemaWithField = 
> SchemaBuilder.record("MyRecord").namespace("ns").fields().name("field1").type().nullable()
> .map().values().stringType().noDefault().endRecord();
> Map data = new HashMap<>();
> data.put("a", "someValue");
> GenericData.Record record = new 
> GenericRecordBuilder(directFieldSchema).set("field1", data).build();
> ByteArrayOutputStream bos = new ByteArrayOutputStream();
> writeAvroBin(bos, record);
> Object read = readAvroBin(new ByteArrayInputStream(bos.toByteArray()), 
> directFieldSchema, schemaWithField);
> Map name = (Map) ((GenericRecord) read).get("field1");
> Assert.assertEquals("someValue", name.get(new Utf8("a")).toString());
>   }
>   private static Object readAvroBin(final InputStream input, final Schema 
> writerSchema, final Schema readerSchema)
>   throws IOException {
> DatumReader reader = new GenericDatumReader(writerSchema, readerSchema);
> DecoderFactory decoderFactory = DecoderFactory.get();
> Decoder decoder = decoderFactory.binaryDecoder(input, null);
> return reader.read(null, decoder);
>   }
>   private static void writeAvroBin(final OutputStream out, final 
> GenericRecord req) throws IOException {
> @SuppressWarnings("unchecked")
> DatumWriter writer = new GenericDatumWriter(req.getSchema());
> Encoder encoder = EncoderFactory.get().binaryEncoder(out, null);
> writer.write(req, encoder);
> encoder.flush();
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AVRO-2418) BUG in schema resolver, when resolving map<> to union {null, map<>}

2019-06-13 Thread Zoltan Farkas (JIRA)
Zoltan Farkas created AVRO-2418:
---

 Summary: BUG in schema resolver, when resolving map<> to union 
{null, map<>}
 Key: AVRO-2418
 URL: https://issues.apache.org/jira/browse/AVRO-2418
 Project: Apache Avro
  Issue Type: Bug
Affects Versions: 1.9.0
Reporter: Zoltan Farkas


Here is unit test to reproduce the issue:

{code}
package org.apache.avro.io.parsing;

import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.util.HashMap;
import java.util.Map;
import org.junit.Assert;
import org.apache.avro.Schema;
import org.apache.avro.SchemaBuilder;
import org.apache.avro.generic.GenericData;
import org.apache.avro.generic.GenericDatumReader;
import org.apache.avro.generic.GenericDatumWriter;
import org.apache.avro.generic.GenericRecord;
import org.apache.avro.generic.GenericRecordBuilder;
import org.apache.avro.io.DatumReader;
import org.apache.avro.io.DatumWriter;
import org.apache.avro.io.Decoder;
import org.apache.avro.io.DecoderFactory;
import org.apache.avro.io.Encoder;
import org.apache.avro.io.EncoderFactory;
import org.apache.avro.util.Utf8;
import org.junit.Test;

/**
 *
 * @author Zoltan Farkas
 */
public class TestUnionPromotion {

  @Test
  public void testUnionPromotionCollection() throws Exception {
Schema directFieldSchema = 
SchemaBuilder.record("MyRecord").namespace("ns").fields().name("field1").type().map()
.values().stringType().noDefault().endRecord();
Schema schemaWithField = 
SchemaBuilder.record("MyRecord").namespace("ns").fields().name("field1").type().nullable()
.map().values().stringType().noDefault().endRecord();
Map data = new HashMap<>();
data.put("a", "someValue");
GenericData.Record record = new 
GenericRecordBuilder(directFieldSchema).set("field1", data).build();
ByteArrayOutputStream bos = new ByteArrayOutputStream();
writeAvroBin(bos, record);
Object read = readAvroBin(new ByteArrayInputStream(bos.toByteArray()), 
directFieldSchema, schemaWithField);
Map name = (Map) ((GenericRecord) read).get("field1");
Assert.assertEquals("someValue", name.get(new Utf8("a")).toString());

  }

  private static Object readAvroBin(final InputStream input, final Schema 
writerSchema, final Schema readerSchema)
  throws IOException {
DatumReader reader = new GenericDatumReader(writerSchema, readerSchema);
DecoderFactory decoderFactory = DecoderFactory.get();
Decoder decoder = decoderFactory.binaryDecoder(input, null);
return reader.read(null, decoder);
  }

  private static void writeAvroBin(final OutputStream out, final GenericRecord 
req) throws IOException {
@SuppressWarnings("unchecked")
DatumWriter writer = new GenericDatumWriter(req.getSchema());
Encoder encoder = EncoderFactory.get().binaryEncoder(out, null);
writer.write(req, encoder);
encoder.flush();
  }

}
{code}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AVRO-2412) Improved default value reading

2019-05-31 Thread Zoltan Farkas (JIRA)
Zoltan Farkas created AVRO-2412:
---

 Summary: Improved default value reading
 Key: AVRO-2412
 URL: https://issues.apache.org/jira/browse/AVRO-2412
 Project: Apache Avro
  Issue Type: Improvement
Affects Versions: 1.9.0
Reporter: Zoltan Farkas


at GenericData.getDefaultValue

https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/generic/GenericData.java#L1164

instead of:

{code}
ByteArrayOutputStream baos = new ByteArrayOutputStream();
BinaryEncoder encoder = EncoderFactory.get().binaryEncoder(baos, null);
Accessor.encode(encoder, field.schema(), json);
encoder.flush();
BinaryDecoder decoder = 
DecoderFactory.get().binaryDecoder(baos.toByteArray(), null);
defaultValue = createDatumReader(field.schema()).read(null, decoder);
{code}

wouldn't it be better to do?: 

{code}
  Schema schema = field.schema();
  if (schema.getType() == Type.UNION) {
schema = schema.getTypes().get(0);
  }
  JsonDecoder decoder = new JsonDecoder(schema, 
json.traverse(Schema.MAPPER));
  defaultValue = createDatumReader(schema).read(null, decoder);
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AVRO-1723) Add support for forward declarations in avro IDL

2019-05-24 Thread Zoltan Farkas (JIRA)


 [ 
https://issues.apache.org/jira/browse/AVRO-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Farkas updated AVRO-1723:

Fix Version/s: 1.9.0

> Add support for forward declarations in avro IDL
> 
>
> Key: AVRO-1723
> URL: https://issues.apache.org/jira/browse/AVRO-1723
> Project: Apache Avro
>  Issue Type: Improvement
>Affects Versions: 1.8.0
>Reporter: Zoltan Farkas
>Assignee: Zoltan Farkas
>Priority: Major
> Fix For: 1.9.0
>
> Attachments: AVRO-1723.patch
>
>
> Currently Recursive data structures like:
> record SampleNode {
>int count = 0;
>array samples = [];
> }
> record SamplePair {
>  string name;
>  SampleNode node;
> }
> It is not possible to declare in IDL,
> however it is possible to declare in avsc (with fix from 
> https://issues.apache.org/jira/browse/AVRO-1667 )
> It is actually not complicated to implement, here is some detail on a 
> possible implementation:
> https://github.com/zolyfarkas/avro/commit/210c50105717149f3daa39b8d4160b8548b8e363
> This would close a capability gap with google protocol buffers...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AVRO-1723) Add support for forward declarations in avro IDL

2019-05-24 Thread Zoltan Farkas (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16847769#comment-16847769
 ] 

Zoltan Farkas commented on AVRO-1723:
-

[~davidcarltonsumo] should be part of 1.9.0, I see the changes in the 
release-1.9.0 tag.

> Add support for forward declarations in avro IDL
> 
>
> Key: AVRO-1723
> URL: https://issues.apache.org/jira/browse/AVRO-1723
> Project: Apache Avro
>  Issue Type: Improvement
>Affects Versions: 1.8.0
>Reporter: Zoltan Farkas
>Assignee: Zoltan Farkas
>Priority: Major
> Attachments: AVRO-1723.patch
>
>
> Currently Recursive data structures like:
> record SampleNode {
>int count = 0;
>array samples = [];
> }
> record SamplePair {
>  string name;
>  SampleNode node;
> }
> It is not possible to declare in IDL,
> however it is possible to declare in avsc (with fix from 
> https://issues.apache.org/jira/browse/AVRO-1667 )
> It is actually not complicated to implement, here is some detail on a 
> possible implementation:
> https://github.com/zolyfarkas/avro/commit/210c50105717149f3daa39b8d4160b8548b8e363
> This would close a capability gap with google protocol buffers...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AVRO-2201) avro-maven-plugin should be able to import schema from generated specificrecord

2019-03-31 Thread Zoltan Farkas (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16806129#comment-16806129
 ] 

Zoltan Farkas commented on AVRO-2201:
-

Another way to achieve a similar result is to publish the generated avsc (and 
idl) along with the java classes. Once you do that you can import them easily 
through te existing methods...

I have some examples here: [https://github.com/zolyfarkas/avro-schema-examples] 

not generating duplicate classes is being done by the plugin itself...

 

> avro-maven-plugin should be able to import schema from generated 
> specificrecord
> ---
>
> Key: AVRO-2201
> URL: https://issues.apache.org/jira/browse/AVRO-2201
> Project: Apache Avro
>  Issue Type: Improvement
>  Components: java
>Reporter: mateusz kanarek
>Priority: Major
>
> Currently SchemaMojo can read avro schemas only from avsc files.
> One can define avro schema and using current SchemaMojo build a jar with 
> generated SpecificRecord(s).
> But there is no easy way to reference this schema from avsc schema in another 
> maven project.
> It would be great if avsc files could reference to avro schema from already 
> generated SpecificRecord.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AVRO-2137) avro JsonDecoding additional field in array type

2019-02-28 Thread Zoltan Farkas (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-2137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16781145#comment-16781145
 ] 

Zoltan Farkas commented on AVRO-2137:
-

Yup, that is correct, incidentally I hit the same issue today on a unrelated 
task...

it is actually because a incomplete implementation of: 
 # AVRO-2034 

When I reviewed the PR for above I noticed the implementation deficiency, but 
nobody did anything about it...

In any case, I pushed out a fix into my fork just now, and a quick release to 
bintray.(1.8.1.49p)

If anyone volunteers to do a PR with this, see commits:

[[fix] bug with JSON 
parsing.|https://github.com/zolyfarkas/avro/commit/f95688f610876319a6899428e52248eaeb9afcab]
 -> [[fix] properly fix 
AVRO-2137|https://github.com/zolyfarkas/avro/commit/0fc5394db06ecf9efcccdaad3be9ef9a40da9acd]

hope it helps.

cheers.

 

> avro JsonDecoding additional field in array type
> 
>
> Key: AVRO-2137
> URL: https://issues.apache.org/jira/browse/AVRO-2137
> Project: Apache Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.8.1
>Reporter: Arun sethia
>Priority: Major
>
> I have following avro schema:
> {code:json}
> {
>   "type": "record",
>   "name": "test",
>   "namespace": "test.name",
>   "fields": [
> {
>   "name": "items",
>   "type": {
> "type": "array",
> "items": {
>   "type": "record",
>   "name": "items",
>   "fields": [
> {
>   "name": "name",
>   "type": "string"
> },
> {
>   "name": "state",
>   "type": "string"
> }
>   ]
> }
>   }
> },
> {
>   "name": "firstname",
>   "type": "string"
> }
>   ]
> }
> {code}
> when I am using Json decoder and avro encoder to encode Json data (scala 
> code):
>  {code:scala}
> val writer = new GenericDatumWriter[GenericRecord](schema)
> val reader = new GenericDatumReader[GenericRecord](schema)
> val baos = new ByteArrayOutputStream
> val decoder: JsonDecoder = DecoderFactory.get.jsonDecoder(schema, json)
> val encoder = EncoderFactory.get.binaryEncoder(baos, null)
> val datum = reader.read(null, decoder) writer.write(datum, encoder)
> encoder.flush()
> val avroByteArray = baos.toByteArray
>  {code}
> *scenario1:* when I am passing following json to encode it works fine:
> {code:json}
>  {
>   "items": [
> {
>   "name": "dallas",
>   "state": "TX"
> }
>   ],
>   "firstname": "arun"
> }
> {code}
>  *scenario2:* when I am passing additional attribute in json at root level 
> (lastname) it is able to encode and works fine:
> {code:json}
> {
>   "items": [
> {
>   "name": "dallas",
>   "state": "TX"
> }
>   ],
>   "firstname": "fname",
>   "lastname": "lname"
> }
> {code}
> *scenario3*: when I am add additional attribute in array record (country) it 
> is throwing following exception:
> {code:scala}
> Expected record-end. Got FIELD_NAME org.apache.avro.AvroTypeException: 
> Expected record-end. Got FIELD_NAME at 
> org.apache.avro.io.JsonDecoder.error(JsonDecoder.java:698) { "items": [
> { "name": "dallas", "state": "TX", "country":"USA" }
> ], "firstname":"fname", "lastname":"lname" }
> {code}
>  In case of if we have any additional element in array type, it should work 
> in same way as normal record; it should just discard them and decode the Json 
> data.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AVRO-2328) Support distinguishing between LocalDateTime and Instant semantics in timestamps

2019-02-28 Thread Zoltan Farkas (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16780956#comment-16780956
 ] 

Zoltan Farkas commented on AVRO-2328:
-

In my fork (zolyfarkas/avro) I implemented a "instant" logical type that 
converts to Jdk Instant.
(This is the right abstraction for timestamps)

that can be applied to:

 * string : iso format in Z timezone
 * long: millis since epoch.
 * record:
 {code}
   /** a instant type */
@logicalType("instant")
record Instant {
  /** nr seconds since UNIX epoch */
  long epochSecond;
  /** nanosecond component */
  int nano;
}
{code}

> Support distinguishing between LocalDateTime and Instant semantics in 
> timestamps
> 
>
> Key: AVRO-2328
> URL: https://issues.apache.org/jira/browse/AVRO-2328
> Project: Apache Avro
>  Issue Type: Task
>Reporter: Zoltan Ivanfi
>Assignee: Nandor Kollar
>Priority: Major
>
> Different SQL engines of the Hadoop stack support different timestamp 
> semantics. The range of supported semantics is about to be extended even 
> further. While some of the new timestamp types can be added to SQL without 
> explicit support from the file formats, others require new physical types. 
> File format support would be beneficial even for timestamp semantics where it 
> is not strictly required, because it would enable correct interpretation 
> without an SQL schema or any other kind of manual configuration.
> This JIRA is about supporting the LocalDateTime and Instant semantics. See 
> [this 
> document|https://docs.google.com/document/d/1E-7miCh4qK6Mg54b-Dh5VOyhGX8V4xdMXKIHJL36a9U/edit#]
>  for details.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (AVRO-2137) avro JsonDecoding additional field in array type

2019-02-28 Thread Zoltan Farkas (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-2137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16780913#comment-16780913
 ] 

Zoltan Farkas edited comment on AVRO-2137 at 2/28/19 9:06 PM:
--

I am not seeing this in my fork (https://github.com/zolyfarkas/avro), can you 
please review my attempt to reproduce the issue?:

{code}

package org.apache.avro.io;

import java.io.ByteArrayInputStream;
import java.io.IOException;
import java.nio.charset.StandardCharsets;
import org.apache.avro.Schema;
import org.apache.avro.generic.GenericDatumReader;
import org.apache.avro.generic.GenericRecord;
import org.junit.Test;

public class JsonDecoderTest {


  private static final String SCHEMA
= "{\n" +
"  \"type\": \"record\",\n" +
"  \"name\": \"test\",\n" +
"  \"namespace\": \"test.name\",\n" +
"  \"fields\": [\n" +
"{\n" +
"  \"name\": \"items\",\n" +
"  \"type\": {\n" +
"\"type\": \"array\",\n" +
"\"items\": {\n" +
"  \"type\": \"record\",\n" +
"  \"name\": \"items\",\n" +
"  \"fields\": [\n" +
"{\n" +
"  \"name\": \"name\",\n" +
"  \"type\": \"string\"\n" +
"},\n" +
"{\n" +
"  \"name\": \"state\",\n" +
"  \"type\": \"string\"\n" +
"}\n" +
"  ]\n" +
"}\n" +
"  }\n" +
"},\n" +
"{\n" +
"  \"name\": \"firstname\",\n" +
"  \"type\": \"string\"\n" +
"}\n" +
"  ]\n" +
"}";


  private static final String testData = "{ \"items\": [\n" +
"\n" +
"{ \"name\": \"dallas\", \"state\": \"TX\", \"country\":\"USA\" }\n" +
"\n" +
"], \"firstname\":\"fname\", \"lastname\":\"lname\" }";

  @Test
  public void testDecoding() throws IOException {
Schema writerSchema = new Schema.Parser().parse(SCHEMA);
Schema readerSchema = writerSchema;
ByteArrayInputStream bis =
new ByteArrayInputStream(testData.getBytes(StandardCharsets.UTF_8));
Decoder decoder = DecoderFactory.get().jsonDecoder(writerSchema, bis);
GenericDatumReader reader = new GenericDatumReader(writerSchema, 
readerSchema);
GenericRecord testData = (GenericRecord) reader.read(null, decoder);
System.out.println(testData);
  }

}
{code}

this might be caused by the fact that my fork contains the fix for: 
https://issues.apache.org/jira/browse/AVRO-2057...
or that my attempt to reproduce is broken...


was (Author: zolyfarkas):
I am not seeing this in my fork (https://github.com/zolyfarkas/avro), can you 
please review my attempt to reproduce the issue?:

{code}

package org.apache.avro.io;

import java.io.ByteArrayInputStream;
import java.io.IOException;
import java.nio.charset.StandardCharsets;
import org.apache.avro.Schema;
import org.apache.avro.generic.GenericDatumReader;
import org.apache.avro.generic.GenericRecord;
import org.junit.Test;

public class JsonDecoderTest {


  private static final String SCHEMA
= "{\n" +
"  \"type\": \"record\",\n" +
"  \"name\": \"test\",\n" +
"  \"namespace\": \"test.name\",\n" +
"  \"fields\": [\n" +
"{\n" +
"  \"name\": \"items\",\n" +
"  \"type\": {\n" +
"\"type\": \"array\",\n" +
"\"items\": {\n" +
"  \"type\": \"record\",\n" +
"  \"name\": \"items\",\n" +
"  \"fields\": [\n" +
"{\n" +
"  \"name\": \"name\",\n" +
"  \"type\": \"string\"\n" +
"},\n" +
"{\n" +
"  \"name\": \"state\",\n" +
"  \"type\": \"string\"\n" +
"}\n" +
"  ]\n" +
"}\n" +
"  }\n" +
"},\n" +
"{\n" +
"  \"name\": \"firstname\",\n" +
"  \"type\": \"string\"\n" +
"}\n" +
"  ]\n" +
"}";


  private static final String testData = "{ \"items\": [\n" +
"\n" +
"{ \"name\": \"dallas\", \"state\": \"TX\", \"country\":\"USA\" }\n" +
"\n" +
"], \"firstname\":\"fname\", \"lastname\":\"lname\" }";

  @Test
  public void testDecoding() throws IOException {
Schema writerSchema = new Schema.Parser().parse(SCHEMA);
Schema readerSchema = writerSchema;
ByteArrayInputStream bis =
new ByteArrayInputStream(testData.getBytes(StandardCharsets.UTF_8));
Decoder decoder = DecoderFactory.get().jsonDecoder(writerSchema, bis);
GenericDatumReader reader = new GenericDatumReader(writerSchema, 
readerSchema);
GenericRecord testData = (GenericRecord) reader.read(null, decoder);
System.out.println(testData);
  }

}
{code}

this might be caused by the fact that contains the fix for: 
https://issues.apache.org/jira/browse/AVRO-2057...
or that my attempt to reproduce is broken...

> avro JsonDecoding additional field in array type
> 
>
> Key: AVRO-2137
> URL: https://issues.apache.org/jira/browse/AVRO-2137
> Project: Apache Avro
>  Issue Type: Bug
>  

[jira] [Commented] (AVRO-2137) avro JsonDecoding additional field in array type

2019-02-28 Thread Zoltan Farkas (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-2137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16780913#comment-16780913
 ] 

Zoltan Farkas commented on AVRO-2137:
-

I am not seeing this in my fork (https://github.com/zolyfarkas/avro), can you 
please review my attempt to reproduce the issue?:

{code}

package org.apache.avro.io;

import java.io.ByteArrayInputStream;
import java.io.IOException;
import java.nio.charset.StandardCharsets;
import org.apache.avro.Schema;
import org.apache.avro.generic.GenericDatumReader;
import org.apache.avro.generic.GenericRecord;
import org.junit.Test;

public class JsonDecoderTest {


  private static final String SCHEMA
= "{\n" +
"  \"type\": \"record\",\n" +
"  \"name\": \"test\",\n" +
"  \"namespace\": \"test.name\",\n" +
"  \"fields\": [\n" +
"{\n" +
"  \"name\": \"items\",\n" +
"  \"type\": {\n" +
"\"type\": \"array\",\n" +
"\"items\": {\n" +
"  \"type\": \"record\",\n" +
"  \"name\": \"items\",\n" +
"  \"fields\": [\n" +
"{\n" +
"  \"name\": \"name\",\n" +
"  \"type\": \"string\"\n" +
"},\n" +
"{\n" +
"  \"name\": \"state\",\n" +
"  \"type\": \"string\"\n" +
"}\n" +
"  ]\n" +
"}\n" +
"  }\n" +
"},\n" +
"{\n" +
"  \"name\": \"firstname\",\n" +
"  \"type\": \"string\"\n" +
"}\n" +
"  ]\n" +
"}";


  private static final String testData = "{ \"items\": [\n" +
"\n" +
"{ \"name\": \"dallas\", \"state\": \"TX\", \"country\":\"USA\" }\n" +
"\n" +
"], \"firstname\":\"fname\", \"lastname\":\"lname\" }";

  @Test
  public void testDecoding() throws IOException {
Schema writerSchema = new Schema.Parser().parse(SCHEMA);
Schema readerSchema = writerSchema;
ByteArrayInputStream bis =
new ByteArrayInputStream(testData.getBytes(StandardCharsets.UTF_8));
Decoder decoder = DecoderFactory.get().jsonDecoder(writerSchema, bis);
GenericDatumReader reader = new GenericDatumReader(writerSchema, 
readerSchema);
GenericRecord testData = (GenericRecord) reader.read(null, decoder);
System.out.println(testData);
  }

}
{code}

this might be caused by the fact that contains the fix for: 
https://issues.apache.org/jira/browse/AVRO-2057...
or that my attempt to reproduce is broken...

> avro JsonDecoding additional field in array type
> 
>
> Key: AVRO-2137
> URL: https://issues.apache.org/jira/browse/AVRO-2137
> Project: Apache Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.8.1
>Reporter: Arun sethia
>Priority: Major
>
> I have following avro schema:
> {code:json}
> {
>   "type": "record",
>   "name": "test",
>   "namespace": "test.name",
>   "fields": [
> {
>   "name": "items",
>   "type": {
> "type": "array",
> "items": {
>   "type": "record",
>   "name": "items",
>   "fields": [
> {
>   "name": "name",
>   "type": "string"
> },
> {
>   "name": "state",
>   "type": "string"
> }
>   ]
> }
>   }
> },
> {
>   "name": "firstname",
>   "type": "string"
> }
>   ]
> }
> {code}
> when I am using Json decoder and avro encoder to encode Json data (scala 
> code):
>  {code:scala}
> val writer = new GenericDatumWriter[GenericRecord](schema)
> val reader = new GenericDatumReader[GenericRecord](schema)
> val baos = new ByteArrayOutputStream
> val decoder: JsonDecoder = DecoderFactory.get.jsonDecoder(schema, json)
> val encoder = EncoderFactory.get.binaryEncoder(baos, null)
> val datum = reader.read(null, decoder) writer.write(datum, encoder)
> encoder.flush()
> val avroByteArray = baos.toByteArray
>  {code}
> *scenario1:* when I am passing following json to encode it works fine:
> {code:json}
>  {
>   "items": [
> {
>   "name": "dallas",
>   "state": "TX"
> }
>   ],
>   "firstname": "arun"
> }
> {code}
>  *scenario2:* when I am passing additional attribute in json at root level 
> (lastname) it is able to encode and works fine:
> {code:json}
> {
>   "items": [
> {
>   "name": "dallas",
>   "state": "TX"
> }
>   ],
>   "firstname": "fname",
>   "lastname": "lname"
> }
> {code}
> *scenario3*: when I am add additional attribute in array record (country) it 
> is throwing following exception:
> {code:scala}
> Expected record-end. Got FIELD_NAME org.apache.avro.AvroTypeException: 
> Expected record-end. Got FIELD_NAME at 
> org.apache.avro.io.JsonDecoder.error(JsonDecoder.java:698) { "items": [
> { "name": "dallas", "state": "TX", "country":"USA" }
> ], "firstname":"fname", "lastname":"lname" }
> {code}
>  In case of if we have any additional 

[jira] [Commented] (AVRO-2164) Make Decimal a first class type.

2018-12-29 Thread Zoltan Farkas (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16730697#comment-16730697
 ] 

Zoltan Farkas commented on AVRO-2164:
-

One more thing I stumbled upon, Currently in org.apache.data.Json.avsc we have:

{code}
{"type": "record", "name": "Json", "namespace":"org.apache.avro.data",
 "fields": [
 {"name": "value",
  "type": [
  "long",
  "double",
  "string",
  "boolean",
  "null",
  {"type": "array", "items": "Json"},
  {"type": "map", "values": "Json"}
  ]
 }
 ]
}
{code}

this avro representation is lossy when covering numbers... ("long","double",) 
having a decimal type would resolve this...




> Make Decimal a first class type.
> 
>
> Key: AVRO-2164
> URL: https://issues.apache.org/jira/browse/AVRO-2164
> Project: Apache Avro
>  Issue Type: Improvement
>  Components: logical types
>Affects Versions: 1.8.2
>Reporter: Andy Coates
>Priority: Major
>
> I'd be interested to hear the communities thoughts on making decimal a first 
> class type. 
> The current logical type encodes a decimal into a _bytes_ or _fixed_. This 
> encoding does not include any information about the scale, i.e. this encoding 
> is lossy. 
> There are open issues around the compatibility / evolvability of schemas 
> containing decimal logical types, (e.g. AVRO-2078 & AVRO-1721), that mean 
> reading data that was previously written with a different scale will result 
> in data corruption.
> If these issues were fixed, with suitable compatibility checks put in place, 
> this would then make it impossible to evolve an Avro schema where the scale 
> needs to be changed. This inability to evolve the scale is very restrictive, 
> and can result in high overhead for organizations that _need_ to change the 
> scale, i.e. they may potentially need to copy their entire data set, 
> deserializing with the old scale and re-serializing with the new.
> If _decimal_ were promoted to a first class type, this would allow the scale 
> to be captured in the serialized form, allow for schema evolution support.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AVRO-2284) Incorrect EnumSymbol initialization in TestReadingWritingDataInEvolvedSchemas.java

2018-12-13 Thread Zoltan Farkas (JIRA)
Zoltan Farkas created AVRO-2284:
---

 Summary: Incorrect EnumSymbol initialization in 
TestReadingWritingDataInEvolvedSchemas.java
 Key: AVRO-2284
 URL: https://issues.apache.org/jira/browse/AVRO-2284
 Project: Apache Avro
  Issue Type: Bug
Affects Versions: 1.8.2
Reporter: Zoltan Farkas


EnumSymbol is initialized with Record schema instead of Enum schema at:

https://github.com/apache/avro/blob/master/lang/java/avro/src/test/java/org/apache/avro/TestReadingWritingDataInEvolvedSchemas.java#L310



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AVRO-2278) GenericData.Record field getter not correct

2018-11-27 Thread Zoltan Farkas (JIRA)


 [ 
https://issues.apache.org/jira/browse/AVRO-2278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Farkas updated AVRO-2278:

Summary: GenericData.Record field getter not correct  (was: 
GenericData.Record field getter no correct)

> GenericData.Record field getter not correct
> ---
>
> Key: AVRO-2278
> URL: https://issues.apache.org/jira/browse/AVRO-2278
> Project: Apache Avro
>  Issue Type: Bug
>Affects Versions: 1.8.2
>Reporter: Zoltan Farkas
>Priority: Major
>
> Currently the get field implementation is not correct in GenericData.Record:
> at: 
> https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/generic/GenericData.java#L209
> {code}
>@Override public Object get(String key) {
>   Field field = schema.getField(key);
>   if (field == null) return null;
>   return values[field.pos()];
> }
> {code}
> The method returns null when a field is not present, making it impossible to 
> distinguish between:
> field value = null
> and
> field does not exist.
> A more "correct" implementation would be:
> {code}
> @Override public Object get(String key) {
>   Field field = schema.getField(key);
>   if (field == null) {
> throw new IllegalArgumentException("Invalid field " + key);
>   }
>   return values[field.pos()];
> }
> {code}
> this will make the behavior consistent with put which will throw a exception 
> when setting a non existent field.
> when I make this change in my fork, some bugs in unit tests showed up



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AVRO-2278) GenericData.Record field getter no correct

2018-11-27 Thread Zoltan Farkas (JIRA)
Zoltan Farkas created AVRO-2278:
---

 Summary: GenericData.Record field getter no correct
 Key: AVRO-2278
 URL: https://issues.apache.org/jira/browse/AVRO-2278
 Project: Apache Avro
  Issue Type: Bug
Affects Versions: 1.8.2
Reporter: Zoltan Farkas


Currently the get field implementation is not correct in GenericData.Record:

at: 
https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/generic/GenericData.java#L209

{code}
   @Override public Object get(String key) {
  Field field = schema.getField(key);
  if (field == null) return null;
  return values[field.pos()];
}
{code}

The method returns null when a field is not present, making it impossible to 
distinguish between:

field value = null

and

field does not exist.

A more "correct" implementation would be:

{code}
@Override public Object get(String key) {
  Field field = schema.getField(key);
  if (field == null) {
throw new IllegalArgumentException("Invalid field " + key);
  }
  return values[field.pos()];
}
{code}

this will make the behavior consistent with put which will throw a exception 
when setting a non existent field.

when I make this change in my fork, some bugs in unit tests showed up




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AVRO-2269) Improve variances seen across Perf.java runs

2018-11-15 Thread Zoltan Farkas (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-2269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16688925#comment-16688925
 ] 

Zoltan Farkas commented on AVRO-2269:
-

instead or re-inventing the wheel...Perf.java should be rewritten to use JMH 
(https://java-performance.info/jmh/) ..
see https://github.com/zolyfarkas/benchmarks for an example example on how to 
use JMH
(there is also a benchmark for one of my experiments:
https://github.com/zolyfarkas/benchmarks/blob/master/src/test/java/org/spf4j/avro/GenericRecordBenchmark.java)
the example shows how you can run them with a profiler Java flight recorder, 
stack sampler, so that you can even have some data to look at and optimize...


> Improve variances seen across Perf.java runs
> 
>
> Key: AVRO-2269
> URL: https://issues.apache.org/jira/browse/AVRO-2269
> Project: Apache Avro
>  Issue Type: Test
>  Components: java
>Reporter: Raymie Stata
>Assignee: Raymie Stata
>Priority: Major
>
> In attempting to use Perf.java to show that proposed performance changes 
> actually improved performance, different runs of Perf.java using the exact 
> same code base resulted variances of 5% or greater – and often 10% or greater 
> – for about half the test cases. With variance this high within a code base, 
> it's impossible to tell if a proposed "improved" code base indeed improves 
> performance. I will post to the wiki and elsewhere some documents and scripts 
> I developed to reduce this variance. This JIRA is for changes to Perf.java 
> that reduce the variance. Specifically:
>  * Access the {{reader}} and {{writer}} instance variables directly in the 
> inner-loop for {{SpecificTest}}, as well as switched to a "reuse" object for 
> reading records, rather than constructing fresh objects for each read. Both 
> helped to significantly reduce variance for 
> {{FooBarSpecificRecordTestWrite}}, a major target of recent 
> performance-improvement efforts.
>  * Switched to {{DirectBinaryEncoder}} instead of {{BufferedBinaryEncoder}} 
> for write tests. Although this slowed writer-tests a bit, it reduced variance 
> a lot, especially for performance tests of primitives like booleans, making 
> it a better choice for measuring the performance-impact of code changes.
>  * Started the timer of a test after the encoder/decoder for the test is 
> constructed, rather than before. Helps a little.
>  * Added the ability to output the _minimum_ runtime of a test case across 
> multiple cycles (vs the total runtime across all cycles). This was inspired 
> by JVMSpec, which used to use a minimum.  I was able to reduce the variance 
> of total runtime enough to obviate the need for this metric, but since it's 
> helpful diagnostically, I left it in.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AVRO-1124) RESTful service for holding schemas

2018-11-14 Thread Zoltan Farkas (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-1124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16686565#comment-16686565
 ] 

Zoltan Farkas commented on AVRO-1124:
-

I have recently published an example of what I described in earlier comment 
(using a maven repo as a schema repo REST service). With this approach, you get 
more than a schema repo, you get a validation/versioning/release system...

Example is at:

https://github.com/zolyfarkas/avro-schema-examples

any comments welcome.





> RESTful service for holding schemas
> ---
>
> Key: AVRO-1124
> URL: https://issues.apache.org/jira/browse/AVRO-1124
> Project: Apache Avro
>  Issue Type: New Feature
>Reporter: Jay Kreps
>Assignee: Jay Kreps
>Priority: Major
> Attachments: AVRO-1124-can-read-with.patch, AVRO-1124-draft.patch, 
> AVRO-1124-validators-preliminary.patch, AVRO-1124.2.patch, AVRO-1124.3.patch, 
> AVRO-1124.4.patch, AVRO-1124.patch, AVRO-1124.patch
>
>
> Motivation: It is nice to be able to pass around data in serialized form but 
> still know the exact schema that was used to serialize it. The overhead of 
> storing the schema with each record is too high unless the individual records 
> are very large. There are workarounds for some common cases: in the case of 
> files a schema can be stored once with a file of many records amortizing the 
> per-record cost, and in the case of RPC the schema can be negotiated ahead of 
> time and used for many requests. For other uses, though it is nice to be able 
> to pass a reference to a given schema using a small id and allow this to be 
> looked up. Since only a small number of schemas are likely to be active for a 
> given data source, these can easily be cached, so the number of remote 
> lookups is very small (one per active schema version).
> Basically this would consist of two things:
> 1. A simple REST service that stores and retrieves schemas
> 2. Some helper java code for fetching and caching schemas for people using 
> the registry
> We have used something like this at LinkedIn for a few years now, and it 
> would be nice to standardize this facility to be able to build up common 
> tooling around it. This proposal will be based on what we have, but we can 
> change it as ideas come up.
> The facilities this provides are super simple, basically you can register a 
> schema which gives back a unique id for it or you can query for a schema. 
> There is almost no code, and nothing very complex. The contract is that 
> before emitting/storing a record you must first publish its schema to the 
> registry or know that it has already been published (by checking your cache 
> of published schemas). When reading you check your cache and if you don't 
> find the id/schema pair there you query the registry to look it up. I will 
> explain some of the nuances in more detail below. 
> An added benefit of such a repository is that it makes a few other things 
> possible:
> 1. A graphical browser of the various data types that are currently used and 
> all their previous forms.
> 2. Automatic enforcement of compatibility rules. Data is always compatible in 
> the sense that the reader will always deserialize it (since they are using 
> the same schema as the writer) but this does not mean it is compatible with 
> the expectations of the reader. For example if an int field is changed to a 
> string that will almost certainly break anyone relying on that field. This 
> definition of compatibility can differ for different use cases and should 
> likely be pluggable.
> Here is a description of one of our uses of this facility at LinkedIn. We use 
> this to retain a schema with "log" data end-to-end from the producing app to 
> various real-time consumers as well as a set of resulting AvroFile in Hadoop. 
> This schema metadata can then be used to auto-create hive tables (or add new 
> fields to existing tables), or inferring pig fields, all without manual 
> intervention. One important definition of compatibility that is nice to 
> enforce is compatibility with historical data for a given "table". Log data 
> is usually loaded in an append-only manner, so if someone changes an int 
> field in a particular data set to be a string, tools like pig or hive that 
> expect static columns will be unusable. Even using plain-vanilla map/reduce 
> processing data where columns and types change willy nilly is painful. 
> However the person emitting this kind of data may not know all the details of 
> compatible schema evolution. We use the schema repository to validate that 
> any change made to a schema don't violate the compatibility model, and reject 
> the update if it does. We do this check both at run time, and also as part of 
> the ant task that generates specific record code (as an early warning). 
> Some details to 

[jira] [Commented] (AVRO-2254) Unions with 2 records declared downward fail

2018-11-11 Thread Zoltan Farkas (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16682877#comment-16682877
 ] 

Zoltan Farkas commented on AVRO-2254:
-

[~nkollar] Nandor, can you please review the fix I am referencing and let me 
know if the solution looks good? This way I can create a better PR...

> Unions with 2 records declared downward fail
> 
>
> Key: AVRO-2254
> URL: https://issues.apache.org/jira/browse/AVRO-2254
> Project: Apache Avro
>  Issue Type: Bug
>Affects Versions: 1.9.0
>Reporter: Zoltan Farkas
>Priority: Major
>
> The following IDL will fail complaining that 2 same type is declared twice in 
> the union:
> {code}
> @namespace("org.apache.avro.gen")
> protocol UnionFwd {
> record TestRecord {
>   union {SR1, SR2} unionField;
> }
> record SR1 {
>   string field;
> }
> record SR2 {
>   string field;
> }
> }
> {code}
> the fix for this can be pretty simple:
> https://github.com/zolyfarkas/avro/commit/56b215f73f34cc80d505875c90217916b271abb5



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AVRO-2164) Make Decimal a first class type.

2018-11-05 Thread Zoltan Farkas (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16675887#comment-16675887
 ] 

Zoltan Farkas commented on AVRO-2164:
-

I am in favor of making decimal a first class type. 

One extra benefit is that we would be able to properly serialize the decimals 
in the json format. (Number instead of binary)

Regarding evolution, we should allow for increases of precision and scale. 

why can't this be part of avro 2.0?

Side note:

In my own decimal logical type implementation, I serialize the scale along with 
the unscaled value.  So if I have a decimal ( 16, 8) and I serialize a decimal 
value with a scale <= 8, the value will be serialized with its original scale, 
for everything with a larger scale, I either error or round depending if a 
rounding mode is specified via a type attribute... 




> Make Decimal a first class type.
> 
>
> Key: AVRO-2164
> URL: https://issues.apache.org/jira/browse/AVRO-2164
> Project: Avro
>  Issue Type: Improvement
>  Components: logical types
>Affects Versions: 1.8.2
>Reporter: Andy Coates
>Priority: Major
>
> I'd be interested to hear the communities thoughts on making decimal a first 
> class type. 
> The current logical type encodes a decimal into a _bytes_ or _fixed_. This 
> encoding does not include any information about the scale, i.e. this encoding 
> is lossy. 
> There are open issues around the compatibility / evolvability of schemas 
> containing decimal logical types, (e.g. AVRO-2078 & AVRO-1721), that mean 
> reading data that was previously written with a different scale will result 
> in data corruption.
> If these issues were fixed, with suitable compatibility checks put in place, 
> this would then make it impossible to evolve an Avro schema where the scale 
> needs to be changed. This inability to evolve the scale is very restrictive, 
> and can result in high overhead for organizations that _need_ to change the 
> scale, i.e. they may potentially need to copy their entire data set, 
> deserializing with the old scale and re-serializing with the new.
> If _decimal_ were promoted to a first class type, this would allow the scale 
> to be captured in the serialized form, allow for schema evolution support.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AVRO-2032) Unable to decode JSON-encoded Double.NaN, Double.POSITIVE_INFINITY or Double.NEGATIVE_INFINITY

2018-06-01 Thread Zoltan Farkas (JIRA)


[ 
https://issues.apache.org/jira/browse/AVRO-2032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16498449#comment-16498449
 ] 

Zoltan Farkas commented on AVRO-2032:
-

I resolved this issue in my fork, it is fairly a small fix, here is the detail:

[https://github.com/zolyfarkas/avro/commit/afe6c04f38c535533c33ed4c303fb011df828606]

I will not have time to work on a PR anytime soonso any help appreciated...

> Unable to decode JSON-encoded Double.NaN, Double.POSITIVE_INFINITY or 
> Double.NEGATIVE_INFINITY
> --
>
> Key: AVRO-2032
> URL: https://issues.apache.org/jira/browse/AVRO-2032
> Project: Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.8.1
>Reporter: Pieter Dekinder
>Priority: Major
>
> When using the JsonEncoder to serialize Double.NaN, Double.POSITIVE_INFINITY 
> or Double.NEGATIVE_INFINITY to resulting JSON cannot be parsed by the 
> JsonDencoder.
> An AvroTypeException is thrown with the message "Expected double. Got 
> VALUE_STRING".
> When using BinaryEncoder/BinaryDecoder, it works fine.
> This JUnit code snippet will reproduce the issue:
> @Test
> public void test() throws Exception {
> Schema schema = SchemaBuilder.builder()
> .record("record")
> .fields()
> .optionalDouble("number1")
> .optionalDouble("number2")
> .optionalDouble("number3")
> .endRecord();
> GenericData.Record record = new GenericData.Record(schema);
> record.put("number1", Double.NaN);
> record.put("number2", Double.POSITIVE_INFINITY);
> record.put("number3", Double.NEGATIVE_INFINITY);
> ByteArrayOutputStream out = new ByteArrayOutputStream();
> JsonEncoder encoder = EncoderFactory.get().jsonEncoder(schema, out);
> new GenericDatumWriter(schema).write(record, encoder);
> encoder.flush();
> System.out.println(out);
> Decoder decoder = DecoderFactory.get().jsonDecoder(schema, 
> out.toString());
> GenericData.Record deserialized = new GenericData.Record(schema);
> new GenericDatumReader(schema).read(deserialized, 
> decoder);
> }



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AVRO-2035) enable validation of default values in schemas by default

2018-02-22 Thread Zoltan Farkas (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-2035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16373777#comment-16373777
 ] 

Zoltan Farkas commented on AVRO-2035:
-

[~cutting] looks good. thanks!  

> enable validation of default values in schemas by default
> -
>
> Key: AVRO-2035
> URL: https://issues.apache.org/jira/browse/AVRO-2035
> Project: Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.8.1
>Reporter: radai rosenblatt
>Assignee: Doug Cutting
>Priority: Major
> Fix For: 1.9.0
>
> Attachments: AVRO-2035.patch
>
>
> suppose i have the following schema evolution:
> {code}
> {
>   "name": "Bob",
>   "type": "record",
>   "fields": [
> {"name": "f1", "type": "int"}
>   ]
> }
> {code}
> and then:
> {code}
> {
>   "name": "Bob",
>   "type": "record",
>   "fields": [
> {"name": "f1", "type": "int"},
> {"name": "f2", "type": "boolean", "default": "true"}
>   ]
> }
> {code}
> the default value for "f2" is specified as the _STRING_ "true" (and not the 
> literal boolean true). 
> if this default value is ever accessed (when reading a gen1-serialized object 
> as a gen2) we get this:
> {code}
> org.apache.avro.AvroTypeException: Non-boolean default for boolean: "true"
>   at 
> org.apache.avro.io.parsing.ResolvingGrammarGenerator.encode(ResolvingGrammarGenerator.java:408)
>   at 
> org.apache.avro.io.parsing.ResolvingGrammarGenerator.getBinary(ResolvingGrammarGenerator.java:307)
>   at 
> org.apache.avro.io.parsing.ResolvingGrammarGenerator.resolveRecords(ResolvingGrammarGenerator.java:285)
>   at 
> org.apache.avro.io.parsing.ResolvingGrammarGenerator.generate(ResolvingGrammarGenerator.java:118)
>   at 
> org.apache.avro.io.parsing.ResolvingGrammarGenerator.generate(ResolvingGrammarGenerator.java:50)
>   at org.apache.avro.io.ResolvingDecoder.resolve(ResolvingDecoder.java:85)
>   at org.apache.avro.io.ResolvingDecoder.(ResolvingDecoder.java:49)
>   at 
> org.apache.avro.io.DecoderFactory.resolvingDecoder(DecoderFactory.java:307)
>   at 
> org.apache.avro.generic.GenericDatumReader.getResolver(GenericDatumReader.java:127)
>   at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142)
> {code}
> yet Schema.parse() passes for this



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AVRO-2132) Avro IDL: Support dot ('.') character in property annotation names

2018-02-22 Thread Zoltan Farkas (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-2132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16373418#comment-16373418
 ] 

Zoltan Farkas commented on AVRO-2132:
-

[~cutting] [~kdrakon]

I just looked at this commit and tried to apply it to my fork.. and since my 
fork does not allow invalid default values defined in schemas/idl, the changes 
made here break the unit tests in the fork.

here is where the issue is:

the new declared fields in simple.avdl do not have default values:

{code}
record TestRecord {
...
@foo.bar("bar.foo") long l;
union {null, @foo.foo.bar(42) @foo.foo.foo("3foo") string} 
nested_properties;
{code}

and as such the method declaration bellow is invalid:

  TestRecord echo(TestRecord `record` = {"name":"bar","kind":"BAR"});

and to be correct since the new added fields do not have defaults it should be 
something like:

  TestRecord echo(TestRecord `record` = 
{"name":"bar","kind":"BAR","l":0,"nested_properties":null});

I thought https://issues.apache.org/jira/browse/AVRO-2035 is resolved?

worthwhile to look into this... cheers.

> Avro IDL: Support dot ('.') character in property annotation names
> --
>
> Key: AVRO-2132
> URL: https://issues.apache.org/jira/browse/AVRO-2132
> Project: Avro
>  Issue Type: Improvement
>  Components: java
>Affects Versions: 1.8.2
>Reporter: Sean Policarpio
>Assignee: Doug Cutting
>Priority: Major
> Fix For: 1.9.0
>
>
> Unless there is a strong reason why names like {{@foo.bar}} can't be used as 
> property annotations in IDL, I propose an enhancement to the IDL parser to 
> allow it.
> The major drive for this change comes from Kafka Connect; for a certain 
> fields – namely timestamps – additional metadata must be present in the 
> schema when certain consumers read the data (e.g. [the JDBC 
> connector|https://github.com/confluentinc/kafka-connect-jdbc]). What I hoped 
> when using IDL was to write the following for a record field:
> {code:java}
> union {null, @connect.version(1) 
> @connect.name("org.apache.kafka.connect.data.Timestamp") long} 
> queryTime;{code}
> so that the following would be available in the schemata:
> {code:java}
> {
>   "name": "queryTime",
>   "type": [
> "null",
> {
>   "type": "long",
>   "connect.version": 1,
>   "connect.name": "org.apache.kafka.connect.data.Timestamp"
> }
>   ],
>   "default": null
> }{code}
> Unfortunately, both {{connect.version}} and {{connect.name}} are unacceptable 
> by the parser.
> The change for this is quite minimal as it can be based on AVRO-1267.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AVRO-2150) Improved idl syntax support for "marker properties"

2018-02-22 Thread Zoltan Farkas (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-2150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16373288#comment-16373288
 ] 

Zoltan Farkas commented on AVRO-2150:
-

Implementation of this is trivial, see: 

https://github.com/zolyfarkas/avro/commit/6659315f5f78feac37f5501bf0057d3f2bc817d0#diff-136e900cc327974ae416a44248e47d0a

for a potential implementation.

> Improved idl syntax support for "marker properties"
> ---
>
> Key: AVRO-2150
> URL: https://issues.apache.org/jira/browse/AVRO-2150
> Project: Avro
>  Issue Type: Improvement
>Reporter: Zoltan Farkas
>Priority: Minor
>
> It would be nice to allow in IDL "marker properties" like:
> {code}
> @MarkerProperty
> record TestRecord {
> 
> }
> {code}
> this would be only a simpler syntax for:
> {code}
> @MarkerProperty("")
> record TestRecord {
> 
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AVRO-2150) Improved idl syntax support for "marker properties"

2018-02-22 Thread Zoltan Farkas (JIRA)
Zoltan Farkas created AVRO-2150:
---

 Summary: Improved idl syntax support for "marker properties"
 Key: AVRO-2150
 URL: https://issues.apache.org/jira/browse/AVRO-2150
 Project: Avro
  Issue Type: Improvement
Reporter: Zoltan Farkas


It would be nice to allow in IDL "marker properties" like:

{code}
@MarkerProperty
record TestRecord {

}
{code}

this would be only a simpler syntax for:

{code}
@MarkerProperty("")
record TestRecord {

}
{code}





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AVRO-1340) use default to allow old readers to specify default enum value when encountering new enum symbols

2018-01-16 Thread Zoltan Farkas (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16327575#comment-16327575
 ] 

Zoltan Farkas commented on AVRO-1340:
-

[~cutting] I believe the best place to define a fallback symbol is at type 
level (enum) ("fallbackSymbol": "SOME SYMBOL")

field default value in my opinion is conceptually something different that a 
"enum unknown value"... and I believe it is error prone to assume field default 
is the right value when a unknown symbol is received...

here is some example to try to explain what I mean:

Lets have for example a record: Transaction with a enum field transactionType 
(type1, type2) with default value type1...

if somebody extends this enum later with type3  it will be problematic since 
older versions of Transaction will treat this as type1 which will probably be 
really bad...

To not have the above we can do:

Transaction with a enum field transactionType (type1, type2, unknown) with 
default value unknown...
This will make extending the enum safer as long as developers use the right 
default value every time they use this enum which tells me that the right 
place this needs to be defined is at the type level... 






> use default to allow old readers to specify default enum value when 
> encountering new enum symbols
> -
>
> Key: AVRO-1340
> URL: https://issues.apache.org/jira/browse/AVRO-1340
> Project: Avro
>  Issue Type: Improvement
>  Components: spec
> Environment: N/A
>Reporter: Jim Donofrio
>Priority: Minor
>
> The schema resolution page says:
> > if both are enums:
> > if the writer's symbol is not present in the reader's enum, then an
> error is signalled.
> This makes it difficult to use enum's because you can never add a enum value 
> and keep old reader's compatible. Why not use the default option to refer to 
> one of enum values so that when a old reader encounters a enum ordinal it 
> does not recognize, it can default to the optional schema provided one. If 
> the old schema does not provide a default then the older reader can continue 
> to fail as it does today.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (AVRO-2118) Rat tool fails over several files.

2017-12-14 Thread Zoltan Farkas (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-2118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16290829#comment-16290829
 ] 

Zoltan Farkas edited comment on AVRO-2118 at 12/14/17 1:39 PM:
---

created PR: https://github.com/apache/avro/pull/267


was (Author: zolyfarkas):
will create a PR, and add the details to this JIRA.

> Rat tool fails over several files.
> --
>
> Key: AVRO-2118
> URL: https://issues.apache.org/jira/browse/AVRO-2118
> Project: Avro
>  Issue Type: Bug
>Reporter: Niels Basjes
>Assignee: Zoltan Farkas
> Fix For: 1.9.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (AVRO-2118) Rat tool fails over several files.

2017-12-14 Thread Zoltan Farkas (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-2118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16290829#comment-16290829
 ] 

Zoltan Farkas commented on AVRO-2118:
-

will create a PR, and add the details to this JIRA.

> Rat tool fails over several files.
> --
>
> Key: AVRO-2118
> URL: https://issues.apache.org/jira/browse/AVRO-2118
> Project: Avro
>  Issue Type: Bug
>Reporter: Niels Basjes
>Assignee: Zoltan Farkas
> Fix For: 1.9.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (AVRO-1810) GenericDatumWriter broken with Enum

2017-09-14 Thread Zoltan Farkas (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16166749#comment-16166749
 ] 

Zoltan Farkas commented on AVRO-1810:
-

[~howellbridger] 
java.lang.Enum equals, hashCode, compareTo are final and cannot be 
overloaded... 

So if one would need to compare generated enums with generic enums a custom 
comparator would be the way...

what is the use case you are thinking of?



> GenericDatumWriter broken with Enum
> ---
>
> Key: AVRO-1810
> URL: https://issues.apache.org/jira/browse/AVRO-1810
> Project: Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.8.0
>Reporter: Ryon Day
>Priority: Blocker
> Fix For: 1.9.0, 1.8.4
>
>
> {panel:title=Description|titleBGColor=#3FA|bgColor=#DDD}
> Using the GenericDatumWriter with either Generic OR SpecificRecord will break 
> if an Enum is present.
> {panel}
> {panel:title=Steps To Reproduce|titleBGColor=#8DB|bgColor=#DDD}
> I have been tracking Avro decoding oddities for a while.
> The tests for this issue can be found 
> [here|https://github.com/ryonday/avroDecodingHelp/blob/master/src/test/java/com/ryonday/test/Avro180EnumFail.java]
> {panel}
> {panel:title=Notes|titleBGColor=#3AF|bgColor=#DDD}
> Due to the debacle that is the Avro "UTF8" object, we have been avoiding it 
> by using the following scheme:
> * Write incoming records to a byte array using the GenericDatumWriter
> * Read back the byte array to our compiled Java domain objects using a 
> SpecificDatumWriter
> This worked great with Avro 1.7.7, and this is a binary-incompatable breaking 
> change with 1.8.0.
> This would appear to be caused by an addition in the 
> {{GenericDatumWriter:163-164}}:
> {code}
>   if (!data.isEnum(datum))
>   throw new AvroTypeException("Not an enum: "+datum);
> {code}
> {panel}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (AVRO-1810) GenericDatumWriter broken with Enum

2017-09-14 Thread Zoltan Farkas (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16166377#comment-16166377
 ] 

Zoltan Farkas edited comment on AVRO-1810 at 9/14/17 5:58 PM:
--

The way I resolved this in my fork was to make the Generated enums implement 
org.apache.avro.generic.GenericEnumSymbol:

https://github.com/zolyfarkas/avro/blob/trunk/lang/java/compiler/src/main/velocity/org/apache/avro/compiler/specific/templates/java/classic/enum.vm#L29

Also changed GenericEnumSymbol from:

{code}
/** An enum symbol. */
public interface GenericEnumSymbol
extends GenericContainer, Comparable {
  /** Return the symbol. */
  String toString();
}
{code}

to:

{code}
/** An enum symbol. */
public interface GenericEnumSymbol
extends GenericContainer, Comparable {
  /** Return the symbol. */
  String toString();
}
{code}

I can prepare a PR if this approach is OK with everyone.



was (Author: zolyfarkas):
The way I resolved this in my for was to make the Generated enums implement 
org.apache.avro.generic.GenericEnumSymbol:

https://github.com/zolyfarkas/avro/blob/trunk/lang/java/compiler/src/main/velocity/org/apache/avro/compiler/specific/templates/java/classic/enum.vm#L29

Also changed GenericEnumSymbol from:

{code}
/** An enum symbol. */
public interface GenericEnumSymbol
extends GenericContainer, Comparable {
  /** Return the symbol. */
  String toString();
}
{code}

to:

{code}
/** An enum symbol. */
public interface GenericEnumSymbol
extends GenericContainer, Comparable {
  /** Return the symbol. */
  String toString();
}
{code}

I can prepare a PR if this approach is OK with everyone.


> GenericDatumWriter broken with Enum
> ---
>
> Key: AVRO-1810
> URL: https://issues.apache.org/jira/browse/AVRO-1810
> Project: Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.8.0
>Reporter: Ryon Day
>Priority: Blocker
> Fix For: 1.9.0, 1.8.4
>
>
> {panel:title=Description|titleBGColor=#3FA|bgColor=#DDD}
> Using the GenericDatumWriter with either Generic OR SpecificRecord will break 
> if an Enum is present.
> {panel}
> {panel:title=Steps To Reproduce|titleBGColor=#8DB|bgColor=#DDD}
> I have been tracking Avro decoding oddities for a while.
> The tests for this issue can be found 
> [here|https://github.com/ryonday/avroDecodingHelp/blob/master/src/test/java/com/ryonday/test/Avro180EnumFail.java]
> {panel}
> {panel:title=Notes|titleBGColor=#3AF|bgColor=#DDD}
> Due to the debacle that is the Avro "UTF8" object, we have been avoiding it 
> by using the following scheme:
> * Write incoming records to a byte array using the GenericDatumWriter
> * Read back the byte array to our compiled Java domain objects using a 
> SpecificDatumWriter
> This worked great with Avro 1.7.7, and this is a binary-incompatable breaking 
> change with 1.8.0.
> This would appear to be caused by an addition in the 
> {{GenericDatumWriter:163-164}}:
> {code}
>   if (!data.isEnum(datum))
>   throw new AvroTypeException("Not an enum: "+datum);
> {code}
> {panel}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (AVRO-1810) GenericDatumWriter broken with Enum

2017-09-14 Thread Zoltan Farkas (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16166377#comment-16166377
 ] 

Zoltan Farkas commented on AVRO-1810:
-

The way I resolved this in my for was to make the Generated enums implement 
org.apache.avro.generic.GenericEnumSymbol:

https://github.com/zolyfarkas/avro/blob/trunk/lang/java/compiler/src/main/velocity/org/apache/avro/compiler/specific/templates/java/classic/enum.vm#L29

Also changed GenericEnumSymbol from:

{code}
/** An enum symbol. */
public interface GenericEnumSymbol
extends GenericContainer, Comparable {
  /** Return the symbol. */
  String toString();
}
{code}

to:

{code}
/** An enum symbol. */
public interface GenericEnumSymbol
extends GenericContainer, Comparable {
  /** Return the symbol. */
  String toString();
}
{code}

I can prepare a PR if this approach is OK with everyone.


> GenericDatumWriter broken with Enum
> ---
>
> Key: AVRO-1810
> URL: https://issues.apache.org/jira/browse/AVRO-1810
> Project: Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.8.0
>Reporter: Ryon Day
>Priority: Blocker
> Fix For: 1.9.0, 1.8.4
>
>
> {panel:title=Description|titleBGColor=#3FA|bgColor=#DDD}
> Using the GenericDatumWriter with either Generic OR SpecificRecord will break 
> if an Enum is present.
> {panel}
> {panel:title=Steps To Reproduce|titleBGColor=#8DB|bgColor=#DDD}
> I have been tracking Avro decoding oddities for a while.
> The tests for this issue can be found 
> [here|https://github.com/ryonday/avroDecodingHelp/blob/master/src/test/java/com/ryonday/test/Avro180EnumFail.java]
> {panel}
> {panel:title=Notes|titleBGColor=#3AF|bgColor=#DDD}
> Due to the debacle that is the Avro "UTF8" object, we have been avoiding it 
> by using the following scheme:
> * Write incoming records to a byte array using the GenericDatumWriter
> * Read back the byte array to our compiled Java domain objects using a 
> SpecificDatumWriter
> This worked great with Avro 1.7.7, and this is a binary-incompatable breaking 
> change with 1.8.0.
> This would appear to be caused by an addition in the 
> {{GenericDatumWriter:163-164}}:
> {code}
>   if (!data.isEnum(datum))
>   throw new AvroTypeException("Not an enum: "+datum);
> {code}
> {panel}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (AVRO-1340) use default to allow old readers to specify default enum value when encountering new enum symbols

2017-09-05 Thread Zoltan Farkas (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16154576#comment-16154576
 ] 

Zoltan Farkas commented on AVRO-1340:
-

I am not sure I like a enum defined like:

"name":"httpResponseCode",
"symbols":["UNKNOWN", "200", "404", "500", "300", "301", "302", "400"],
"symbolAliases": {"300":["301", "302"], "400": ["UNKNOWN"]}

this would imply that symbols: "300", "301", "302" are equivalent (an alias is 
just another name)... and 400 is equivalent with "UNKNOWN"... which does not 
seem right..

this would allow things like:

switch (myEnum) {
case 300:
  dosomething
case 301:
  dosomethingelse
}
which would be error prone..

I think the enforcing the "symbols" to be distinct (applying aliases) might 
make sense...
 

> use default to allow old readers to specify default enum value when 
> encountering new enum symbols
> -
>
> Key: AVRO-1340
> URL: https://issues.apache.org/jira/browse/AVRO-1340
> Project: Avro
>  Issue Type: Improvement
>  Components: spec
> Environment: N/A
>Reporter: Jim Donofrio
>Priority: Minor
>
> The schema resolution page says:
> > if both are enums:
> > if the writer's symbol is not present in the reader's enum, then an
> error is signalled.
> This makes it difficult to use enum's because you can never add a enum value 
> and keep old reader's compatible. Why not use the default option to refer to 
> one of enum values so that when a old reader encounters a enum ordinal it 
> does not recognize, it can default to the optional schema provided one. If 
> the old schema does not provide a default then the older reader can continue 
> to fail as it does today.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (AVRO-1340) use default to allow old readers to specify default enum value when encountering new enum symbols

2017-09-01 Thread Zoltan Farkas (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16151176#comment-16151176
 ] 

Zoltan Farkas commented on AVRO-1340:
-

[~cutting] There is https://issues.apache.org/jira/browse/AVRO-1752 which is 
for aliases and it is linked to this JIRA, I am fine either way with 2 PRs or 
1...

> use default to allow old readers to specify default enum value when 
> encountering new enum symbols
> -
>
> Key: AVRO-1340
> URL: https://issues.apache.org/jira/browse/AVRO-1340
> Project: Avro
>  Issue Type: Improvement
>  Components: spec
> Environment: N/A
>Reporter: Jim Donofrio
>Priority: Minor
>
> The schema resolution page says:
> > if both are enums:
> > if the writer's symbol is not present in the reader's enum, then an
> error is signalled.
> This makes it difficult to use enum's because you can never add a enum value 
> and keep old reader's compatible. Why not use the default option to refer to 
> one of enum values so that when a old reader encounters a enum ordinal it 
> does not recognize, it can default to the optional schema provided one. If 
> the old schema does not provide a default then the older reader can continue 
> to fail as it does today.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (AVRO-2072) ResolvingGrammarGenerator doesn't implement schema resolution correctly for unions

2017-08-30 Thread Zoltan Farkas (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-2072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16147312#comment-16147312
 ] 

Zoltan Farkas commented on AVRO-2072:
-

[~nkollar] Missed that part of the spec, the patch behavior is compliant with 
the spec... an update to the unit test is needed to comply with the patch.

BTW these unit tests have been added in AVRO-1931: 
https://issues.apache.org/jira/secure/attachment/12832657/AVRO-1931-2.patch 

 


> ResolvingGrammarGenerator doesn't implement schema resolution correctly for 
> unions
> --
>
> Key: AVRO-2072
> URL: https://issues.apache.org/jira/browse/AVRO-2072
> Project: Avro
>  Issue Type: Bug
>Reporter: Nandor Kollar
>Assignee: Nandor Kollar
> Attachments: AVRO-2072.patch
>
>
> According to 
> [specification|https://avro.apache.org/docs/current/spec.html#Schema+Resolution],
>  int and long is promotable to float, but when using SchemaValidator, a union 
> with a single int or long branch is not readable by an union with a float 
> branch.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (AVRO-2072) ResolvingGrammarGenerator doesn't implement schema resolution correctly for unions

2017-08-30 Thread Zoltan Farkas (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-2072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16147229#comment-16147229
 ] 

Zoltan Farkas edited comment on AVRO-2072 at 8/30/17 1:22 PM:
--

I executed this against my branch... which is ahead of the official branch in 
certain places and behind in others... 
THeese failures do highligh the need for this patch to contains some tests to 
test the functionality.

TestReadingWritingDataInEvolvedSchemas.longWrittenWithUnionSchemaIsConvertedToFloatDoubleUnionSchema

highlights actually an interesting evolution case which the patch does not 
cover:

{code}
int field -> union {float, double} field;
{code}

The test validates that int are promoted to double... meanwhile the patch 
promotes it to float (first compatible type in the union)

I think the spec should be undated to clarify what needs to be done here...


was (Author: zolyfarkas):
I executed this against my branch... which is ahead of the official branch in 
certain places and behind in others... 
THeese failures do highligh the need for this patch to contains some tests to 
test the functionality.

TestReadingWritingDataInEvolvedSchemas.longWrittenWithUnionSchemaIsConvertedToFloatDoubleUnionSchema

highlights actually an interesting evolution case which the patch does not 
cover:

int field -> union {float, double} field;

The test validates that things are promoted to double... meanwhile the patch 
promotes it to float (first compatible type in the union)

I think the spec should be undated to clarify what needs to be done here...

> ResolvingGrammarGenerator doesn't implement schema resolution correctly for 
> unions
> --
>
> Key: AVRO-2072
> URL: https://issues.apache.org/jira/browse/AVRO-2072
> Project: Avro
>  Issue Type: Bug
>Reporter: Nandor Kollar
>Assignee: Nandor Kollar
> Attachments: AVRO-2072.patch
>
>
> According to 
> [specification|https://avro.apache.org/docs/current/spec.html#Schema+Resolution],
>  int and long is promotable to float, but when using SchemaValidator, a union 
> with a single int or long branch is not readable by an union with a float 
> branch.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (AVRO-2072) ResolvingGrammarGenerator doesn't implement schema resolution correctly for unions

2017-08-30 Thread Zoltan Farkas (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-2072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16147229#comment-16147229
 ] 

Zoltan Farkas commented on AVRO-2072:
-

I executed this against my branch... which is ahead of the official branch in 
certain places and behind in others... 
THeese failures do highligh the need for this patch to contains some tests to 
test the functionality.

TestReadingWritingDataInEvolvedSchemas.longWrittenWithUnionSchemaIsConvertedToFloatDoubleUnionSchema

highlights actually an interesting evolution case which the patch does not 
cover:

int field -> union {float, double} field;

The test validates that things are promoted to double... meanwhile the patch 
promotes it to float (first compatible type in the union)

I think the spec should be undated to clarify what needs to be done here...

> ResolvingGrammarGenerator doesn't implement schema resolution correctly for 
> unions
> --
>
> Key: AVRO-2072
> URL: https://issues.apache.org/jira/browse/AVRO-2072
> Project: Avro
>  Issue Type: Bug
>Reporter: Nandor Kollar
>Assignee: Nandor Kollar
> Attachments: AVRO-2072.patch
>
>
> According to 
> [specification|https://avro.apache.org/docs/current/spec.html#Schema+Resolution],
>  int and long is promotable to float, but when using SchemaValidator, a union 
> with a single int or long branch is not readable by an union with a float 
> branch.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (AVRO-1340) use default to allow old readers to specify default enum value when encountering new enum symbols

2017-08-27 Thread Zoltan Farkas (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16143208#comment-16143208
 ] 

Zoltan Farkas commented on AVRO-1340:
-

Removing a value would only be possible if a fallback exists.
This is in similar with removal of fields from records, you can remove a field 
as long as you have a default value defined.


> use default to allow old readers to specify default enum value when 
> encountering new enum symbols
> -
>
> Key: AVRO-1340
> URL: https://issues.apache.org/jira/browse/AVRO-1340
> Project: Avro
>  Issue Type: Improvement
>  Components: spec
> Environment: N/A
>Reporter: Jim Donofrio
>Priority: Minor
>
> The schema resolution page says:
> > if both are enums:
> > if the writer's symbol is not present in the reader's enum, then an
> error is signalled.
> This makes it difficult to use enum's because you can never add a enum value 
> and keep old reader's compatible. Why not use the default option to refer to 
> one of enum values so that when a old reader encounters a enum ordinal it 
> does not recognize, it can default to the optional schema provided one. If 
> the old schema does not provide a default then the older reader can continue 
> to fail as it does today.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (AVRO-1340) use default to allow old readers to specify default enum value when encountering new enum symbols

2017-08-27 Thread Zoltan Farkas (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16143186#comment-16143186
 ] 

Zoltan Farkas commented on AVRO-1340:
-

In you example if v2 symbols are identical with v1 symbols, converting from v1 
<-> v2 will never involve the fallbackValue as such conversion will correct. 
But if  we have v3:

V3:
"name":"httpResponseCode",
"symbols":["UNKNOWN", "200", "404", "500", "300"],
"fallbackSymbol": "UNKNOWN"

When converting v3 "300" to v1 we get UNKNOWN and to v2 we would get "500"...  
not pretty... but on the other hand code written against v1 and v2 might deal 
with this correctly... but yuck...

All the use cases I have in mind, changing the fallback symbol does not make 
much sense... so unless somebody has a use case where fallback symbol change 
makes sense I am in favor of enforcing the fallbackSymbol to stay the same... 
one thing less to miss-use...



> use default to allow old readers to specify default enum value when 
> encountering new enum symbols
> -
>
> Key: AVRO-1340
> URL: https://issues.apache.org/jira/browse/AVRO-1340
> Project: Avro
>  Issue Type: Improvement
>  Components: spec
> Environment: N/A
>Reporter: Jim Donofrio
>Priority: Minor
>
> The schema resolution page says:
> > if both are enums:
> > if the writer's symbol is not present in the reader's enum, then an
> error is signalled.
> This makes it difficult to use enum's because you can never add a enum value 
> and keep old reader's compatible. Why not use the default option to refer to 
> one of enum values so that when a old reader encounters a enum ordinal it 
> does not recognize, it can default to the optional schema provided one. If 
> the old schema does not provide a default then the older reader can continue 
> to fail as it does today.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (AVRO-1340) use default to allow old readers to specify default enum value when encountering new enum symbols

2017-08-27 Thread Zoltan Farkas (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16143153#comment-16143153
 ] 

Zoltan Farkas commented on AVRO-1340:
-

FELIX> Can v1 have no "fallbackSymbol", v2 define "fallbackSymbol":"SPADES" and 
then v3 define "fallbackSymbol":"HEART"?

Z>Yes, In the case above if v2 adds a symbol that is not present in v1, a v2 
record will not be convertible to v1, but it will be the other way around. 
Having v2 and v3 with different fallback symbols would be fine with the logic 
that when converting records to v2 and v3 unknown symbols will be converted to 
different symbols...

The symbol aliases the way I described and implemented above allows only the 
renaming of a enum symbol in a backwards compatible matter. (A alias is just 
another name for the same thing like with field aliasses)
if you have enum symbol BADNAME you can rename it in v+1: CORRECT_NAME with 
symbolAliasses : "CORRECT_NAME" : ["BADNAME"]

a example of how this would work you can see at: 
https://github.com/zolyfarkas/avro/pull/3/files#diff-e7505abebad7702fa59c473f4e976b0fR41
 

FELIX>why not name "fallbackSymbol" as "default", to keep it inline with 
regular Avro syntax?

Z> This does not feel like a default value... currently default values apply 
only to fields... for we would declare a field as:
Enum field = "SPADE"
but fallbackSymbol would be "UNKNOWN".
I am not sure fallbackSymbol is the best name, I am open for any suggestions 
for: "symbol to use when we don't have a match". Default for me means "symbol 
to use when no symbol is provided"...

let me know if it makes sense



> use default to allow old readers to specify default enum value when 
> encountering new enum symbols
> -
>
> Key: AVRO-1340
> URL: https://issues.apache.org/jira/browse/AVRO-1340
> Project: Avro
>  Issue Type: Improvement
>  Components: spec
> Environment: N/A
>Reporter: Jim Donofrio
>Priority: Minor
>
> The schema resolution page says:
> > if both are enums:
> > if the writer's symbol is not present in the reader's enum, then an
> error is signalled.
> This makes it difficult to use enum's because you can never add a enum value 
> and keep old reader's compatible. Why not use the default option to refer to 
> one of enum values so that when a old reader encounters a enum ordinal it 
> does not recognize, it can default to the optional schema provided one. If 
> the old schema does not provide a default then the older reader can continue 
> to fail as it does today.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (AVRO-1340) use default to allow old readers to specify default enum value when encountering new enum symbols

2017-08-26 Thread Zoltan Farkas (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16142833#comment-16142833
 ] 

Zoltan Farkas commented on AVRO-1340:
-

Anyone opinions on my previous suggestion?
I have implemented the above at: https://github.com/zolyfarkas/avro/pull/3/files
(this PR is against my fork which is a bit off sync with the official, but 
close enough)
thanks!


> use default to allow old readers to specify default enum value when 
> encountering new enum symbols
> -
>
> Key: AVRO-1340
> URL: https://issues.apache.org/jira/browse/AVRO-1340
> Project: Avro
>  Issue Type: Improvement
>  Components: spec
> Environment: N/A
>Reporter: Jim Donofrio
>Priority: Minor
>
> The schema resolution page says:
> > if both are enums:
> > if the writer's symbol is not present in the reader's enum, then an
> error is signalled.
> This makes it difficult to use enum's because you can never add a enum value 
> and keep old reader's compatible. Why not use the default option to refer to 
> one of enum values so that when a old reader encounters a enum ordinal it 
> does not recognize, it can default to the optional schema provided one. If 
> the old schema does not provide a default then the older reader can continue 
> to fail as it does today.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (AVRO-1340) use default to allow old readers to specify default enum value when encountering new enum symbols

2017-08-22 Thread Zoltan Farkas (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16137220#comment-16137220
 ] 

Zoltan Farkas edited comment on AVRO-1340 at 8/22/17 7:29 PM:
--

[~cutting] what do you thing about the following:

#  Symbol aliases example to correct misspellings. (similar to field aliasses): 
 

 v1:

 {code}
 { "type": "enum",
   "name": "Suit",
  "symbols" : ["SPADES", "HEARTS", "DIAMONDS", "CLUS"]
 }
 {code}

 v2 (correcting CLUS):

 {code}
 { "type": "enum",
   "name": "Suit",
   "symbols" : ["SPADES", "HEARTS", "DIAMONDS", "CLUB”]
   “symbolAliasses” : {
   “CLUB” : [“CLUS”]
   }
 }
 {code}

 v3 (correcting CLUB):

 {code}
 { "type": "enum",
   "name": "Suit",
   "symbols" : ["SPADES", "HEARTS", "DIAMONDS", "CLUBS”]
   “symbolAliasses” : {
   “CLUBS” : [“CLUS”, “CLUB”]
   }
 }
 {code}

 sample conversion:

 v1: { “val” : “CLUS”} <-> v3: { “val” : “CLUBS” }

 the following enum schema would be illegal:

 {code}
 { "type": "enum",
   "name": "Suit", 
   "symbols" : ["SPADES", "HEARTS", "DIAMONDS", "CLUBS”]
   “symbolAliasses” : {
   “CLUBS” : [“CLUS”, “CLUB”, “SPADES”]
   }
 }
 {code}

 since it contains a duplicate definition. (“SPADES”, “CLUBS”)

# Evolution example with a fallback symbol:
 
 V1:

 {code}
 { "type": "enum",
   "name": "Suit",
   "symbols" : [“UNKNOWN”, ”SPADES", "HEARTS", "DIAMONDS", "CLUBS”],
   “symbolAliasses” : {
   “CLUBS” : [“CLUS”, “CLUB”]
   },
   “fallbackSymbol” : “UNKNOWN”
 }
 {code}

 and a record using the above enum:

 {code}
 {
   "type": "record",
   "name": “Example”,
   "fields" : [
 {"name": “enumValue”,
   "type": “Suit”,
   “default” : “SPADES”},
   ]
 }
 {code}

 V2 (adding "NEWVAL" symbol):

 {code}
 { "type": "enum",
   "name": "Suit",
   "symbols" : [“UNKNOWN”, ”SPADES", "HEARTS", "DIAMONDS", "CLUBS”, “NEWVAL”]
   “symbolAliasses” : {
   “CLUBS” : [“CLUS”, “CLUB”],
   },
   “fallbackSymbol” : “UNKNOWN”
 }
 {code}
 
 V3 (removing "SPADES" symbol):

 {code}
 { "type": "enum",
   "name": "Suit",
   "symbols" : [“UNKNOWN”,  "HEARTS", "DIAMONDS", "CLUBS”, “NEWVAL”]
   “symbolAliasses” : {
   “CLUBS” : [“CLUS”, “CLUB”],
   },
   “fallbackSymbol” : “UNKNOWN”
 }
 {code}


 sample conversions:

 v2 : { “enumValue” : “NEWVAL” } -> v1 : {“enumValue” : “UNKNOWN”}

 v1: {“enumValue” : “UNKNOWN”} -> v2  : {“enumValue” : “UNKNOWN”}

 v1: {“enumValue” : “SPADES”} -> v3  : {“enumValue” : “UNKNOWN”}

 would this be acceptable?




was (Author: zolyfarkas):
[~cutting] what do you thing about the following:

#  Symbol aliases example to correct misspellings. (similar to field aliasses): 
 

v1:

{code}
{ "type": "enum",
  "name": "Suit",
  "symbols" : ["SPADES", "HEARTS", "DIAMONDS", "CLUS"]
}
{code}

v2 (correcting CLUS):

{code}
{ "type": "enum",
  "name": "Suit",
  "symbols" : ["SPADES", "HEARTS", "DIAMONDS", "CLUB”]
  “symbolAliasses” : {
  “CLUB” : [“CLUS”]
  }
}
{code}

v3 (correcting CLUB):

{code}
{ "type": "enum",
  "name": "Suit",
  "symbols" : ["SPADES", "HEARTS", "DIAMONDS", "CLUBS”]
  “symbolAliasses” : {
  “CLUBS” : [“CLUS”, “CLUB”]
  }
}
{code}

sample conversion:

v1: { “val” : “CLUS”} <-> v3: { “val” : “CLUBS” }

the following enum schema would be illegal:

{code}
{ "type": "enum",
  "name": "Suit",
  "symbols" : ["SPADES", "HEARTS", "DIAMONDS", "CLUBS”]
  “symbolAliasses” : {
  “CLUBS” : [“CLUS”, “CLUB”, “SPADES”]
  }
}
{code}

since it contains a duplicate definition. (“SPADES”, “CLUBS”)


# Evolution example with a fallback symbol:

V1:

{code}
{ "type": "enum",
  "name": "Suit",
  "symbols" : [“UNKNOWN”, ”SPADES", "HEARTS", "DIAMONDS", "CLUBS”],
  “symbolAliasses” : {
  “CLUBS” : [“CLUS”, “CLUB”]
  },
  “fallbackSymbol” : “UNKNOWN”
}
{code}

and a record using the above enum:

{code}
{
  "type": "record",
  "name": “Example”,
  "fields" : [
{"name": “enumValue”,
  "type": “Suit”,
  “default” : “SPADES”},
  ]
}
{code}

V2 (adding "NEWVAL" symbol):

{code}
{ "type": "enum",
  "name": "Suit",
  "symbols" : [“UNKNOWN”, ”SPADES", "HEARTS", "DIAMONDS", "CLUBS”, “NEWVAL”]
  “symbolAliasses” : {
  “CLUBS” : [“CLUS”, “CLUB”],
  },
  “fallbackSymbol” : “UNKNOWN”
}
{code}

V3 (removing "SPADES" symbol):

{code}
{ "type": "enum",
  "name": "Suit",
  "symbols" : [“UNKNOWN”,  "HEARTS", "DIAMONDS", "CLUBS”, “NEWVAL”]
  “symbolAliasses” : {
  “CLUBS” : [“CLUS”, “CLUB”],
  },
  “fallbackSymbol” : “UNKNOWN”
}
{code}


sample conversion:

v2 : { “enumValue” : “NEWVAL” } -> v1 : {“enumValue” : “UNKNOWN”}

v1: {“enumValue” : “UNKNOWN”} -> v2  : {“enumValue” : “UNKNOWN”}

v1: {“enumValue” : “SPADES”} -> v3  : {“enumValue” : “UNKNOWN”}

would this be acceptable?



> use default to allow old readers to specify default enum value when 
> encountering new enum symbols
> -
>
> Key: 

[jira] [Comment Edited] (AVRO-1340) use default to allow old readers to specify default enum value when encountering new enum symbols

2017-08-22 Thread Zoltan Farkas (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16137220#comment-16137220
 ] 

Zoltan Farkas edited comment on AVRO-1340 at 8/22/17 7:29 PM:
--

[~cutting] what do you thing about the following:

1)  Symbol aliases example to correct misspellings. (similar to field 
aliasses):  

 v1:

 {code}
 { "type": "enum",
   "name": "Suit",
  "symbols" : ["SPADES", "HEARTS", "DIAMONDS", "CLUS"]
 }
 {code}

 v2 (correcting CLUS):

 {code}
 { "type": "enum",
   "name": "Suit",
   "symbols" : ["SPADES", "HEARTS", "DIAMONDS", "CLUB”]
   “symbolAliasses” : {
   “CLUB” : [“CLUS”]
   }
 }
 {code}

 v3 (correcting CLUB):

 {code}
 { "type": "enum",
   "name": "Suit",
   "symbols" : ["SPADES", "HEARTS", "DIAMONDS", "CLUBS”]
   “symbolAliasses” : {
   “CLUBS” : [“CLUS”, “CLUB”]
   }
 }
 {code}

 sample conversion:

 v1: { “val” : “CLUS”} <-> v3: { “val” : “CLUBS” }

 the following enum schema would be illegal:

 {code}
 { "type": "enum",
   "name": "Suit", 
   "symbols" : ["SPADES", "HEARTS", "DIAMONDS", "CLUBS”]
   “symbolAliasses” : {
   “CLUBS” : [“CLUS”, “CLUB”, “SPADES”]
   }
 }
 {code}

 since it contains a duplicate definition. (“SPADES”, “CLUBS”)

2) Evolution example with a fallback symbol:
 
 V1:

 {code}
 { "type": "enum",
   "name": "Suit",
   "symbols" : [“UNKNOWN”, ”SPADES", "HEARTS", "DIAMONDS", "CLUBS”],
   “symbolAliasses” : {
   “CLUBS” : [“CLUS”, “CLUB”]
   },
   “fallbackSymbol” : “UNKNOWN”
 }
 {code}

 and a record using the above enum:

 {code}
 {
   "type": "record",
   "name": “Example”,
   "fields" : [
 {"name": “enumValue”,
   "type": “Suit”,
   “default” : “SPADES”},
   ]
 }
 {code}

 V2 (adding "NEWVAL" symbol):

 {code}
 { "type": "enum",
   "name": "Suit",
   "symbols" : [“UNKNOWN”, ”SPADES", "HEARTS", "DIAMONDS", "CLUBS”, “NEWVAL”]
   “symbolAliasses” : {
   “CLUBS” : [“CLUS”, “CLUB”],
   },
   “fallbackSymbol” : “UNKNOWN”
 }
 {code}
 
 V3 (removing "SPADES" symbol):

 {code}
 { "type": "enum",
   "name": "Suit",
   "symbols" : [“UNKNOWN”,  "HEARTS", "DIAMONDS", "CLUBS”, “NEWVAL”]
   “symbolAliasses” : {
   “CLUBS” : [“CLUS”, “CLUB”],
   },
   “fallbackSymbol” : “UNKNOWN”
 }
 {code}


 sample conversions:

 v2 : { “enumValue” : “NEWVAL” } -> v1 : {“enumValue” : “UNKNOWN”}

 v1: {“enumValue” : “UNKNOWN”} -> v2  : {“enumValue” : “UNKNOWN”}

 v1: {“enumValue” : “SPADES”} -> v3  : {“enumValue” : “UNKNOWN”}

 would this be acceptable?




was (Author: zolyfarkas):
[~cutting] what do you thing about the following:

#  Symbol aliases example to correct misspellings. (similar to field aliasses): 
 

 v1:

 {code}
 { "type": "enum",
   "name": "Suit",
  "symbols" : ["SPADES", "HEARTS", "DIAMONDS", "CLUS"]
 }
 {code}

 v2 (correcting CLUS):

 {code}
 { "type": "enum",
   "name": "Suit",
   "symbols" : ["SPADES", "HEARTS", "DIAMONDS", "CLUB”]
   “symbolAliasses” : {
   “CLUB” : [“CLUS”]
   }
 }
 {code}

 v3 (correcting CLUB):

 {code}
 { "type": "enum",
   "name": "Suit",
   "symbols" : ["SPADES", "HEARTS", "DIAMONDS", "CLUBS”]
   “symbolAliasses” : {
   “CLUBS” : [“CLUS”, “CLUB”]
   }
 }
 {code}

 sample conversion:

 v1: { “val” : “CLUS”} <-> v3: { “val” : “CLUBS” }

 the following enum schema would be illegal:

 {code}
 { "type": "enum",
   "name": "Suit", 
   "symbols" : ["SPADES", "HEARTS", "DIAMONDS", "CLUBS”]
   “symbolAliasses” : {
   “CLUBS” : [“CLUS”, “CLUB”, “SPADES”]
   }
 }
 {code}

 since it contains a duplicate definition. (“SPADES”, “CLUBS”)

# Evolution example with a fallback symbol:
 
 V1:

 {code}
 { "type": "enum",
   "name": "Suit",
   "symbols" : [“UNKNOWN”, ”SPADES", "HEARTS", "DIAMONDS", "CLUBS”],
   “symbolAliasses” : {
   “CLUBS” : [“CLUS”, “CLUB”]
   },
   “fallbackSymbol” : “UNKNOWN”
 }
 {code}

 and a record using the above enum:

 {code}
 {
   "type": "record",
   "name": “Example”,
   "fields" : [
 {"name": “enumValue”,
   "type": “Suit”,
   “default” : “SPADES”},
   ]
 }
 {code}

 V2 (adding "NEWVAL" symbol):

 {code}
 { "type": "enum",
   "name": "Suit",
   "symbols" : [“UNKNOWN”, ”SPADES", "HEARTS", "DIAMONDS", "CLUBS”, “NEWVAL”]
   “symbolAliasses” : {
   “CLUBS” : [“CLUS”, “CLUB”],
   },
   “fallbackSymbol” : “UNKNOWN”
 }
 {code}
 
 V3 (removing "SPADES" symbol):

 {code}
 { "type": "enum",
   "name": "Suit",
   "symbols" : [“UNKNOWN”,  "HEARTS", "DIAMONDS", "CLUBS”, “NEWVAL”]
   “symbolAliasses” : {
   “CLUBS” : [“CLUS”, “CLUB”],
   },
   “fallbackSymbol” : “UNKNOWN”
 }
 {code}


 sample conversions:

 v2 : { “enumValue” : “NEWVAL” } -> v1 : {“enumValue” : “UNKNOWN”}

 v1: {“enumValue” : “UNKNOWN”} -> v2  : {“enumValue” : “UNKNOWN”}

 v1: {“enumValue” : “SPADES”} -> v3  : {“enumValue” : “UNKNOWN”}

 would this be acceptable?



> use default to allow old readers to specify default enum value when 
> encountering new enum symbols
> 

[jira] [Comment Edited] (AVRO-1340) use default to allow old readers to specify default enum value when encountering new enum symbols

2017-08-22 Thread Zoltan Farkas (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16137220#comment-16137220
 ] 

Zoltan Farkas edited comment on AVRO-1340 at 8/22/17 7:26 PM:
--

[~cutting] what do you thing about the following:

#  Symbol aliases example to correct misspellings. (similar to field aliasses): 
 

v1:

{code}
{ "type": "enum",
  "name": "Suit",
  "symbols" : ["SPADES", "HEARTS", "DIAMONDS", "CLUS"]
}
{code}

v2 (correcting CLUS):

{code}
{ "type": "enum",
  "name": "Suit",
  "symbols" : ["SPADES", "HEARTS", "DIAMONDS", "CLUB”]
  “symbolAliasses” : {
  “CLUB” : [“CLUS”]
  }
}
{code}

v3 (correcting CLUB):

{code}
{ "type": "enum",
  "name": "Suit",
  "symbols" : ["SPADES", "HEARTS", "DIAMONDS", "CLUBS”]
  “symbolAliasses” : {
  “CLUBS” : [“CLUS”, “CLUB”]
  }
}
{code}

sample conversion:

v1: { “val” : “CLUS”} <-> v3: { “val” : “CLUBS” }

the following enum schema would be illegal:

{code}
{ "type": "enum",
  "name": "Suit",
  "symbols" : ["SPADES", "HEARTS", "DIAMONDS", "CLUBS”]
  “symbolAliasses” : {
  “CLUBS” : [“CLUS”, “CLUB”, “SPADES”]
  }
}
{code}

since it contains a duplicate definition. (“SPADES”, “CLUBS”)


# Evolution example with a fallback symbol:

V1:

{code}
{ "type": "enum",
  "name": "Suit",
  "symbols" : [“UNKNOWN”, ”SPADES", "HEARTS", "DIAMONDS", "CLUBS”],
  “symbolAliasses” : {
  “CLUBS” : [“CLUS”, “CLUB”]
  },
  “fallbackSymbol” : “UNKNOWN”
}
{code}

and a record using the above enum:

{code}
{
  "type": "record",
  "name": “Example”,
  "fields" : [
{"name": “enumValue”,
  "type": “Suit”,
  “default” : “SPADES”},
  ]
}
{code}

V2 (adding "NEWVAL" symbol):

{code}
{ "type": "enum",
  "name": "Suit",
  "symbols" : [“UNKNOWN”, ”SPADES", "HEARTS", "DIAMONDS", "CLUBS”, “NEWVAL”]
  “symbolAliasses” : {
  “CLUBS” : [“CLUS”, “CLUB”],
  },
  “fallbackSymbol” : “UNKNOWN”
}
{code}

V3 (removing "SPADES" symbol):

{code}
{ "type": "enum",
  "name": "Suit",
  "symbols" : [“UNKNOWN”,  "HEARTS", "DIAMONDS", "CLUBS”, “NEWVAL”]
  “symbolAliasses” : {
  “CLUBS” : [“CLUS”, “CLUB”],
  },
  “fallbackSymbol” : “UNKNOWN”
}
{code}


sample conversion:

v2 : { “enumValue” : “NEWVAL” } -> v1 : {“enumValue” : “UNKNOWN”}

v1: {“enumValue” : “UNKNOWN”} -> v2  : {“enumValue” : “UNKNOWN”}

v1: {“enumValue” : “SPADES”} -> v3  : {“enumValue” : “UNKNOWN”}

would this be acceptable?




was (Author: zolyfarkas):
[~cutting] what do you thing about the following:

 1) Symbol aliases example to correct misspellings. (similar to field 
aliasses):  

v1:

{code}
{ "type": "enum",
  "name": "Suit",
  "symbols" : ["SPADES", "HEARTS", "DIAMONDS", "CLUS"]
}
{code}

v2:

{code}
{ "type": "enum",
  "name": "Suit",
  "symbols" : ["SPADES", "HEARTS", "DIAMONDS", "CLUB”]
  “symbolAliasses” : {
  “CLUB” : [“CLUS”]
  }
}
{code}

v3:

{code}
{ "type": "enum",
  "name": "Suit",
  "symbols" : ["SPADES", "HEARTS", "DIAMONDS", "CLUBS”]
  “symbolAliasses” : {
  “CLUBS” : [“CLUS”, “CLUB”]
  }
}
{code}

sample conversion:

v1: { “val” : “CLUS”} <-> v3: { “val” : “CLUBS” }

the following enum schema would be illegal:

{code}
{ "type": "enum",
  "name": "Suit",
  "symbols" : ["SPADES", "HEARTS", "DIAMONDS", "CLUBS”]
  “symbolAliasses” : {
  “CLUBS” : [“CLUS”, “CLUB”, “SPADES”]
  }
}
{code}

since it contains a duplicate definition. (“SPADES”, “CLUBS”)


 2) Evolution example with a fallback symbol:

V1:

{code}
{ "type": "enum",
  "name": "Suit",
  "symbols" : [“UNKNOWN”, ”SPADES", "HEARTS", "DIAMONDS", "CLUBS”],
  “symbolAliasses” : {
  “CLUBS” : [“CLUS”, “CLUB”]
  },
  “fallbackSymbol” : “UNKNOWN”
}
{code}

and a record using the above enum:

{code}
{
  "type": "record",
  "name": “Example”,
  "fields" : [
{"name": “enumValue”,
  "type": “Suit”,
  “default” : “SPADES”},
  ]
}
{code}

V2:

{code}
{ "type": "enum",
  "name": "Suit",
  "symbols" : [“UNKNOWN”, ”SPADES", "HEARTS", "DIAMONDS", "CLUBS”, “NEWVAL”]
  “symbolAliasses” : {
  “CLUBS” : [“CLUS”, “CLUB”],
  },
  “fallbackSymbol” : “UNKNOWN”
}
{code}

sample conversion:

v2 : { “enumValue” : “NEWVAL” } -> v1 : {“enumValue” : “UNKNOWN”}
v1: {“enumValue” : “UNKNOWN”} -> v2  : {“enumValue” : “UNKNOWN”}

would this be acceptable?




> use default to allow old readers to specify default enum value when 
> encountering new enum symbols
> -
>
> Key: AVRO-1340
> URL: https://issues.apache.org/jira/browse/AVRO-1340
> Project: Avro
>  Issue Type: Improvement
>  Components: spec
> Environment: N/A
>Reporter: Jim Donofrio
>Priority: Minor
>
> The schema resolution page says:
> > if both are enums:
> > if the writer's symbol is not present in the reader's enum, then an
> error is signalled.
> This makes it difficult to use enum's 

[jira] [Commented] (AVRO-1340) use default to allow old readers to specify default enum value when encountering new enum symbols

2017-08-22 Thread Zoltan Farkas (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16137220#comment-16137220
 ] 

Zoltan Farkas commented on AVRO-1340:
-

[~cutting] what do you thing about the following:

 1) Symbol aliases example to correct misspellings. (similar to field 
aliasses):  

v1:

{code}
{ "type": "enum",
  "name": "Suit",
  "symbols" : ["SPADES", "HEARTS", "DIAMONDS", "CLUS"]
}
{code}

v2:

{code}
{ "type": "enum",
  "name": "Suit",
  "symbols" : ["SPADES", "HEARTS", "DIAMONDS", "CLUB”]
  “symbolAliasses” : {
  “CLUB” : [“CLUS”]
  }
}
{code}

v3:

{code}
{ "type": "enum",
  "name": "Suit",
  "symbols" : ["SPADES", "HEARTS", "DIAMONDS", "CLUBS”]
  “symbolAliasses” : {
  “CLUBS” : [“CLUS”, “CLUB”]
  }
}
{code}

sample conversion:

v1: { “val” : “CLUS”} <-> v3: { “val” : “CLUBS” }

the following enum schema would be illegal:

{code}
{ "type": "enum",
  "name": "Suit",
  "symbols" : ["SPADES", "HEARTS", "DIAMONDS", "CLUBS”]
  “symbolAliasses” : {
  “CLUBS” : [“CLUS”, “CLUB”, “SPADES”]
  }
}
{code}

since it contains a duplicate definition. (“SPADES”, “CLUBS”)


 2) Evolution example with a fallback symbol:

V1:

{code}
{ "type": "enum",
  "name": "Suit",
  "symbols" : [“UNKNOWN”, ”SPADES", "HEARTS", "DIAMONDS", "CLUBS”],
  “symbolAliasses” : {
  “CLUBS” : [“CLUS”, “CLUB”]
  },
  “fallbackSymbol” : “UNKNOWN”
}
{code}

and a record using the above enum:

{code}
{
  "type": "record",
  "name": “Example”,
  "fields" : [
{"name": “enumValue”,
  "type": “Suit”,
  “default” : “SPADES”},
  ]
}
{code}

V2:

{code}
{ "type": "enum",
  "name": "Suit",
  "symbols" : [“UNKNOWN”, ”SPADES", "HEARTS", "DIAMONDS", "CLUBS”, “NEWVAL”]
  “symbolAliasses” : {
  “CLUBS” : [“CLUS”, “CLUB”],
  },
  “fallbackSymbol” : “UNKNOWN”
}
{code}

sample conversion:

v2 : { “enumValue” : “NEWVAL” } -> v1 : {“enumValue” : “UNKNOWN”}
v1: {“enumValue” : “UNKNOWN”} -> v2  : {“enumValue” : “UNKNOWN”}

would this be acceptable?




> use default to allow old readers to specify default enum value when 
> encountering new enum symbols
> -
>
> Key: AVRO-1340
> URL: https://issues.apache.org/jira/browse/AVRO-1340
> Project: Avro
>  Issue Type: Improvement
>  Components: spec
> Environment: N/A
>Reporter: Jim Donofrio
>Priority: Minor
>
> The schema resolution page says:
> > if both are enums:
> > if the writer's symbol is not present in the reader's enum, then an
> error is signalled.
> This makes it difficult to use enum's because you can never add a enum value 
> and keep old reader's compatible. Why not use the default option to refer to 
> one of enum values so that when a old reader encounters a enum ordinal it 
> does not recognize, it can default to the optional schema provided one. If 
> the old schema does not provide a default then the older reader can continue 
> to fail as it does today.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (AVRO-2068) Improve EnumSchema constructor performance

2017-08-22 Thread Zoltan Farkas (JIRA)
Zoltan Farkas created AVRO-2068:
---

 Summary: Improve EnumSchema constructor performance
 Key: AVRO-2068
 URL: https://issues.apache.org/jira/browse/AVRO-2068
 Project: Avro
  Issue Type: Improvement
Reporter: Zoltan Farkas
Priority: Trivial


at 
https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/Schema.java#L745
 :
{code}

  private static class EnumSchema extends NamedSchema {
private final List symbols;
private final Map ordinals;
public EnumSchema(Name name, String doc,
LockableArrayList symbols) {
  super(Type.ENUM, name, doc);
  this.symbols = symbols.lock();
  this.ordinals = new HashMap();
  int i = 0;
  for (String symbol : symbols)
if (ordinals.put(validateName(symbol), i++) != null)
  throw new SchemaParseException("Duplicate enum symbol: "+symbol);
}

{code}

should be changed to:

{code}

  private static class EnumSchema extends NamedSchema {
private final List symbols;
private final Map ordinals;
public EnumSchema(Name name, String doc,
LockableArrayList symbols) {
  super(Type.ENUM, name, doc);
  this.symbols = symbols.lock();
  this.ordinals = new HashMap(symbols.size());
  int i = 0;
  for (String symbol : symbols)
if (ordinals.put(validateName(symbol), i++) != null)
  throw new SchemaParseException("Duplicate enum symbol: "+symbol);
}

{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (AVRO-2057) JsonDecoder.skipChildren does not skip map/records correctly

2017-07-18 Thread Zoltan Farkas (JIRA)
Zoltan Farkas created AVRO-2057:
---

 Summary: JsonDecoder.skipChildren does not skip map/records 
correctly
 Key: AVRO-2057
 URL: https://issues.apache.org/jira/browse/AVRO-2057
 Project: Avro
  Issue Type: Bug
Affects Versions: 1.8.2
Reporter: Zoltan Farkas
Priority: Critical


at 
https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/io/JsonDecoder.java#L585

{code}
  @Override
  public JsonParser skipChildren() throws IOException {
JsonToken tkn = elements.get(pos).token;
int level = (tkn == JsonToken.START_ARRAY || tkn == 
JsonToken.END_ARRAY) ? 1 : 0;
while (level > 0) {
  switch(elements.get(++pos).token) {
  case START_ARRAY:
  case START_OBJECT:
level++;
break;
  case END_ARRAY:
  case END_OBJECT:
level--;
break;
  }
}
return this;
  }
{code}

should be:

{code}
  @Override
  public JsonParser skipChildren() throws IOException {
JsonToken tkn = elements.get(pos).token;
int level = (tkn == JsonToken.START_ARRAY || tkn == 
JsonToken.START_OBJECT) ? 1 : 0;
while (level > 0) {
  switch(elements.get(++pos).token) {
  case START_ARRAY:
  case START_OBJECT:
level++;
break;
  case END_ARRAY:
  case END_OBJECT:
level--;
break;
  }
}
return this;
  }
{code}

This results in de-serialization failures when the reader schema does not have 
fields that are present in the serialized object and the writer schema. 




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (AVRO-1575) Platform specific end of line hardcoded in unit test causes test failure on Windows.

2017-05-30 Thread Zoltan Farkas (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16029363#comment-16029363
 ] 

Zoltan Farkas commented on AVRO-1575:
-

This can be closed.

> Platform specific end of line hardcoded in unit test causes test failure on 
> Windows.
> 
>
> Key: AVRO-1575
> URL: https://issues.apache.org/jira/browse/AVRO-1575
> Project: Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.7.7
> Environment: Windows
>Reporter: Zoltan Farkas
>Priority: Trivial
> Attachments: Avro-TestSchemaCompatibility.patch
>
>   Original Estimate: 5m
>  Remaining Estimate: 5m
>
> diff --git 
> a/lang/java/avro/src/test/java/org/apache/avro/TestSchemaCompatibility.java 
> b/lang/java/avro/src/test/java/org/apache/avro/TestSchemaCompatibility.java
> index 10d87df..6114981 100644
> --- 
> a/lang/java/avro/src/test/java/org/apache/avro/TestSchemaCompatibility.java
> +++ 
> b/lang/java/avro/src/test/java/org/apache/avro/TestSchemaCompatibility.java
> @@ -246,8 +246,8 @@
>  reader,
>  WRITER_SCHEMA,
>  String.format(
> -"Data encoded using writer schema:\n%s\n"
> -+ "will or may fail to decode using reader schema:\n%s\n",
> +"Data encoded using writer schema:%n%s%n"
> ++ "will or may fail to decode using reader schema:%n%s%n",
>  WRITER_SCHEMA.toString(true),
>  reader.toString(true)));
>  
> @@ -271,8 +271,8 @@
>  invalidReader,
>  STRING_ARRAY_SCHEMA,
>  String.format(
> -"Data encoded using writer schema:\n%s\n"
> -+ "will or may fail to decode using reader schema:\n%s\n",
> +"Data encoded using writer schema:%n%s%n"
> ++ "will or may fail to decode using reader schema:%n%s%n",
>  STRING_ARRAY_SCHEMA.toString(true),
>  invalidReader.toString(true)));
>  
> @@ -299,8 +299,8 @@
>  INT_SCHEMA,
>  STRING_SCHEMA,
>  String.format(
> -"Data encoded using writer schema:\n%s\n"
> -+ "will or may fail to decode using reader schema:\n%s\n",
> +"Data encoded using writer schema:%n%s%n"
> ++ "will or may fail to decode using reader schema:%n%s%n",
>  STRING_SCHEMA.toString(true),
>  INT_SCHEMA.toString(true)));
>  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (AVRO-1723) Add support for forward declarations in avro IDL

2017-05-14 Thread Zoltan Farkas (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16009886#comment-16009886
 ] 

Zoltan Farkas commented on AVRO-1723:
-

[~thiru_mg] I have updated the PR to redo the guava code and shaded it just 
like the main java lib.

build.sh test passes now, please review, and let me know if this looks good. 

thank you

> Add support for forward declarations in avro IDL
> 
>
> Key: AVRO-1723
> URL: https://issues.apache.org/jira/browse/AVRO-1723
> Project: Avro
>  Issue Type: Improvement
>Affects Versions: 1.8.0
>Reporter: Zoltan Farkas
>Assignee: Zoltan Farkas
> Attachments: AVRO-1723.patch
>
>
> Currently Recursive data structures like:
> record SampleNode {
>int count = 0;
>array samples = [];
> }
> record SamplePair {
>  string name;
>  SampleNode node;
> }
> It is not possible to declare in IDL,
> however it is possible to declare in avsc (with fix from 
> https://issues.apache.org/jira/browse/AVRO-1667 )
> It is actually not complicated to implement, here is some detail on a 
> possible implementation:
> https://github.com/zolyfarkas/avro/commit/210c50105717149f3daa39b8d4160b8548b8e363
> This would close a capability gap with google protocol buffers...



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (AVRO-1723) Add support for forward declarations in avro IDL

2017-05-04 Thread Zoltan Farkas (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15997230#comment-15997230
 ] 

Zoltan Farkas commented on AVRO-1723:
-

Let me know if I need to do anything else.

> Add support for forward declarations in avro IDL
> 
>
> Key: AVRO-1723
> URL: https://issues.apache.org/jira/browse/AVRO-1723
> Project: Avro
>  Issue Type: Improvement
>Affects Versions: 1.8.0
>Reporter: Zoltan Farkas
>Assignee: Zoltan Farkas
> Attachments: AVRO-1723.patch
>
>
> Currently Recursive data structures like:
> record SampleNode {
>int count = 0;
>array samples = [];
> }
> record SamplePair {
>  string name;
>  SampleNode node;
> }
> It is not possible to declare in IDL,
> however it is possible to declare in avsc (with fix from 
> https://issues.apache.org/jira/browse/AVRO-1667 )
> It is actually not complicated to implement, here is some detail on a 
> possible implementation:
> https://github.com/zolyfarkas/avro/commit/210c50105717149f3daa39b8d4160b8548b8e363
> This would close a capability gap with google protocol buffers...



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (AVRO-2031) GenericData.writeEscapedString should be static

2017-05-01 Thread Zoltan Farkas (JIRA)
Zoltan Farkas created AVRO-2031:
---

 Summary: GenericData.writeEscapedString should be static
 Key: AVRO-2031
 URL: https://issues.apache.org/jira/browse/AVRO-2031
 Project: Avro
  Issue Type: Improvement
Affects Versions: 1.8.1
Reporter: Zoltan Farkas
Priority: Trivial






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (AVRO-1723) Add support for forward declarations in avro IDL

2017-05-01 Thread Zoltan Farkas (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15990923#comment-15990923
 ] 

Zoltan Farkas commented on AVRO-1723:
-

Looks good! thank you!

> Add support for forward declarations in avro IDL
> 
>
> Key: AVRO-1723
> URL: https://issues.apache.org/jira/browse/AVRO-1723
> Project: Avro
>  Issue Type: Improvement
>Affects Versions: 1.8.0
>Reporter: Zoltan Farkas
>Assignee: Zoltan Farkas
> Attachments: AVRO-1723.patch
>
>
> Currently Recursive data structures like:
> record SampleNode {
>int count = 0;
>array samples = [];
> }
> record SamplePair {
>  string name;
>  SampleNode node;
> }
> It is not possible to declare in IDL,
> however it is possible to declare in avsc (with fix from 
> https://issues.apache.org/jira/browse/AVRO-1667 )
> It is actually not complicated to implement, here is some detail on a 
> possible implementation:
> https://github.com/zolyfarkas/avro/commit/210c50105717149f3daa39b8d4160b8548b8e363
> This would close a capability gap with google protocol buffers...



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (AVRO-1723) Add support for forward declarations in avro IDL

2017-04-30 Thread Zoltan Farkas (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15990376#comment-15990376
 ] 

Zoltan Farkas commented on AVRO-1723:
-

[~thiru_mg] Thanks for review.

your cleanup + fixes looks good.

Regarding the order of definition of named schema elements, it will not matter 
anymore.

This implementation will resolve all named elements not only records.

I just tested the following:

{code}
   /* Name Value record */
   record ANameValue {
  /** the name */
  string name;
  /** the value */
  string value;
  /* is the value a json object */
  ValueType type = "PLAIN";
   }
   enum ValueType {JSON, BASE64BIN, PLAIN}
{code}

while this is not supported with AVRO without this change.
It will work just fine from now on. 
Worthwhile adding a unit test.

What we should also do add a jira to move the generic schema walker we created 
for this into the core avro lib and see if there are other places this can be 
re-used as [~rdblue] suggested:

"Also, you might be able to simplify what's happening here by using the visitor 
pattern. I've been thinking that it would be helpful to have an internal 
SchemaVisitor for this kind of operation. That would cut down on the size of 
these methods and separate the recursion from the logic you're trying to 
implement. This may be a good time to add it to the core library."

This is the reason of existence of the CloningVisitor, to add one other use 
case for this generic schema walker... 

thank you!

> Add support for forward declarations in avro IDL
> 
>
> Key: AVRO-1723
> URL: https://issues.apache.org/jira/browse/AVRO-1723
> Project: Avro
>  Issue Type: Improvement
>Affects Versions: 1.8.0
>Reporter: Zoltan Farkas
>Assignee: Zoltan Farkas
> Attachments: AVRO-1723.patch
>
>
> Currently Recursive data structures like:
> record SampleNode {
>int count = 0;
>array samples = [];
> }
> record SamplePair {
>  string name;
>  SampleNode node;
> }
> It is not possible to declare in IDL,
> however it is possible to declare in avsc (with fix from 
> https://issues.apache.org/jira/browse/AVRO-1667 )
> It is actually not complicated to implement, here is some detail on a 
> possible implementation:
> https://github.com/zolyfarkas/avro/commit/210c50105717149f3daa39b8d4160b8548b8e363
> This would close a capability gap with google protocol buffers...



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (AVRO-1723) Add support for forward declarations in avro IDL

2017-04-26 Thread Zoltan Farkas (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Farkas updated AVRO-1723:

Release Note: 
With this patch recursive data structures will be supported in IDL:

record SampleNode {
  int count = 0;
  array samples = []; 
}
record SamplePair {
 string name;
 SampleNode node; 
}

Also the order of declaration of named types will not matter anymore. (similar 
with other languages like java)

Compatibility: 

All schemas that are compilable with older versions of AVRO will continue 
working without issues.
IDLs that will take advantage of the the features introduced with this patch, 
will not be compilable with older versions of AVRO
please see the dependent JIRAs for more detail on related issues.

  was:
With this patch recursive data structures will be supported in IDL:

{code}
record SampleNode {
  int count = 0;
  array samples = []; 
}
record SamplePair {
 string name;
 SampleNode node; 
}
{code}

Also the order of declaration of named types will not matter anymore. (similar 
with other languages like java)

Compatibility: 

All schemas that are compilable with older versions of AVRO will continue 
working without issues.
IDLs that will take advantage of the the features introduced with this patch, 
will not be compilable with older versions of AVRO
please see the dependent JIRAs for more detail on related issues.


> Add support for forward declarations in avro IDL
> 
>
> Key: AVRO-1723
> URL: https://issues.apache.org/jira/browse/AVRO-1723
> Project: Avro
>  Issue Type: Improvement
>Affects Versions: 1.8.0
>Reporter: Zoltan Farkas
>Assignee: Zoltan Farkas
> Attachments: AVRO-1723.patch
>
>
> Currently Recursive data structures like:
> record SampleNode {
>int count = 0;
>array samples = [];
> }
> record SamplePair {
>  string name;
>  SampleNode node;
> }
> It is not possible to declare in IDL,
> however it is possible to declare in avsc (with fix from 
> https://issues.apache.org/jira/browse/AVRO-1667 )
> It is actually not complicated to implement, here is some detail on a 
> possible implementation:
> https://github.com/zolyfarkas/avro/commit/210c50105717149f3daa39b8d4160b8548b8e363
> This would close a capability gap with google protocol buffers...



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (AVRO-1723) Add support for forward declarations in avro IDL

2017-04-26 Thread Zoltan Farkas (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Farkas updated AVRO-1723:

Release Note: 
With this patch recursive data structures will be supported in IDL:

{code}
record SampleNode {
  int count = 0;
  array samples = []; 
}
record SamplePair {
 string name;
 SampleNode node; 
}
{code}

Also the order of declaration of named types will not matter anymore. (similar 
with other languages like java)

Compatibility: 

All schemas that are compilable with older versions of AVRO will continue 
working without issues.
IDLs that will take advantage of the the features introduced with this patch, 
will not be compilable with older versions of AVRO
please see the dependent JIRAs for more detail on related issues.

  was:
Pull request available:

https://github.com/apache/avro/pull/79


> Add support for forward declarations in avro IDL
> 
>
> Key: AVRO-1723
> URL: https://issues.apache.org/jira/browse/AVRO-1723
> Project: Avro
>  Issue Type: Improvement
>Affects Versions: 1.8.0
>Reporter: Zoltan Farkas
>Assignee: Zoltan Farkas
> Attachments: AVRO-1723.patch
>
>
> Currently Recursive data structures like:
> record SampleNode {
>int count = 0;
>array samples = [];
> }
> record SamplePair {
>  string name;
>  SampleNode node;
> }
> It is not possible to declare in IDL,
> however it is possible to declare in avsc (with fix from 
> https://issues.apache.org/jira/browse/AVRO-1667 )
> It is actually not complicated to implement, here is some detail on a 
> possible implementation:
> https://github.com/zolyfarkas/avro/commit/210c50105717149f3daa39b8d4160b8548b8e363
> This would close a capability gap with google protocol buffers...



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (AVRO-1723) Add support for forward declarations in avro IDL

2017-04-25 Thread Zoltan Farkas (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15983499#comment-15983499
 ] 

Zoltan Farkas commented on AVRO-1723:
-

Can somebody review the pull request?

thank you

> Add support for forward declarations in avro IDL
> 
>
> Key: AVRO-1723
> URL: https://issues.apache.org/jira/browse/AVRO-1723
> Project: Avro
>  Issue Type: Improvement
>Affects Versions: 1.8.0
>Reporter: Zoltan Farkas
>Assignee: Zoltan Farkas
> Attachments: AVRO-1723.patch
>
>
> Currently Recursive data structures like:
> record SampleNode {
>int count = 0;
>array samples = [];
> }
> record SamplePair {
>  string name;
>  SampleNode node;
> }
> It is not possible to declare in IDL,
> however it is possible to declare in avsc (with fix from 
> https://issues.apache.org/jira/browse/AVRO-1667 )
> It is actually not complicated to implement, here is some detail on a 
> possible implementation:
> https://github.com/zolyfarkas/avro/commit/210c50105717149f3daa39b8d4160b8548b8e363
> This would close a capability gap with google protocol buffers...



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (AVRO-1723) Add support for forward declarations in avro IDL

2017-04-06 Thread Zoltan Farkas (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Farkas updated AVRO-1723:

Release Note: 
Pull request available:

https://github.com/apache/avro/pull/79
  Status: Patch Available  (was: Open)

Pull request available:

https://github.com/apache/avro/pull/79

> Add support for forward declarations in avro IDL
> 
>
> Key: AVRO-1723
> URL: https://issues.apache.org/jira/browse/AVRO-1723
> Project: Avro
>  Issue Type: Improvement
>Affects Versions: 1.8.0
>Reporter: Zoltan Farkas
>Assignee: Zoltan Farkas
> Attachments: AVRO-1723.patch
>
>
> Currently Recursive data structures like:
> record SampleNode {
>int count = 0;
>array samples = [];
> }
> record SamplePair {
>  string name;
>  SampleNode node;
> }
> It is not possible to declare in IDL,
> however it is possible to declare in avsc (with fix from 
> https://issues.apache.org/jira/browse/AVRO-1667 )
> It is actually not complicated to implement, here is some detail on a 
> possible implementation:
> https://github.com/zolyfarkas/avro/commit/210c50105717149f3daa39b8d4160b8548b8e363
> This would close a capability gap with google protocol buffers...



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (AVRO-1723) Add support for forward declarations in avro IDL

2017-04-06 Thread Zoltan Farkas (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Farkas reassigned AVRO-1723:
---

Assignee: Zoltan Farkas

> Add support for forward declarations in avro IDL
> 
>
> Key: AVRO-1723
> URL: https://issues.apache.org/jira/browse/AVRO-1723
> Project: Avro
>  Issue Type: Improvement
>Affects Versions: 1.8.0
>Reporter: Zoltan Farkas
>Assignee: Zoltan Farkas
> Attachments: AVRO-1723.patch
>
>
> Currently Recursive data structures like:
> record SampleNode {
>int count = 0;
>array samples = [];
> }
> record SamplePair {
>  string name;
>  SampleNode node;
> }
> It is not possible to declare in IDL,
> however it is possible to declare in avsc (with fix from 
> https://issues.apache.org/jira/browse/AVRO-1667 )
> It is actually not complicated to implement, here is some detail on a 
> possible implementation:
> https://github.com/zolyfarkas/avro/commit/210c50105717149f3daa39b8d4160b8548b8e363
> This would close a capability gap with google protocol buffers...



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (AVRO-1997) Avro Field.defaultVal broken for Fixed fields.

2017-02-28 Thread Zoltan Farkas (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15888614#comment-15888614
 ] 

Zoltan Farkas commented on AVRO-1997:
-

[~busbey]

Here is the release note to describe the original issue:

"Schema.Field.defaultVal() returns null for fields of type fixed even if the 
field has a default value."

let me know if this is enough or not.

thank you

> Avro Field.defaultVal broken for Fixed fields.
> --
>
> Key: AVRO-1997
> URL: https://issues.apache.org/jira/browse/AVRO-1997
> Project: Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.8.1, 1.8.2
>Reporter: Zoltan Farkas
>Assignee: Zoltan Farkas
>  Labels: pull-request-available
> Fix For: 1.9.0, 1.8.2
>
>
> here is a unit test to reproduce the issue:
> {code}
> package org.apache.avro;
> import java.nio.ByteBuffer;
> import org.junit.Assert;
> import org.junit.Test;
> public class TestFixed {
>   @Test
>   public void testFixedDefaultValueDrop() {
> Schema md5 = SchemaBuilder.builder().fixed("MD5").size(16);
> Schema frec = SchemaBuilder.builder().record("test")
> .fields().name("hash").type(md5).withDefault(ByteBuffer.wrap(new 
> byte[16])).endRecord();
> Schema.Field field = frec.getField("hash");
> Assert.assertNotNull(field.defaultVal());
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (AVRO-1997) Avro Field.defaultVal broken for Fixed fields.

2017-02-27 Thread Zoltan Farkas (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Farkas updated AVRO-1997:

   Labels: pull-request-available  (was: )
Fix Version/s: 1.8.2
 Release Note: https://github.com/apache/avro/pull/194
   Status: Patch Available  (was: Open)

https://github.com/apache/avro/pull/194

> Avro Field.defaultVal broken for Fixed fields.
> --
>
> Key: AVRO-1997
> URL: https://issues.apache.org/jira/browse/AVRO-1997
> Project: Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.8.1, 1.8.2
>Reporter: Zoltan Farkas
>Assignee: Zoltan Farkas
>  Labels: pull-request-available
> Fix For: 1.8.2
>
>
> here is a unit test to reproduce the issue:
> {code}
> package org.apache.avro;
> import java.nio.ByteBuffer;
> import org.junit.Assert;
> import org.junit.Test;
> public class TestFixed {
>   @Test
>   public void testFixedDefaultValueDrop() {
> Schema md5 = SchemaBuilder.builder().fixed("MD5").size(16);
> Schema frec = SchemaBuilder.builder().record("test")
> .fields().name("hash").type(md5).withDefault(ByteBuffer.wrap(new 
> byte[16])).endRecord();
> Schema.Field field = frec.getField("hash");
> Assert.assertNotNull(field.defaultVal());
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (AVRO-1997) Avro Field.defaultVal broken for Fixed fields.

2017-02-15 Thread Zoltan Farkas (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15867909#comment-15867909
 ] 

Zoltan Farkas commented on AVRO-1997:
-

I see this JIRA has been assigned to me, however I am not sure what do I have 
to do next... See my previous comment with the pull request... Somebody should 
review the pull request and merge it in...

> Avro Field.defaultVal broken for Fixed fields.
> --
>
> Key: AVRO-1997
> URL: https://issues.apache.org/jira/browse/AVRO-1997
> Project: Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.8.1, 1.8.2
>Reporter: Zoltan Farkas
>Assignee: Zoltan Farkas
>
> here is a unit test to reproduce the issue:
> {code}
> package org.apache.avro;
> import java.nio.ByteBuffer;
> import org.junit.Assert;
> import org.junit.Test;
> public class TestFixed {
>   @Test
>   public void testFixedDefaultValueDrop() {
> Schema md5 = SchemaBuilder.builder().fixed("MD5").size(16);
> Schema frec = SchemaBuilder.builder().record("test")
> .fields().name("hash").type(md5).withDefault(ByteBuffer.wrap(new 
> byte[16])).endRecord();
> Schema.Field field = frec.getField("hash");
> Assert.assertNotNull(field.defaultVal());
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (AVRO-1997) Avro Field.defaultVal broken for Fixed fields.

2017-02-07 Thread Zoltan Farkas (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15856544#comment-15856544
 ] 

Zoltan Farkas commented on AVRO-1997:
-

i have created a pull request with a potential fix:

https://github.com/apache/avro/pull/194

> Avro Field.defaultVal broken for Fixed fields.
> --
>
> Key: AVRO-1997
> URL: https://issues.apache.org/jira/browse/AVRO-1997
> Project: Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.8.1, 1.8.2
>Reporter: Zoltan Farkas
>
> here is a unit test to reproduce the issue:
> {code}
> package org.apache.avro;
> import java.nio.ByteBuffer;
> import org.junit.Assert;
> import org.junit.Test;
> public class TestFixed {
>   @Test
>   public void testFixedDefaultValueDrop() {
> Schema md5 = SchemaBuilder.builder().fixed("MD5").size(16);
> Schema frec = SchemaBuilder.builder().record("test")
> .fields().name("hash").type(md5).withDefault(ByteBuffer.wrap(new 
> byte[16])).endRecord();
> Schema.Field field = frec.getField("hash");
> Assert.assertNotNull(field.defaultVal());
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (AVRO-1723) Add support for forward declarations in avro IDL

2017-02-06 Thread Zoltan Farkas (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15854413#comment-15854413
 ] 

Zoltan Farkas commented on AVRO-1723:
-

 "cloning" a field will not work in certain situations when defaultVal() is 
broken.

> Add support for forward declarations in avro IDL
> 
>
> Key: AVRO-1723
> URL: https://issues.apache.org/jira/browse/AVRO-1723
> Project: Avro
>  Issue Type: Improvement
>Affects Versions: 1.8.0
>Reporter: Zoltan Farkas
> Attachments: AVRO-1723.patch
>
>
> Currently Recursive data structures like:
> record SampleNode {
>int count = 0;
>array samples = [];
> }
> record SamplePair {
>  string name;
>  SampleNode node;
> }
> It is not possible to declare in IDL,
> however it is possible to declare in avsc (with fix from 
> https://issues.apache.org/jira/browse/AVRO-1667 )
> It is actually not complicated to implement, here is some detail on a 
> possible implementation:
> https://github.com/zolyfarkas/avro/commit/210c50105717149f3daa39b8d4160b8548b8e363
> This would close a capability gap with google protocol buffers...



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (AVRO-1997) Avro Field.defaultVal broken for Fixed fields.

2017-02-06 Thread Zoltan Farkas (JIRA)
Zoltan Farkas created AVRO-1997:
---

 Summary: Avro Field.defaultVal broken for Fixed fields.
 Key: AVRO-1997
 URL: https://issues.apache.org/jira/browse/AVRO-1997
 Project: Avro
  Issue Type: Bug
  Components: java
Affects Versions: 1.8.1, 1.8.2
Reporter: Zoltan Farkas


here is a unit test to reproduce the issue:

{code}
package org.apache.avro;

import java.nio.ByteBuffer;
import org.junit.Assert;
import org.junit.Test;

public class TestFixed {


  @Test
  public void testFixedDefaultValueDrop() {
Schema md5 = SchemaBuilder.builder().fixed("MD5").size(16);
Schema frec = SchemaBuilder.builder().record("test")
.fields().name("hash").type(md5).withDefault(ByteBuffer.wrap(new 
byte[16])).endRecord();
Schema.Field field = frec.getField("hash");
Assert.assertNotNull(field.defaultVal());
  }

}
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (AVRO-1950) Better Json serialization for Avro decimal logical types?

2016-11-07 Thread Zoltan Farkas (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15644224#comment-15644224
 ] 

Zoltan Farkas commented on AVRO-1950:
-

I have been using string encoding for decimal in my fork for a while, and 
"3.14" is a improvement from bytes, but still not the ideal Json 
representation... The more I think about it, making decimal  a core type seems 
like the right approach...

> Better Json serialization for Avro decimal logical types?
> -
>
> Key: AVRO-1950
> URL: https://issues.apache.org/jira/browse/AVRO-1950
> Project: Avro
>  Issue Type: Improvement
>Reporter: Zoltan Farkas
>Priority: Minor
>
> Currently as I understand decimal logical types are encoded on top of bytes 
> and fixed avro types. This makes them a bit "unnatural" in the json 
> encoding...
> I worked around a hack in my fork to naturally encode them into json 
> decimals. A good starting point to look at is in: 
> https://github.com/zolyfarkas/avro/blob/trunk/lang/java/avro/src/main/java/org/apache/avro/io/DecimalEncoder.java
>  
> My approach is a bit hacky, so I would be interested in suggestions to have 
> this closer to something we can integrate into avro...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (AVRO-1950) Better Json serialization for Avro decimal logical types?

2016-11-05 Thread Zoltan Farkas (JIRA)
Zoltan Farkas created AVRO-1950:
---

 Summary: Better Json serialization for Avro decimal logical types?
 Key: AVRO-1950
 URL: https://issues.apache.org/jira/browse/AVRO-1950
 Project: Avro
  Issue Type: Improvement
Reporter: Zoltan Farkas
Priority: Minor


Currently as I understand decimal logical types are encoded on top of bytes and 
fixed avro types. This makes them a bit "unnatural" in the json encoding...

I worked around a hack in my fork to naturally encode them into json decimals. 
A good starting point to look at is in: 
https://github.com/zolyfarkas/avro/blob/trunk/lang/java/avro/src/main/java/org/apache/avro/io/DecimalEncoder.java
 

My approach is a bit hacky, so I would be interested in suggestions to have 
this closer to something we can integrate into avro...





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (AVRO-1934) Avro test resources reference old avro dev versions

2016-10-14 Thread Zoltan Farkas (JIRA)
Zoltan Farkas created AVRO-1934:
---

 Summary: Avro test resources reference old avro dev versions
 Key: AVRO-1934
 URL: https://issues.apache.org/jira/browse/AVRO-1934
 Project: Avro
  Issue Type: Bug
Affects Versions: 1.8.1
Reporter: Zoltan Farkas
Priority: Minor


For example:
https://github.com/apache/avro/blob/master/lang/java/maven-plugin/src/test/resources/unit/idl/pom.xml
 

references 1.7.3-SNAPSHOT:

{code}
  
avro-parent
org.apache.avro
1.7.3-SNAPSHOT
../../../../../../../../../
  
{code}

this does not seem right.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AVRO-1810) GenericDatumWriter broken with Enum

2016-10-13 Thread Zoltan Farkas (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15573691#comment-15573691
 ] 

Zoltan Farkas commented on AVRO-1810:
-

I hit this issue when serializing a SpecificRecord (generated from idl) with a 
GenericDatumWriter.

Everything works fine when serializing withe SpecificDatumWriter, but since all 
generated avro records implement GenericRecord I don't see why they should not 
be serializable with a GenericDatumWriter...

I see 2 ways to fix this:

1) make GenericDatumWriter handle java enums.
2) make the generated enums (like Reuben suggested) implement GenericEnumSymbol.

I used approach 1 to fix my fork.

I am not sure the separation between GenericRecord and SpecificRecords 
reader/writers is ideal...

For example I wrote some utilities to on the fly generate Generic Data like:

https://github.com/zolyfarkas/spf4j/blob/master/spf4j-avro/src/main/java/org/spf4j/avro/GenericRecordBuilder.java

how to use:

https://github.com/zolyfarkas/spf4j/blob/master/spf4j-avro/src/test/java/org/spf4j/avro/GenericRecordBuilderTest.java

This is still beta quality, but the results are slightly more efficient (10%) 
GenericRecord implementations. (see JMH benchmark: 
https://github.com/zolyfarkas/spf4j/blob/master/spf4j-benchmarks/src/test/java/org/spf4j/avro/GenericRecordBenchmark.java)



> GenericDatumWriter broken with Enum
> ---
>
> Key: AVRO-1810
> URL: https://issues.apache.org/jira/browse/AVRO-1810
> Project: Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.8.0
>Reporter: Ryon Day
>Priority: Blocker
>
> {panel:title=Description|titleBGColor=#3FA|bgColor=#DDD}
> Using the GenericDatumWriter with either Generic OR SpecificRecord will break 
> if an Enum is present.
> {panel}
> {panel:title=Steps To Reproduce|titleBGColor=#8DB|bgColor=#DDD}
> I have been tracking Avro decoding oddities for a while.
> The tests for this issue can be found 
> [here|https://github.com/ryonday/avroDecodingHelp/blob/master/src/test/java/com/ryonday/test/Avro180EnumFail.java]
> {panel}
> {panel:title=Notes|titleBGColor=#3AF|bgColor=#DDD}
> Due to the debacle that is the Avro "UTF8" object, we have been avoiding it 
> by using the following scheme:
> * Write incoming records to a byte array using the GenericDatumWriter
> * Read back the byte array to our compiled Java domain objects using a 
> SpecificDatumWriter
> This worked great with Avro 1.7.7, and this is a binary-incompatable breaking 
> change with 1.8.0.
> This would appear to be caused by an addition in the 
> {{GenericDatumWriter:163-164}}:
> {code}
>   if (!data.isEnum(datum))
>   throw new AvroTypeException("Not an enum: "+datum);
> {code}
> {panel}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AVRO-1340) use default to allow old readers to specify default enum value when encountering new enum symbols

2016-10-13 Thread Zoltan Farkas (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15571571#comment-15571571
 ] 

Zoltan Farkas commented on AVRO-1340:
-

Agree, fallback and alias are different concepts. Alias implies that it is the 
same thing with a different name.

The more I think, I think using the default value declared for the field is the 
better way.

for example in v1:

{code}
enum Suit {
  UNKNOWN,
  CLUBS,
  HEARTS,
}
...
Suit field = UNKNOWN;
...
{code}

And v2:

{code}
enum Suit {
  UNKNOWN,
  CLUBS,
  HEARTS,
  SPADES,
  DIAMONDS
}
...
Suit field = UNKNOWN;
...
{code}

Originally I though using something specific for the fallback, but thinking 
through all the use cases you always end up having with an enum with a 
"UNKNOWN" value which is always the right default value...

Aliases should be used for the same purpose as field aliases. (improving 
names...)



> use default to allow old readers to specify default enum value when 
> encountering new enum symbols
> -
>
> Key: AVRO-1340
> URL: https://issues.apache.org/jira/browse/AVRO-1340
> Project: Avro
>  Issue Type: Improvement
>  Components: spec
> Environment: N/A
>Reporter: Jim Donofrio
>Priority: Minor
>
> The schema resolution page says:
> > if both are enums:
> > if the writer's symbol is not present in the reader's enum, then an
> error is signalled.
> This makes it difficult to use enum's because you can never add a enum value 
> and keep old reader's compatible. Why not use the default option to refer to 
> one of enum values so that when a old reader encounters a enum ordinal it 
> does not recognize, it can default to the optional schema provided one. If 
> the old schema does not provide a default then the older reader can continue 
> to fail as it does today.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AVRO-1340) use default to allow old readers to specify default enum value when encountering new enum symbols

2016-10-11 Thread Zoltan Farkas (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15566400#comment-15566400
 ] 

Zoltan Farkas commented on AVRO-1340:
-

makes sense, I create awhile ago a JIRA to add alias support for enum symbols, 
linked the jira... somebody still needs to do the work...

> use default to allow old readers to specify default enum value when 
> encountering new enum symbols
> -
>
> Key: AVRO-1340
> URL: https://issues.apache.org/jira/browse/AVRO-1340
> Project: Avro
>  Issue Type: Improvement
>  Components: spec
> Environment: N/A
>Reporter: Jim Donofrio
>Priority: Minor
>
> The schema resolution page says:
> > if both are enums:
> > if the writer's symbol is not present in the reader's enum, then an
> error is signalled.
> This makes it difficult to use enum's because you can never add a enum value 
> and keep old reader's compatible. Why not use the default option to refer to 
> one of enum values so that when a old reader encounters a enum ordinal it 
> does not recognize, it can default to the optional schema provided one. If 
> the old schema does not provide a default then the older reader can continue 
> to fail as it does today.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AVRO-1582) Json serialization of nullable fileds and fields with default values improvement.

2016-09-22 Thread Zoltan Farkas (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15513391#comment-15513391
 ] 

Zoltan Farkas commented on AVRO-1582:
-

Hi Sean, I will provide a update from my side,  I am currently still stuck to 
get AVRO-1723 in.(working on Ryan's suggestions... he should get some code to 
review soon), after which I was planning to tackle this JIRA...

I will provide some detail on the implementation in case somebody wants to work 
on this.

My implementation is currently:
https://github.com/zolyfarkas/avro/blob/trunk/lang/java/avro/src/main/java/org/apache/avro/specific/ExtendedSpecificDatumWriter.java
https://github.com/zolyfarkas/avro/blob/trunk/lang/java/avro/src/main/java/org/apache/avro/reflect/ExtendedReflectDatumWriter.java
https://github.com/zolyfarkas/avro/blob/trunk/lang/java/avro/src/main/java/org/apache/avro/generic/ExtendedGenericDatumWriter.java
https://github.com/zolyfarkas/avro/blob/trunk/lang/java/avro/src/main/java/org/apache/avro/io/ExtendedJsonDecoder.java
https://github.com/zolyfarkas/avro/blob/trunk/lang/java/avro/src/main/java/org/apache/avro/io/ExtendedJsonEncoder.java

here is what needs to be considered:

1) Currently implementation does: a) optimizes union {null, something} b) 
omits/infers fields that are equal with the default values. b) is very useful 
in the world that uses schemas by reducing the size of the payload. But I can 
see issues with the schema-less crowd, where they need the fields because they 
don't have the schema... which is why some people suggested separating a) from 
b)
2) I still need to move over unit tests that I have outside of the library.
3) there is more potential for improvement here, for example: union {null, int, 
string}, union {double, record}... can also be jsonized better, which I have on 
my todo list, and will be in my implementation sometime in the next 6 months... 
this might change the approach the current implementation takes...

Unfortunately my time available for this is limited... and since our use cases 
are covered in the fork we use, this is currently low priority in my list...

> Json serialization of nullable fileds and fields with default values 
> improvement.
> -
>
> Key: AVRO-1582
> URL: https://issues.apache.org/jira/browse/AVRO-1582
> Project: Avro
>  Issue Type: Improvement
>  Components: java
>Affects Versions: 1.8.0
>Reporter: Zoltan Farkas
> Attachments: AVRO-1582-PATCH
>
>
> Currently serializing a nullable field of type union like:
> "type" : ["null","some type"]
> when serialized as JSON results in:  
> "field":{"some type":"value"}
> when it could be:
> "field":"value"
> Also fields that equal the the default value can be omitted from the 
> serialized data. This is possible because the reader will have the writer's 
> schema and can infer the field values. This reduces the size of the json 
> messages.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (AVRO-1911) for avro HTTP content type instead of avro/binary, application/octet-stream;fmt=avro might be more appropriate?

2016-09-08 Thread Zoltan Farkas (JIRA)
Zoltan Farkas created AVRO-1911:
---

 Summary: for avro HTTP content type instead of avro/binary, 
application/octet-stream;fmt=avro might be more appropriate?
 Key: AVRO-1911
 URL: https://issues.apache.org/jira/browse/AVRO-1911
 Project: Avro
  Issue Type: Improvement
Reporter: Zoltan Farkas


the content type is defined in:

{code}
/** An HTTP-based {@link Transceiver} implementation. */
public class HttpTransceiver extends Transceiver {
  static final String CONTENT_TYPE = "avro/binary";
{code}

I suggest using for avro binary:
application/octet-stream;fmt=avro
and for avro json:
application/json;fmt=avro

this would take advantage of standard mime types...




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AVRO-1854) NPE on recursive datatype in JSON encoder

2016-05-27 Thread Zoltan Farkas (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15304627#comment-15304627
 ] 

Zoltan Farkas commented on AVRO-1854:
-

This is probably a duplicate of: 
https://issues.apache.org/jira/browse/AVRO-1667 


> NPE on recursive datatype in JSON encoder
> -
>
> Key: AVRO-1854
> URL: https://issues.apache.org/jira/browse/AVRO-1854
> Project: Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.7.7
>Reporter: Douglas Kaminsky
>Priority: Critical
> Attachments: RecursiveFlattenBugRegression.java, symbolstack.png
>
>
> When trying to encode to JSON a record whose schema contains a recursive type 
> embedded in another type (ie. an array), the "flatten" method in the 
> {{Parser}} leaves a hole in the middle of the symbol stack which causes a NPE 
> to occur. My best guess is that it's leaving space for the embedded recursion 
> but never populating the symbol, but that code is a bit obtuse so I am having 
> trouble getting to the root cause (otherwise I would have provided a patch to 
> solve the problem as well).
> Attached is a class with a short main method that replicates the problem with 
> anonymized versions of my datatypes/record and a screenshot of the symbol 
> stack at runtime right before the error occurs.
> Not yet tested in 1.8.0 but should be easy to verify.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (AVRO-1852) Make org.apache.avro.Schema serializable (java.io.Serializable)

2016-05-24 Thread Zoltan Farkas (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Farkas updated AVRO-1852:

Description: 
here is a commit describing the implementation: 
https://github.com/zolyfarkas/avro/commit/867f4d6a0f2e65a4ca8084f02b0d704a3acdb9d0


> Make org.apache.avro.Schema serializable (java.io.Serializable)
> ---
>
> Key: AVRO-1852
> URL: https://issues.apache.org/jira/browse/AVRO-1852
> Project: Avro
>  Issue Type: Improvement
>Reporter: Zoltan Farkas
>Priority: Minor
>
> here is a commit describing the implementation: 
> https://github.com/zolyfarkas/avro/commit/867f4d6a0f2e65a4ca8084f02b0d704a3acdb9d0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (AVRO-1852) Make org.apache.avro.Schema serializable (java.io.Serializable)

2016-05-24 Thread Zoltan Farkas (JIRA)
Zoltan Farkas created AVRO-1852:
---

 Summary: Make org.apache.avro.Schema serializable 
(java.io.Serializable)
 Key: AVRO-1852
 URL: https://issues.apache.org/jira/browse/AVRO-1852
 Project: Avro
  Issue Type: Improvement
Reporter: Zoltan Farkas
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AVRO-1603) maven avro plugin to also generate avsc files.

2016-05-11 Thread Zoltan Farkas (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15280647#comment-15280647
 ] 

Zoltan Farkas commented on AVRO-1603:
-

To elaborate more on my intention and provide an example...:

currently I am able to generate the avsc with avro-tools like:

https://github.com/zolyfarkas/spf4j/blob/master/spf4j-core/pom.xml#L170

it is a bit of a pain, but it is functional.

I use the avsc files to generate "documentation" like:

https://zolyfarkas.github.io/spf4j/spf4j-core/avrodoc.html

detail on how at: 

https://github.com/zolyfarkas/spf4j/blob/master/spf4j-core/pom.xml#L192






> maven avro plugin to also generate avsc files.
> --
>
> Key: AVRO-1603
> URL: https://issues.apache.org/jira/browse/AVRO-1603
> Project: Avro
>  Issue Type: New Feature
>  Components: java
>Affects Versions: 1.8.0
>Reporter: Zoltan Farkas
>Priority: Minor
>
> It would be nice to be able to generate also all avsc schema files during 
> compilation.
> This schema files than could be packages, versioned, distributed with maven...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (AVRO-1831) Take advantage of JSR 3.5 annotations in the generated java classes.

2016-04-18 Thread Zoltan Farkas (JIRA)
Zoltan Farkas created AVRO-1831:
---

 Summary: Take advantage of JSR 3.5 annotations in the generated 
java classes.
 Key: AVRO-1831
 URL: https://issues.apache.org/jira/browse/AVRO-1831
 Project: Avro
  Issue Type: Improvement
Reporter: Zoltan Farkas
Priority: Minor


it would be nice if the generated records would take advantage of:

@javax.annotation.Nullable
@javax.annotation.Nonnull

to annotate the fields that can be null...

This would have a documenting role, and more importantly allow windbags to 
detect incorrect use of the fields.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (AVRO-1708) Memory leak with WeakIdentityHashMap?

2016-04-17 Thread Zoltan Farkas (JIRA)

 [ 
https://issues.apache.org/jira/browse/AVRO-1708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Farkas resolved AVRO-1708.
-
Resolution: Won't Fix

I am not sure issue was related with WeakIdentityHashMap implementation, and 
more with the fact that weak references GC overhead is high..
In any case it would be useful to review the uses of these caches.

> Memory leak with WeakIdentityHashMap?
> -
>
> Key: AVRO-1708
> URL: https://issues.apache.org/jira/browse/AVRO-1708
> Project: Avro
>  Issue Type: Bug
>Affects Versions: 1.8.0
>Reporter: Zoltan Farkas
>
> WeakIdentityHashMap used in GenericDatumReader has only weak Keys, 
> it seems to grow, and values remain in map which looks like a memory leak...
> java WeakhashMap has Weak Entries which allows the GC to collect a entire 
> entry, which prevents leaks...
> the javadoc of this class claims: "Implements a combination of WeakHashMap 
> and IdentityHashMap." which is not really the case



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AVRO-1707) Java serialization readers/writers in generated Java classes

2016-04-17 Thread Zoltan Farkas (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15245015#comment-15245015
 ] 

Zoltan Farkas commented on AVRO-1707:
-

No problems with the practice if the readers are fixed to stop keeping 
references to a thread... (which were the cause of large memory waste in our 
apps)
 
In  our use cases java serialization is not a common use case so it seemed like 
a waste to have these readers and writers initialized without being used... and 
it is so simple to make this lazy and being initialized only when needed...



> Java serialization readers/writers in generated Java classes
> 
>
> Key: AVRO-1707
> URL: https://issues.apache.org/jira/browse/AVRO-1707
> Project: Avro
>  Issue Type: Improvement
>Affects Versions: 1.8.0
>Reporter: Zoltan Farkas
>
> the following static instances are declared in the generated classes:
>   private static final org.apache.avro.io.DatumWriter
> WRITER$ = new org.apache.avro.specific.SpecificDatumWriter(SCHEMA$);  
>   private static final org.apache.avro.io.DatumReader
> READER$ = new org.apache.avro.specific.SpecificDatumReader(SCHEMA$);  
>  the reaser/writer hold on to a reference to the "Creator Thread":
> "private final Thread creator;"
> which inhibits GC-ing thread locals... for this thread...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AVRO-1723) Add support for forward declarations in avro IDL

2016-04-10 Thread Zoltan Farkas (JIRA)

[ 
https://issues.apache.org/jira/browse/AVRO-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15234067#comment-15234067
 ] 

Zoltan Farkas commented on AVRO-1723:
-

Ryan, I have updated the pull request based on your comments.
Please take a look and let me know if its current state isa acceptable.

thank you

> Add support for forward declarations in avro IDL
> 
>
> Key: AVRO-1723
> URL: https://issues.apache.org/jira/browse/AVRO-1723
> Project: Avro
>  Issue Type: Improvement
>Affects Versions: 1.8.0
>Reporter: Zoltan Farkas
> Attachments: AVRO-1723.patch
>
>
> Currently Recursive data structures like:
> record SampleNode {
>int count = 0;
>array samples = [];
> }
> record SamplePair {
>  string name;
>  SampleNode node;
> }
> It is not possible to declare in IDL,
> however it is possible to declare in avsc (with fix from 
> https://issues.apache.org/jira/browse/AVRO-1667 )
> It is actually not complicated to implement, here is some detail on a 
> possible implementation:
> https://github.com/zolyfarkas/avro/commit/210c50105717149f3daa39b8d4160b8548b8e363
> This would close a capability gap with google protocol buffers...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >