[jira] [Updated] (AVRO-2299) Get Plain Schema

2019-01-26 Thread Rumeshkrishnan (JIRA)


 [ 
https://issues.apache.org/jira/browse/AVRO-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rumeshkrishnan updated AVRO-2299:
-
Issue Type: Bug  (was: Improvement)

> Get Plain Schema
> 
>
> Key: AVRO-2299
> URL: https://issues.apache.org/jira/browse/AVRO-2299
> Project: Apache Avro
>  Issue Type: Bug
>  Components: java
>Affects Versions: 1.8.2
>Reporter: Rumeshkrishnan
>Priority: Critical
>  Labels: features
> Fix For: 1.9.0, 1.8.2, 1.8.3, 1.8.4
>
>
> {panel:title=Avro Schema Reserved Keys:}
> "doc", "fields", "items", "name", "namespace",
>  "size", "symbols", "values", "type", "aliases", "default"
> {panel}
> AVRO also supports user defined properties for both Schema and Field.
> Is there way to get the schema with reserved property (key, value)? 
> Input Schema: 
> {code:java}
> {
>   "name": "testSchema",
>   "namespace": "com.avro",
>   "type": "record",
>   "fields": [
> {
>   "name": "email",
>   "type": "string",
>   "doc": "email id",
>   "user_field_prop": "x"
> }
>   ],
>   "user_schema_prop": "xx"
> }{code}
> Expected Plain Schema:
> {code:java}
> {
>   "name": "testSchema",
>   "namespace": "com.avro",
>   "type": "record",
>   "fields": [
> {
>   "name": "email",
>   "type": "string",
>   "doc": "email id"
> }
>   ]
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AVRO-2307) Opt-in setting to improve GC behavior during deserialization?

2019-01-26 Thread Felix GV (JIRA)
Felix GV created AVRO-2307:
--

 Summary: Opt-in setting to improve GC behavior during 
deserialization?
 Key: AVRO-2307
 URL: https://issues.apache.org/jira/browse/AVRO-2307
 Project: Apache Avro
  Issue Type: Bug
Affects Versions: 1.7.7
Reporter: Felix GV


We have a performance-sensitive project that leverages Avro for an online data 
store, and I wanted to report on a couple of Avro deserialization optimizations 
we have implemented. On one hand, it is great that Avro’s code is clean and 
modular enough to have allowed us to achieve this easily. But on the other 
hand, we are leveraging parts of the API which are probably not typically used 
by most users, and thus we are exposing ourselves to ongoing maintenance costs 
as those “ambiguously-public” APIs might change in future versions. For this 
reason, I wanted to gauge the appetite of the Avro community for taking in 
those optimizations upstream into the main project.

The minor challenge is that the optimizations we’ve made are not completely 
invisible, and therefore should probably be presented as opt-in settings, 
rather than new defaults. Below is a summary of both changes.

1. Re-use of byte arrays when instantiating ByteBuffers

When deserializing a byte array that contains a ByteBuffer field, the relevant 
portion of the input byte array is copied into a new byte array, which is then 
used as the backing array of a new a ButeBuffer.

In our case, we have a few schemas which contain some general metadata and an 
opaque byte array payload, which often ends up being a significant portion of 
the total byte length. Recopying these bytes results in up to 2x the byte 
allocation. The ButeBuffer API, however, provides an alternative behavior where 
the backing array can be larger than needed, with an offset and length provided 
to indicate the internal boundaries of the payload. In our implementation, we 
re-use the input byte array as the ButeBuffer’s backing array, therefore 
avoiding a copy.

The caveat in this case is that this only works properly for use cases that 
don’t mutate the content of the bytes (neither the input nor the deserialized 
object). In our case this assumption is valid.

If this was implemented in the open-source project, there are a few ways this 
could be achieved:
 # There could be a config flag on the decoder or elsewhere that allows a user 
to opt-in to this mode. In this case, it may be safer to return a special 
read-only ButeBuffer implementation that throws an exception if any mutation is 
attempted, indicating that the flag ought to be turned off to support mutations.
 # It could be the default mode, but wrapped in a modified ByteBuffer 
implementation which defers the copy of the content lazily until (and only if) 
one of the mutation API is called.

Either way, this requires a custom ByteBuffer implementation with special 
behavior in order to be fully clean and safe, however, in the first approach, 
the default behavior would still return regular ByteBuffer instances.

2. Primitive (non-boxing) implementation of lists

Another challenge we’ve come across is that lists of primitive types (floats in 
our case) are always boxed into Object Floats by Avro. In our case, this 
resulted in millions of Objects / second getting allocated, causing 
pathological GC pressure.

To solve this, we have implemented an alternative version of the Avro Array 
class, but which instead of hanging on to an array of generically-typed 
Objects, internally, hangs on to an array of primitive floats. This causes no 
boxing at deserialization time, but there is a further challenge which is that 
since Avro array fields are exposed as Java Lists, the regular functions of the 
API all return Objects, therefore merely deferring the boxing to a slightly 
later point in time. To get around this further complication, we have added a 
getPrimitive(i) function which returns primitive items directly. In order to be 
able to use this more optimized function, it is necessary for us to cast the 
list into our own type, otherwise we wouldn’t see the new function. The end 
result is quite dramatic, performance-wise, reducing our p99 latencies down to 
a quarter to a third of their original values.

One challenge here is that the “PrimitiveFloatArray” class is an almost 
complete copy of the Array class, basically just stripping away the generics. 
If we were to contribute this upstream to the open-source project, I imagine we 
might want to do this not only for floats but for boolean, int, long and double 
arrays as well. This would mean roughly 5x the same copy-pasted implementation, 
which is not ideal from a maintenance standpoint. The generic types are nicer 
in that sense, but unfortunately, Java generics do not support primitives. In 
our case, we are willing to pay that maintenance cost in exchange for the 
dramatic GC reduction it gives 

Re: [ANNOUNCE] Avro works now with Java 11 too

2019-01-26 Thread Sean Busbey
Is this in a release or just project internal still?

On Tue, Jan 22, 2019 at 3:41 AM Ismaël Mejía  wrote:
>
> Hello,
>
> I just wanted to announce that we merged today the last fix to support
> Java 11 in our build system. Notice that Avro is still Java 8
> compatible and will remain for long time, but validating that we are
> still 'forward' compatible with Java 11 is important for the health of
> the project even if we don't use anything Java 11 specific.
>
> This does not cover yet the modularization of Avro to be compatible
> with Java's new module system. I filled  AVRO-2305 to track this if
> some brave soul want to try it.
>
> Finally, I would like to thank Dan Kulp and Fokko Driesprong for the
> help to bring this to fruition.
>
> Regards,
> Ismaël Mejía



-- 
busbey