Hi people,
Is there interest in a custom Scala API for Avro records and protocols?
I am currently working on an schema compiler for Scala, but before I go
deeper, I would really like to have external feedback.
I would especially like to hear from anyone who has opinions on how to map
Avro types onto Scala types.
Here are a few hints on what I've been trying so far:
- Records are compiled into two forms: mutable and immutable.
- To avoid collisions with Java generated classes, scala classes are
generated in a .scala sub-package.
- Avro arrays are translated to Seq/List when immutable and
Buffer/ArrayBuffer when mutable.
- Avro maps are translated to immutable or mutable Map/HashMap.
- Bytes/Fixed are translated to Seq[Byte] when immutable and
Buffer[Byte] when mutable.
- Avro unions are currently translated into Any, but I plan to:
- translate union{null, X} into Scala Option[X]
- compile union {T1, T2, T3} into a custom case classes to have
proper type checking and pattern matching.
- Scala records provide a method encode(encoder) to serialize as binary
into a byte stream (appears ~30% faster than SpecificDatumWriter).
- Scala mutable records provide a method decode(decoder) to deserialize
a byte stream (appears ~25% faster than SpecificDatumReader).
- Scala records implement the SpecificRecord Java interface (with some
overhead), so one may still use the SpecificDatumReader/Writer when the
custom encoder/decoder methods cannot be used.
- Mutable records can be converted to immutable (ie. can act as
builders).
Thanks,
Christophe