This would be fantastic. I would be excited to see it. It would be great to see a Scala language addition to the project if you wish to contribute.
I believe there have been a few other Scala Avro attempts by others over time. I recall one where all records were case classes (but this broke at 22 fields). Another thing to look at is: http://code.google.com/p/avro-scala-compiler-plugin/ Perhaps we can get a few of the other people who have developed Scala Avro tools to review/comment or contribute as well? On 5/29/12 11:04 PM, "Christophe Taton" <[email protected]> wrote: > Hi people, > > Is there interest in a custom Scala API for Avro records and protocols? > I am currently working on an schema compiler for Scala, but before I go > deeper, I would really like to have external feedback. > I would especially like to hear from anyone who has opinions on how to map > Avro types onto Scala types. > Here are a few hints on what I've been trying so far: > * Records are compiled into two forms: mutable and immutable. Very nice. > * To avoid collisions with Java generated classes, scala classes are generated > in a .scala sub-package. > * Avro arrays are translated to Seq/List when immutable and Buffer/ArrayBuffer > when mutable. > * Avro maps are translated to immutable or mutable Map/HashMap. > * Bytes/Fixed are translated to Seq[Byte] when immutable and Buffer[Byte] when > mutable. > * Avro unions are currently translated into Any, but I plan to: >> * translate union{null, X} into Scala Option[X] >> * compile union {T1, T2, T3} into a custom case classes to have proper type >> checking and pattern matching. If you have a record R1, it compiles to a Scala class. If you put it in a union of {T1, String}, what does the case class for the union look like? Is it basically a wrapper like a specialized Either[T1, String] ? Maybe Scala will get Union types later to push this into the compiler instead of object instances :) > * Scala records provide a method encode(encoder) to serialize as binary into a > byte stream (appears ~30% faster than SpecificDatumWriter). > * Scala mutable records provide a method decode(decoder) to deserialize a byte > stream (appears ~25% faster than SpecificDatumReader). I have some plans to improve {Generic,Specific}Datum{Reader,Writer} in Java, I would be interested in seeing how the Scala one here works. The Java one is slowed by traversing too many data structures that represent decisions that could be pre-computed rather than repeatedly parsed for each record. > * Scala records implement the SpecificRecord Java interface (with some > overhead), so one may still use the SpecificDatumReader/Writer when the custom > encoder/decoder methods cannot be used. > * Mutable records can be converted to immutable (ie. can act as builders). > Thanks, > Christophe >
