Cool. Thanks for the response. Quick update:
I've had early success reading avro files with the avro c library and Go through cgo. It was relatively straight forward. It's a tad tedious as the new "value" interface on the C library uses a lot of macros, and cgo cannot (AFAIK) call macros directly. Rather, I needed to create C-wrapper functions for all the macros. I did this for about 8 or so macros (just the ones I needed as a proof of concept, but it included most everything you'd expect on the reading side including generic readers, retrieving writer schema, iterating over record values, teasing out unions/disciriment branches, retrieving strings & long values, get field by index and by name, corresponding incref/decref, and generic readers,). Aside from the macros, integrating with C from Go is straight forward and, with some quick tests, seems to be comparable in performance to C. I have tested performance using a simple script that reads through an Avro file, extracts two fields (string and long), and sums up the longs across all records (strings are just dropped to the floor). I tested with a ~900M avro file (compressed blocks) that has about 25M records. On my machine, the simple C library I built runs through it in about 42seconds. The Go library I have that essentially does the same thing with Go/Cgo accomplishes the same task in about 51 seconds. A more common (in my domain) sized input (~270M avro file) containing ~7.5M records runs ~15s C and ~18s in Go). We regularly process 100s of files of that size/shape. This is not taking advantage of any of the Go concurrency routines / etc. and the Go code is largely just the C code in Go clothing. But i was pleased to see pretty negligible overhead. Looking down the road, an idiomatic library should follow a similar pattern to the Go "encoding/json" package. That shouldn't be too difficult. They only real barrier is time ;-) I currently have a task at hand and have enough pieces to accomplish it. I will circle back on this though as I get a little more comfort with Go idioms and idiosyncrasies. I wanted to share the above though as I view these quick results as promising. p.s. I also tested using C to convert a record to a json *char and pass that to a go function that unmarshals it into a Go struct. this worked fine, but, as one would expects, adds a considerable amount of overhead - 12 minutes for the same 52 second test noted above. it does work though for a quick approach. On Mar 20, 2014 4:33 PM, "Doug Cutting" <[email protected]> wrote: > > I have not heard of any work on an implementation of Avro in go. It > would make a great addition, even if only data file support. > > Doug > > On Sat, Mar 15, 2014 at 5:59 AM, Mike Stanley <[email protected]> wrote: > > Anyone know of any avro libraries for go? I haven't had much luck finding > > anything. Either Cgo or pure go is fine by me. I'm a long time user of > > avro and have a considerable amount of data in it. (Avro is our > > serialization format of choice for all archive data, event logs, and other > > data stored on s3, and in hdfs). Go is quickly becoming a core technology > > in our stack as well and avro support is one of the impeding areas for wider > > adoption. > > > > Worse case scenario this may be something I take on. I'd much rather pick > > up where someone else left of though. I dont need any RPC functionality. > > Just read/write (with compression support).
