Hi,
Avro provides schema for data and can be used to serialize individual
records in a binary format.
It does not compress the data (although this can be put on top) but is more
space efficient due to the binary serialization.
I think you can implement a Writer for the BucketingSink that writes
r
If you want to write in batches from a streaming source you always will need
some state ie a state database (here a NoSQL database such as a key value store
makes sense). Then you can grab the data at certain points in time and convert
it to Avro. You need to make sure that the state is logicall