I'm storing data generated from my web application in Apache Avro format. The data is serialized and sent to an Apache Kinesis Firehose that buffers and writes the data to Amazon S3 every 300 seconds or so. Since I have multiple web servers, this results in multiple blobs of Avro files being sent to Kinesis, upon which it concatenates and periodically writes them to S3.
When I grab the file from S3, I can't using the normal Avro tools to decode it since it's actually multiple files in one. I could add a delimiter I suppose, but that seems risky in the event that the data being logged also has the same delimiter. What's the best way to deal with this? I couldn't find anything in the standard that supports multiple Avro files concatenated into the same file. -- Chris Miller
