Hi all, I am using a typical Avro->Kafka solution where data is serialized to Avro before it gets written to Kafka and each message is prepended with a schema ID which can be looked up in my schema repository.
Now, I want to store the data in long-term storage by writing data from Kafka->S3. I know that the usual way to store Avro in storage is using Avro container files, however a container file can only contain messages encoded with a single Avro schema. In my case, the messages may be encoded with difference schemas, and I need to retain the order of the messages (so that they can be replayed into Kafka, in order). Therefore, a single file in S3 needs to contain messages encoded with different schemas and so I can't use Avro container files. I was wondering what would be a good solution to this? What format could I use to store my Avro data, such that a single data file can contain messages encoded with different schemas? Should I store the messages with a prepended schema ID, similar to what I do in Kafka? In that case, how could I separate the messages in the file? Thanks for any advice, Josh
