Hi all,

I am using a typical Avro->Kafka solution where data is serialized to Avro
before it gets written to Kafka and each message is prepended with a schema
ID which can be looked up in my schema repository.

Now, I want to store the data in long-term storage by writing data from
Kafka->S3.

I know that the usual way to store Avro in storage is using Avro container
files, however a container file can only contain messages encoded with a
single Avro schema. In my case, the messages may be encoded with difference
schemas, and I need to retain the order of the messages (so that they can
be replayed into Kafka, in order). Therefore, a single file in S3 needs to
contain messages encoded with different schemas and so I can't use Avro
container files.

I was wondering what would be a good solution to this? What format could I
use to store my Avro data, such that a single data file can contain
messages encoded with different schemas? Should I store the messages with a
prepended schema ID, similar to what I do in Kafka? In that case, how could
I separate the messages in the file?

Thanks for any advice,
Josh

Reply via email to