Hi, A key thing with Avro is that in order to deserialize a record from the byte array back into a usable form you need the schema that was used to create the bytes in the first place.
An Avro file is essentially a (large) set of records that all adhere to the same schema. In such a file you will find the complete schema and for each of the records the binary representation of that record. This is possible way storing records that can then be used for batch processing and because the schema is part of the file you can always read all records in that file. The Avro message format was created for the streaming usecase. If you want to stream records into Kafka (where they will persist until the TTL expires) then you need a way to know the schema ... for each record. A schema may change over time we need to record the schema with EACH record. Because the schema can be quite big (several KiB is common) you do not want to store the same schema with every message. So for the Message format you will find the ID of the schema in conjunction with the actual record. Looking at the API there is a system included behind which you can create a database for all versions of all your schemas. Does this clarify it for you? Niels Basjes On Tue, May 16, 2017 at 8:30 AM, kant kodali <[email protected]> wrote: > Hi All, > > I am new to Avro so I was wondering what is the difference between Avro > message vs Avro Object Container files? Are they related at all? What are > the use cases for each? > > Thanks! > -- Best regards / Met vriendelijke groeten, Niels Basjes
