Re: Difference between Avro message vs Avro Object Container files?

Niels Basjes Tue, 16 May 2017 08:03:02 -0700

Hi,

A key thing with Avro is that in order to deserialize a record from the
byte array back into a usable form you need the schema that was used to
create the bytes in the first place.

An Avro file is essentially a (large) set of records that all adhere to the
same schema.
In such a file you will find the complete schema and for each of the
records the binary representation of that record.
This is possible way storing records that can then be used for batch
processing and because the schema is part of the file you can always read
all records in that file.

The Avro message format was created for the streaming usecase.
If you want to stream records into Kafka (where they will persist until the
TTL expires) then you need a way to know the schema ... for each record.
A schema may change over time we need to record the schema with EACH record.
Because the schema can be quite big (several KiB is common) you do not want
to store the same schema with every message.
So for the Message format you will find the ID of the schema in conjunction
with the actual record.
Looking at the API there is a system included behind which you can create a
database for all versions of all your schemas.

Does this clarify it for you?

Niels Basjes

On Tue, May 16, 2017 at 8:30 AM, kant kodali <[email protected]> wrote:

> Hi All,
>
> I am new to Avro so I was wondering what is the difference between Avro
> message vs Avro Object Container files? Are they related at all? What are
> the use cases for each?
>
> Thanks!
>

-- 
Best regards / Met vriendelijke groeten,

Niels Basjes

Re: Difference between Avro message vs Avro Object Container files?

Reply via email to