Hello Niels, I guess you are talking about the schema registry service from kafka/Confluent in your Avro message part?
The Schema Registry (SR) service defined its own avro format as "| 1 byte magic byte | 4 byte *schema id* indicate the schema stored in SR service| actual Avro datum |" http://docs.confluent.io/current/schema-registry/docs/serializer-formatter.html#wire-format While the Avro message described inside Avro document is something like > > - a series of *buffers*, where each buffer consists of: > - a four-byte, big-endian *buffer length*, followed by > - that many bytes of *buffer data*. > > https://avro.apache.org/docs/1.8.1/spec.html#Message Framing And it didn't mentioned the schema at all. I am not convinced these 2 are the same thing and I am not quite sure that does the avro community and kafka/confluent community has got an agreement on this. Can you or someone else explain more about this? Really Appreciate! Thanks, Yang 2017-05-16 11:02 GMT-04:00 Niels Basjes <[email protected]>: > Hi, > > A key thing with Avro is that in order to deserialize a record from the > byte array back into a usable form you need the schema that was used to > create the bytes in the first place. > > An Avro file is essentially a (large) set of records that all adhere to > the same schema. > In such a file you will find the complete schema and for each of the > records the binary representation of that record. > This is possible way storing records that can then be used for batch > processing and because the schema is part of the file you can always read > all records in that file. > > The Avro message format was created for the streaming usecase. > If you want to stream records into Kafka (where they will persist until > the TTL expires) then you need a way to know the schema ... for each record. > A schema may change over time we need to record the schema with EACH > record. > Because the schema can be quite big (several KiB is common) you do not > want to store the same schema with every message. > So for the Message format you will find the ID of the schema in > conjunction with the actual record. > Looking at the API there is a system included behind which you can create > a database for all versions of all your schemas. > > Does this clarify it for you? > > Niels Basjes > > > On Tue, May 16, 2017 at 8:30 AM, kant kodali <[email protected]> wrote: > >> Hi All, >> >> I am new to Avro so I was wondering what is the difference between Avro >> message vs Avro Object Container files? Are they related at all? What are >> the use cases for each? >> >> Thanks! >> > > > > -- > Best regards / Met vriendelijke groeten, > > Niels Basjes >
