I'm planning to stream it via Kafka.

I think I've sorted it out. The Object Container Files is not an essential part of the Avro. Instead of using DataFileReader/DataFileWriter, I can use DatumReader/DatumWriter to encode and decode the binary data and store the schema separately.

Wai Yip



Monday, January 27, 2014 11:50 AM
If you're using Avro's RPC mechanism, schemas are only sent when the
client and server do not already have each other's schema. Each
client request is preceded by a hash of the clients schema and the
schema it thinks the server is using. If the server already has the
client's schema, and the client already has the server's, then the
server can directly respond. If they do not have the other's schema
then schemas are transmitted and cached. This way the server's schema
is only transmitted for the first request from a given client, and the
client's schema is only transmitted to the server the first time a
client with that schema connects.

Avro Python does support RPC.

If you're not using Avro RPC but some other messaging mechanism, then
AVRO-1124 as you mention might be useful, but it also has not yet been
released.

If you're storing Avro data in a file, then the Schema is included in
the file, as you mention.

Doug

Monday, January 27, 2014 11:00 AM
I found Deepesh's question back in December. I have joined the mailing list later. So don't have the message in my inbox and I do not know the proper way to reply. Anyway I have include the original message below.

I have the similar issue. In addition I'm interested to find out about Python and Node js library support.

From what I understand, the avro specification requires avro.schema. So I am quite unsure of the status of have the schema in an external repository.

-  avro.schema contains the schema of objects stored in the file, as JSON data (required).

http://avro.apache.org/docs/1.7.6/spec.html#Object+Container+Files

Wai Yip







Reply via email to