Re: embed avro data in an envelop

Martin Kleppmann Wed, 14 May 2014 02:02:33 -0700

If you want to include records from JSON avsc files, you can pass multiple 
filenames to the code generation task. Files later in the command line can 
refer to types defined in files earlier in the command line. For example:


java -jar avro-tools-$VERSION.jar compile schema meta.avsc my_app.avsc 
$OUTPUT_DIR

(assuming that meta.avsc is standalone, and my_app.avsc refers to a type 
defined in meta.avsc)

Martin

On 11 May 2014, at 14:36, Eric Wasserman 
<[email protected]<mailto:[email protected]>> wrote:
You can import .avdl files in other .avdl files to do what you describe. 
They're also a lot more human readable IMO.

On May 11, 2014, at 1:38 AM, "Wai Yip Tung" 
<[email protected]<mailto:[email protected]>> wrote:

Thank you. This should work.

I wonder if there is a way in the JSON/avsc to reference or compose the schema 
from other files. So we will have app developer to create `my_app.avsc` and the 
infrastructure developer create `meta.avsc`. How do we embed one schema into 
another? I guess we can do it programmatically given we have a infrastructure 
library that inject the meta data already we can join the schema in runtime. I 
wonder if there are other builtin ways to do it also.

Wai Yip



<compose-unknown-contact.jpg>
Eric Wasserman<mailto:[email protected]>
Thursday, May 01, 2014 2:41 PM
We've been happy doing the second approach you mentioned. Our usage looks like 
(in avro IDL):

record Datum {
Header header;
Body body;
}

where Header contains the meta-data and Body is specific to the particular 
application.
Something like:

record Body {
union { SpecificType1, SpecificType1, ...} body;
}

one of the nice side effects is that you can take data written with the 
composite Datum schema
and let Avro transform it into what you need by specifying a different reader's 
schema (Note: you also still have to
give Avro *exactly* the schema the data were originally written with, the 
"writer's schema", for it to be able to parse the Datum records).

So if all you care about is the application-specific part you use the following 
reader's schema in your parser:

record HeaderFreeDatum {
Body body;
}

Conversely, if you care about the header bits use this as the reader's schema 
in your parser:

record BodyFreeDatum {
Header header;
}

In our use we found significant speedup reading just the headers (YMMV). You 
can also use Avro-generated classes for the BodyFreeDatum that don't really 
ever change (as long as the Header doesn't change).
This lets you revise the schemas for Header and the SpecificTypeX on different 
schedules.

One final piece of advice: think about how you will handle the inevitable 
evolution the schemas will undergo.

________________________________________
From: Wai Yip Tung <[email protected]><mailto:[email protected]>
Sent: Tuesday, April 29, 2014 6:14 PM
To: [email protected]<mailto:[email protected]>
Subject: embed avro data in an envelop

I am looking for some avro usage advice. We have created various schema
for different applications, say to represent, item id, name, metric,
etc. On the other hand our infrastructure group want to include some
meta data on all messages. This should include things like timestamp,
hostname, etc. This meta data is the same for all application messages.

One way to do it is to have a meta data schema that has timestamp,
hostname and a binary content field for the application data. This way
each message need to be decoded twice using two schema.

Another way is to somehow have a composite schema that include both the
meta data and the application specific data. So each message is just
decoded once and it automatically include the needed meta data. I wonder
if this can be done and if it is a good idea. Have other people
considered similar usage?

Wai Yip
<postbox-contact.jpg>
Wai Yip Tung<mailto:[email protected]>
Tuesday, April 29, 2014 6:14 PM
I am looking for some avro usage advice. We have created various schema for 
different applications, say to represent, item id, name, metric, etc. On the 
other hand our infrastructure group want to include some meta data on all 
messages. This should include things like timestamp, hostname, etc. This meta 
data is the same for all application messages.

One way to do it is to have a meta data schema that has timestamp, hostname and 
a binary content field for the application data. This way each message need to 
be decoded twice using two schema.

Another way is to somehow have a composite schema that include both the meta 
data and the application specific data. So each message is just decoded once 
and it automatically include the needed meta data. I wonder if this can be done 
and if it is a good idea. Have other people considered  similar usage?

Wai Yip
<compose-unknown-contact.jpg><postbox-contact.jpg>

Re: embed avro data in an envelop

Reply via email to