Re: Schema import dependencies

Wai Yip Tung Wed, 28 May 2014 15:41:27 -0700

Let's say we are interested to keep 2 schema file because they come from2 separate organization. When we generate a data file they need to bemerged into one standalone schema. The maven plugin does this. Otherwisewe have to merge it ourselves. This is not too hard to merge. I justwant make sure I'm not missing some exiting tool or API available.


Wai Yip

Doug Cutting <mailto:[email protected]>
Wednesday, May 28, 2014 12:09 PM
Your userInfo.avsc is not a standalone schema since it depends onmailing_address already being defined. A schema included in a datafile is always standalone, and would include the mailing_addressschema definition within the userInfo schema's "address" field.
Some tools will process such non-standalone schemas in separate files.For example, the Java schema compiler will accept multiple schemafiles on the command line, and those later on the command line mayreference types defined earlier. Java's maven tasks also permitreferences to other files, but these are probably not of interest to aPython developer.
The IDL tool uses the JVM as its runtime but is not Java-specific.

Doug



Wai Yip Tung <mailto:[email protected]>
Wednesday, May 28, 2014 11:53 AM
I want to extend this question somewhat. I begin to realized avro hasaccommodation to compose schema from user defined type. I want tocheck if I understand it correctly and also the proper way to use it.
I take a single, two level nested schema from the web (see using anembedded record").
http://docs.oracle.com/cd/E26161_02/html/GettingStartedGuide/avroschemas.html
I break it down to two separate records. The main `userInfo` recordand the embedded `mailing_address` record as two separate JSON object.
------------------------------------------------------------------------
userInfo.avsc

{
"type" : "record",
"name" : "userInfo",
"namespace" : "my.example",
"fields" : [{"name" : "username",
             "type" : "string",
             "default" : "NONE"},

            {"name" : "age",
             "type" : "int",
             "default" : -1},

             {"name" : "phone",
              "type" : "string",
              "default" : "NONE"},

             {"name" : "housenum",
              "type" : "string",
              "default" : "NONE"},

             {"name" : "address",
              "type" : "mailing_address", <--- user defined type
              "default" : "NONE"},
]
}

------------------------------------------------------------------------
mailing_address.avsc

{
 "type" : "record",
 "name" : "mailing_address", <--- defined here
 "fields" : [
    {"name" : "street",
     "type" : "string",
     "default" : "NONE"},

    {"name" : "city",
     "type" : "string",
     "default" : "NONE"},

    {"name" : "state_prov",
     "type" : "string",
     "default" : "NONE"},

    {"name" : "country",
     "type" : "string",
     "default" : "NONE"},

    {"name" : "zip",
     "type" : "string",
     "default" : "NONE"}
    ]}
}
------------------------------------------------------------------------

Is this a valid composite avro schema definition?
The second question is how can we actually use this in practice. If wehave two separate file, is there a standard API that load them both.Hrishikesh P mentions avro maven plugin. I mainly use the Python APIso I am unfamiliar with this. Is a comparable API exist?
I understand the IDL form has explicit linking of schema files. I willlook into it next.
Wai Yip


Doug Cutting <mailto:[email protected]>
Thursday, May 22, 2014 2:57 PM
You might instead use Avro IDL to define your schemas. It permits you
define multiple schemas in a single file, so that you can determine
the order they're defined in. It also permits ordered inclusion of
types from other files, both IDL files and schema files.

Doug

On Thu, May 22, 2014 at 10:46 AM, Hrishikesh P

Hrishikesh P <mailto:[email protected]>
Thursday, May 22, 2014 10:46 AM
I have a few avro schemas that I am generating the code from using theavro maven plugin. I have dependencies in the schemas which I was ableto resolve by putting the schemas in separate folders and/or renamingthe schema file names with 01-, 02-, ...etc so that the dependenciesget compiled first. However, this only works on mac but not on RHEL(probably because of the different ways the directories are read onthem?). Anybody knows the best way to handle schema dependencies? If Ispecify individual schema names in the POM in the imports section, theschemas get compiled but I have listed the folders and I would like toavoid listing individual files if possible.
Here's a related issue: https://issues.apache.org/jira/browse/AVRO-1367

Thanks in advance.

Re: Schema import dependencies

Reply via email to