On a quick look pass, this looks sane, and nests many-to-one sets of data
within the parent record.
A few things to think about:
* A double in mySQL will be a double in Avro, not a long.
* Each field that can be null in the database should be a union of null
and the field type. For example, if a schema was
//pseudo syntax
CREATE TABLE "Foo" (
id int primary_key,
product_id int not null,
message varchar[100])
-----
then the record would need three fields -- the first two are integers that
are not nullable, and the last one is a string that may be null:
{"type":"record",
"name":"Foo",
"fields":{
"id":"int",
"product_id":"int",
"message":[null, "string"]
}
}
-Scott
On 11/24/12 5:23 AM, "Bart Verwilst" <[email protected]> wrote:
>Hello!
>
>I'm currently writing an importer to import our MySQL data into hadoop
>( as Avro files ). Attached you can find the schema i'm converting to
>Avro, along with the corresponding Avro schema i would like to use for
>my imported data. I was wondering if you guys could go over the schema
>and determine if this is sane/optimal, and if not, how i should improve
>it.
>
>As a sidenote, i converted bigints to long, and had 1 occurrence of
>double, which i also converted to long in the avro, not sure if that's
>the correct type?
>
>Thanks in advance for your expert opinions! ;)
>
>Kind regards,
>
>Bart