I'm very new to Drill and just learning how everything works. I had a
question about a query when one of the fields in an array (or list) of
values. To simplify, I have a Parquet file of records where each record has
just two fields. "name" is a string value and "lastName" is an array of
strings. The Parquet file was created by writing Avro records to the
Parquet file format. The Avro schema looks like this:
{ "type": "record",
"name": "Animal",
"fields": [{"name": "name", "type": ["null","string"] },
{"name": "lastName", "type" : [{"type": "array", "items":
"string"}, "null"]}
]
}
Now when I read the records in Drill using "select * from
dfs.`animal.parquet` the result looks like:
+---------+-------------------------+
| name | lastName |
+---------+-------------------------+
| HOBSON | {"array":["Staley"]} |
| CASEY | {"array":["Barber"]} |
| MANDY | {"array":["Locher"]} |
| TED | {"array":["Schilder"]} |
| Bokkie | {"array":["Hagler"]} |
+---------+-------------------------+
It looks a little weird to me with the "array" there. But maybe that's how
Drill displays array/list column values. So let's say I want to get the
first value from each of those arrays. The Drill documentation seems to say
that I do it like this:
SELECT name, lastName[0] FROM dfs.`animal.parquet`;
that throws a big old nasty looking error though. Instead through some
trial and error I found I have to do this:
SELECT name, d.lastName.`array`[0] FROM dfs.`animal.parquet`;
Could someone help me understand if that is expected or if I've done
something wrong when creating the Avro record or writing the Parquet file.
Thanks,
Dave