I'm very new to Drill and just learning how everything works. I had a
question about a query when one of the fields in an array (or list) of
values. To simplify, I have a Parquet file of records where each record has
just two fields. "name" is a string value and "lastName" is an array of
strings. The Parquet file was created by writing Avro records to the
Parquet file format. The Avro schema looks like this:

{ "type": "record",
  "name": "Animal",
  "fields": [{"name": "name", "type": ["null","string"] },
                {"name": "lastName", "type" : [{"type": "array", "items":
"string"}, "null"]}
   ]
}

Now when I read the records in Drill using "select * from
dfs.`animal.parquet` the result looks like:

+---------+-------------------------+
|  name   |        lastName         |
+---------+-------------------------+
| HOBSON  | {"array":["Staley"]}    |
| CASEY   | {"array":["Barber"]}    |
| MANDY   | {"array":["Locher"]}    |
| TED     | {"array":["Schilder"]}  |
| Bokkie  | {"array":["Hagler"]}    |
+---------+-------------------------+

It looks a little weird to me with the "array" there. But maybe that's how
Drill displays array/list column values. So let's say I want to get the
first value from each of those arrays. The Drill documentation seems to say
that I do it like this:

SELECT name, lastName[0] FROM dfs.`animal.parquet`;

that throws a big old nasty looking error though. Instead through some
trial and error I found I have to do this:

SELECT name, d.lastName.`array`[0] FROM dfs.`animal.parquet`;

Could someone help me understand if that is expected or if I've done
something wrong when creating the Avro record or writing the Parquet file.

Thanks,

Dave

Reply via email to