Thanks, Kunal. Yes, my issue looks very similar to that one, only I'm
reading from Parquet files. I am creating Avro records which I am writing
into the Parquet files, but in the end Drill is just reading the Parquet
files. Should I file a Jira issue that is more specific to the problem I'm
seeing?

- Dave

On Mon, Jun 27, 2016 at 1:06 AM, Kunal Khatua <[email protected]> wrote:

> Hi Dave
>
> I think you might be hitting one of these recent issues:
> https://issues.apache.org/jira/browse/DRILL-4594
>
> From a quick glance, the 'array' word appears to be treated as a key in a
> map within the lastName field. But, your schema otherwise looks fine. Could
> you post a set of the sample records as well?
>
> Thanks
> Kunal
>
> Kunal Khatua
> Engineering
> [ MapR] [http://www.mapr.com/]
>
> www.mapr.com [http://www.mapr.com/]
> On Thu 23-Jun-2016 1:52:48 PM, David Kincaid <[email protected]>
> wrote:
> I'm very new to Drill and just learning how everything works. I had a
> question about a query when one of the fields in an array (or list) of
> values. To simplify, I have a Parquet file of records where each record has
> just two fields. "name" is a string value and "lastName" is an array of
> strings. The Parquet file was created by writing Avro records to the
> Parquet file format. The Avro schema looks like this:
>
> { "type": "record",
> "name": "Animal",
> "fields": [{"name": "name", "type": ["null","string"] },
> {"name": "lastName", "type" : [{"type": "array", "items":
> "string"}, "null"]}
> ]
> }
>
> Now when I read the records in Drill using "select * from
> dfs.`animal.parquet` the result looks like:
>
> +---------+-------------------------+
> | name | lastName |
> +---------+-------------------------+
> | HOBSON | {"array":["Staley"]} |
> | CASEY | {"array":["Barber"]} |
> | MANDY | {"array":["Locher"]} |
> | TED | {"array":["Schilder"]} |
> | Bokkie | {"array":["Hagler"]} |
> +---------+-------------------------+
>
> It looks a little weird to me with the "array" there. But maybe that's how
> Drill displays array/list column values. So let's say I want to get the
> first value from each of those arrays. The Drill documentation seems to say
> that I do it like this:
>
> SELECT name, lastName[0] FROM dfs.`animal.parquet`;
>
> that throws a big old nasty looking error though. Instead through some
> trial and error I found I have to do this:
>
> SELECT name, d.lastName.`array`[0] FROM dfs.`animal.parquet`;
>
> Could someone help me understand if that is expected or if I've done
> something wrong when creating the Avro record or writing the Parquet file.
>
> Thanks,
>
> Dave
>

Reply via email to