Thanks, Kunal. Yes, my issue looks very similar to that one, only I'm reading from Parquet files. I am creating Avro records which I am writing into the Parquet files, but in the end Drill is just reading the Parquet files. Should I file a Jira issue that is more specific to the problem I'm seeing?
- Dave On Mon, Jun 27, 2016 at 1:06 AM, Kunal Khatua <[email protected]> wrote: > Hi Dave > > I think you might be hitting one of these recent issues: > https://issues.apache.org/jira/browse/DRILL-4594 > > From a quick glance, the 'array' word appears to be treated as a key in a > map within the lastName field. But, your schema otherwise looks fine. Could > you post a set of the sample records as well? > > Thanks > Kunal > > Kunal Khatua > Engineering > [ MapR] [http://www.mapr.com/] > > www.mapr.com [http://www.mapr.com/] > On Thu 23-Jun-2016 1:52:48 PM, David Kincaid <[email protected]> > wrote: > I'm very new to Drill and just learning how everything works. I had a > question about a query when one of the fields in an array (or list) of > values. To simplify, I have a Parquet file of records where each record has > just two fields. "name" is a string value and "lastName" is an array of > strings. The Parquet file was created by writing Avro records to the > Parquet file format. The Avro schema looks like this: > > { "type": "record", > "name": "Animal", > "fields": [{"name": "name", "type": ["null","string"] }, > {"name": "lastName", "type" : [{"type": "array", "items": > "string"}, "null"]} > ] > } > > Now when I read the records in Drill using "select * from > dfs.`animal.parquet` the result looks like: > > +---------+-------------------------+ > | name | lastName | > +---------+-------------------------+ > | HOBSON | {"array":["Staley"]} | > | CASEY | {"array":["Barber"]} | > | MANDY | {"array":["Locher"]} | > | TED | {"array":["Schilder"]} | > | Bokkie | {"array":["Hagler"]} | > +---------+-------------------------+ > > It looks a little weird to me with the "array" there. But maybe that's how > Drill displays array/list column values. So let's say I want to get the > first value from each of those arrays. The Drill documentation seems to say > that I do it like this: > > SELECT name, lastName[0] FROM dfs.`animal.parquet`; > > that throws a big old nasty looking error though. Instead through some > trial and error I found I have to do this: > > SELECT name, d.lastName.`array`[0] FROM dfs.`animal.parquet`; > > Could someone help me understand if that is expected or if I've done > something wrong when creating the Avro record or writing the Parquet file. > > Thanks, > > Dave >
