Hi, Have you tried: select column['list'][0]['element'] from ... should return "My First Value".
or try: select flatten(column['list'])['element] from ... Hope it helps, in our data we have a column that looks like this: [{"NAME:":"Aname", "DATA":"thedata"},{"NAME:":"Aname2", "DATA":"thedata2"},.....] We ended doing custom function to do look up instead of doing costly flatten technique. Francois On Sat, Jun 17, 2017 at 10:04 PM, David Kincaid <kincaid.d...@gmail.com> wrote: > I'm having a problem querying Parquet files that were created from Spark > and have columns that are array or list types. When I do a SELECT on these > columns they show up like this: > > {"list": [{"element": "My first value"}, {"element": "My second value"}]} > > which Drill does not recognize as a REPEATED column and is not really > workable to hack around like I did in DRILL-5183 ( > https://issues.apache.org/jira/browse/DRILL-5183). I can get to one value > using something like t.columnName.`list`.`element` but that's not really > feasible to use in a query. > > The little I could find on this by Googling around led me to this document > on the Parquet format Github page - > https://github.com/apache/parquet-format/blob/master/LogicalTypes.md. This > seems to say that Spark is writing these files correctly, but Drill is not > interpreting them properly. > > Is there a workaround that anyone can help me to turn these columns into > values that Drill understands as repeated values? This is a fairly urgent > issue for us. > > Thanks, > > Dave >