Thanks, but I'm still not clear on what this will mean to the user querying
a hive table with complex types.
Will they be able to query the map in the way they expect based on their
experience with Hive, or will they have to be aware of the parquet schema?
Today, the complex types in hive-created parquet looks like this in my
example:
select * from dfs.`/user/hive/warehouse/complex_parquet`;
+------------+-----------+--------------------------------------------------------------+--------------------------------------------------------------------------+
| firstname | lastname | children | parents |
+------------+-----------+--------------------------------------------------------------+--------------------------------------------------------------------------+
| Vince | Gonzalez |
{"bag":[{"array_element":"son1"},{"array_element":"son2"}]} |
{"map":[{"key":"Mother","value":"mom"},{"key":"Father","value":"dad"}]} |
+------------+-----------+--------------------------------------------------------------+--------------------------------------------------------------------------+
Will the user be able to query the parents column like "select
parents.Mother from hive.complex_parquet" or will they have to deal with
the more complicated structure above?
On Thu, Aug 27, 2015 at 12:30 PM, Venki Korukanti <[email protected]
> wrote:
> I started looking into this few weeks back, but haven't made much progress
> in implementation.
>
> Hive MAP type and Drill MAP type are different. Hive MAP is a pure (key,
> value) structure. Drill MAP is more like Hive STRUCT type. Both Hive types
> MAP and STRUCT are going to be mapped to Drill MAP type. Hive UNION type is
> another one which needs some discussion on how to handle it. Hive LIST type
> is straightforward to map to Drill repeated types.
>
> We may not get to work on this in 1.2.0. Please vote on the jira which will
> help plan in future releases.
>
> On Thu, Aug 27, 2015 at 7:43 AM, Vince Gonzalez <[email protected]>
> wrote:
>
> > Drill 3290 aims to add support for complex Hive types, and looks to me
> like
> > it's targeted for 1.2.0.
> >
> > The way I'm understanding it, supporting hive complex types means that
> if I
> > create a hive table, stored say as parquet with a MAP column, I should be
> > able to query it in Drill in the way we'd expect.
> >
> > Currently, when I create a Hive table with complex types, Drill fails to
> > query the table using the hive plugin because it lacks the support for
> the
> > types.
> >
> > 0: jdbc:drill:> select * from hive.complex_parquet;
> > Error: SYSTEM ERROR: RuntimeException: Unsupported Hive data type LIST.
> > Following Hive data types are supported in Drill for querying: BOOLEAN,
> > BYTE, SHORT, INT, LONG, FLOAT, DOUBLE, DATE, TIMESTAMP, BINARY, DECIMAL,
> > STRING, and VARCHAR
> >
> > Fragment 0:0
> >
> > [Error Id: f783df3d-7f77-4170-b0e7-aee9ba7d27c7 on ip-172-16-2-200:31010]
> > (state=,code=0)
> >
> >
> > I can go around Hive and query the files directly, but the hive-created
> > parquet has a schema that's not as intuitive to query:
> >
> > 0: jdbc:drill:> select * from dfs.`/user/hive/warehouse/complex_parquet`;
> >
> >
> +------------+-----------+--------------------------------------------------------------+--------------------------------------------------------------------------+
> > | firstname | lastname | children
> > | parents
> > |
> >
> >
> +------------+-----------+--------------------------------------------------------------+--------------------------------------------------------------------------+
> > | Vince | Gonzalez |
> > {"bag":[{"array_element":"son1"},{"array_element":"son2"}]} |
> > {"map":[{"key":"Mother","value":"mom"},{"key":"Father","value":"dad"}]}
> |
> >
> >
> +------------+-----------+--------------------------------------------------------------+--------------------------------------------------------------------------+
> > 1 row selected (0.162 seconds)
> >
> > Can I interpret "support for Hive complex types" to mean that Drill would
> > be able to query the above hive table without having to deal with the
> "bag"
> > and "map" keys?
> >
> > Can anyone say how likely this is to actually be in 1.2.0?
> >
> > I put the hive DDL for the above example here:
> > https://gist.github.com/vicenteg/d48fb1a9cb70b1b592f4
> >
>