[ 
https://issues.apache.org/jira/browse/DRILL-3961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Victoria Markman updated DRILL-3961:
------------------------------------
    Priority: Critical  (was: Major)

> CONVERT_FROM should return INTEGER when converting from TINYINT/SMALLINT types
> ------------------------------------------------------------------------------
>
>                 Key: DRILL-3961
>                 URL: https://issues.apache.org/jira/browse/DRILL-3961
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Functions - Drill
>    Affects Versions: 1.1.0, 1.2.0
>            Reporter: Victoria Markman
>            Priority: Critical
>
> We've disabled SMALLINT, TINYINT SQL types because we had bugs that we could 
> not fix before 1.0.0 Some things were left in sort of half implemented state 
> and we should either fix up these things or re-implement TINYINT/SMALLINT 
> types.
> * We can't cast to TINYINT/SMALLINT:
> {code}
> 0: jdbc:drill:schema=dfs> select cast(1 as tinyint) from sys.options limit 1;
> Error: UNSUPPORTED_OPERATION ERROR: TINYINT is not supported
> See Apache Drill JIRA: DRILL-1959
> [Error Id: c86bb4f5-4ea9-47cc-856e-ce1e96463477 on atsqa4-133.qa.lab:31010] 
> (state=,code=0)
> 0: jdbc:drill:schema=dfs> select cast(1 as smallint) from sys.options limit 1;
> Error: UNSUPPORTED_OPERATION ERROR: SMALLINT is not supported
> See Apache Drill JIRA: DRILL-1959
> [Error Id: 14cc008a-532b-4f02-a9ad-6a54cf5dc6bc on atsqa4-133.qa.lab:31010] 
> (state=,code=0)
> {code}
> * We hacked our implementation to read TINTINT/SMALLINT in hive as INTEGER: 
> https://issues.apache.org/jira/browse/DRILL-3263
>    Which works. Now, if you try querying hive table with TINYINT/SMALLINT 
> attribute, you get INTEGER in Drill.
> Hive table, notice c9/c10 columns are TINYINT/SMALLINT respectively:
> {code}
> hive> describe alltypes;
> OK
> c1                      int
> c2                      boolean
> c3                      double
> c4                      string
> c5                      array<int>
> c6                      map<int,string>
> c7                      map<string,string>
> c8                      struct<r:string,s:int,t:double>
> c9                      tinyint
> c10                     smallint
> c11                     float
> c12                     bigint
> c13                     array<array<string>>
> c14                     map<int,map<int,int>>
> c15                     struct<r:int,s:struct<a:int,b:string>>
> c16                     array<struct<m:map<string,string>,n:int>>
> c17                     timestamp
> c18                     decimal(10,0)
> c19                     binary
> c20                     date
> Time taken: 0.094 seconds, Fetched: 20 row(s)
> {code}
> Same table accessed from Drill:
> {code}
> 0: jdbc:drill:schema=dfs> use hive;
> +-------+-----------------------------------+
> |  ok   |              summary              |
> +-------+-----------------------------------+
> | true  | Default schema changed to [hive]  |
> +-------+-----------------------------------+
> 1 row selected (0.337 seconds)
> 0: jdbc:drill:schema=dfs> describe alltypes;
> +--------------+--------------------+--------------+
> | COLUMN_NAME  |     DATA_TYPE      | IS_NULLABLE  |
> +--------------+--------------------+--------------+
> | c1           | INTEGER            | YES          |
> | c2           | BOOLEAN            | YES          |
> | c3           | DOUBLE             | YES          |
> | c4           | CHARACTER VARYING  | YES          |
> | c5           | ARRAY              | NO           |
> | c6           | MAP                | NO           |
> | c7           | MAP                | NO           |
> | c8           | ROW                | NO           |
> | c9           | INTEGER            | YES          |
> | c10          | INTEGER            | YES          |
> | c11          | FLOAT              | YES          |
> | c12          | BIGINT             | YES          |
> | c13          | ARRAY              | NO           |
> | c14          | MAP                | NO           |
> | c15          | ROW                | NO           |
> | c16          | ARRAY              | NO           |
> | c17          | TIMESTAMP          | YES          |
> | c18          | DECIMAL            | YES          |
> | c19          | BINARY VARYING     | YES          |
> | c20          | DATE               | YES          |
> +--------------+--------------------+--------------+
> 20 rows selected (1.379 seconds)
> {code}
> * However, if you try reading parquet file with TINYINT/SMALLINT types 
> (generated by impala for instance), CONVERT_FROM function still returns 
> SMALLINT/TINYINT types.
> I can successfully read SMALLINT field from impala generated parquet file:
> {code}
> 0: jdbc:drill:schema=dfs> select asset_id from `impala.parquet` limit 1;
> +--------------+
> |   asset_id   |
> +--------------+
> | [B@22a3aec1  |
> +--------------+
> 1 row selected (0.298 seconds)
> {code}
> Using CONVERT_FROM function:
> {code}
> 0: jdbc:drill:schema=dfs> select CONVERT_FROM(asset_id, 'SMALLINT') from 
> `impala.parquet` limit 1;
> +---------+
> | EXPR$0  |
> +---------+
> | 19535   |
> +---------+
> 1 row selected (0.283 seconds)
> {code}
> Use of this function in an arithmetic expression results in an error:
> {code}
> 0: jdbc:drill:schema=dfs> select CONVERT_FROM(asset_id, 'SMALLINT') + 1 from 
> `impala.parquet` limit 1;
> Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to 
> materialize incoming schema.  Errors:
> Error in expression at index -1.  Error: Missing function implementation: 
> [castINT(SMALLINT-OPTIONAL)].  Full expression: --UNKNOWN EXPRESSION--..
> Fragment 0:0
> [Error Id: b5e063ee-85e0-48f6-9ca2-09dd9d6a84e6 on atsqa4-133.qa.lab:31010] 
> (state=,code=0)
> {code}
> Attempt to cast to integer fails for the same reason:
> {code}
> 0: jdbc:drill:schema=dfs> select CAST(CONVERT_FROM(asset_id, 'SMALLINT') as 
> integer) from `impala.parquet` limit 1;
> Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to 
> materialize incoming schema.  Errors:
> Error in expression at index -1.  Error: Missing function implementation: 
> [castBIGINT(SMALLINT-OPTIONAL)].  Full expression: --UNKNOWN EXPRESSION--..
> Fragment 0:0
> [Error Id: 70820c35-f5fe-4229-8897-cafe0db5bdef on atsqa4-133.qa.lab:31010] 
> (state=,code=0)
> {code}
> Trying to 'cheat it' and read as INT:
> {code}
> 0: jdbc:drill:schema=dfs> select CONVERT_FROM(asset_id, 'INT') from 
> `impala.parquet` limit 1;
> Error: SYSTEM ERROR: IllegalArgumentException: Wrong length 2(2-0) in the 
> buffer 'OL', expected 4.
> Fragment 0:0
> [Error Id: 47e0a0d7-fc40-4f3f-8d68-4bed63ac0a60 on atsqa4-133.qa.lab:31010] 
> (state=,code=0)
> {code}
> The only thing I can do in order to make use of this column in a query is to 
> write it to a table:
> {code}
> 0: jdbc:drill:schema=dfs> create table x(c1) as select CONVERT_FROM(asset_id, 
> 'SMALLINT') from `impala.parquet` limit 1;
> +-----------+----------------------------+
> | Fragment  | Number of records written  |
> +-----------+----------------------------+
> | 0_0       | 1                          |
> +-----------+----------------------------+
> 1 row selected (0.568 seconds)
> [Tue Oct 20 15:57:14] # ~/parquet-tools/parquet-schema 0_0_0.parquet
> message root {
>   optional int32 c1;
> }
> {code}
> *  Probably the same with Hive UDFs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to