[ https://issues.apache.org/jira/browse/DRILL-3961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Victoria Markman updated DRILL-3961: ------------------------------------ Priority: Critical (was: Major) > CONVERT_FROM should return INTEGER when converting from TINYINT/SMALLINT types > ------------------------------------------------------------------------------ > > Key: DRILL-3961 > URL: https://issues.apache.org/jira/browse/DRILL-3961 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Drill > Affects Versions: 1.1.0, 1.2.0 > Reporter: Victoria Markman > Priority: Critical > > We've disabled SMALLINT, TINYINT SQL types because we had bugs that we could > not fix before 1.0.0 Some things were left in sort of half implemented state > and we should either fix up these things or re-implement TINYINT/SMALLINT > types. > * We can't cast to TINYINT/SMALLINT: > {code} > 0: jdbc:drill:schema=dfs> select cast(1 as tinyint) from sys.options limit 1; > Error: UNSUPPORTED_OPERATION ERROR: TINYINT is not supported > See Apache Drill JIRA: DRILL-1959 > [Error Id: c86bb4f5-4ea9-47cc-856e-ce1e96463477 on atsqa4-133.qa.lab:31010] > (state=,code=0) > 0: jdbc:drill:schema=dfs> select cast(1 as smallint) from sys.options limit 1; > Error: UNSUPPORTED_OPERATION ERROR: SMALLINT is not supported > See Apache Drill JIRA: DRILL-1959 > [Error Id: 14cc008a-532b-4f02-a9ad-6a54cf5dc6bc on atsqa4-133.qa.lab:31010] > (state=,code=0) > {code} > * We hacked our implementation to read TINTINT/SMALLINT in hive as INTEGER: > https://issues.apache.org/jira/browse/DRILL-3263 > Which works. Now, if you try querying hive table with TINYINT/SMALLINT > attribute, you get INTEGER in Drill. > Hive table, notice c9/c10 columns are TINYINT/SMALLINT respectively: > {code} > hive> describe alltypes; > OK > c1 int > c2 boolean > c3 double > c4 string > c5 array<int> > c6 map<int,string> > c7 map<string,string> > c8 struct<r:string,s:int,t:double> > c9 tinyint > c10 smallint > c11 float > c12 bigint > c13 array<array<string>> > c14 map<int,map<int,int>> > c15 struct<r:int,s:struct<a:int,b:string>> > c16 array<struct<m:map<string,string>,n:int>> > c17 timestamp > c18 decimal(10,0) > c19 binary > c20 date > Time taken: 0.094 seconds, Fetched: 20 row(s) > {code} > Same table accessed from Drill: > {code} > 0: jdbc:drill:schema=dfs> use hive; > +-------+-----------------------------------+ > | ok | summary | > +-------+-----------------------------------+ > | true | Default schema changed to [hive] | > +-------+-----------------------------------+ > 1 row selected (0.337 seconds) > 0: jdbc:drill:schema=dfs> describe alltypes; > +--------------+--------------------+--------------+ > | COLUMN_NAME | DATA_TYPE | IS_NULLABLE | > +--------------+--------------------+--------------+ > | c1 | INTEGER | YES | > | c2 | BOOLEAN | YES | > | c3 | DOUBLE | YES | > | c4 | CHARACTER VARYING | YES | > | c5 | ARRAY | NO | > | c6 | MAP | NO | > | c7 | MAP | NO | > | c8 | ROW | NO | > | c9 | INTEGER | YES | > | c10 | INTEGER | YES | > | c11 | FLOAT | YES | > | c12 | BIGINT | YES | > | c13 | ARRAY | NO | > | c14 | MAP | NO | > | c15 | ROW | NO | > | c16 | ARRAY | NO | > | c17 | TIMESTAMP | YES | > | c18 | DECIMAL | YES | > | c19 | BINARY VARYING | YES | > | c20 | DATE | YES | > +--------------+--------------------+--------------+ > 20 rows selected (1.379 seconds) > {code} > * However, if you try reading parquet file with TINYINT/SMALLINT types > (generated by impala for instance), CONVERT_FROM function still returns > SMALLINT/TINYINT types. > I can successfully read SMALLINT field from impala generated parquet file: > {code} > 0: jdbc:drill:schema=dfs> select asset_id from `impala.parquet` limit 1; > +--------------+ > | asset_id | > +--------------+ > | [B@22a3aec1 | > +--------------+ > 1 row selected (0.298 seconds) > {code} > Using CONVERT_FROM function: > {code} > 0: jdbc:drill:schema=dfs> select CONVERT_FROM(asset_id, 'SMALLINT') from > `impala.parquet` limit 1; > +---------+ > | EXPR$0 | > +---------+ > | 19535 | > +---------+ > 1 row selected (0.283 seconds) > {code} > Use of this function in an arithmetic expression results in an error: > {code} > 0: jdbc:drill:schema=dfs> select CONVERT_FROM(asset_id, 'SMALLINT') + 1 from > `impala.parquet` limit 1; > Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to > materialize incoming schema. Errors: > Error in expression at index -1. Error: Missing function implementation: > [castINT(SMALLINT-OPTIONAL)]. Full expression: --UNKNOWN EXPRESSION--.. > Fragment 0:0 > [Error Id: b5e063ee-85e0-48f6-9ca2-09dd9d6a84e6 on atsqa4-133.qa.lab:31010] > (state=,code=0) > {code} > Attempt to cast to integer fails for the same reason: > {code} > 0: jdbc:drill:schema=dfs> select CAST(CONVERT_FROM(asset_id, 'SMALLINT') as > integer) from `impala.parquet` limit 1; > Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to > materialize incoming schema. Errors: > Error in expression at index -1. Error: Missing function implementation: > [castBIGINT(SMALLINT-OPTIONAL)]. Full expression: --UNKNOWN EXPRESSION--.. > Fragment 0:0 > [Error Id: 70820c35-f5fe-4229-8897-cafe0db5bdef on atsqa4-133.qa.lab:31010] > (state=,code=0) > {code} > Trying to 'cheat it' and read as INT: > {code} > 0: jdbc:drill:schema=dfs> select CONVERT_FROM(asset_id, 'INT') from > `impala.parquet` limit 1; > Error: SYSTEM ERROR: IllegalArgumentException: Wrong length 2(2-0) in the > buffer 'OL', expected 4. > Fragment 0:0 > [Error Id: 47e0a0d7-fc40-4f3f-8d68-4bed63ac0a60 on atsqa4-133.qa.lab:31010] > (state=,code=0) > {code} > The only thing I can do in order to make use of this column in a query is to > write it to a table: > {code} > 0: jdbc:drill:schema=dfs> create table x(c1) as select CONVERT_FROM(asset_id, > 'SMALLINT') from `impala.parquet` limit 1; > +-----------+----------------------------+ > | Fragment | Number of records written | > +-----------+----------------------------+ > | 0_0 | 1 | > +-----------+----------------------------+ > 1 row selected (0.568 seconds) > [Tue Oct 20 15:57:14] # ~/parquet-tools/parquet-schema 0_0_0.parquet > message root { > optional int32 c1; > } > {code} > * Probably the same with Hive UDFs -- This message was sent by Atlassian JIRA (v6.3.4#6332)