I have some Hive data loaded with sqoop that looks like this: 0: jdbc:drill:> select curr_fiscal_week, load_dts from hive.dim.`YEAR_DATE_DIM`; +-------------------+------------------------+ | curr_fiscal_week | load_dts | +-------------------+------------------------+ | 12.0 | 2014-02-07 15:11:35.0 | | 12.0 | 2014-02-07 15:11:35.0 | | 12.0 | 2012-11-08 00:00:00.0 | | 12.0 | 2012-11-08 00:00:00.0 | | 12.0 | 2012-11-08 00:00:00.0 | | 12.0 | 2012-11-08 00:00:00.0 | | 12.0 | 2012-11-08 00:00:00.0 | | 12.0 | 2012-11-08 00:00:00.0 | | 12.0 | 2012-11-08 00:00:00.0 | | 12.0 | 2012-11-08 00:00:00.0 | | 12.0 | 2012-11-08 00:00:00.0 | | 12.0 | 2012-11-08 00:00:00.0 | | 12.0 | 2012-11-08 00:00:00.0 | | 12.0 | 2012-11-08 00:00:00.0 | | 12.0 | 2012-11-08 00:00:00.0 | | 12.0 | 2012-11-08 00:00:00.0 | | 12.0 | 2012-11-08 00:00:00.0 | | 12.0 | 2012-11-08 00:00:00.0 | | 12.0 | 2012-11-08 00:00:00.0 | | 12.0 | 2012-11-08 00:00:00.0 | | 12.0 | 2012-11-08 00:00:00.0 | | 12.0 | 2012-11-08 00:00:00.0 | | 12.0 | 2015-01-13 00:00:00.0 | | 12.0 | 2015-01-13 00:00:00.0 | | 12.0 | null | | 12.0 | null | +-------------------+------------------------+ 26 rows selected (0.229 seconds)
When I’m trying to create a drill parquet table by using a CTAS with a CASE statement to handle the nulls I get an error: 0: jdbc:drill:> CREATE TABLE dfs.DIM.`YEAR_DATE_DIM_data` as SELECT . . . . . . . > CAST(`YR_DATE_CD` as INT) as `YR_DATE_CD`, . . . . . . . > CAST(`FISCAL_YR` as INT) as `FISCAL_YR`, . . . . . . . > CASE . . . . . . . > WHEN `FISCAL_YR_BEG_DT` = 'null' THEN NULL ELSE CAST(`FISCAL_YR_BEG_DT` as TIMESTAMP) . . . . . . . > END as `FISCAL_YR_BEG_DT`, . . . . . . . > CASE . . . . . . . > WHEN `FISCAL_YR_END_DT` = 'null' THEN NULL ELSE CAST(`FISCAL_YR_END_DT` as TIMESTAMP) . . . . . . . > END as `FISCAL_YR_END_DT`, . . . . . . . > CASE . . . . . . . > WHEN `CURR_FISCAL_YR` = 'null' THEN NULL ELSE CAST(`CURR_FISCAL_YR` as INT) . . . . . . . > END as `CURR_FISCAL_YR`, . . . . . . . > CASE . . . . . . . > WHEN `CURR_FISCAL_HALF` = 'null' THEN NULL ELSE CAST(`CURR_FISCAL_HALF` as INT) . . . . . . . > END as `CURR_FISCAL_HALF`, . . . . . . . > CASE . . . . . . . > WHEN `CURR_FISCAL_QTR` = 'null' THEN NULL ELSE CAST(`CURR_FISCAL_QTR` as INT) . . . . . . . > END as `CURR_FISCAL_QTR`, . . . . . . . > CASE . . . . . . . > WHEN `CURR_FISCAL_MO` = 'null' THEN NULL ELSE CAST(`CURR_FISCAL_MO` as INT) . . . . . . . > END as `CURR_FISCAL_MO`, . . . . . . . > CASE . . . . . . . > WHEN `CURR_FISCAL_WEEK` = 'null' THEN NULL ELSE CAST(`CURR_FISCAL_WEEK` as INT) . . . . . . . > END as `CURR_FISCAL_WEEK`, . . . . . . . > CASE . . . . . . . > WHEN `LOAD_DTS` = 'null' THEN NULL ELSE CAST(`LOAD_DTS` as TIMESTAMP) . . . . . . . > END as `LOAD_DTS` . . . . . . . > FROM hive.dim.`YEAR_DATE_DIM`; Error: SYSTEM ERROR: NumberFormatException: For input string: "null" Fragment 0:0 [Error Id: 2755cf80-04b3-4fa7-be66-20158f115f27 on chlhad02:31010] (state=,code=0) Simplified query: Using a CASE statement only on the field containing the null values works: 0: jdbc:drill:> select cast(curr_fiscal_week as int) as curr_fiscal_week, case when load_dts = 'null' then null else cast(load_dts as timestamp) end as `load_dts` from hive.dim.`YEAR_DATE_DIM`; +-------------------+------------------------+ | curr_fiscal_week | load_dts | +-------------------+------------------------+ | 12 | 2014-02-07 15:11:35.0 | | 12 | 2014-02-07 15:11:35.0 | | 12 | 2012-11-08 00:00:00.0 | | 12 | 2012-11-08 00:00:00.0 | | 12 | 2012-11-08 00:00:00.0 | | 12 | 2012-11-08 00:00:00.0 | | 12 | 2012-11-08 00:00:00.0 | | 12 | 2012-11-08 00:00:00.0 | | 12 | 2012-11-08 00:00:00.0 | | 12 | 2012-11-08 00:00:00.0 | | 12 | 2012-11-08 00:00:00.0 | | 12 | 2012-11-08 00:00:00.0 | | 12 | 2012-11-08 00:00:00.0 | | 12 | 2012-11-08 00:00:00.0 | | 12 | 2012-11-08 00:00:00.0 | | 12 | 2012-11-08 00:00:00.0 | | 12 | 2012-11-08 00:00:00.0 | | 12 | 2012-11-08 00:00:00.0 | | 12 | 2012-11-08 00:00:00.0 | | 12 | 2012-11-08 00:00:00.0 | | 12 | 2012-11-08 00:00:00.0 | | 12 | 2012-11-08 00:00:00.0 | | 12 | 2015-01-13 00:00:00.0 | | 12 | 2015-01-13 00:00:00.0 | | 12 | null | | 12 | null | +-------------------+------------------------+ 26 rows selected (0.332 seconds) As soon as I use another case statement on the first field (which doesn’t contain null values) I get an error: 0: jdbc:drill:> select case when curr_fiscal_week = 'null' then null else cast(curr_fiscal_week as int) end as curr_fiscal_week, case when load_dts = 'null' then null else cast(load_dts as timestamp) end as `load_dts` from hive.dim.`YEAR_DATE_DIM`; Error: SYSTEM ERROR: NumberFormatException: For input string: "null" Fragment 0:0 [Error Id: 315ee6c6-a642-496b-b8c0-a005f7183663 on chlhad02:31010] (state=,code=0) This data was imported sqoop so I believe it’s stored as a TextInput hadoop file with ‘\0001’ field delimiters and ‘\n’ record delimiters. Chris Matta [email protected] 215-701-3146
