Jason, The first column is stored as an INT on disk, but there are no NULL values in it, the only NULL values are in the timestamp column (the last two records).
Chris Matta [email protected] 215-701-3146 On Fri, Mar 18, 2016 at 10:27 PM, Jason Altekruse <[email protected]> wrote: > Does the first column have a type of int? In that case what might be > failing here is your string literal 'null' failing to be implicitly cast to > match the column. > > Jason Altekruse > Software Engineer at Dremio > Apache Drill Committer > > On Fri, Mar 18, 2016 at 6:22 PM, Christopher Matta <[email protected]> > wrote: > > > I have some Hive data loaded with sqoop that looks like this: > > > > 0: jdbc:drill:> select curr_fiscal_week, load_dts from > > hive.dim.`YEAR_DATE_DIM`; > > +-------------------+------------------------+ > > | curr_fiscal_week | load_dts | > > +-------------------+------------------------+ > > | 12.0 | 2014-02-07 15:11:35.0 | > > | 12.0 | 2014-02-07 15:11:35.0 | > > | 12.0 | 2012-11-08 00:00:00.0 | > > | 12.0 | 2012-11-08 00:00:00.0 | > > | 12.0 | 2012-11-08 00:00:00.0 | > > | 12.0 | 2012-11-08 00:00:00.0 | > > | 12.0 | 2012-11-08 00:00:00.0 | > > | 12.0 | 2012-11-08 00:00:00.0 | > > | 12.0 | 2012-11-08 00:00:00.0 | > > | 12.0 | 2012-11-08 00:00:00.0 | > > | 12.0 | 2012-11-08 00:00:00.0 | > > | 12.0 | 2012-11-08 00:00:00.0 | > > | 12.0 | 2012-11-08 00:00:00.0 | > > | 12.0 | 2012-11-08 00:00:00.0 | > > | 12.0 | 2012-11-08 00:00:00.0 | > > | 12.0 | 2012-11-08 00:00:00.0 | > > | 12.0 | 2012-11-08 00:00:00.0 | > > | 12.0 | 2012-11-08 00:00:00.0 | > > | 12.0 | 2012-11-08 00:00:00.0 | > > | 12.0 | 2012-11-08 00:00:00.0 | > > | 12.0 | 2012-11-08 00:00:00.0 | > > | 12.0 | 2012-11-08 00:00:00.0 | > > | 12.0 | 2015-01-13 00:00:00.0 | > > | 12.0 | 2015-01-13 00:00:00.0 | > > | 12.0 | null | > > | 12.0 | null | > > +-------------------+------------------------+ > > 26 rows selected (0.229 seconds) > > > > When I’m trying to create a drill parquet table by using a CTAS with a > CASE > > statement to handle the nulls I get an error: > > > > 0: jdbc:drill:> CREATE TABLE dfs.DIM.`YEAR_DATE_DIM_data` as SELECT > > . . . . . . . > CAST(`YR_DATE_CD` as INT) as `YR_DATE_CD`, > > . . . . . . . > CAST(`FISCAL_YR` as INT) as `FISCAL_YR`, > > . . . . . . . > CASE > > . . . . . . . > WHEN `FISCAL_YR_BEG_DT` = 'null' THEN NULL ELSE > > CAST(`FISCAL_YR_BEG_DT` as TIMESTAMP) > > . . . . . . . > END as `FISCAL_YR_BEG_DT`, > > . . . . . . . > CASE > > . . . . . . . > WHEN `FISCAL_YR_END_DT` = 'null' THEN NULL ELSE > > CAST(`FISCAL_YR_END_DT` as TIMESTAMP) > > . . . . . . . > END as `FISCAL_YR_END_DT`, > > . . . . . . . > CASE > > . . . . . . . > WHEN `CURR_FISCAL_YR` = 'null' THEN NULL ELSE > > CAST(`CURR_FISCAL_YR` as INT) > > . . . . . . . > END as `CURR_FISCAL_YR`, > > . . . . . . . > CASE > > . . . . . . . > WHEN `CURR_FISCAL_HALF` = 'null' THEN NULL ELSE > > CAST(`CURR_FISCAL_HALF` as INT) > > . . . . . . . > END as `CURR_FISCAL_HALF`, > > . . . . . . . > CASE > > . . . . . . . > WHEN `CURR_FISCAL_QTR` = 'null' THEN NULL ELSE > > CAST(`CURR_FISCAL_QTR` as INT) > > . . . . . . . > END as `CURR_FISCAL_QTR`, > > . . . . . . . > CASE > > . . . . . . . > WHEN `CURR_FISCAL_MO` = 'null' THEN NULL ELSE > > CAST(`CURR_FISCAL_MO` as INT) > > . . . . . . . > END as `CURR_FISCAL_MO`, > > . . . . . . . > CASE > > . . . . . . . > WHEN `CURR_FISCAL_WEEK` = 'null' THEN NULL ELSE > > CAST(`CURR_FISCAL_WEEK` as INT) > > . . . . . . . > END as `CURR_FISCAL_WEEK`, > > . . . . . . . > CASE > > . . . . . . . > WHEN `LOAD_DTS` = 'null' THEN NULL ELSE > > CAST(`LOAD_DTS` as TIMESTAMP) > > . . . . . . . > END as `LOAD_DTS` > > . . . . . . . > FROM hive.dim.`YEAR_DATE_DIM`; > > Error: SYSTEM ERROR: NumberFormatException: For input string: "null" > > > > Fragment 0:0 > > > > [Error Id: 2755cf80-04b3-4fa7-be66-20158f115f27 on chlhad02:31010] > > (state=,code=0) > > > > Simplified query: > > Using a CASE statement only on the field containing the null values > works: > > > > 0: jdbc:drill:> select cast(curr_fiscal_week as int) as > > curr_fiscal_week, case when load_dts = 'null' then null else > > cast(load_dts as timestamp) end as `load_dts` from > > hive.dim.`YEAR_DATE_DIM`; > > +-------------------+------------------------+ > > | curr_fiscal_week | load_dts | > > +-------------------+------------------------+ > > | 12 | 2014-02-07 15:11:35.0 | > > | 12 | 2014-02-07 15:11:35.0 | > > | 12 | 2012-11-08 00:00:00.0 | > > | 12 | 2012-11-08 00:00:00.0 | > > | 12 | 2012-11-08 00:00:00.0 | > > | 12 | 2012-11-08 00:00:00.0 | > > | 12 | 2012-11-08 00:00:00.0 | > > | 12 | 2012-11-08 00:00:00.0 | > > | 12 | 2012-11-08 00:00:00.0 | > > | 12 | 2012-11-08 00:00:00.0 | > > | 12 | 2012-11-08 00:00:00.0 | > > | 12 | 2012-11-08 00:00:00.0 | > > | 12 | 2012-11-08 00:00:00.0 | > > | 12 | 2012-11-08 00:00:00.0 | > > | 12 | 2012-11-08 00:00:00.0 | > > | 12 | 2012-11-08 00:00:00.0 | > > | 12 | 2012-11-08 00:00:00.0 | > > | 12 | 2012-11-08 00:00:00.0 | > > | 12 | 2012-11-08 00:00:00.0 | > > | 12 | 2012-11-08 00:00:00.0 | > > | 12 | 2012-11-08 00:00:00.0 | > > | 12 | 2012-11-08 00:00:00.0 | > > | 12 | 2015-01-13 00:00:00.0 | > > | 12 | 2015-01-13 00:00:00.0 | > > | 12 | null | > > | 12 | null | > > +-------------------+------------------------+ > > 26 rows selected (0.332 seconds) > > > > As soon as I use another case statement on the first field (which doesn’t > > contain null values) I get an error: > > > > 0: jdbc:drill:> select case when curr_fiscal_week = 'null' then null > > else cast(curr_fiscal_week as int) end as curr_fiscal_week, case when > > load_dts = 'null' then null else cast(load_dts as timestamp) end as > > `load_dts` from hive.dim.`YEAR_DATE_DIM`; > > Error: SYSTEM ERROR: NumberFormatException: For input string: "null" > > > > Fragment 0:0 > > > > [Error Id: 315ee6c6-a642-496b-b8c0-a005f7183663 on chlhad02:31010] > > (state=,code=0) > > > > This data was imported sqoop so I believe it’s stored as a TextInput > hadoop > > file with ‘\0001’ field delimiters and ‘\n’ record delimiters. > > > > Chris Matta > > > > [email protected] > > 215-701-3146 > > > > >
