Try using is null instead of = 'null'. --Andries
> On Mar 18, 2016, at 7:36 PM, Jason Altekruse <[email protected]> wrote: > > I get that, but you are trying to compare your integers (curr_fiscal_week) > with the string 'null' in your case statement. Comparison between a string > and an int results in casting the string as an int. The string that is > failing to parse in your literal, not data appearing in the file. > > Jason Altekruse > Software Engineer at Dremio > Apache Drill Committer > >> On Fri, Mar 18, 2016 at 7:33 PM, Christopher Matta <[email protected]> wrote: >> >> Jason, >> The first column is stored as an INT on disk, but there are no NULL values >> in it, the only NULL values are in the timestamp column (the last two >> records). >> >> Chris Matta >> [email protected] >> 215-701-3146 >> >> On Fri, Mar 18, 2016 at 10:27 PM, Jason Altekruse <[email protected]> >> wrote: >> >>> Does the first column have a type of int? In that case what might be >>> failing here is your string literal 'null' failing to be implicitly cast >> to >>> match the column. >>> >>> Jason Altekruse >>> Software Engineer at Dremio >>> Apache Drill Committer >>> >>> On Fri, Mar 18, 2016 at 6:22 PM, Christopher Matta <[email protected]> >>> wrote: >>> >>>> I have some Hive data loaded with sqoop that looks like this: >>>> >>>> 0: jdbc:drill:> select curr_fiscal_week, load_dts from >>>> hive.dim.`YEAR_DATE_DIM`; >>>> +-------------------+------------------------+ >>>> | curr_fiscal_week | load_dts | >>>> +-------------------+------------------------+ >>>> | 12.0 | 2014-02-07 15:11:35.0 | >>>> | 12.0 | 2014-02-07 15:11:35.0 | >>>> | 12.0 | 2012-11-08 00:00:00.0 | >>>> | 12.0 | 2012-11-08 00:00:00.0 | >>>> | 12.0 | 2012-11-08 00:00:00.0 | >>>> | 12.0 | 2012-11-08 00:00:00.0 | >>>> | 12.0 | 2012-11-08 00:00:00.0 | >>>> | 12.0 | 2012-11-08 00:00:00.0 | >>>> | 12.0 | 2012-11-08 00:00:00.0 | >>>> | 12.0 | 2012-11-08 00:00:00.0 | >>>> | 12.0 | 2012-11-08 00:00:00.0 | >>>> | 12.0 | 2012-11-08 00:00:00.0 | >>>> | 12.0 | 2012-11-08 00:00:00.0 | >>>> | 12.0 | 2012-11-08 00:00:00.0 | >>>> | 12.0 | 2012-11-08 00:00:00.0 | >>>> | 12.0 | 2012-11-08 00:00:00.0 | >>>> | 12.0 | 2012-11-08 00:00:00.0 | >>>> | 12.0 | 2012-11-08 00:00:00.0 | >>>> | 12.0 | 2012-11-08 00:00:00.0 | >>>> | 12.0 | 2012-11-08 00:00:00.0 | >>>> | 12.0 | 2012-11-08 00:00:00.0 | >>>> | 12.0 | 2012-11-08 00:00:00.0 | >>>> | 12.0 | 2015-01-13 00:00:00.0 | >>>> | 12.0 | 2015-01-13 00:00:00.0 | >>>> | 12.0 | null | >>>> | 12.0 | null | >>>> +-------------------+------------------------+ >>>> 26 rows selected (0.229 seconds) >>>> >>>> When I’m trying to create a drill parquet table by using a CTAS with a >>> CASE >>>> statement to handle the nulls I get an error: >>>> >>>> 0: jdbc:drill:> CREATE TABLE dfs.DIM.`YEAR_DATE_DIM_data` as SELECT >>>> . . . . . . . > CAST(`YR_DATE_CD` as INT) as `YR_DATE_CD`, >>>> . . . . . . . > CAST(`FISCAL_YR` as INT) as `FISCAL_YR`, >>>> . . . . . . . > CASE >>>> . . . . . . . > WHEN `FISCAL_YR_BEG_DT` = 'null' THEN NULL ELSE >>>> CAST(`FISCAL_YR_BEG_DT` as TIMESTAMP) >>>> . . . . . . . > END as `FISCAL_YR_BEG_DT`, >>>> . . . . . . . > CASE >>>> . . . . . . . > WHEN `FISCAL_YR_END_DT` = 'null' THEN NULL ELSE >>>> CAST(`FISCAL_YR_END_DT` as TIMESTAMP) >>>> . . . . . . . > END as `FISCAL_YR_END_DT`, >>>> . . . . . . . > CASE >>>> . . . . . . . > WHEN `CURR_FISCAL_YR` = 'null' THEN NULL ELSE >>>> CAST(`CURR_FISCAL_YR` as INT) >>>> . . . . . . . > END as `CURR_FISCAL_YR`, >>>> . . . . . . . > CASE >>>> . . . . . . . > WHEN `CURR_FISCAL_HALF` = 'null' THEN NULL ELSE >>>> CAST(`CURR_FISCAL_HALF` as INT) >>>> . . . . . . . > END as `CURR_FISCAL_HALF`, >>>> . . . . . . . > CASE >>>> . . . . . . . > WHEN `CURR_FISCAL_QTR` = 'null' THEN NULL ELSE >>>> CAST(`CURR_FISCAL_QTR` as INT) >>>> . . . . . . . > END as `CURR_FISCAL_QTR`, >>>> . . . . . . . > CASE >>>> . . . . . . . > WHEN `CURR_FISCAL_MO` = 'null' THEN NULL ELSE >>>> CAST(`CURR_FISCAL_MO` as INT) >>>> . . . . . . . > END as `CURR_FISCAL_MO`, >>>> . . . . . . . > CASE >>>> . . . . . . . > WHEN `CURR_FISCAL_WEEK` = 'null' THEN NULL ELSE >>>> CAST(`CURR_FISCAL_WEEK` as INT) >>>> . . . . . . . > END as `CURR_FISCAL_WEEK`, >>>> . . . . . . . > CASE >>>> . . . . . . . > WHEN `LOAD_DTS` = 'null' THEN NULL ELSE >>>> CAST(`LOAD_DTS` as TIMESTAMP) >>>> . . . . . . . > END as `LOAD_DTS` >>>> . . . . . . . > FROM hive.dim.`YEAR_DATE_DIM`; >>>> Error: SYSTEM ERROR: NumberFormatException: For input string: "null" >>>> >>>> Fragment 0:0 >>>> >>>> [Error Id: 2755cf80-04b3-4fa7-be66-20158f115f27 on chlhad02:31010] >>>> (state=,code=0) >>>> >>>> Simplified query: >>>> Using a CASE statement only on the field containing the null values >>> works: >>>> >>>> 0: jdbc:drill:> select cast(curr_fiscal_week as int) as >>>> curr_fiscal_week, case when load_dts = 'null' then null else >>>> cast(load_dts as timestamp) end as `load_dts` from >>>> hive.dim.`YEAR_DATE_DIM`; >>>> +-------------------+------------------------+ >>>> | curr_fiscal_week | load_dts | >>>> +-------------------+------------------------+ >>>> | 12 | 2014-02-07 15:11:35.0 | >>>> | 12 | 2014-02-07 15:11:35.0 | >>>> | 12 | 2012-11-08 00:00:00.0 | >>>> | 12 | 2012-11-08 00:00:00.0 | >>>> | 12 | 2012-11-08 00:00:00.0 | >>>> | 12 | 2012-11-08 00:00:00.0 | >>>> | 12 | 2012-11-08 00:00:00.0 | >>>> | 12 | 2012-11-08 00:00:00.0 | >>>> | 12 | 2012-11-08 00:00:00.0 | >>>> | 12 | 2012-11-08 00:00:00.0 | >>>> | 12 | 2012-11-08 00:00:00.0 | >>>> | 12 | 2012-11-08 00:00:00.0 | >>>> | 12 | 2012-11-08 00:00:00.0 | >>>> | 12 | 2012-11-08 00:00:00.0 | >>>> | 12 | 2012-11-08 00:00:00.0 | >>>> | 12 | 2012-11-08 00:00:00.0 | >>>> | 12 | 2012-11-08 00:00:00.0 | >>>> | 12 | 2012-11-08 00:00:00.0 | >>>> | 12 | 2012-11-08 00:00:00.0 | >>>> | 12 | 2012-11-08 00:00:00.0 | >>>> | 12 | 2012-11-08 00:00:00.0 | >>>> | 12 | 2012-11-08 00:00:00.0 | >>>> | 12 | 2015-01-13 00:00:00.0 | >>>> | 12 | 2015-01-13 00:00:00.0 | >>>> | 12 | null | >>>> | 12 | null | >>>> +-------------------+------------------------+ >>>> 26 rows selected (0.332 seconds) >>>> >>>> As soon as I use another case statement on the first field (which >> doesn’t >>>> contain null values) I get an error: >>>> >>>> 0: jdbc:drill:> select case when curr_fiscal_week = 'null' then null >>>> else cast(curr_fiscal_week as int) end as curr_fiscal_week, case when >>>> load_dts = 'null' then null else cast(load_dts as timestamp) end as >>>> `load_dts` from hive.dim.`YEAR_DATE_DIM`; >>>> Error: SYSTEM ERROR: NumberFormatException: For input string: "null" >>>> >>>> Fragment 0:0 >>>> >>>> [Error Id: 315ee6c6-a642-496b-b8c0-a005f7183663 on chlhad02:31010] >>>> (state=,code=0) >>>> >>>> This data was imported sqoop so I believe it’s stored as a TextInput >>> hadoop >>>> file with ‘\0001’ field delimiters and ‘\n’ record delimiters. >>>> >>>> Chris Matta >>>> >>>> [email protected] >>>> 215-701-3146 >>
