Does the first column have a type of int? In that case what might be
failing here is your string literal 'null' failing to be implicitly cast to
match the column.

Jason Altekruse
Software Engineer at Dremio
Apache Drill Committer

On Fri, Mar 18, 2016 at 6:22 PM, Christopher Matta <[email protected]> wrote:

> I have some Hive data loaded with sqoop that looks like this:
>
> 0: jdbc:drill:> select curr_fiscal_week, load_dts from
> hive.dim.`YEAR_DATE_DIM`;
> +-------------------+------------------------+
> | curr_fiscal_week  |        load_dts        |
> +-------------------+------------------------+
> | 12.0              | 2014-02-07 15:11:35.0  |
> | 12.0              | 2014-02-07 15:11:35.0  |
> | 12.0              | 2012-11-08 00:00:00.0  |
> | 12.0              | 2012-11-08 00:00:00.0  |
> | 12.0              | 2012-11-08 00:00:00.0  |
> | 12.0              | 2012-11-08 00:00:00.0  |
> | 12.0              | 2012-11-08 00:00:00.0  |
> | 12.0              | 2012-11-08 00:00:00.0  |
> | 12.0              | 2012-11-08 00:00:00.0  |
> | 12.0              | 2012-11-08 00:00:00.0  |
> | 12.0              | 2012-11-08 00:00:00.0  |
> | 12.0              | 2012-11-08 00:00:00.0  |
> | 12.0              | 2012-11-08 00:00:00.0  |
> | 12.0              | 2012-11-08 00:00:00.0  |
> | 12.0              | 2012-11-08 00:00:00.0  |
> | 12.0              | 2012-11-08 00:00:00.0  |
> | 12.0              | 2012-11-08 00:00:00.0  |
> | 12.0              | 2012-11-08 00:00:00.0  |
> | 12.0              | 2012-11-08 00:00:00.0  |
> | 12.0              | 2012-11-08 00:00:00.0  |
> | 12.0              | 2012-11-08 00:00:00.0  |
> | 12.0              | 2012-11-08 00:00:00.0  |
> | 12.0              | 2015-01-13 00:00:00.0  |
> | 12.0              | 2015-01-13 00:00:00.0  |
> | 12.0              | null                   |
> | 12.0              | null                   |
> +-------------------+------------------------+
> 26 rows selected (0.229 seconds)
>
> When I’m trying to create a drill parquet table by using a CTAS with a CASE
> statement to handle the nulls I get an error:
>
> 0: jdbc:drill:> CREATE TABLE dfs.DIM.`YEAR_DATE_DIM_data` as SELECT
> . . . . . . . > CAST(`YR_DATE_CD` as INT) as `YR_DATE_CD`,
> . . . . . . . > CAST(`FISCAL_YR` as INT) as `FISCAL_YR`,
> . . . . . . . > CASE
> . . . . . . . > WHEN `FISCAL_YR_BEG_DT` = 'null' THEN NULL ELSE
> CAST(`FISCAL_YR_BEG_DT` as TIMESTAMP)
> . . . . . . . > END as `FISCAL_YR_BEG_DT`,
> . . . . . . . > CASE
> . . . . . . . > WHEN `FISCAL_YR_END_DT` = 'null' THEN NULL ELSE
> CAST(`FISCAL_YR_END_DT` as TIMESTAMP)
> . . . . . . . > END as `FISCAL_YR_END_DT`,
> . . . . . . . > CASE
> . . . . . . . > WHEN `CURR_FISCAL_YR` = 'null' THEN NULL ELSE
> CAST(`CURR_FISCAL_YR` as INT)
> . . . . . . . > END as `CURR_FISCAL_YR`,
> . . . . . . . > CASE
> . . . . . . . > WHEN `CURR_FISCAL_HALF` = 'null' THEN NULL ELSE
> CAST(`CURR_FISCAL_HALF` as INT)
> . . . . . . . > END as `CURR_FISCAL_HALF`,
> . . . . . . . > CASE
> . . . . . . . > WHEN `CURR_FISCAL_QTR` = 'null' THEN NULL ELSE
> CAST(`CURR_FISCAL_QTR` as INT)
> . . . . . . . > END as `CURR_FISCAL_QTR`,
> . . . . . . . > CASE
> . . . . . . . > WHEN `CURR_FISCAL_MO` = 'null' THEN NULL ELSE
> CAST(`CURR_FISCAL_MO` as INT)
> . . . . . . . > END as `CURR_FISCAL_MO`,
> . . . . . . . > CASE
> . . . . . . . > WHEN `CURR_FISCAL_WEEK` = 'null' THEN NULL ELSE
> CAST(`CURR_FISCAL_WEEK` as INT)
> . . . . . . . > END as `CURR_FISCAL_WEEK`,
> . . . . . . . > CASE
> . . . . . . . > WHEN `LOAD_DTS` = 'null' THEN NULL ELSE
> CAST(`LOAD_DTS` as TIMESTAMP)
> . . . . . . . > END as `LOAD_DTS`
> . . . . . . . > FROM hive.dim.`YEAR_DATE_DIM`;
> Error: SYSTEM ERROR: NumberFormatException: For input string: "null"
>
> Fragment 0:0
>
> [Error Id: 2755cf80-04b3-4fa7-be66-20158f115f27 on chlhad02:31010]
> (state=,code=0)
>
> Simplified query:
> Using a CASE statement only on the field containing the null values works:
>
> 0: jdbc:drill:> select cast(curr_fiscal_week as int) as
> curr_fiscal_week, case when load_dts = 'null' then null else
> cast(load_dts as timestamp) end as `load_dts` from
> hive.dim.`YEAR_DATE_DIM`;
> +-------------------+------------------------+
> | curr_fiscal_week  |        load_dts        |
> +-------------------+------------------------+
> | 12                | 2014-02-07 15:11:35.0  |
> | 12                | 2014-02-07 15:11:35.0  |
> | 12                | 2012-11-08 00:00:00.0  |
> | 12                | 2012-11-08 00:00:00.0  |
> | 12                | 2012-11-08 00:00:00.0  |
> | 12                | 2012-11-08 00:00:00.0  |
> | 12                | 2012-11-08 00:00:00.0  |
> | 12                | 2012-11-08 00:00:00.0  |
> | 12                | 2012-11-08 00:00:00.0  |
> | 12                | 2012-11-08 00:00:00.0  |
> | 12                | 2012-11-08 00:00:00.0  |
> | 12                | 2012-11-08 00:00:00.0  |
> | 12                | 2012-11-08 00:00:00.0  |
> | 12                | 2012-11-08 00:00:00.0  |
> | 12                | 2012-11-08 00:00:00.0  |
> | 12                | 2012-11-08 00:00:00.0  |
> | 12                | 2012-11-08 00:00:00.0  |
> | 12                | 2012-11-08 00:00:00.0  |
> | 12                | 2012-11-08 00:00:00.0  |
> | 12                | 2012-11-08 00:00:00.0  |
> | 12                | 2012-11-08 00:00:00.0  |
> | 12                | 2012-11-08 00:00:00.0  |
> | 12                | 2015-01-13 00:00:00.0  |
> | 12                | 2015-01-13 00:00:00.0  |
> | 12                | null                   |
> | 12                | null                   |
> +-------------------+------------------------+
> 26 rows selected (0.332 seconds)
>
> As soon as I use another case statement on the first field (which doesn’t
> contain null values) I get an error:
>
> 0: jdbc:drill:> select case when curr_fiscal_week = 'null' then null
> else cast(curr_fiscal_week as int) end as curr_fiscal_week, case when
> load_dts = 'null' then null else cast(load_dts as timestamp) end as
> `load_dts` from hive.dim.`YEAR_DATE_DIM`;
> Error: SYSTEM ERROR: NumberFormatException: For input string: "null"
>
> Fragment 0:0
>
> [Error Id: 315ee6c6-a642-496b-b8c0-a005f7183663 on chlhad02:31010]
> (state=,code=0)
>
> This data was imported sqoop so I believe it’s stored as a TextInput hadoop
> file with ‘\0001’ field delimiters and ‘\n’ record delimiters.
>
> Chris Matta
>
> [email protected]
> 215-701-3146
> ​
>

Reply via email to