Re: Strange error with hive null data

Andries Engelbrecht Fri, 18 Mar 2016 22:07:20 -0700

Try using is null instead of = 'null'.

--Andries


> On Mar 18, 2016, at 7:36 PM, Jason Altekruse <[email protected]> wrote:
> 
> I get that, but you are trying to compare your integers (curr_fiscal_week)
> with the string 'null' in your case statement. Comparison between a string
> and an int results in casting the string as an int. The string that is
> failing to parse in your literal, not data appearing in the file.
> 
> Jason Altekruse
> Software Engineer at Dremio
> Apache Drill Committer
> 
>> On Fri, Mar 18, 2016 at 7:33 PM, Christopher Matta <[email protected]> wrote:
>> 
>> Jason,
>> The first column is stored as an INT on disk, but there are no NULL values
>> in it, the only NULL values are in the timestamp column (the last two
>> records).
>> 
>> Chris Matta
>> [email protected]
>> 215-701-3146
>> 
>> On Fri, Mar 18, 2016 at 10:27 PM, Jason Altekruse <[email protected]>
>> wrote:
>> 
>>> Does the first column have a type of int? In that case what might be
>>> failing here is your string literal 'null' failing to be implicitly cast
>> to
>>> match the column.
>>> 
>>> Jason Altekruse
>>> Software Engineer at Dremio
>>> Apache Drill Committer
>>> 
>>> On Fri, Mar 18, 2016 at 6:22 PM, Christopher Matta <[email protected]>
>>> wrote:
>>> 
>>>> I have some Hive data loaded with sqoop that looks like this:
>>>> 
>>>> 0: jdbc:drill:> select curr_fiscal_week, load_dts from
>>>> hive.dim.`YEAR_DATE_DIM`;
>>>> +-------------------+------------------------+
>>>> | curr_fiscal_week  |        load_dts        |
>>>> +-------------------+------------------------+
>>>> | 12.0              | 2014-02-07 15:11:35.0  |
>>>> | 12.0              | 2014-02-07 15:11:35.0  |
>>>> | 12.0              | 2012-11-08 00:00:00.0  |
>>>> | 12.0              | 2012-11-08 00:00:00.0  |
>>>> | 12.0              | 2012-11-08 00:00:00.0  |
>>>> | 12.0              | 2012-11-08 00:00:00.0  |
>>>> | 12.0              | 2012-11-08 00:00:00.0  |
>>>> | 12.0              | 2012-11-08 00:00:00.0  |
>>>> | 12.0              | 2012-11-08 00:00:00.0  |
>>>> | 12.0              | 2012-11-08 00:00:00.0  |
>>>> | 12.0              | 2012-11-08 00:00:00.0  |
>>>> | 12.0              | 2012-11-08 00:00:00.0  |
>>>> | 12.0              | 2012-11-08 00:00:00.0  |
>>>> | 12.0              | 2012-11-08 00:00:00.0  |
>>>> | 12.0              | 2012-11-08 00:00:00.0  |
>>>> | 12.0              | 2012-11-08 00:00:00.0  |
>>>> | 12.0              | 2012-11-08 00:00:00.0  |
>>>> | 12.0              | 2012-11-08 00:00:00.0  |
>>>> | 12.0              | 2012-11-08 00:00:00.0  |
>>>> | 12.0              | 2012-11-08 00:00:00.0  |
>>>> | 12.0              | 2012-11-08 00:00:00.0  |
>>>> | 12.0              | 2012-11-08 00:00:00.0  |
>>>> | 12.0              | 2015-01-13 00:00:00.0  |
>>>> | 12.0              | 2015-01-13 00:00:00.0  |
>>>> | 12.0              | null                   |
>>>> | 12.0              | null                   |
>>>> +-------------------+------------------------+
>>>> 26 rows selected (0.229 seconds)
>>>> 
>>>> When I’m trying to create a drill parquet table by using a CTAS with a
>>> CASE
>>>> statement to handle the nulls I get an error:
>>>> 
>>>> 0: jdbc:drill:> CREATE TABLE dfs.DIM.`YEAR_DATE_DIM_data` as SELECT
>>>> . . . . . . . > CAST(`YR_DATE_CD` as INT) as `YR_DATE_CD`,
>>>> . . . . . . . > CAST(`FISCAL_YR` as INT) as `FISCAL_YR`,
>>>> . . . . . . . > CASE
>>>> . . . . . . . > WHEN `FISCAL_YR_BEG_DT` = 'null' THEN NULL ELSE
>>>> CAST(`FISCAL_YR_BEG_DT` as TIMESTAMP)
>>>> . . . . . . . > END as `FISCAL_YR_BEG_DT`,
>>>> . . . . . . . > CASE
>>>> . . . . . . . > WHEN `FISCAL_YR_END_DT` = 'null' THEN NULL ELSE
>>>> CAST(`FISCAL_YR_END_DT` as TIMESTAMP)
>>>> . . . . . . . > END as `FISCAL_YR_END_DT`,
>>>> . . . . . . . > CASE
>>>> . . . . . . . > WHEN `CURR_FISCAL_YR` = 'null' THEN NULL ELSE
>>>> CAST(`CURR_FISCAL_YR` as INT)
>>>> . . . . . . . > END as `CURR_FISCAL_YR`,
>>>> . . . . . . . > CASE
>>>> . . . . . . . > WHEN `CURR_FISCAL_HALF` = 'null' THEN NULL ELSE
>>>> CAST(`CURR_FISCAL_HALF` as INT)
>>>> . . . . . . . > END as `CURR_FISCAL_HALF`,
>>>> . . . . . . . > CASE
>>>> . . . . . . . > WHEN `CURR_FISCAL_QTR` = 'null' THEN NULL ELSE
>>>> CAST(`CURR_FISCAL_QTR` as INT)
>>>> . . . . . . . > END as `CURR_FISCAL_QTR`,
>>>> . . . . . . . > CASE
>>>> . . . . . . . > WHEN `CURR_FISCAL_MO` = 'null' THEN NULL ELSE
>>>> CAST(`CURR_FISCAL_MO` as INT)
>>>> . . . . . . . > END as `CURR_FISCAL_MO`,
>>>> . . . . . . . > CASE
>>>> . . . . . . . > WHEN `CURR_FISCAL_WEEK` = 'null' THEN NULL ELSE
>>>> CAST(`CURR_FISCAL_WEEK` as INT)
>>>> . . . . . . . > END as `CURR_FISCAL_WEEK`,
>>>> . . . . . . . > CASE
>>>> . . . . . . . > WHEN `LOAD_DTS` = 'null' THEN NULL ELSE
>>>> CAST(`LOAD_DTS` as TIMESTAMP)
>>>> . . . . . . . > END as `LOAD_DTS`
>>>> . . . . . . . > FROM hive.dim.`YEAR_DATE_DIM`;
>>>> Error: SYSTEM ERROR: NumberFormatException: For input string: "null"
>>>> 
>>>> Fragment 0:0
>>>> 
>>>> [Error Id: 2755cf80-04b3-4fa7-be66-20158f115f27 on chlhad02:31010]
>>>> (state=,code=0)
>>>> 
>>>> Simplified query:
>>>> Using a CASE statement only on the field containing the null values
>>> works:
>>>> 
>>>> 0: jdbc:drill:> select cast(curr_fiscal_week as int) as
>>>> curr_fiscal_week, case when load_dts = 'null' then null else
>>>> cast(load_dts as timestamp) end as `load_dts` from
>>>> hive.dim.`YEAR_DATE_DIM`;
>>>> +-------------------+------------------------+
>>>> | curr_fiscal_week  |        load_dts        |
>>>> +-------------------+------------------------+
>>>> | 12                | 2014-02-07 15:11:35.0  |
>>>> | 12                | 2014-02-07 15:11:35.0  |
>>>> | 12                | 2012-11-08 00:00:00.0  |
>>>> | 12                | 2012-11-08 00:00:00.0  |
>>>> | 12                | 2012-11-08 00:00:00.0  |
>>>> | 12                | 2012-11-08 00:00:00.0  |
>>>> | 12                | 2012-11-08 00:00:00.0  |
>>>> | 12                | 2012-11-08 00:00:00.0  |
>>>> | 12                | 2012-11-08 00:00:00.0  |
>>>> | 12                | 2012-11-08 00:00:00.0  |
>>>> | 12                | 2012-11-08 00:00:00.0  |
>>>> | 12                | 2012-11-08 00:00:00.0  |
>>>> | 12                | 2012-11-08 00:00:00.0  |
>>>> | 12                | 2012-11-08 00:00:00.0  |
>>>> | 12                | 2012-11-08 00:00:00.0  |
>>>> | 12                | 2012-11-08 00:00:00.0  |
>>>> | 12                | 2012-11-08 00:00:00.0  |
>>>> | 12                | 2012-11-08 00:00:00.0  |
>>>> | 12                | 2012-11-08 00:00:00.0  |
>>>> | 12                | 2012-11-08 00:00:00.0  |
>>>> | 12                | 2012-11-08 00:00:00.0  |
>>>> | 12                | 2012-11-08 00:00:00.0  |
>>>> | 12                | 2015-01-13 00:00:00.0  |
>>>> | 12                | 2015-01-13 00:00:00.0  |
>>>> | 12                | null                   |
>>>> | 12                | null                   |
>>>> +-------------------+------------------------+
>>>> 26 rows selected (0.332 seconds)
>>>> 
>>>> As soon as I use another case statement on the first field (which
>> doesn’t
>>>> contain null values) I get an error:
>>>> 
>>>> 0: jdbc:drill:> select case when curr_fiscal_week = 'null' then null
>>>> else cast(curr_fiscal_week as int) end as curr_fiscal_week, case when
>>>> load_dts = 'null' then null else cast(load_dts as timestamp) end as
>>>> `load_dts` from hive.dim.`YEAR_DATE_DIM`;
>>>> Error: SYSTEM ERROR: NumberFormatException: For input string: "null"
>>>> 
>>>> Fragment 0:0
>>>> 
>>>> [Error Id: 315ee6c6-a642-496b-b8c0-a005f7183663 on chlhad02:31010]
>>>> (state=,code=0)
>>>> 
>>>> This data was imported sqoop so I believe it’s stored as a TextInput
>>> hadoop
>>>> file with ‘\0001’ field delimiters and ‘\n’ record delimiters.
>>>> 
>>>> Chris Matta
>>>> 
>>>> [email protected]
>>>> 215-701-3146
>>

Re: Strange error with hive null data

Reply via email to