I get that, but you are trying to compare your integers (curr_fiscal_week)
with the string 'null' in your case statement. Comparison between a string
and an int results in casting the string as an int. The string that is
failing to parse in your literal, not data appearing in the file.

Jason Altekruse
Software Engineer at Dremio
Apache Drill Committer

On Fri, Mar 18, 2016 at 7:33 PM, Christopher Matta <[email protected]> wrote:

> Jason,
> The first column is stored as an INT on disk, but there are no NULL values
> in it, the only NULL values are in the timestamp column (the last two
> records).
>
> Chris Matta
> [email protected]
> 215-701-3146
>
> On Fri, Mar 18, 2016 at 10:27 PM, Jason Altekruse <[email protected]>
> wrote:
>
> > Does the first column have a type of int? In that case what might be
> > failing here is your string literal 'null' failing to be implicitly cast
> to
> > match the column.
> >
> > Jason Altekruse
> > Software Engineer at Dremio
> > Apache Drill Committer
> >
> > On Fri, Mar 18, 2016 at 6:22 PM, Christopher Matta <[email protected]>
> > wrote:
> >
> > > I have some Hive data loaded with sqoop that looks like this:
> > >
> > > 0: jdbc:drill:> select curr_fiscal_week, load_dts from
> > > hive.dim.`YEAR_DATE_DIM`;
> > > +-------------------+------------------------+
> > > | curr_fiscal_week  |        load_dts        |
> > > +-------------------+------------------------+
> > > | 12.0              | 2014-02-07 15:11:35.0  |
> > > | 12.0              | 2014-02-07 15:11:35.0  |
> > > | 12.0              | 2012-11-08 00:00:00.0  |
> > > | 12.0              | 2012-11-08 00:00:00.0  |
> > > | 12.0              | 2012-11-08 00:00:00.0  |
> > > | 12.0              | 2012-11-08 00:00:00.0  |
> > > | 12.0              | 2012-11-08 00:00:00.0  |
> > > | 12.0              | 2012-11-08 00:00:00.0  |
> > > | 12.0              | 2012-11-08 00:00:00.0  |
> > > | 12.0              | 2012-11-08 00:00:00.0  |
> > > | 12.0              | 2012-11-08 00:00:00.0  |
> > > | 12.0              | 2012-11-08 00:00:00.0  |
> > > | 12.0              | 2012-11-08 00:00:00.0  |
> > > | 12.0              | 2012-11-08 00:00:00.0  |
> > > | 12.0              | 2012-11-08 00:00:00.0  |
> > > | 12.0              | 2012-11-08 00:00:00.0  |
> > > | 12.0              | 2012-11-08 00:00:00.0  |
> > > | 12.0              | 2012-11-08 00:00:00.0  |
> > > | 12.0              | 2012-11-08 00:00:00.0  |
> > > | 12.0              | 2012-11-08 00:00:00.0  |
> > > | 12.0              | 2012-11-08 00:00:00.0  |
> > > | 12.0              | 2012-11-08 00:00:00.0  |
> > > | 12.0              | 2015-01-13 00:00:00.0  |
> > > | 12.0              | 2015-01-13 00:00:00.0  |
> > > | 12.0              | null                   |
> > > | 12.0              | null                   |
> > > +-------------------+------------------------+
> > > 26 rows selected (0.229 seconds)
> > >
> > > When I’m trying to create a drill parquet table by using a CTAS with a
> > CASE
> > > statement to handle the nulls I get an error:
> > >
> > > 0: jdbc:drill:> CREATE TABLE dfs.DIM.`YEAR_DATE_DIM_data` as SELECT
> > > . . . . . . . > CAST(`YR_DATE_CD` as INT) as `YR_DATE_CD`,
> > > . . . . . . . > CAST(`FISCAL_YR` as INT) as `FISCAL_YR`,
> > > . . . . . . . > CASE
> > > . . . . . . . > WHEN `FISCAL_YR_BEG_DT` = 'null' THEN NULL ELSE
> > > CAST(`FISCAL_YR_BEG_DT` as TIMESTAMP)
> > > . . . . . . . > END as `FISCAL_YR_BEG_DT`,
> > > . . . . . . . > CASE
> > > . . . . . . . > WHEN `FISCAL_YR_END_DT` = 'null' THEN NULL ELSE
> > > CAST(`FISCAL_YR_END_DT` as TIMESTAMP)
> > > . . . . . . . > END as `FISCAL_YR_END_DT`,
> > > . . . . . . . > CASE
> > > . . . . . . . > WHEN `CURR_FISCAL_YR` = 'null' THEN NULL ELSE
> > > CAST(`CURR_FISCAL_YR` as INT)
> > > . . . . . . . > END as `CURR_FISCAL_YR`,
> > > . . . . . . . > CASE
> > > . . . . . . . > WHEN `CURR_FISCAL_HALF` = 'null' THEN NULL ELSE
> > > CAST(`CURR_FISCAL_HALF` as INT)
> > > . . . . . . . > END as `CURR_FISCAL_HALF`,
> > > . . . . . . . > CASE
> > > . . . . . . . > WHEN `CURR_FISCAL_QTR` = 'null' THEN NULL ELSE
> > > CAST(`CURR_FISCAL_QTR` as INT)
> > > . . . . . . . > END as `CURR_FISCAL_QTR`,
> > > . . . . . . . > CASE
> > > . . . . . . . > WHEN `CURR_FISCAL_MO` = 'null' THEN NULL ELSE
> > > CAST(`CURR_FISCAL_MO` as INT)
> > > . . . . . . . > END as `CURR_FISCAL_MO`,
> > > . . . . . . . > CASE
> > > . . . . . . . > WHEN `CURR_FISCAL_WEEK` = 'null' THEN NULL ELSE
> > > CAST(`CURR_FISCAL_WEEK` as INT)
> > > . . . . . . . > END as `CURR_FISCAL_WEEK`,
> > > . . . . . . . > CASE
> > > . . . . . . . > WHEN `LOAD_DTS` = 'null' THEN NULL ELSE
> > > CAST(`LOAD_DTS` as TIMESTAMP)
> > > . . . . . . . > END as `LOAD_DTS`
> > > . . . . . . . > FROM hive.dim.`YEAR_DATE_DIM`;
> > > Error: SYSTEM ERROR: NumberFormatException: For input string: "null"
> > >
> > > Fragment 0:0
> > >
> > > [Error Id: 2755cf80-04b3-4fa7-be66-20158f115f27 on chlhad02:31010]
> > > (state=,code=0)
> > >
> > > Simplified query:
> > > Using a CASE statement only on the field containing the null values
> > works:
> > >
> > > 0: jdbc:drill:> select cast(curr_fiscal_week as int) as
> > > curr_fiscal_week, case when load_dts = 'null' then null else
> > > cast(load_dts as timestamp) end as `load_dts` from
> > > hive.dim.`YEAR_DATE_DIM`;
> > > +-------------------+------------------------+
> > > | curr_fiscal_week  |        load_dts        |
> > > +-------------------+------------------------+
> > > | 12                | 2014-02-07 15:11:35.0  |
> > > | 12                | 2014-02-07 15:11:35.0  |
> > > | 12                | 2012-11-08 00:00:00.0  |
> > > | 12                | 2012-11-08 00:00:00.0  |
> > > | 12                | 2012-11-08 00:00:00.0  |
> > > | 12                | 2012-11-08 00:00:00.0  |
> > > | 12                | 2012-11-08 00:00:00.0  |
> > > | 12                | 2012-11-08 00:00:00.0  |
> > > | 12                | 2012-11-08 00:00:00.0  |
> > > | 12                | 2012-11-08 00:00:00.0  |
> > > | 12                | 2012-11-08 00:00:00.0  |
> > > | 12                | 2012-11-08 00:00:00.0  |
> > > | 12                | 2012-11-08 00:00:00.0  |
> > > | 12                | 2012-11-08 00:00:00.0  |
> > > | 12                | 2012-11-08 00:00:00.0  |
> > > | 12                | 2012-11-08 00:00:00.0  |
> > > | 12                | 2012-11-08 00:00:00.0  |
> > > | 12                | 2012-11-08 00:00:00.0  |
> > > | 12                | 2012-11-08 00:00:00.0  |
> > > | 12                | 2012-11-08 00:00:00.0  |
> > > | 12                | 2012-11-08 00:00:00.0  |
> > > | 12                | 2012-11-08 00:00:00.0  |
> > > | 12                | 2015-01-13 00:00:00.0  |
> > > | 12                | 2015-01-13 00:00:00.0  |
> > > | 12                | null                   |
> > > | 12                | null                   |
> > > +-------------------+------------------------+
> > > 26 rows selected (0.332 seconds)
> > >
> > > As soon as I use another case statement on the first field (which
> doesn’t
> > > contain null values) I get an error:
> > >
> > > 0: jdbc:drill:> select case when curr_fiscal_week = 'null' then null
> > > else cast(curr_fiscal_week as int) end as curr_fiscal_week, case when
> > > load_dts = 'null' then null else cast(load_dts as timestamp) end as
> > > `load_dts` from hive.dim.`YEAR_DATE_DIM`;
> > > Error: SYSTEM ERROR: NumberFormatException: For input string: "null"
> > >
> > > Fragment 0:0
> > >
> > > [Error Id: 315ee6c6-a642-496b-b8c0-a005f7183663 on chlhad02:31010]
> > > (state=,code=0)
> > >
> > > This data was imported sqoop so I believe it’s stored as a TextInput
> > hadoop
> > > file with ‘\0001’ field delimiters and ‘\n’ record delimiters.
> > >
> > > Chris Matta
> > >
> > > [email protected]
> > > 215-701-3146
> > > ​
> > >
> >
>

Reply via email to