Hi guys,
Running with a parquet backed table in hive ‘dim_promo_date_curr_p' which has
the following data;
scala> sqlContext.sql("select * from pz.dim_promo_date_curr_p").show(3)
15/06/18 00:53:21 INFO ParseDriver: Parsing command: select * from
pz.dim_promo_date_curr_p
15/06/18 00:53:21 INFO ParseDriver: Parse Completed
+----------+-------------+-----------+
|clndr_date|pw_start_date|pw_end_date|
+----------+-------------+-----------+
|2015-02-18| 2015-02-18| 2015-02-24|
|2015-11-13| 2015-11-11| 2015-11-17|
|2015-03-31| 2015-03-25| 2015-03-31|
|2015-07-21| 2015-07-15| 2015-07-21|
+----------+-------------+-----------+
Running a query from Spark 1.4 shell with the sqlContext (hive) with date_add
it seems to work except for the value from the table. I’ve only seen it on the
31st of March, no other dates;
scala> sqlContext.sql("SELECT DATE_ADD(CLNDR_DATE, 7) as wrong,
DATE_ADD('2015-03-30', 7) as right30, DATE_ADD('2015-03-31', 7) as right31,
DATE_ADD('2015-04-01', 7) as right01 FROM pz.dim_promo_date_curr_p WHERE
CLNDR_DATE='2015-03-31'").show
15/06/18 00:57:32 INFO ParseDriver: Parsing command: SELECT
DATE_ADD(CLNDR_DATE, 7) as wrong, DATE_ADD('2015-03-30', 7) as right30,
DATE_ADD('2015-03-31', 7) as right31, DATE_ADD('2015-04-01', 7) as right01 FROM
pz.dim_promo_date_curr_p WHERE CLNDR_DATE='2015-03-31'
15/06/18 00:57:32 INFO ParseDriver: Parse Completed
+----------+----------+----------+----------+
| wrong| right30| right31| right01|
+----------+----------+----------+----------+
|2015-04-06|2015-04-06|2015-04-07|2015-04-08|
+----------+----------+----------+----------+
It seems to miss a date, even though the where clause has 31st in it. When the
date is just a string the select clause seems to work fine. Problem appears in
Spark 1.3.1 as well.
Not sure if this is coming from Hive, but it seems like a bug. I’ve raised a
JIRA https://issues.apache.org/jira/browse/SPARK-8421
Cheers,
Nathan