I'm having a hard time passing a date as a hive environment variable.
The setting is this: The table I'm querying is partitioned on a date column,
say, local_dt. I wish to query on last two days' worth of data. Unfortunately
there seems to be no way of getting the current date without either scanning
the entire table on all local dates (current_timestamp()) or writing a custom
UDF. So, I'm trying to pass it as a parameter within shell.
Here is my test_query.sql:
select count(*)
from myschema.mytable
where local_dt > ${hiveconf:ref_date}
;
and here is the driver shell script:
#/bin/sh
somedate=$(date -d '2 day ago' +"%Y-%m-%d")
echo $somedate
hive -hiveconf ref_date=$somedate -f test_query.sql > output.dat
The problem is that Hive is performing subtractions in the date format. echo
$somedate produces "2012-05-08" and "select ${hiveconf:ref_date} from
dummytable limit 1" produces "1999".
I noticed that there is an option to "set hive.variable.substitute=false;", but
in that case, hive throws the following error:
FAILED: Parse Error: line 3:7 cannot recognize input near '$' '{' 'hiveconf' in
select clause
Regards,
Saurabh
P.S. I'm using this as reference:
http://hive.apache.org/docs/r0.9.0/language_manual/var_substitution.html