Hi,
Synchronous datasets provides clocked datasets for map-reduce coordinator
applications. However, I am not clear how to use it in my Hive action. I have a
Hive script that processes the past seven days' data. What I need to do is
either
[1] Empty the table content and then using the following statement to load the
desired data into Hive table and then run my Hive script to process the data.
Day1: ALTER TABLE myTable ADD IF NOT EXISTS PARTITION (dt = 'mm-dd-yyyy')
LOCATION '/input/yyyy/mm/dd';
Day2: ... ..
...
Day7: ... ...
Or
[]2 Load the daily partition into Hive table and use Hive partition in my
select statement.
Anyway, I need to access dynamic YEAR, MONTH, DAY values (similar to the way
"start-instance" and "end-instance" are used) generated by my coordinator
application in my Hive script. Any suggestion on what is possibly a good
approach?
Thanks!
Yongcheng