Synchronous Datasets for Hive

Yongcheng Li Wed, 14 Nov 2012 16:09:28 -0800

Hi,

Synchronous datasets provides clocked datasets for map-reduce coordinator 
applications. However, I am not clear how to use it in my Hive action. I have a 
Hive script that processes the past seven days' data. What I need to do is 
either


[1] Empty the table content and then using the following statement to load the 
desired data into Hive table and then run my Hive script to process the data.

      Day1: ALTER TABLE myTable ADD IF NOT EXISTS PARTITION (dt = 'mm-dd-yyyy') 
LOCATION '/input/yyyy/mm/dd';
      Day2: ... ..
      ...
      Day7: ... ...

Or

[]2 Load the daily partition into Hive table and use Hive partition in my 
select statement.

Anyway, I need to access dynamic YEAR, MONTH, DAY values (similar to the way 
"start-instance" and "end-instance" are used) generated by my coordinator 
application in my Hive script. Any suggestion on what is possibly a good 
approach?

Thanks!

Yongcheng

Synchronous Datasets for Hive

Reply via email to