Re: Synchronous Datasets for Hive

Mohammad Islam Wed, 14 Nov 2012 16:44:14 -0800


Hi Yongcheng,
As far as I know, currently there is no easy way of achieving.
We are working on this feature based on HCat/Hive. Dev works progress can be 
found at:
https://issues.apache.org/jira/browse/OOZIE-561



At this time, I can think of writing a Hive action or Java action to get those 
range of instance from metastore. I know it is not convenient.

Regards,
Mohammad



________________________________
 From: Yongcheng Li <[email protected]>
To: "[email protected]" <[email protected]> 
Sent: Wednesday, November 14, 2012 11:44 AM
Subject: Synchronous Datasets for Hive
 
Hi,

Synchronous datasets provides clocked datasets for map-reduce coordinator 
applications. However, I am not clear how to use it in my Hive action. I have a 
Hive script that processes the past seven days' data. What I need to do is 
either

[1] Empty the table content and then using the following statement to load the 
desired data into Hive table and then run my Hive script to process the data.

      Day1: ALTER TABLE myTable ADD IF NOT EXISTS PARTITION (dt = 'mm-dd-yyyy') 
LOCATION '/input/yyyy/mm/dd';
      Day2: ... ..
      ...
      Day7: ... ...

Or

[]2 Load the daily partition into Hive table and use Hive partition in my 
select statement.

Anyway, I need to access dynamic YEAR, MONTH, DAY values (similar to the way 
"start-instance" and "end-instance" are used) generated by my coordinator 
application in my Hive script. Any suggestion on what is possibly a good 
approach?

Thanks!

Yongcheng

Re: Synchronous Datasets for Hive

Reply via email to