Re: Automating the partition creation process

Mark Grover Mon, 28 Jan 2013 20:48:21 -0800

Sadananda,
See if this helps:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Recoverpartitions


On Mon, Jan 28, 2013 at 8:05 PM, Sadananda Hegde <saduhe...@gmail.com>wrote:

> Hello,
>
> My hive table is partitioned by year, month and day. I have defined it as
> external table. The M/R job correctly loads the files into the daily
> subfolders. The hdfs files will be loaded to
> <hivetable>/year=yyyy/month=mm/day=dd/ folders by the scheduled M/R jobs.
> The M/R job has some business logic in determining the values for year,
> month and day; so one run might create / load files into multiple sub
> -folders (multiple days). I am able to query the tables after adding
> partitions using ALTER TABLE ADD PARTITION statement. But how do I automate
> the partition creation step? Basically this script needs to identify the
> subfolders created by the M/R job and create corresponding ALTER TABLE ADD
> PARTITION statements.
>
> For example, say the M/R job loads files into the following 3 sub-folders
>
> /user/hive/warehouse/sales/year=2013/month=1/day=21
> /user/hive/warehouse/sales/year=2013/month=1/day=22
> /user/hive/warehouse/sales/year=2013/month=1/day=23
>
> Then it should create 3 alter table statements
>
> ALTER TABLE sales ADD PARTITION (year=2013, month=1, day=21);
> ALTER TABLE sales ADD PARTITION (year=2013, month=1, day=22);
> ALTER TABLE sales ADD PARTITION (year=2013, month=1, day=23);
>
> I thought of changing M/R jobs to load all files into same folder,
> then first load the files into non-partitioned table and then to load the
> partitioned table from non-partitioned table (using dynamic partition); but
> would prefer to avoid that extra step if possible (esp. since data is
> already in the correct sub-folders).
>
> Any help would greately be appreciated.
>
> Regards,
> Sadu
>
>
>

Re: Automating the partition creation process

Reply via email to