This is very easy to do with a shell script and you can have cron run it nightly, say just after midnight so the year, month, and day calculation below return the correct values:
#!/bin/bash year=$(year +%Y) month=$(year +%m) day=$(year +%d) hive -e "use mydb; ALTER TABLE foo ADD IF NOT EXISTS PARTITION (year='$year', month='$month', day='$day') location '/path/to/mydb.db/foo/year=$year/month=$month/day=$day';" (The last line wrapped) If the year, month and day are integers, then omit the single quotes, i.e., (year=$year, month=$month, day=$day) Including IF NOT EXISTS let's you run the script multiple times per day, if you're paranoid... ;) By the way, I used "use mydb; ALTER TABLE foo..." instead of "ALTER TABLE mydb.foo..." because hive (v0.10.0) didn't like the latter. dean On Mon, Jan 28, 2013 at 10:47 PM, Mark Grover <grover.markgro...@gmail.com>wrote: > Sadananda, > See if this helps: > https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Recoverpartitions > > > On Mon, Jan 28, 2013 at 8:05 PM, Sadananda Hegde <saduhe...@gmail.com>wrote: > >> Hello, >> >> My hive table is partitioned by year, month and day. I have defined it as >> external table. The M/R job correctly loads the files into the daily >> subfolders. The hdfs files will be loaded to >> <hivetable>/year=yyyy/month=mm/day=dd/ folders by the scheduled M/R jobs. >> The M/R job has some business logic in determining the values for year, >> month and day; so one run might create / load files into multiple sub >> -folders (multiple days). I am able to query the tables after adding >> partitions using ALTER TABLE ADD PARTITION statement. But how do I automate >> the partition creation step? Basically this script needs to identify the >> subfolders created by the M/R job and create corresponding ALTER TABLE ADD >> PARTITION statements. >> >> For example, say the M/R job loads files into the following 3 sub-folders >> >> /user/hive/warehouse/sales/year=2013/month=1/day=21 >> /user/hive/warehouse/sales/year=2013/month=1/day=22 >> /user/hive/warehouse/sales/year=2013/month=1/day=23 >> >> Then it should create 3 alter table statements >> >> ALTER TABLE sales ADD PARTITION (year=2013, month=1, day=21); >> ALTER TABLE sales ADD PARTITION (year=2013, month=1, day=22); >> ALTER TABLE sales ADD PARTITION (year=2013, month=1, day=23); >> >> I thought of changing M/R jobs to load all files into same folder, >> then first load the files into non-partitioned table and then to load the >> partitioned table from non-partitioned table (using dynamic partition); but >> would prefer to avoid that extra step if possible (esp. since data is >> already in the correct sub-folders). >> >> Any help would greately be appreciated. >> >> Regards, >> Sadu >> >> >> > > -- *Dean Wampler, Ph.D.* thinkbiganalytics.com +1-312-339-1330