You can choose to partition by (country, date).
In this case you move the data in a date partition within your country 
partition and avoid overwriting old data.

If you choose to go this way one thing to check is that this should not result 
in too many partitions.
Large number of partitions have large query startup times.

Thanks
Vaibhav

From: Bejoy Ks [mailto:[email protected]]
Sent: Monday, October 03, 2011 7:02 AM
To: hive user group
Subject: Hive Dynamic Partions - How to avoid overwrite

Hi Experts
    I'm intending to use hive dynamic partition approach on my current business 
use case. What I have in mind for the design is as follows.
-Load my incoming data into a non partitioned hive table (Table 1)
-Load this data into partitioned hive table using Dynamic Partitions(Table 2)
-Flush the data in Table1(Drop Table and Recreate the same)
With this series of steps my data world be ready for mining.
    This is going to a periodic process happening daily. When I searched around 
I came across a concern with this approach, 'the partitions getting 
overwritten'.
For example. Say my second table is partitioned based on Country and in my 
first load, data is populated in the partition with country=USA. When the 
second time my Dynamic Partition load/insert it is executed and the source data 
again contains value with country=USA, in that case the data that is already 
there in the partition be overwritten with the new ones.
Is my understanding right on this scenario? Also in such scenarios what would 
be recommended approach to overcome this hurdle. Basically I want the existing 
data in the partition to be preserved while new data is added on to. I can't go 
ahead with the static partition approach because my data is huge and the number 
of partitions are also petty large.  Has some one framed effective solutions on 
such scenarios with Dynamic Partition insert approach? Can some one guide me 
with a suitable approach with hive for such use cases?

Thanks and Regards
Bejoy.K.S

Reply via email to