Hi All, I need to load a month worth of processed data into a hive table. Table have 10 partitions. Each day have many files to load and each file is taking two seconds(constantly) and i have ~3000 files). So it will take days to complete for 30 days worth of data.
I planned to load every day data parellaly into respective partition so that i can complete it short time. But i need clarrification before proceeding it. Question: 1. Will it cause data loss/corruption by loading parellely in different partition of same hive table ? For example, Assume i am doing like below, Table : processedlogs Partition : logdate Running below commands parellely, LOAD DATA INPATH '/logs/processed/2013-04-01' OVERWRITE INTO TABLE processedlogs PARTITION(logdate='2013-04-01'); LOAD DATA INPATH '/logs/processed/2013-04-02' OVERWRITE INTO TABLE processedlogs PARTITION(logdate='2013-04-02'); LOAD DATA INPATH '/logs/processed/2013-04-02' OVERWRITE INTO TABLE processedlogs PARTITION(logdate='2013-04-03'); LOAD DATA INPATH '/logs/processed/2013-04-02' OVERWRITE INTO TABLE processedlogs PARTITION(logdate='2013-04-04'); ..... LOAD DATA INPATH '/logs/processed/2013-04-30' OVERWRITE INTO TABLE processedlogs PARTITION(logdate='2013-04-30'); Thanks Selva