Hi Daniel,
Actually the mapreduce job was just fine, but the process stuck on the data 
loading after that.
The output stopped at:
Loading data to table default.parquet_table_with_40k_partitions partition 
(yearmonth=null, prefix=null)

When I look at the size of hdfs files of table, I can see the size is growing, 
but it's kind of slow.
For mapreduce job, I had 400+ mappers and 100+ reducers.

Thanks
Tianqi

From: Daniel Haviv [mailto:[email protected]]
Sent: Wednesday, April 15, 2015 9:23 PM
To: [email protected]
Subject: Re: Extremely Slow Data Loading with 40k+ Partitions

How many reducers are you using?
Daniel

On 16 באפר׳ 2015, at 00:55, Tianqi Tong 
<[email protected]<mailto:[email protected]>> wrote:
Hi,
I'm loading data to a Parquet table with dynamic partitons. I have 40k+ 
partitions, and I have skipped the partition stats computation step.
Somehow it's still exetremely slow loading data into partitions (800MB/h).
Do you have any hints on the possible reason and solution?

Thank you
Tianqi Tong

Reply via email to