Hi Daniel, Actually the mapreduce job was just fine, but the process stuck on the data loading after that. The output stopped at: Loading data to table default.parquet_table_with_40k_partitions partition (yearmonth=null, prefix=null)
When I look at the size of hdfs files of table, I can see the size is growing, but it's kind of slow. For mapreduce job, I had 400+ mappers and 100+ reducers. Thanks Tianqi From: Daniel Haviv [mailto:[email protected]] Sent: Wednesday, April 15, 2015 9:23 PM To: [email protected] Subject: Re: Extremely Slow Data Loading with 40k+ Partitions How many reducers are you using? Daniel On 16 באפר׳ 2015, at 00:55, Tianqi Tong <[email protected]<mailto:[email protected]>> wrote: Hi, I'm loading data to a Parquet table with dynamic partitons. I have 40k+ partitions, and I have skipped the partition stats computation step. Somehow it's still exetremely slow loading data into partitions (800MB/h). Do you have any hints on the possible reason and solution? Thank you Tianqi Tong
