Re: TPCDS query degrade with hive-3.1.2 because of wrong estimation for reducers

2022-10-03 Thread Battula, Brahma Reddy
Hi Rajesh, Thanks for spending time on this. We tried applying these patches but unfortunately it didn't help HIVE-23485: Bound GroupByOperator stats using largest NDV among columns HIVE-23684: Large underestimation in NDV stats when input and join cardinality ratio is big HIVE-20432:

Re: TPCDS query degrade with hive-3.1.2 because of wrong estimation for reducers

2022-10-02 Thread Rajesh Balamohan
Based on the plan, filtered output in map-1 had mis-estimates and also groupby operators have large misestimates. This is causing the number of reducers to be estimated as "4" which is less for this query. Due to the partition factor of tez, it ends up with 8 reducer slots at runtime for hive