Based on the plan, filtered output in map-1 had mis-estimates and also
groupby operators have large misestimates.

This is causing the number of reducers to be estimated as "4" which is less
for this query. Due to the partition factor of tez, it ends up with 8
reducer slots at runtime for hive 3.x.

Here are a few tickets which can help, but note that it is *very risky* to
backport pointed patches for stats and CBO without complete context. This
may have an adverse impact on other queries.

https://issues.apache.org/jira/browse/HIVE-23684
https://issues.apache.org/jira/browse/HIVE-20432
https://issues.apache.org/jira/browse/HIVE-23485

On Sun, Oct 2, 2022 at 1:56 PM Battula, Brahma Reddy <bbatt...@visa.com>
wrote:

> + Attaching the hs2 logs also.
>
>
>
> *From: *"Battula, Brahma Reddy" <bbatt...@visa.com>
> *Date: *Sunday, 2 October 2022 at 2:16 AM
> *To: *"u...@hive.apache.org" <u...@hive.apache.org>
> *Subject: *TPCDS query degrade with hive-3.1.2 because of wrong
> estimation for reducers
>
>
>
> Hi All,
>
>
>
> We’ve ran TPCDS queries against hive-3.1.2 and trunk(little older
> version). (Attached files suffix “a” is trunk and “v” is 3.1.2)
>
>
>
> The query execution time is higher in hive-3.1.2 as number of the reducers
> estimated is less (8) as compared to trunk version where it’s 46.
>
>
>
> All the hive/tez/Yarn configs are same in both clusters. Even h/w
> resources are same. And query planner is also same.
>
>
>
> *The stats in reduce sink phase are not look same.*
>
>
>
> *HIVE_TRUNK_CODE* - 2022-09-26T05:58:23,786 INFO
> [07243354-f941-419d-8908-45009762e67d HiveServer2-Handler-Pool:
> Thread-168]: optimizer.ConvertJoinMapJoin (:()) - Join input#1;
> onlineDataSize:   9628; Statistics: Num rows:  359 Data size: 4308  Basic
> stats: COMPLETE Column stats: COMPLETE
>
> *HIVE_3.1.2_CODE* - 2022-09-27T03:39:45,116 INFO
> [2fd1493c-f1a0-4874-acac-58f28e9c21ea HiveServer2-Handler-Pool:
> Thread-134]: optimizer.ConvertJoinMapJoin (:()) - Join input#1;
> onlineDataSize: 325856; Statistics: Num rows: 8116 Data size: 97392 Basic
> stats: COMPLETE Column stats: COMPLETE
>
>
>
> Any idea how the reducers getting underestimated.?
>
>
>
>
>
>
>
>
>

Reply via email to