Forwarded to user@hive as I think many people are curious about the release
of Hive 4.

---------- Forwarded message ---------
From: Sungwoo Park <c...@pl.postech.ac.kr>
Date: Sat, Nov 4, 2023 at 12:42 AM
Subject: Release of Hive 4 and TPC-DS benchmark
To: <d...@hive.apache.org>


Hi everyone,

I would like to resume the discussion on the release of Hive 4 and
the result of the TPC-DS benchmark.

Currently there are four unresolved JIRAs marked 'hive-4.0.0-must' which
must be
resolved before the release of Hive 4 ([1], [2], [3], [4]). The most urgent
one
is perhaps HIVE-26654 [1] which reports failing queries in the TPC-DS
benchmark.
(All these bugs were introduced after the release of Hive 3.1.2 which
passes all
the TPC-DS tests.)

Originally we reported 7 failing cases in HIVE-26654. Since then, 3 cases
have
been resolved, 2 cases have pull requests, and 2 cases don't have pull
requests
yet.

1. Query 17: Resolved in HIVE-26655 [6]
2. Query 16, 69, 94: Resolved in HIVE-26659 [8]
3. Query 64: Resolved in HIVE-26968 [10]

4. Query 2: Pull request available in HIVE-27006 [5]
5. Query 71: Pull request available in HIVE-26986 [9]

6. Query 14: Reported in HIVE-24167 [7]
7. Query 97: Reported in HIVE-27269 [11]

Seonggon and I (in MR3 team) have been working on these problems, and so
far we
have submitted 4 pull requests. Two of them have been merged, but the other
two
are not being reviewed (for query 2 and query 71). I'd apprecite it very
much if
Hive committers could review the remaining pull requests.

The remainging problems are query 14 and query 97.

For query 14, I suggest that we take a simple workaround by setting
hive.optimize.cte.materialize.threshold to -1 by default because nobody
seems to
working on this JIRA. If necessary, we could try to fix it after the
release of
Hive 4.

For query 97 (which we think is the most challenging one among all the
sub-JIRAs), we have a few choices:

1) Use a quick-fix solution by ignoring hive.mapjoin.hashtable.load.threads
when
FullOuterJoin is used
2) Fix HIVE-25583 [12] which introduces this bug
3) Fix it properly

I suggest that we take a quick-fix solution and revisit the problem after
the
release of Hive 4.

(We have also observed performance regression in Hive, but I guess another
topic
to discuss after fixing correctness issues.)

Please let us know what you think.

Thanks,

--- Sungwoo

[1] https://issues.apache.org/jira/browse/HIVE-26654
[2] https://issues.apache.org/jira/browse/HIVE-27226
[3] https://issues.apache.org/jira/browse/HIVE-26505
[4] https://issues.apache.org/jira/browse/HIVE-22636
[5] https://issues.apache.org/jira/browse/HIVE-27006
[6] https://issues.apache.org/jira/browse/HIVE-26655
[7] https://issues.apache.org/jira/browse/HIVE-24167
[8] https://issues.apache.org/jira/browse/HIVE-26659
[9] https://issues.apache.org/jira/browse/HIVE-26986
[10] https://issues.apache.org/jira/browse/HIVE-26968
[11] https://issues.apache.org/jira/browse/HIVE-27269
[12] https://issues.apache.org/jira/browse/HIVE-25583

Reply via email to