Hi Hive users,

I am happy to announce the release of MR3 1.7. MR3 is an execution engine
for big data processing, and its main application Hive on MR3 is an
alternative to Hive-Tez and Hive-LLAP. I would like to summarize its main
features.

1. Hive on MR3 on Hadoop
Hive on MR3 is easy to install on Hadoop. In particular, you don't have to
upgrade Tez and Hadoop to their matching version.

2. Hive on MR3 on Kubernetes
MR3 provides native support for Kubernetes, and Hive on MR3 can run
directly on Kubernetes (without having to operate Hadoop on Kubernetes). On
public clouds like Amazon EKS, one can take advantage of autoscaling and
spot instances.

3. Hive on MR3 in standalone mode
>From version 1.7, MR3 supports standalone mode which does not require a
resource manager like Hadoop and Kubernetes. By exploiting standalone mode,
one can run Hive on MR3 virtually in any type of cluster. Now installing
Hive on MR3 is as simple as installing Trino/Presto.

4. Performance
Based on 10TB TPC-DS benchmark, Hive on MR3 runs faster than Hive-LLAP
(8074s vs 8680s). It is slightly slower than Trino 418 (7424s vs 8074s),
but returns correct results on all 99 queries, while Trino fails or returns
wrong results on some queries.

5. Java 17
As an experimental feature, MR3 supports Java 17, and Hive on MR3 can run
with Java 17. From the same TPC-DS benchmark, upgrading Java from 8 to 17
yields about 8% speedup (from 8074s to 7415s).

6. Correctness
Hive on MR3 is based on Hive branch-3.1 and has backported over 700
patches. In addition to q-tests included in the source code of Hive, we use
TPC-DS benchmark to check the correctness of query compilation. Hive on MR3
returns correct results on all 99 queries. (Note: The current master branch
of Hive returns wrong results on some queries in TPC-DS.)

7. Misc
Other applications of MR3 include Spark on MR3 and MapReduce on MR3. For
example, you can run MapReduce jobs directly on Kubernetes!

For the full documentation (including quick start guide and release notes),
please see:

https://mr3docs.datamonad.com/

The git repository for Hive on MR3 can be used to build Hive on Tez as well
(by ignoring the last few commits):

https://github.com/mr3project/hive-mr3

Thanks,

--- Sungwoo

Reply via email to