Questions on LLAP and hive.server2.enable.doAs

2018-05-17 Thread Sungwoo Park
Hello, I have a couple of questions on LLAP and hive.server2.enable.doAs. I've learned that LLAP does not support hive.server2.enable.doAs=true, but what if we disable LLAP IO? If LLAP IO is disabled and no cache is used in LLAP daemons, I guess it should be okay to allow

Re: Questions on LLAP and hive.server2.enable.doAs

2018-05-17 Thread Sungwoo Park
For question 1, if hive.server2.enable.doAs is set to true, the AppMaster fails to connect to LLAP daemons (from my experiments). --- Sungwoo On Fri, May 18, 2018 at 1:02 AM, Sungwoo Park <glap...@gmail.com> wrote: > Hello, > > I have a couple of questions on LLAP and hive.serv

Re: MERGE performances issue

2018-05-24 Thread Sungwoo Park
Hive-MR3 could be a solution for you. It supports everything that you mention in the previous post. I have written a blog article discussing the pros and cons of Hive-MR3 with respect to Hive-LLAP. https://mr3.postech.ac.kr/blog/2018/05/19/comparison-hivemr3-llap/ --- Sungwoo On Thu, May 10,

Announce: Hive-MR3 0.2

2018-05-24 Thread Sungwoo Park
Hello Hive users, I am pleased to announce the release of Hive-MR3 0.2. Hive-MR3 now supports LLAP I/O. I have published a blog article that compares the stability and performance of Hive-MR3 and Hive-LLAP: https://mr3.postech.ac.kr/blog/2018/05/19/comparison-hivemr3-llap/ >From the blog

CachedStore for hive.metastore.rawstore.impl in Hive 3.0

2018-06-12 Thread Sungwoo Park
iguration parameters (because I imported hive-site.xml from Hive 2). Any suggestion would be appreciated. Thanks a lot, --- Sungwoo Park

Re: issues with Hive 3 simple sellect from an ORC table

2018-06-12 Thread Sungwoo Park
This is a diff file that let me compile Hive 3.0 on Hadoop 2.8.0 (and also run it on Hadoop 2.7.x). diff --git a/pom.xml b/pom.xml index c57ff58..8445288 100644 --- a/pom.xml +++ b/pom.xml @@ -146,7 +146,7 @@ 19.0 2.4.11 1.3.166 -3.1.0 +2.8.0

Re: Question on accessing LLAP as data cache from external containers

2018-01-31 Thread Sungwoo Park
/hadoop-hdfs/ > CentralizedCacheManagement.html > > To answer your original question: why not implement the whole job in Hive? > Or orchestrate using oozie some parts in mr and some in Huve. > > On 30. Jan 2018, at 05:15, Sungwoo Park <glap...@gmail.com> wrote: > > Hello all, > >

Question on accessing LLAP as data cache from external containers

2018-01-29 Thread Sungwoo Park
Hello all, I wonder if an external YARN container can send requests to LLAP daemon to read data from its in-memory cache. For example, YARN containers owned by a typical MapReduce job (e.g., TeraSort) could fetch data directly from LLAP instead of contacting HDFS. In this scenario, LLAP daemon

Re: Announce: MR3 0.3, and performance comparison with Hive-LLAP, Presto, Spark, Hive on Tez

2018-08-16 Thread Sungwoo Park
The article can be found at: https://mr3.postech.ac.kr/blog/2018/08/15/comparison-llap-presto-spark-mr3/ -- Sungwoo Park On Thu, Aug 16, 2018 at 10:53 PM, Sungwoo Park wrote: > Hello Hive users, > > I am pleased to announce the release of MR3 0.3. A new feature of MR3 0.3 > is

Announce: MR3 0.3, and performance comparison with Hive-LLAP, Presto, Spark, Hive on Tez

2018-08-16 Thread Sungwoo Park
0.203e 3) Spark 2.2.0 included in HDP 2.6.4 4) Hive 3.0.0 on Tez 5) Hive 3.0.0 on MR3 6) Hive 2.3.3 on MR3 You can download MR3 0.3 at: https://mr3.postech.ac.kr/download/home/ Thank you for your interest! --- Sungwoo Park

Fwd: Hive generating different DAGs from the same query

2018-07-19 Thread Sungwoo Park
not affect DAG generation. This issue is not related to query reexecution, as even with query reexecution disabled (hive.query.reexecution.enabled set to false), I still see this problem occurring. --- Sungwoo Park On Fri, Jul 13, 2018 at 4:48 PM, Zoltan Haindrich wrote: > Hello Sungwoo! &g

Re: Does Hive 3.0 only works with hadoop3.x.y?

2018-07-19 Thread Sungwoo Park
i Sungwoo, > > Just want to confirm, does that mean I just need to update the hive > version, without updating the hadoop version? > > Thanks! > > Best, > Zhefu Peng > > > ------ 原始邮件 -- > *发件人:* "Sungwoo Park"; > *发送时间:* 201

Re: Does Hive 3.0 only works with hadoop3.x.y?

2018-07-19 Thread Sungwoo Park
previously posted a diff file that lets us compile Hadoop 3.x on Hadoop 2.8+. http://mail-archives.apache.org/mod_mbox/hive-user/201806.mbox/%3CCAKHFPXDDFn52buKetHzSXTtjzX3UMHf%3DQvxm9QNNkv9r5xBs-Q%40mail.gmail.com%3E --- Sungwoo Park On Thu, Jul 19, 2018 at 8:21 PM, 彭鱼宴 <461292...@qq.com>

Re: Hive generating different DAGs from the same query

2018-09-11 Thread Sungwoo Park
Hello Gopal, I have been looking further into this issue, and have found that the non-determinstic behavior of Hive in generating DAGs is actually due to the logic in AggregateStatsCache.findBestMatch() called from AggregateStatsCache.get(), as well as the disproportionate distribution of Nulls

Re: Announce: MR3 0.3, and performance comparison with Hive-LLAP, Presto, Spark, Hive on Tez

2018-09-11 Thread Sungwoo Park
at 10:55:19PM +0900, Sungwoo Park wrote: > > The article compare the following six systems: > > Great article, as usual. Would have been great to also compare > concurrent queries. In particular, I guess presto on that point perform > the best. That metric is major since su

Hive generating different DAGs from the same query

2018-07-11 Thread Sungwoo Park
ome internal configuration key in HiveConf that enables/disables some optimization depending on the accumulate statistics in HiveServer2? (I haven't tested it yet, but I can also test with Hive 2.x.) Thank you in advance, --- Sungwoo Park

[Announce] Hive-MR3: Hive running on top of MR3

2018-04-04 Thread Sungwoo Park
of ApplicationMaster in MR3. In particular, it makes a better utilization of computing resources and thus yields a higher throughput for concurrent queries. --- Sungwoo Park

Re: [Announce] Hive-MR3: Hive running on top of MR3

2018-04-05 Thread Sungwoo Park
rs between the AMs. > > On Wed, Apr 4, 2018 at 10:06 AM Sungwoo Park <glap...@gmail.com> wrote: > >> Hello Hive users, >> >> I am pleased to announce MR3 and Hive-MR3. Please visit the following >> webpage for everything on MR3 and Hive-MR3: >> >> https://m

Re: Ways to reduce launching time of query in Hive 2.2.1

2018-04-16 Thread Sungwoo Park
there is no launch cost. Containers are also shared by all queries and thus run like daemons. https://mr3.postech.ac.kr/hivemr3/features/hiveserver2/ Hive-MR3 0.1 does not support LLAP IO yet, but Hive-MR3 0.2 will support LLAP IO (which will be released by the end of this month.) --- Sungwoo Park On Mon

Re: FW: NPE in hive 2.3.x during window operator

2018-03-25 Thread Sungwoo Park
this NPE. --- Sungwoo Park On Wed, Mar 21, 2018 at 9:24 AM, Anuj Lal <a...@lendingclub.com> wrote: > > > > We are also facing the issue as described in > > https://issues.apache.org/jira/browse/HIVE-18786?page= > com.atlassian.jira.plugin.system.issuetabpanels%3Aal

Announce: MR3 0.4 released

2018-11-01 Thread Sungwoo Park
/31/performance-evaluation-0.4/ You can download MR3 0.4 at: https://mr3.postech.ac.kr/download/home/ --- Sungwoo Park

Announce: MR3 0.6 released

2019-03-23 Thread Sungwoo Park
I am pleased to announce the release of MR3 0.6. New key features are: - In Hive on Kubernetes, DAGAppMaster can run in its own Pod. - MR3-UI requires only Timeline Server. - Hive on MR3 is much more stable because it supports memory monitoring when loading hash tables for Map-side join. You can

Announce: MR3 0.5 released (with Hive on Kubernetes)

2019-02-20 Thread Sungwoo Park
also supports Hive 3.1.1 and Hive 2.3.4. You can download MR3 0.5 at: https://mr3.postech.ac.kr/download/home/ --- Sungwoo Park

Re: Hive on Tez vs Impala

2019-04-15 Thread Sungwoo Park
/performance-evaluation-0.4/ --- Sungwoo Park On Mon, Apr 15, 2019 at 8:44 PM Artur Sukhenko wrote: > Hi, > We are using CDH 5, with Impala 2.7.0-cdh5.9.1 and Hive 1.1 (MapReduce) > I can't find the info regarding Hive on Tez performance compared to Impala. > Does someone know or compared it

Announce: MR3 0.7 released

2019-04-27 Thread Sungwoo Park
an FAQ page: https://mr3.postech.ac.kr/faq/home/ You can download MR3 0.7 at: https://mr3.postech.ac.kr/download/home/ --- Sungwoo Park

Re: Announce: MR3 0.8 released

2019-06-27 Thread Sungwoo Park
at 7:56 PM Sungwoo Park wrote: > I am pleased to announce the release of MR3 0.8. New features are: > > -- Hive on MR3 on Yarn fully supports recovery: > https://mr3.postech.ac.kr/hivemr3/features/recovery/ > > -- Hive on MR3 on Yarn supports high availability in which mult

Fwd: Announce: MR3 0.8 released

2019-06-26 Thread Sungwoo Park
ger. Hive on Kubernetes supports Timeline Server. You can download MR3 0.8 at: https://mr3.postech.ac.kr/download/home/ --- Sungwoo Park

Fwd: Article on the correctness of Hive on MR3, Presto, and Impala

2019-06-26 Thread Sungwoo Park
I have published a new article on the correctness of Hive on MR3, Presto, and Impala: https://mr3.postech.ac.kr/blog/2019/06/26/correctness-hivemr3-presto-impala/ Hope you enjoy reading the article. --- Sungwoo

Re: Article on the correctness of Hive on MR3, Presto, and Impala

2019-06-26 Thread Sungwoo Park
e columns in the result set and conduct a large diff on them? > > On Wednesday, June 26, 2019, Sungwoo Park wrote: > >> I have published a new article on the correctness of Hive on MR3, Presto, >> and Impala: >> >> >> https://mr3.postech.ac.kr/blog/2019/06/26/c

Re: Filters with IN clause are getting omitted

2019-04-23 Thread Sungwoo Park
Not solution to the problem on HDP 2.6.5, but I have tested the first script in Hive 2.3.4 and Hive 3.1.1. On Hive 2.3.4, it returns 1 row, and on Hive 3.1.1, it returns no row. So, I guess the bug is still in HDP 2.6.5. --- Sungwoo On Tue, Apr 23, 2019 at 7:40 PM Rajat Khandelwal wrote: > Hi

Presto 317 vs Hive on MR3 0.10 (snapshot)

2019-08-22 Thread Sungwoo Park
Hello Hive users, I have published a new article that compares Presto 317 and Hive 3.1.1 on MR3 0.10 (snapshot). https://mr3.postech.ac.kr/blog/2019/08/22/comparison-presto317-0.10/ I haven't tested myself, but I guess Hive-LLAP also runs much faster than Presto. --- Sungwoo

Re: Apache Hive 2.3.4 - Issue with combination of Like operator & newline (\n) character in data

2019-07-29 Thread Sungwoo Park
Not a solution, but one can use \n in the search string, e.g.: select * from default.withdraw where id like '%withdraw\ncash'; select * from default.withdraw where id like '%withdraw%\ncash'; select * from default.withdraw where id like '%withdraw%\n%cash'; --- Sungwoo On Tue, Jul 30, 2019 at

Announce: MR3 0.9 released

2019-07-25 Thread Sungwoo Park
a serverless environment.) https://mr3.postech.ac.kr/hivek8s/guide/multiple-metastores/ * UDFs work okay on Kubernetes. You can download MR3 0.9 at: https://mr3.postech.ac.kr/download/home/ --- Sungwoo Park

Re: Announce: MR3 0.8 released

2019-06-28 Thread Sungwoo Park
https://youtu.be/1NB7GtI8NXM I have uploaded a video demonstrating Hive on Kubernetes using MR3. --- Sungwoo On Fri, Jun 28, 2019 at 4:44 AM Sungwoo Park wrote: > I have created a quick start guide showing how to run Hive-MR3 on > Kubernetes using Minikube on a single machine.

Hive-LLAP in HDP 3.1.4 vs Hive 3/4 on MR3 0.10

2019-11-03 Thread Sungwoo Park
I have published a new article that compares: Hive-LLAP in HDP 3.1.4, Hive 3.1.2 on MR3 0.10, and Hive 4.0.0-SNAPSHOT on MR3 0.10. You can find the result at: https://mr3.postech.ac.kr/blog/2019/11/03/hive-performance-0.10/ Cheers, --- Sungwoo

Re: Hive Not Returning YARN Application Results Correctly Nor Inserting Into Local Tables

2019-11-06 Thread Sungwoo Park
For the problem of not returning the result to the console, I think it occurs because the default file system is set to local file system, not to HDFS. Perhaps hive.exec.scratchdir is already set to /tmp/hive, but if the default file system is local, FileSinkOperator writes the final result to the

Announce: MR3 0.11 released

2019-12-04 Thread Sungwoo Park
-in/out. So if you would like to try autoscaling with Hive on MR3, we suggest EKS instead of EMR. https://mr3.postech.ac.kr/quickstart/aws/run-eks-autoscaling/ You can download MR3 0.11 at: https://mr3.postech.ac.kr/download/home/ Cheers, --- Sungwoo Park

Re: Hive 1.1.0 support on hive metastore 2.3.0

2019-12-09 Thread Sungwoo Park
ore db. > > Thanks. > > > > > On Mon, Dec 9, 2019 at 12:46 PM Sungwoo Park wrote: > >> Not a definitive answer, but my test result might help. I tested with >> HiveServer2 1.2.2 and Metastore 2.3.6. Queries in the TPC-DS benchmark >> (which only read data an

Re: Hive 1.1.0 support on hive metastore 2.3.0

2019-12-08 Thread Sungwoo Park
Not a definitive answer, but my test result might help. I tested with HiveServer2 1.2.2 and Metastore 2.3.6. Queries in the TPC-DS benchmark (which only read data and never update) run okay. Creating new tables and loading data to tables also work okay. So, I guess for basic uses of Hive, running

Announce: MR3 0.10 released

2019-10-18 Thread Sungwoo Park
/download/home/ Cheers, --- Sungwoo Park

All-in-One Docker image for running Hive on MR3 + Ranger + Timeline Server

2019-10-18 Thread Sungwoo Park
://hub.docker.com/u/glaparkdocker Cheers, --- Sungwoo Park

Re: How to decide Hive Cluster capacity

2019-12-21 Thread Sungwoo Park
I think this problem of choosing a cluster capacity is really challenging because the desired cluster capacity depends not only on the size of the dataset but also on the complexity of queries. For example, the execution time of the TPC-DS queries on the same dataset can range from sub-10 seconds

Re: rename output error during hive query on AWSs3-external table

2020-02-04 Thread Sungwoo Park
Not a solution, but looking at the source code of S3AFileSystem.java (Hadoop 2.8.5), I think the Exception raised inside S3AFileSystem.rename() is swallowed and only a new HiveException is reported. So, in order to find out the root cause, I guess you might need to set Log level to DEBUG and see

Announce: MR3 1.0 released

2020-02-19 Thread Sungwoo Park
be useful to those interested in trying MR3 in production. https://www.datamonad.com/post/2020-02-19-testing-mr3/ Cheers, --- Sungwoo Park

Re: Issues with aggregating on map values

2020-02-21 Thread Sungwoo Park
I tested the example on Hive 2.3.6, and it returned correct results. Hive 3.1.2 and 4.0.0-SNAPSHOT also returned correct results. So, I guess, if this is a bug, it was introduced somewhere around Hive 3.0 and fixed in 3.1.2. On Hive 2.3.6, I used these commands instead: create table dummy(a

Re: UDF get_splits()

2020-04-05 Thread Sungwoo Park
nworks/spark/sql/hive/llap/HiveWarehouseDataSourceReader.java > > That being said, I'm not sure if this UDF is technically supported as a > public API by the Hive community, so you may want to check about that. > > Eric > > On Sun, Apr 5, 2020 at 11:52 AM Sungwoo Park wrot

UDF get_splits()

2020-04-05 Thread Sungwoo Park
Hello, I would like to learn the use of UDF get_splits(). I tried such queries as: select get_splits("select * from web_returns", 1) ; select get_splits("select count(*) from web_returns", 1); These queries just return InputSplit objects, and I would like to see an example that uses the result

Re: Count bug in Hive 3.0.0.3.1

2020-04-28 Thread Sungwoo Park
I have tested the script with Hive 2.3.6, Hive 3.1.2, and Hive 4.0.0-SNAPSHOT (all with minor modifications), and have not found any problem. So, I guess all the master branches are fine. If Hive 3.0.0.3.1 is the release included in HDP 3.0.0 or HDP 3.0.1, I remember that this Hive-LLAP/Tez

Question on metadata before and after compaction

2020-10-08 Thread Sungwoo Park
Hi, I have a question on the consistency between data (e.g., on HDFS) and metadata kept by Metastore before and after compaction. Here is a scenario: 1. We back up the database for Metastore (before performing compaction). 2. We perform compaction. 3. After performing compaction, we lose the

Video demo of fault tolerance in Hive on MR3 on Kubernetes

2020-07-29 Thread Sungwoo Park
Hi everyone, We created a video demo of fault tolerance in Hive on MR3 on Kubernetes, using Hive 3.1.2 and MR3 1.1. Hope you enjoy it! https://youtu.be/uoZGsMUlhew Cheers, --- Sungwoo

Re: Hive metastore

2020-07-14 Thread Sungwoo Park
Hello, We use just TCP readiness/liveness probes checking the Metastore listener port (specified by hive.metastore.port or metastore.thrift.port). I don't know if an HTTP endpoint is available for Metastore. readinessProbe: tcpSocket: port: 9083

MR3 1.1 released

2020-07-19 Thread Sungwoo Park
We are pleased to announce the release of MR3 1.1. Three main improvements in MR3 1.1 are: 1. Hive on MR3 on Kubernetes now runs almost as fast as Hive on MR3 on Hadoop. For experimental results, please see a new blog article "Why you should run Hive on Kubernetes, even in a Hadoop cluster".

MR3 1.2 released

2020-10-29 Thread Sungwoo Park
Hello Hive users, MR3 1.2 has been released. A few improvements in this release are: 1. MR3 can publish Prometheus metrics. 2. On Kubernetes, the user can change the total resources for workers dynamically (e.g., by using Prometheus metrics). This feature can be combined with autoscaling in

Re: Maintaining Hive 2 and 3 branches,

2021-03-18 Thread Sungwoo Park
Hi Peter, - Are these patches you mention below bugfixes, or new features on Hive > 3.1.3? (This might be a typo as I think the last Hive release is 3.1.2) > They are a collection of bug-fixes and improvements picked up from master/branch-3 branches. The list is mostly based on the additional

Maintaining Hive 2 and 3 branches,

2021-03-18 Thread Sungwoo Park
. (You can ignore the last commit which is internal to our work.) https://github.com/mr3project/hive-mr3/commits/master3 Thanks, --- Sungwoo Park

MR3 1.3 released

2021-08-18 Thread Sungwoo Park
We are pleased to announce the release of MR3 1.3. Highlights in this release are: 1) MR3 On both Hadoop and Kubernetes, there is no limit on the aggregate memory of ContainerWorkers, so MR3 can run in a cluster of any size. 2) Hive on MR3 We have backported about 350 patches to Apache Hive

Re: Future release of hive

2021-09-16 Thread Sungwoo Park
Hello, Hive PMC members or committers could share insider knowledge about the status of the Hive project, but here is my impression on Hive 3.1.2 as an outsider. Hive 3.1.2 is widely used in production, but not maintained seriously. (You could just check out the # of commits in branch-3.1 for

Patches to Hive 3.1.2,

2021-08-12 Thread Sungwoo Park
Hello Hive users, We have updated the repository that backports patches to Hive 3.1.2. Now it backports about 350 patches from the master branch to branch-3.1 of November 2020. You can ignore the last two commits which add MR3 backend and remove Hive on Spark.

Re: Hive servers restarting every few hours

2021-10-13 Thread Sungwoo Park
Hi, For 1, Hive 3.1.2 has a bug which leaks Metastore connections. This was reported in HIVE-20600: https://issues.apache.org/jira/browse/HIVE-20600 You might reproduce the bug by inserting values into a table and checking the number of connections, e.g.: 0: jdbc:hive2://blue0:9852/> CREATE

Re: Future release of hive

2021-09-21 Thread Sungwoo Park
Actually we can run Hive 3.1.2 with Ranger! To run Hive 3.1.2 with Ranger 2.0.0, you could set: hive.security.authorization.enabled=true hive.security.authenticator.manager=org.apache.hadoop.hive.ql.security.SessionStateUserAuthenticator

Re: Future release of hive

2021-09-21 Thread Sungwoo Park
Sorry, I missed one thing -- you need to backport: HIVE-20344: PrivilegeSynchronizer for SBA might hit AccessControlException (Daniel Dai, reviewed by Vaibhav Gumashta) --- Sungwoo On Wed, Sep 22, 2021 at 12:24 AM Sungwoo Park wrote: > Actually we can run Hive 3.1.2 with Ranger! > &g

Re: Hive-3 with hadoop-2.x.

2022-03-15 Thread Sungwoo Park
Up to MR3 version 1.2, Hive-MR3 supported Hive 3.1.2 on Hadoop 2.7+. From MR3 version 1.3 on, we did not release distributions for Hadoop 2.7+ because all use cases in production were using Hadoop 3+. (However it's still easy for us to build a distribution for Hadoop 2.7+.) When we were

Re: Too many S3 API calls for simple queries like select and create external table

2022-02-20 Thread Sungwoo Park
My understanding is that additional calls to S3 APi is the price to pay for using the Hadoop library which only emulates FileSystem on top of S3. S3 is not a distributed file system like HDFS, so some of the API calls cannot be optimized in an ideal way. For (i), a more serious problem is the

MR3 1.4 released,

2022-02-20 Thread Sungwoo Park
We are pleased to announce MR3 1.4 and MR3 App. 1. We have backported over 600 patches to Apache Hive 3.1. https://github.com/mr3project/hive-mr3 This repository is maintained as part of developing Hive on MR3, but can also be used for building Apache Hive (by ignoring the last two commits).

Performance evaluation of Spark 2, Spark 3, Hive-LLAP, MR3 1.4

2022-04-07 Thread Sungwoo Park
Hi Hive users, Here is our latest article on the performance of Spark 2, Spark 3, and Hive 3. Hope you find it interesting. https://www.datamonad.com/post/2022-04-01-spark-hive-performance-1.4/ Spark 3 is catching up with Hive very fast, at least when executing sequential queries. For

Re: Announce: Hive-MR3 with Celeborn,

2023-11-01 Thread Sungwoo Park
On Thu, Nov 2, 2023 at 1:43 PM Sungwoo Park wrote: > Have you done comparison between uniffle and celeborn..? >> > > We did not compare the performance of Uniffle and Celeborn (because > Hive-MR3-Celeborn has been released but Hive-MR3-Uniffle is not complete > yet). Much of

Re: Announce: Hive-MR3 with Celeborn,

2023-11-01 Thread Sungwoo Park
> > Have you done comparison between uniffle and celeborn..? > We did not compare the performance of Uniffle and Celeborn (because Hive-MR3-Celeborn has been released but Hive-MR3-Uniffle is not complete yet). Much of the code in Hive-MR3-Celeborn is currently reused in Hive-MR3-Uniffle, so we

Re: Announce: Hive-MR3 with Celeborn,

2023-11-02 Thread Sungwoo Park
future) to help compute > engines > > better use disaggregated architecture, as well as become more efficient > and > > stable for huge shuffle sized jobs. > > > > > > Currently Celeborn supports Hive on MR, and I think integrating with MR3 > > provides a good example

Fwd: Release of Hive 4 and TPC-DS benchmark

2023-11-03 Thread Sungwoo Park
Forwarded to user@hive as I think many people are curious about the release of Hive 4. -- Forwarded message - From: Sungwoo Park Date: Sat, Nov 4, 2023 at 12:42 AM Subject: Release of Hive 4 and TPC-DS benchmark To: Hi everyone, I would like to resume the discussion

Announce: Hive-MR3 with Celeborn,

2023-10-24 Thread Sungwoo Park
Hi Hive users, Before the impending release of MR3 1.8, we would like to announce the release of Hive-MR3 with Celeborn (Hive 3.1.3 on MR3 1.8 with Celeborn 0.3.1). Apache Celeborn [1] is remote shuffle service, similar to Magnet [2] and Apache Uniffle [3] (which was discussed in this Hive

Re: Specifying YARN Node (Label) for LLAP AM

2023-08-19 Thread Sungwoo Park
Hello, For more recent benchmark results, please see [1] where we compare Trino 418, Spark 3.4.0, and Hive 3.1.3 (on MR3 1.7) using TPC-DS 10TB. Spark takes about 19600 seconds to complete all the queries, whereas Trino and Hive take about 7400 seconds only. The experiment does not use Hive-LLAP,

Web-based interface for running Hive on Amazon EKS and Kubernetes

2022-06-06 Thread Sungwoo Park
Hi Hive users, We created MR3 Cloud, a web-based interface for executing Hive on Amazon EKS and Kubernetes. After specifying parameters in an interactive way, the user can download YAML files for creating an EKS cluster and Kubernetes objects. The user can create all the following components at

Re: [DISCUSS] End of life for Hive 1.x, 2.x, 3.x

2022-05-10 Thread Sungwoo Park
we were brainstorming about the future of the Hive 3 branch with > Zoltan Haindrich, he mentioned this letter: > https://lists.apache.org/thread/by9ppc2z8oqdzpqotzv5bs34yrxrd84l > > I think Sungwoo Park and his team makes a huge effort to maintain this > branch, and maybe it would be b

Re: External table replication in Hive

2022-08-25 Thread Sungwoo Park
For 1, cherry-picking it to Hive 3 does not work. I tried to backport HIVE-20911 to Hive 3, but it did not work because of so many dependencies :-( --- Sungwoo On Thu, Aug 25, 2022 at 2:15 AM Bharathkrishna G M wrote: > Hi, > > I want to replicate the Hive metastore to create a separate

MR3 1.5 released

2022-08-05 Thread Sungwoo Park
Hello Hive users, MR3 1.5 has been released. Hive 3.1.3 (with more than 600 additional patches backported) and Spark 3.2.2 are supported. Hive/Spark on MR3 is a quick and ready solution for you if: 1. You want to migrate from Hadoop to Kubernetes, but continue to use Hive. 2. You want to run

Re: Hive 3 has big performance improvement from my test

2023-01-07 Thread Sungwoo Park
In fact, Hive 3 has been much faster than Spark for a long time. For complex queries, Hive 3 is much faster than Presto (or Trino) as well. The reality is different from common beliefs on Hive, Spark, and Presto. If interested, see the result of performance comparison using the TPC-DS benchmark.

Re: Hive 3 has big performance improvement from my test

2023-01-07 Thread Sungwoo Park
> > > [image: image.png] > > from your posting, the result is amazing. glad to know hive on mr3 has > that nice performance. > Hive on MR3 is similar to Hive-LLAP in performance, so we can interpret the above result as Hive being much faster than SparkSQL. For executing concurrent queries, the

Re: Hive 3 has big performance improvement from my test

2023-01-08 Thread Sungwoo Park
> from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > > On Sun, 8 Jan 2023 at 05:21, Sungwoo Park wrote: > >> >&g

Re: Specifying YARN Node (Label) for LLAP AM

2023-03-22 Thread Sungwoo Park
Hello, A similar issue was discussed in the Tez mailing list a long time ago: https://lists.apache.org/thread/0vjor12lpcncg43rn6vddw8yc1k62c81 Tez still does not support specifying node labels for AMs, but as explained in the response, this is quite easy to implement if you can re-compile Tez.

Running Hive on Kubernetes,

2023-02-23 Thread Sungwoo Park
Hello, If you are interested in running Hive on Kubernetes (without requiring Hadoop), we have updated the quick start guide on running Hive on MR3 on Kubernetes. The quick start guide shows step-by-step instructions for running Metastore, HiveServer2, Ranger, MR3-UI, Grafana, with/without

Hive on MR3 1.7 released

2023-05-29 Thread Sungwoo Park
Hi Hive users, I am happy to announce the release of MR3 1.7. MR3 is an execution engine for big data processing, and its main application Hive on MR3 is an alternative to Hive-Tez and Hive-LLAP. I would like to summarize its main features. 1. Hive on MR3 on Hadoop Hive on MR3 is easy to install

Performance Evaluation of Trino, Spark, and Hive on MR3

2023-05-31 Thread Sungwoo Park
Hello Hive users, With the release of Hive on MR3 1.7, we published an article that compares Trino, Spark, and Hive on MR3. https://www.datamonad.com/post/2023-05-31-trino-spark-hive-performance-1.7/ Omitted in the article is the result of running Hive-LLAP included in HDP 3.1.4. In our

hive.query.reexecution.stats.persist.scope

2023-05-24 Thread Sungwoo Park
Hi Hive users, Hive can persist runtime statistics by setting hive.query.reexecution.stats.persist.scope to 'hiveserver' or 'metastore' (instead of the default value 'query'). If you have an experience of using this configuration key in production, could you share it here? (Like the stability of

Re: hive.query.reexecution.stats.persist.scope

2023-05-25 Thread Sungwoo Park
sues.apache.org/jira/browse/HIVE-26978 > > On Wed, 24 May 2023 at 19:53, Sungwoo Park wrote: > >> Hi Hive users, >> >> Hive can persist runtime statistics by setting >> hive.query.reexecution.stats.persist.scope to 'hiveserver' or 'metastore' >> (ins

Blog article 'Performance Tuning for Single-table Queries'

2023-12-23 Thread Sungwoo Park
Hello Hive users, I have published a new blog article 'Performance Tuning for Single-table Queries'. It shows how to change configuration parameters of Hive and Tez in order to make simple queries run faster than Spark. Although it uses Hive on MR3, the technique equally applies to Hive on Tez

Re: MR3 1.8 released

2023-12-15 Thread Sungwoo Park
For Chinese users, MR3 1.8 is now shipped in HiDataPlus (along with Celeborn). https://mp.weixin.qq.com/s/65bgrnFpXtORlb4FjlPMWA --- Sungwoo On Sat, Dec 9, 2023 at 9:08 PM Sungwoo Park wrote: > MR3 1.8 released > > On behalf of the MR3 team, I am pleased to announce the release o

MR3 1.8 released

2023-12-09 Thread Sungwoo Park
MR3 1.8 released On behalf of the MR3 team, I am pleased to announce the release of MR3 1.8. MR3 is an execution engine similar in spirit to MapReduce and Tez which has been under development since 2015. Its main application is Hive on MR3. You can run Hive on MR3 on Hadoop, on Kubernetes, in

Re: Docker Hive using tez without hdfs

2024-01-09 Thread Sungwoo Park
Hello, I don't have an answer to your problem, but if your goal is to quickly test Hive 3 using Docker, there is an alternative way which uses Hive on MR3. https://mr3docs.datamonad.com/docs/quick/docker/ You can also run Hive on MR3 on Kubernetes. Thanks, --- Sungwoo On Wed, Jan 10, 2024

Re: Docker Hive using tez without hdfs

2024-01-09 Thread Sungwoo Park
Tez also > works in standalone mode ? > > On Tue, Jan 9, 2024 at 11:08 PM Sungwoo Park wrote: > > > > Hello, > > > > I don't have an answer to your problem, but if your goal is to quickly > test Hive 3 using Docker, there is an alternative way

MR3 1.9 and performance evaluation of Trino 435 and Hive-MR3 1.9 using TPC-DS

2024-01-08 Thread Sungwoo Park
Hello Hive users, MR3 1.9 has been released. For changes, please see the release notes: https://mr3docs.datamonad.com/docs/release/ https://mr3docs.datamonad.com/docs/release/#patches-backported-in-mr3-19 We evaluated the performance of Trino 435 and Hive on MR3 1.9 using the TPC-DS benchmark.

Re: CachedStore for hive.metastore.rawstore.impl in Hive 3.0

2024-02-29 Thread Sungwoo Park
we reverted HIVE-14187 and set > connectionPoolingType=HikariCP (see No.7). Even with connectionPoolingType > set to None, the environment where we reverted HIVE-14187 still performed > reasonably well (see No.6). > > Please note our investigation is still ongoing and we haven't yet come to >

Re: CachedStore for hive.metastore.rawstore.impl in Hive 3.0

2024-02-28 Thread Sungwoo Park
metastore.stats.fetch.bitvector=true can also help generate more efficient query plans. --- Sungwoo On Wed, Feb 28, 2024 at 1:40 PM Takanobu Asanuma wrote: > Hi Sungwoo Park, > > I'm sorry for the late reply to this old email. > We are attempting to upgrade Hive MetaStore from Hive1 to Hive3, a

Re: CachedStore for hive.metastore.rawstore.impl in Hive 3.0

2024-02-28 Thread Sungwoo Park
resolved in your fork of Hive 3.1.3? > Thank you for sharing the issue with CachedStore and the JIRA tickets. > I will also try out metastore.stats.fetch.bitvector=true. > > Regards, > - Takanobu > > 2024年2月28日(水) 18:49 Sungwoo Park : > >> Hello Takanobu, >> >>

Hive-MR3 1.10 released

2024-03-19 Thread Sungwoo Park
Hello Hive users, We have released Hive on MR3 1.10. MR3 is an execution engine similar to MapReduce and Tez, and it supports Hadoop, Kubernetes, and standalone mode. Hive-MR3 uses MR3 for its execution backend in Hive 3.1.3. If you are interested, please give it a try. In MR3 1.10, we have

Re: [ANNOUNCE] Apache Hive 4.0.0 Released

2024-04-04 Thread Sungwoo Park
Congratulations and huge thanks to Apache Hive team and contributors for releasing Hive 4. We have been watching the development of Hive 4 since the release of Hive 3.1, and it's truly satisfying to witness the resolution of all the critical issues at last after 5 years. Hive 4 comes with a lot of