Re: count(*) with count(distinct) gives different results between Hive 2.3.2 and Hive 3.1.2

2020-07-29 Thread Eugene Chung
I found that hive.optimize.countdistinct=true; is the problem, It looks like https://issues.apache.org/jira/browse/HIVE-16654 made the side effect. Best regards, Eugene Chung (Korean : 정의근)

Video demo of fault tolerance in Hive on MR3 on Kubernetes

2020-07-29 Thread Sungwoo Park
Hi everyone, We created a video demo of fault tolerance in Hive on MR3 on Kubernetes, using Hive 3.1.2 and MR3 1.1. Hope you enjoy it! https://youtu.be/uoZGsMUlhew Cheers, --- Sungwoo

count(*) with count(distinct) gives different results between Hive 2.3.2 and Hive 3.1.2

2020-07-29 Thread Eugene Chung
Hi, For the same query, for example, select count(*), count(distinct mid) from db1.table1 where log_date between '2020-07-20' and '2020-07-26'; both Hive 2.3.2 and Hive 3.1.2 give different results for the same input. Note that db1.table1 is an ORC table and partitioned with the log_date

Hive Warehouse Connector Dependency issue

2020-07-29 Thread Daniel . Schmitt
Hi, we are currently trying to use the Hive Warehouse Connector to read transactional tables in Hive (3.0.0.3.1) from Spark (2.3.0). It seems that there is no other option to do so, when the hive tables are transactional. Our application (spring-boot and spark) is runnig fine without the HWC