[jira] [Commented] (HIVE-22165) Synchronisation introduced by HIVE-14296 on SessionManager.closeSession causes high latency in a busy hive server

2019-09-03 Thread Gopal V (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16921966#comment-16921966
 ] 

Gopal V commented on HIVE-22165:


The patch doesn't build, because the variable "session" got pulled out of the 
scope.

> Synchronisation introduced by HIVE-14296 on SessionManager.closeSession 
> causes high latency in a busy hive server
> -
>
> Key: HIVE-22165
> URL: https://issues.apache.org/jira/browse/HIVE-22165
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 2.1.0, 2.3.2
>Reporter: Amruth S
>Assignee: Amruth S
>Priority: Major
> Attachments: HIVE-22165.patch
>
>
> HIVE-14296 introduces this 
> [commit|https://github.com/apache/hive/commit/477a47d3b4b9e3da3c22465217c2024588f7f000]
>  which adds synchronization to SessionManager.closeSession.
> And it looks like it is used only for logging purposes.
> In a busy hive server where 5-10 sessions are created closed every second, an 
> increase in latency of any other downstream services (Zk, HDFS) causes a 
> queueing effect (lot of threads getting blocked on 
> SessionManager.closeSession) creating an induced latency of 3-5 minutes at 
> times for just closing the session. 
> Since the gauge (MetricsConstant.HS2_OPEN_SESSIONS) is already tracking the 
> open session counts, the synchronization (along with the additional logging) 
> can be without any functionality losses.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (HIVE-22161) UDF: FunctionRegistry synchronizes on org.apache.hadoop.hive.ql.udf.UDFType class

2019-09-03 Thread Gopal V (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-22161:
---
Labels: concurrency performance  (was: performance)

> UDF: FunctionRegistry synchronizes on org.apache.hadoop.hive.ql.udf.UDFType 
> class
> -
>
> Key: HIVE-22161
> URL: https://issues.apache.org/jira/browse/HIVE-22161
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 1.2.2, 4.0.0, 3.1.2
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Major
>  Labels: concurrency, performance
> Fix For: 4.0.0
>
> Attachments: HIVE-22161.1.patch
>
>
> There's a hidden synchronization across threads when looking up isStateful 
> and isDeterministic.
> https://github.com/apache/hive/blob/master/common/src/java/org/apache/hive/common/util/AnnotationUtils.java#L27
> {code}
>   // to avoid https://bugs.openjdk.java.net/browse/JDK-7122142
>   public static  T getAnnotation(Class clazz, 
> Class annotationClass) {
> synchronized (annotationClass) {
>   return clazz.getAnnotation(annotationClass);
> }
>   }
> {code}
> This is serializing multiple threads initializing UDFs (or checking them 
> during compilation) & also being locked across threads for each instance of 
> GenericUDFOpEqual in the specific scenario.
> https://bugs.openjdk.java.net/browse/JDK-7122142 is fixed in jdk8+



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (HIVE-22161) UDF: FunctionRegistry synchronizes on org.apache.hadoop.hive.ql.udf.UDFType class

2019-09-03 Thread Gopal V (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-22161:
---
Labels: performance  (was: )

> UDF: FunctionRegistry synchronizes on org.apache.hadoop.hive.ql.udf.UDFType 
> class
> -
>
> Key: HIVE-22161
> URL: https://issues.apache.org/jira/browse/HIVE-22161
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 1.2.2, 4.0.0, 3.1.2
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Major
>  Labels: performance
> Fix For: 4.0.0
>
> Attachments: HIVE-22161.1.patch
>
>
> There's a hidden synchronization across threads when looking up isStateful 
> and isDeterministic.
> https://github.com/apache/hive/blob/master/common/src/java/org/apache/hive/common/util/AnnotationUtils.java#L27
> {code}
>   // to avoid https://bugs.openjdk.java.net/browse/JDK-7122142
>   public static  T getAnnotation(Class clazz, 
> Class annotationClass) {
> synchronized (annotationClass) {
>   return clazz.getAnnotation(annotationClass);
> }
>   }
> {code}
> This is serializing multiple threads initializing UDFs (or checking them 
> during compilation) & also being locked across threads for each instance of 
> GenericUDFOpEqual in the specific scenario.
> https://bugs.openjdk.java.net/browse/JDK-7122142 is fixed in jdk8+



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (HIVE-22161) UDF: FunctionRegistry synchronizes on org.apache.hadoop.hive.ql.udf.UDFType class

2019-09-03 Thread Gopal V (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-22161:
---
Affects Version/s: 4.0.0
   1.2.2
   3.1.2

> UDF: FunctionRegistry synchronizes on org.apache.hadoop.hive.ql.udf.UDFType 
> class
> -
>
> Key: HIVE-22161
> URL: https://issues.apache.org/jira/browse/HIVE-22161
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 1.2.2, 4.0.0, 3.1.2
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-22161.1.patch
>
>
> There's a hidden synchronization across threads when looking up isStateful 
> and isDeterministic.
> https://github.com/apache/hive/blob/master/common/src/java/org/apache/hive/common/util/AnnotationUtils.java#L27
> {code}
>   // to avoid https://bugs.openjdk.java.net/browse/JDK-7122142
>   public static  T getAnnotation(Class clazz, 
> Class annotationClass) {
> synchronized (annotationClass) {
>   return clazz.getAnnotation(annotationClass);
> }
>   }
> {code}
> This is serializing multiple threads initializing UDFs (or checking them 
> during compilation) & also being locked across threads for each instance of 
> GenericUDFOpEqual in the specific scenario.
> https://bugs.openjdk.java.net/browse/JDK-7122142 is fixed in jdk8+



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (HIVE-22161) UDF: FunctionRegistry synchronizes on org.apache.hadoop.hive.ql.udf.UDFType class

2019-09-03 Thread Gopal V (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-22161:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> UDF: FunctionRegistry synchronizes on org.apache.hadoop.hive.ql.udf.UDFType 
> class
> -
>
> Key: HIVE-22161
> URL: https://issues.apache.org/jira/browse/HIVE-22161
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 1.2.2, 4.0.0, 3.1.2
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Major
>  Labels: concurrency, performance
> Fix For: 4.0.0
>
> Attachments: HIVE-22161.1.patch
>
>
> There's a hidden synchronization across threads when looking up isStateful 
> and isDeterministic.
> https://github.com/apache/hive/blob/master/common/src/java/org/apache/hive/common/util/AnnotationUtils.java#L27
> {code}
>   // to avoid https://bugs.openjdk.java.net/browse/JDK-7122142
>   public static  T getAnnotation(Class clazz, 
> Class annotationClass) {
> synchronized (annotationClass) {
>   return clazz.getAnnotation(annotationClass);
> }
>   }
> {code}
> This is serializing multiple threads initializing UDFs (or checking them 
> during compilation) & also being locked across threads for each instance of 
> GenericUDFOpEqual in the specific scenario.
> https://bugs.openjdk.java.net/browse/JDK-7122142 is fixed in jdk8+



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (HIVE-22161) UDF: FunctionRegistry synchronizes on org.apache.hadoop.hive.ql.udf.UDFType class

2019-09-03 Thread Gopal V (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-22161:
---
Fix Version/s: 4.0.0

> UDF: FunctionRegistry synchronizes on org.apache.hadoop.hive.ql.udf.UDFType 
> class
> -
>
> Key: HIVE-22161
> URL: https://issues.apache.org/jira/browse/HIVE-22161
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-22161.1.patch
>
>
> There's a hidden synchronization across threads when looking up isStateful 
> and isDeterministic.
> https://github.com/apache/hive/blob/master/common/src/java/org/apache/hive/common/util/AnnotationUtils.java#L27
> {code}
>   // to avoid https://bugs.openjdk.java.net/browse/JDK-7122142
>   public static  T getAnnotation(Class clazz, 
> Class annotationClass) {
> synchronized (annotationClass) {
>   return clazz.getAnnotation(annotationClass);
> }
>   }
> {code}
> This is serializing multiple threads initializing UDFs (or checking them 
> during compilation) & also being locked across threads for each instance of 
> GenericUDFOpEqual in the specific scenario.
> https://bugs.openjdk.java.net/browse/JDK-7122142 is fixed in jdk8+



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (HIVE-22107) Correlated subquery producing wrong schema

2019-09-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-22107:
--
Labels: pull-request-available  (was: )

> Correlated subquery producing wrong schema
> --
>
> Key: HIVE-22107
> URL: https://issues.apache.org/jira/browse/HIVE-22107
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Affects Versions: 4.0.0
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-22107.1.patch, HIVE-22107.2.patch, 
> HIVE-22107.3.patch, HIVE-22107.4.patch, HIVE-22107.5.patch
>
>
> *Repro*
> {code:sql}
> create table test(id int, name string,dept string);
> insert into test values(1,'a','it'),(2,'b','eee'),(NULL, 'c', 'cse');
> select distinct 'empno' as eid, a.id from test a where NOT EXISTS (select 
> c.id from test c where a.id=c.id);
> {code}
> {code}
> +---++
> |  eid  |  a.id  |
> +---++
> | NULL  | empno  |
> +---++
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (HIVE-22107) Correlated subquery producing wrong schema

2019-09-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22107?focusedWorklogId=305873=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-305873
 ]

ASF GitHub Bot logged work on HIVE-22107:
-

Author: ASF GitHub Bot
Created on: 03/Sep/19 20:46
Start Date: 03/Sep/19 20:46
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on pull request #755: HIVE-22107 
Correlated subquery producing wrong schema
URL: https://github.com/apache/hive/pull/755#discussion_r320470567
 
 

 ##
 File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveSubQueryRemoveRule.java
 ##
 @@ -199,7 +199,7 @@ private RexNode rewriteScalar(RelMetadataQuery mq, 
RexSubQuery e, Set Correlated subquery producing wrong schema
> --
>
> Key: HIVE-22107
> URL: https://issues.apache.org/jira/browse/HIVE-22107
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Affects Versions: 4.0.0
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-22107.1.patch, HIVE-22107.2.patch, 
> HIVE-22107.3.patch, HIVE-22107.4.patch, HIVE-22107.5.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> *Repro*
> {code:sql}
> create table test(id int, name string,dept string);
> insert into test values(1,'a','it'),(2,'b','eee'),(NULL, 'c', 'cse');
> select distinct 'empno' as eid, a.id from test a where NOT EXISTS (select 
> c.id from test c where a.id=c.id);
> {code}
> {code}
> +---++
> |  eid  |  a.id  |
> +---++
> | NULL  | empno  |
> +---++
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (HIVE-22162) MVs are not using ACID tables by default

2019-09-03 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-22162:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Pushed to master, thanks [~kkasa]!

> MVs are not using ACID tables by default
> 
>
> Key: HIVE-22162
> URL: https://issues.apache.org/jira/browse/HIVE-22162
> Project: Hive
>  Issue Type: Bug
>  Components: Materialized views
>Affects Versions: 3.1.2
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-22162.1.patch, HIVE-22162.2.patch, 
> HIVE-22162.3.patch, HIVE-22162.4.patch
>
>
> {code}
> SET hive.support.concurrency=true;
> SET hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
> SET metastore.strict.managed.tables=true;
> SET hive.default.fileformat=textfile;
> SET hive.default.fileformat.managed=orc;
> SET metastore.create.as.acid=true;
> CREATE TABLE cmv_basetable_n4 (a int, b varchar(256), c decimal(10,2));
> INSERT INTO cmv_basetable_n4 VALUES (1, 'alfred', 10.30),(2, 'bob', 3.14),(2, 
> 'bonnie', 172342.2),(3, 'calvin', 978.76),(3, 'charlie', 9.8);
> CREATE MATERIALIZED VIEW cmv_mat_view_n4 disable rewrite
> AS SELECT a, b, c FROM cmv_basetable_n4;
> DESCRIBE FORMATTED cmv_mat_view_n4;
> {code}
> {code}
> POSTHOOK: query: DESCRIBE FORMATTED cmv_mat_view_n4
> ...
> Table Type:   MATERIALIZED_VIEW
> Table Parameters:  
>   COLUMN_STATS_ACCURATE   
> {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"a\":\"true\",\"b\":\"true\",\"c\":\"true\"}}
>   bucketing_version   2   
>   numFiles1   
>   numRows 5   
>   rawDataSize 1025
>   totalSize   509   
> {code}
> Missing table parameter
> {code}
> transaction = true
> {code}
> cc.: [~ashutoshc], [~gopalv], [~jcamachorodriguez]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (HIVE-22162) MVs are not using ACID tables by default

2019-09-03 Thread Jesus Camacho Rodriguez (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16921573#comment-16921573
 ] 

Jesus Camacho Rodriguez commented on HIVE-22162:


+1

> MVs are not using ACID tables by default
> 
>
> Key: HIVE-22162
> URL: https://issues.apache.org/jira/browse/HIVE-22162
> Project: Hive
>  Issue Type: Bug
>  Components: Materialized views
>Affects Versions: 3.1.2
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-22162.1.patch, HIVE-22162.2.patch, 
> HIVE-22162.3.patch, HIVE-22162.4.patch
>
>
> {code}
> SET hive.support.concurrency=true;
> SET hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
> SET metastore.strict.managed.tables=true;
> SET hive.default.fileformat=textfile;
> SET hive.default.fileformat.managed=orc;
> SET metastore.create.as.acid=true;
> CREATE TABLE cmv_basetable_n4 (a int, b varchar(256), c decimal(10,2));
> INSERT INTO cmv_basetable_n4 VALUES (1, 'alfred', 10.30),(2, 'bob', 3.14),(2, 
> 'bonnie', 172342.2),(3, 'calvin', 978.76),(3, 'charlie', 9.8);
> CREATE MATERIALIZED VIEW cmv_mat_view_n4 disable rewrite
> AS SELECT a, b, c FROM cmv_basetable_n4;
> DESCRIBE FORMATTED cmv_mat_view_n4;
> {code}
> {code}
> POSTHOOK: query: DESCRIBE FORMATTED cmv_mat_view_n4
> ...
> Table Type:   MATERIALIZED_VIEW
> Table Parameters:  
>   COLUMN_STATS_ACCURATE   
> {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"a\":\"true\",\"b\":\"true\",\"c\":\"true\"}}
>   bucketing_version   2   
>   numFiles1   
>   numRows 5   
>   rawDataSize 1025
>   totalSize   509   
> {code}
> Missing table parameter
> {code}
> transaction = true
> {code}
> cc.: [~ashutoshc], [~gopalv], [~jcamachorodriguez]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (HIVE-22162) MVs are not using ACID tables by default

2019-09-03 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-22162:
---
Summary: MVs are not using ACID tables by default  (was: MVs are not using 
ACID tables.)

> MVs are not using ACID tables by default
> 
>
> Key: HIVE-22162
> URL: https://issues.apache.org/jira/browse/HIVE-22162
> Project: Hive
>  Issue Type: Bug
>  Components: Materialized views
>Affects Versions: 3.1.2
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-22162.1.patch, HIVE-22162.2.patch, 
> HIVE-22162.3.patch, HIVE-22162.4.patch
>
>
> {code}
> SET hive.support.concurrency=true;
> SET hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
> SET metastore.strict.managed.tables=true;
> SET hive.default.fileformat=textfile;
> SET hive.default.fileformat.managed=orc;
> SET metastore.create.as.acid=true;
> CREATE TABLE cmv_basetable_n4 (a int, b varchar(256), c decimal(10,2));
> INSERT INTO cmv_basetable_n4 VALUES (1, 'alfred', 10.30),(2, 'bob', 3.14),(2, 
> 'bonnie', 172342.2),(3, 'calvin', 978.76),(3, 'charlie', 9.8);
> CREATE MATERIALIZED VIEW cmv_mat_view_n4 disable rewrite
> AS SELECT a, b, c FROM cmv_basetable_n4;
> DESCRIBE FORMATTED cmv_mat_view_n4;
> {code}
> {code}
> POSTHOOK: query: DESCRIBE FORMATTED cmv_mat_view_n4
> ...
> Table Type:   MATERIALIZED_VIEW
> Table Parameters:  
>   COLUMN_STATS_ACCURATE   
> {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"a\":\"true\",\"b\":\"true\",\"c\":\"true\"}}
>   bucketing_version   2   
>   numFiles1   
>   numRows 5   
>   rawDataSize 1025
>   totalSize   509   
> {code}
> Missing table parameter
> {code}
> transaction = true
> {code}
> cc.: [~ashutoshc], [~gopalv], [~jcamachorodriguez]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (HIVE-22164) Vectorized Limit operator returns wrong number of results with offset

2019-09-03 Thread Ramesh Kumar Thangarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramesh Kumar Thangarajan updated HIVE-22164:

Attachment: HIVE-22164.4.patch
Status: Patch Available  (was: Open)

> Vectorized Limit operator returns wrong number of results with offset
> -
>
> Key: HIVE-22164
> URL: https://issues.apache.org/jira/browse/HIVE-22164
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, llap, Vectorization
>Affects Versions: 4.0.0
>Reporter: Ramesh Kumar Thangarajan
>Assignee: Ramesh Kumar Thangarajan
>Priority: Major
> Attachments: HIVE-22164.1.patch, HIVE-22164.2.patch, 
> HIVE-22164.3.patch, HIVE-22164.4.patch
>
>
> Vectorized Limit operator returns wrong number of results with offset



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (HIVE-22164) Vectorized Limit operator returns wrong number of results with offset

2019-09-03 Thread Ramesh Kumar Thangarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramesh Kumar Thangarajan updated HIVE-22164:

Status: Open  (was: Patch Available)

> Vectorized Limit operator returns wrong number of results with offset
> -
>
> Key: HIVE-22164
> URL: https://issues.apache.org/jira/browse/HIVE-22164
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, llap, Vectorization
>Affects Versions: 4.0.0
>Reporter: Ramesh Kumar Thangarajan
>Assignee: Ramesh Kumar Thangarajan
>Priority: Major
> Attachments: HIVE-22164.1.patch, HIVE-22164.2.patch, 
> HIVE-22164.3.patch
>
>
> Vectorized Limit operator returns wrong number of results with offset



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (HIVE-22099) Several date related UDFs can't handle Julian dates properly since HIVE-20007

2019-09-03 Thread Jesus Camacho Rodriguez (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16921542#comment-16921542
 ] 

Jesus Camacho Rodriguez commented on HIVE-22099:


[~szita], a few minor comments on the patch. Consider using 
{{DateTimeMath.retrieveProlepticGregorianCalendarUTC}} instead of importing the 
static method. Also {{retrieveProlepticGregorianCalendarUTC}} -> 
{{getProlepticGregorianCalendarUTC}}.

> Several date related UDFs can't handle Julian dates properly since HIVE-20007
> -
>
> Key: HIVE-22099
> URL: https://issues.apache.org/jira/browse/HIVE-22099
> Project: Hive
>  Issue Type: Bug
>Reporter: Adam Szita
>Assignee: Adam Szita
>Priority: Major
> Attachments: HIVE-22099.0.patch, HIVE-22099.1.patch, 
> HIVE-22099.2.patch, HIVE-22099.3.patch, HIVE-22099.4.patch, HIVE-22099.5.patch
>
>
> Currently dates that belong to Julian calendar (before Oct 15, 1582) are 
> handled improperly by date/timestamp UDFs.
> E.g. DateFormat UDF:
> Although the dates are in Julian calendar, the formatter insists to print 
> these according to Gregorian calendar causing multiple days of difference in 
> some cases:
>  
> {code:java}
> beeline> select date_format('1001-01-05','dd---MM--');
> ++
> | _c0 |
> ++
> | 30---12--1000 |
> ++{code}
>  I've observed similar problems in the following UDFs:
>  * add_months
>  * date_format
>  * day
>  * month
>  * months_between
>  * weekofyear
>  * year
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (HIVE-21737) Upgrade Avro to version 1.9.1

2019-09-03 Thread Nandor Kollar (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16921487#comment-16921487
 ] 

Nandor Kollar commented on HIVE-21737:
--

[~Fokko] changes on {{RelTreeSignature.java}} look unrelated, would you mind 
reverting those?

In addition, I'm afraid that 
[this|https://github.com/apache/hive/blob/master/serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java#L238]
 part of TypeInfoToSchema no longer sets default to null as it used to do: this 
call landed 
[here|https://github.com/apache/avro/blob/branch-1.8/lang/java/avro/src/main/java/org/apache/avro/Schema.java#L394]
 before, but now 
[this|https://github.com/apache/avro/blob/master/lang/java/avro/src/main/java/org/apache/avro/Schema.java#L557]
 constructor is getting called, which - if I'm not mistaken - will end with a 
{{org.apache.avro.AvroRuntimeException: Unknown datum class: class 
com.fasterxml.jackson.databind.node.NullNode}}. Is my assumption correct? 
Unfortunately I'm not too familiar with Hive, so I don't know which test case 
would fail.
I think we should simply get rid of Jackson classes here, and just pass null in 
the Schema.Field constructor.

> Upgrade Avro to version 1.9.1
> -
>
> Key: HIVE-21737
> URL: https://issues.apache.org/jira/browse/HIVE-21737
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Ismaël Mejía
>Assignee: Fokko Driesprong
>Priority: Minor
>  Labels: pull-request-available
> Attachments: 0001-HIVE-21737-Bump-Apache-Avro-to-1.9.1.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Avro 1.9.0 was released recently. It brings a lot of fixes including a leaner 
> version of Avro without Jackson in the public API. Worth the update.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (HIVE-22150) HS2 allows setting system properties

2019-09-03 Thread Hui An (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16921402#comment-16921402
 ] 

Hui An commented on HIVE-22150:
---

[~alangates] [~pxiong] Could you please review this patch?

> HS2 allows setting system properties
> 
>
> Key: HIVE-22150
> URL: https://issues.apache.org/jira/browse/HIVE-22150
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 3.1.1
>Reporter: Craig Condit
>Assignee: Hui An
>Priority: Major
> Attachments: HIVE-22150.patch.1, HIVE-22150.patch.2
>
>
> HiveServer2 currently allows setting system properties, which is a problem 
> when used in a multi-user environment.
> Connecting via beeline and executing the following demonstrates the issue:
> {noformat}
> 0: jdbc:hive2://serv1000.example.com:2181,serv> SET system:java.io.tmpdir;
> +-+
> | set |
> +-+
> | system:java.io.tmpdir=/tmp  |
> +-+
> 1 row selected (0.018 seconds)
> 0: jdbc:hive2://serv1000.example.com:2181,serv> SET 
> system:java.io.tmpdir=/tmp/attacker-dir;
> No rows affected (0.013 seconds)
> 0: jdbc:hive2://serv1000.example.com:2181,serv> SET system:java.io.tmpdir;
> +--+
> |   set|
> +--+
> | system:java.io.tmpdir=/tmp/attacker-dir  |
> +--+
> 1 row selected (0.019 seconds)
> {noformat}
> Any changes persist until HS2 is restarted, and affect all connected users. 
> At the very least, this is a denial-of-service vector (verified by setting 
> line.separator to a random string).



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Comment Edited] (HIVE-22030) Bumping jackson version to 2.9.9 and 2.9.9.1 (jackson-databind)

2019-09-03 Thread Robert Schaft (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16921364#comment-16921364
 ] 

Robert Schaft edited comment on HIVE-22030 at 9/3/19 11:51 AM:
---

Even jackson 2.9.9.1 has vulnerabilites: 
[CVE-2019-14439|https://www.cvedetails.com/cve/CVE-2019-14439/] and 
[CVE-2019-14379|https://www.cvedetails.com/cve/CVE-2019-14379/]

You need to bump to at least version 2.9.9.2. Newest ist 2.9.9.3


was (Author: robert.schaft):
Even jackson 2.9.9.1 has vulnerabilites: 
[CVE-2019-14439|https://www.cvedetails.com/cve/CVE-2019-14439/] and 
[CVE-2019-14379|https://www.cvedetails.com/cve/CVE-2019-14379/]

You need to bump to at least version 2.9.9.2. Newest ist 2.9.9.3

> Bumping jackson version to 2.9.9 and 2.9.9.1 (jackson-databind)
> ---
>
> Key: HIVE-22030
> URL: https://issues.apache.org/jira/browse/HIVE-22030
> Project: Hive
>  Issue Type: Task
>Reporter: Dombi Akos
>Assignee: Dombi Akos
>Priority: Major
> Fix For: 4.0.0
>
>
> Bump the following jackson versions:
>  - jackson version to 2.9.9
>  - jackson-databind version to 2.9.9.1



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (HIVE-22030) Bumping jackson version to 2.9.9 and 2.9.9.1 (jackson-databind)

2019-09-03 Thread Robert Schaft (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16921364#comment-16921364
 ] 

Robert Schaft commented on HIVE-22030:
--

Even jackson 2.9.9.1 has vulnerabilites: 
[CVE-2019-14439|https://www.cvedetails.com/cve/CVE-2019-14439/] and 
[CVE-2019-14379|https://www.cvedetails.com/cve/CVE-2019-14379/]

You need to bump to at least version 2.9.9.2. Newest ist 2.9.9.3

> Bumping jackson version to 2.9.9 and 2.9.9.1 (jackson-databind)
> ---
>
> Key: HIVE-22030
> URL: https://issues.apache.org/jira/browse/HIVE-22030
> Project: Hive
>  Issue Type: Task
>Reporter: Dombi Akos
>Assignee: Dombi Akos
>Priority: Major
> Fix For: 4.0.0
>
>
> Bump the following jackson versions:
>  - jackson version to 2.9.9
>  - jackson-databind version to 2.9.9.1



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Comment Edited] (HIVE-21002) TIMESTAMP - Backwards incompatible change: Hive 3.1 reads back Avro and Parquet timestamps written by Hive 2.x incorrectly

2019-09-03 Thread Piotr Findeisen (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-21002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16919389#comment-16919389
 ] 

Piotr Findeisen edited comment on HIVE-21002 at 9/3/19 10:59 AM:
-

[~klcopp] [~zi] this issue explicitly talks about Avro and Parquet, whereas the 
same problem applies also to "RCBinary" ({{ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.columnar.LazyBinaryColumnarSerDe' STORED AS 
RCFILE;}}).
-Has this been addressed too, or should I create a new issue?- created 
HIVE-22167


was (Author: findepi):
[~klcopp] [~zi]  this issue explicitly talks about Avro and Parquet, whereas 
the same problem applies also to "RCBinary" ({{ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.columnar.LazyBinaryColumnarSerDe' STORED AS 
RCFILE;}}).
Has this been addressed too, or should I create a new issue?

> TIMESTAMP - Backwards incompatible change: Hive 3.1 reads back Avro and 
> Parquet timestamps written by Hive 2.x incorrectly
> --
>
> Key: HIVE-21002
> URL: https://issues.apache.org/jira/browse/HIVE-21002
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.1.0, 3.1.1
>Reporter: Zoltan Ivanfi
>Priority: Major
>
> Hive 3.1 reads back Avro and Parquet timestamps written by Hive 2.x 
> incorrectly. As an example session to demonstrate this problem, create a 
> dataset using Hive version 2.x in America/Los_Angeles:
> {code:sql}
> hive> create table ts_‹format› (ts timestamp) stored as ‹format›;
> hive> insert into ts_‹format› values (*‘2018-01-01 00:00:00.000’*);
> {code}
> Querying this table by issuing
> {code:sql}
> hive> select * from ts_‹format›;
> {code}
> from different time zones using different versions of Hive and different 
> storage formats gives the following results:
> |‹format›|Writer time zone (in Hive 2.x)|Reader time zone|Result in Hive 2.x 
> reader|Result in Hive 3.1 reader|
> |Avro and Parquet|America/Los_Angeles|America/Los_Angeles|2018-01-01 
> *00*:00:00.0|2018-01-01 *08*:00:00.0|
> |Avro and Parquet|America/Los_Angeles|Europe/Paris|2018-01-01 
> *09*:00:00.0|2018-01-01 *08*:00:00.0|
> |Textfile and ORC|America/Los_Angeles|America/Los_Angeles|2018-01-01 
> 00:00:00.0|2018-01-01 00:00:00.0|
> |Textfile and ORC|America/Los_Angeles|Europe/Paris|2018-01-01 
> 00:00:00.0|2018-01-01 00:00:00.0|
> *Hive 3.1 clearly gives different results than Hive 2.x for timestamps stored 
> in Avro and Parquet formats.* Apache ORC behaviour has not changed because it 
> was modified to adjust timestamps to retain backwards compatibility. Textfile 
> behaviour has not changed, because its processing involves parsing and 
> formatting instead of proper serializing and deserializing, so they 
> inherently had LocalDateTime semantics even in Hive 2.x.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Comment Edited] (HIVE-22077) Inserting overwrite partitions clause does not clean directories while partitions' info is not stored in metadata

2019-09-03 Thread Hui An (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16921275#comment-16921275
 ] 

Hui An edited comment on HIVE-22077 at 9/3/19 9:21 AM:
---

[~kgyrtkirk] Could you please review this patch?


was (Author: bone an):
[~kgyrtkirk]Could you please review this patch?

> Inserting overwrite partitions clause does not clean directories while 
> partitions' info is not stored in metadata
> -
>
> Key: HIVE-22077
> URL: https://issues.apache.org/jira/browse/HIVE-22077
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.1.1, 4.0.0, 2.3.4
>Reporter: Hui An
>Assignee: Hui An
>Priority: Major
> Attachments: HIVE-22077.patch.1
>
>
> Inserting overwrite static partitions may not clean related HDFS location if 
> partitions' info is not stored in metadata.
> Steps to reproduce this issue : 
> 
> 1. Create a managed table :
> 
> {code:sql}
>  CREATE TABLE `test`(   
>`id` string) 
>  PARTITIONED BY (   
>`dayno` string)  
>  ROW FORMAT SERDE   
>'org.apache.hadoop.hive.ql.io.orc.OrcSerde'  
>  STORED AS INPUTFORMAT  
>'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'  
>  OUTPUTFORMAT   
>'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' 
>  LOCATION   
>'hdfs://test-dev-hdfs/user/hive/warehouse/test.db/test' 
>  TBLPROPERTIES (
>'transient_lastDdlTime'='1564731656')   
> {code}
> 
> 2. Create partition's directory and put some data in it
> 
> {code:java}
> hdfs dfs -mkdir 
> hdfs://test-dev-hdfs/user/hive/warehouse/test.db/test/dayno=20190802
> hdfs dfs -put test.data 
> hdfs://test-dev-hdfs/user/hive/warehouse/test.db/test/dayno=20190802
> {code}
> 
> 3. Insert overwrite partition dayno=20190802
> 
> {code:sql}
> INSERT OVERWRITE TABLE test PARTITION(dayno='20190802')
> SELECT "some value";
> {code}
> 
> 4. We could see the test.data under partition directory is not deleted.
> 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (HIVE-22077) Inserting overwrite partitions clause does not clean directories while partitions' info is not stored in metadata

2019-09-03 Thread Hui An (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16921275#comment-16921275
 ] 

Hui An commented on HIVE-22077:
---

[~kgyrtkirk]Could you please review this patch?

> Inserting overwrite partitions clause does not clean directories while 
> partitions' info is not stored in metadata
> -
>
> Key: HIVE-22077
> URL: https://issues.apache.org/jira/browse/HIVE-22077
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.1.1, 4.0.0, 2.3.4
>Reporter: Hui An
>Assignee: Hui An
>Priority: Major
> Attachments: HIVE-22077.patch.1
>
>
> Inserting overwrite static partitions may not clean related HDFS location if 
> partitions' info is not stored in metadata.
> Steps to reproduce this issue : 
> 
> 1. Create a managed table :
> 
> {code:sql}
>  CREATE TABLE `test`(   
>`id` string) 
>  PARTITIONED BY (   
>`dayno` string)  
>  ROW FORMAT SERDE   
>'org.apache.hadoop.hive.ql.io.orc.OrcSerde'  
>  STORED AS INPUTFORMAT  
>'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'  
>  OUTPUTFORMAT   
>'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' 
>  LOCATION   
>'hdfs://test-dev-hdfs/user/hive/warehouse/test.db/test' 
>  TBLPROPERTIES (
>'transient_lastDdlTime'='1564731656')   
> {code}
> 
> 2. Create partition's directory and put some data in it
> 
> {code:java}
> hdfs dfs -mkdir 
> hdfs://test-dev-hdfs/user/hive/warehouse/test.db/test/dayno=20190802
> hdfs dfs -put test.data 
> hdfs://test-dev-hdfs/user/hive/warehouse/test.db/test/dayno=20190802
> {code}
> 
> 3. Insert overwrite partition dayno=20190802
> 
> {code:sql}
> INSERT OVERWRITE TABLE test PARTITION(dayno='20190802')
> SELECT "some value";
> {code}
> 
> 4. We could see the test.data under partition directory is not deleted.
> 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (HIVE-22166) Configure Kerberos for Hive Ranger Client via HS2 configuration

2019-09-03 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko updated HIVE-22166:
--
Attachment: HIVE-22166.1.patch

> Configure Kerberos for Hive Ranger Client via HS2 configuration
> ---
>
> Key: HIVE-22166
> URL: https://issues.apache.org/jira/browse/HIVE-22166
> Project: Hive
>  Issue Type: Improvement
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
> Attachments: HIVE-22166.1.patch
>
>
> In Hive we would like to have possibility to enable Kerberos partially (i.e 
> only Ranger, Atlas and HMS).
> However, since hadoop security is a global flag there are many places that 
> need to be commented out to avoid the UGI cluster wide configuration.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Assigned] (HIVE-22166) Configure Kerberos for Hive Ranger Client via HS2 configuration

2019-09-03 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko reassigned HIVE-22166:
-

Assignee: Denys Kuzmenko

> Configure Kerberos for Hive Ranger Client via HS2 configuration
> ---
>
> Key: HIVE-22166
> URL: https://issues.apache.org/jira/browse/HIVE-22166
> Project: Hive
>  Issue Type: Improvement
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>
> In Hive we would like to have possibility to enable Kerberos partially (i.e 
> only Ranger, Atlas and HMS).
> However, since hadoop security is a global flag there are many places that 
> need to be commented out to avoid the UGI cluster wide configuration.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (HIVE-22149) Metastore: Unify codahale metrics.log json structure between hiveserver2 and metastore services

2019-09-03 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16921225#comment-16921225
 ] 

Zoltan Haindrich commented on HIVE-22149:
-

+1 pending tests
I don't know about anything which might build upon these values; [~abstractdog] 
do you know any?

> Metastore: Unify codahale metrics.log json structure between hiveserver2 and 
> metastore services
> ---
>
> Key: HIVE-22149
> URL: https://issues.apache.org/jira/browse/HIVE-22149
> Project: Hive
>  Issue Type: Bug
>Reporter: Laszlo Bodor
>Assignee: Laszlo Bodor
>Priority: Major
> Attachments: HIVE-22149.01.patch, metrics_hiveserver2.log, 
> metrics_metastore.log
>
>
> While fixing HIVE-22140 I found some really annoying differences between the 
> codahale metric file structures between hiveserver2 and metastore, e.g.
> open_connections: can be found in "counters" for hs2, but in "gauges" for ms
> threads count: it's a proper "threads.count" for hs2, but a really ambiguous 
> "count" for ms
> so I realized that "memory." and "threads." prefix is completely absent in ms 
> metrics file, which is misleading



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (HIVE-22162) MVs are not using ACID tables.

2019-09-03 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16921221#comment-16921221
 ] 

Hive QA commented on HIVE-22162:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12979186/HIVE-22162.4.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:green}SUCCESS:{color} +1 due to 16746 tests passed

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-Build/18436/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/18436/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-18436/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12979186 - PreCommit-HIVE-Build

> MVs are not using ACID tables.
> --
>
> Key: HIVE-22162
> URL: https://issues.apache.org/jira/browse/HIVE-22162
> Project: Hive
>  Issue Type: Bug
>  Components: Materialized views
>Affects Versions: 3.1.2
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-22162.1.patch, HIVE-22162.2.patch, 
> HIVE-22162.3.patch, HIVE-22162.4.patch
>
>
> {code}
> SET hive.support.concurrency=true;
> SET hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
> SET metastore.strict.managed.tables=true;
> SET hive.default.fileformat=textfile;
> SET hive.default.fileformat.managed=orc;
> SET metastore.create.as.acid=true;
> CREATE TABLE cmv_basetable_n4 (a int, b varchar(256), c decimal(10,2));
> INSERT INTO cmv_basetable_n4 VALUES (1, 'alfred', 10.30),(2, 'bob', 3.14),(2, 
> 'bonnie', 172342.2),(3, 'calvin', 978.76),(3, 'charlie', 9.8);
> CREATE MATERIALIZED VIEW cmv_mat_view_n4 disable rewrite
> AS SELECT a, b, c FROM cmv_basetable_n4;
> DESCRIBE FORMATTED cmv_mat_view_n4;
> {code}
> {code}
> POSTHOOK: query: DESCRIBE FORMATTED cmv_mat_view_n4
> ...
> Table Type:   MATERIALIZED_VIEW
> Table Parameters:  
>   COLUMN_STATS_ACCURATE   
> {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"a\":\"true\",\"b\":\"true\",\"c\":\"true\"}}
>   bucketing_version   2   
>   numFiles1   
>   numRows 5   
>   rawDataSize 1025
>   totalSize   509   
> {code}
> Missing table parameter
> {code}
> transaction = true
> {code}
> cc.: [~ashutoshc], [~gopalv], [~jcamachorodriguez]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (HIVE-22163) CBO: Enabling CBO turns on stats estimation, even when the estimation is disabled

2019-09-03 Thread Krisztian Kasa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa updated HIVE-22163:
--
Status: Patch Available  (was: In Progress)

> CBO: Enabling CBO turns on stats estimation, even when the estimation is 
> disabled
> -
>
> Key: HIVE-22163
> URL: https://issues.apache.org/jira/browse/HIVE-22163
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Gopal V
>Assignee: Krisztian Kasa
>Priority: Major
> Attachments: HIVE-22163.1.patch
>
>
> {code}
> create table claims(claim_rec_id bigint, claim_invoice_num string, typ_c int);
> alter table claims update statistics set 
> ('numRows'='1154941534','rawDataSize'='1135307527922');
> set hive.stats.estimate=false;
> explain extended select count(1) from claims where typ_c=3;
> set hive.stats.ndv.estimate.percent=5e-7;
> explain extended select count(1) from claims where typ_c=3;
> {code}
> Expecting the standard /2 for the single filter, but we instead get 5 rows.
> {code}
> 'Map Operator Tree:'
> 'TableScan'
> '  alias: claims'
> '  filterExpr: (typ_c = 3) (type: boolean)'
> '  Statistics: Num rows: 1154941534 Data size: 4388777832 
> Basic stats: COMPLETE Column stats: NONE'
> '  GatherStats: false'
> '  Filter Operator'
> 'isSamplingPred: false'
> 'predicate: (typ_c = 3) (type: boolean)'
> 'Statistics: Num rows: 5 Data size: 19 Basic stats: 
> COMPLETE Column stats: NONE'
> {code}
> The estimation is in effect, as changing the estimate.percent changes this.
> {code}
> '  filterExpr: (typ_c = 3) (type: boolean)'
> '  Statistics: Num rows: 1154941534 Data size: 4388777832 
> Basic stats: COMPLETE Column stats: NONE'
> '  GatherStats: false'
> '  Filter Operator'
> 'isSamplingPred: false'
> 'predicate: (typ_c = 3) (type: boolean)'
> 'Statistics: Num rows: 230988307 Data size: 877755567 
> Basic stats: COMPLETE Column stats: NONE'
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (HIVE-22163) CBO: Enabling CBO turns on stats estimation, even when the estimation is disabled

2019-09-03 Thread Krisztian Kasa (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Kasa updated HIVE-22163:
--
Attachment: HIVE-22163.1.patch

> CBO: Enabling CBO turns on stats estimation, even when the estimation is 
> disabled
> -
>
> Key: HIVE-22163
> URL: https://issues.apache.org/jira/browse/HIVE-22163
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Reporter: Gopal V
>Assignee: Krisztian Kasa
>Priority: Major
> Attachments: HIVE-22163.1.patch
>
>
> {code}
> create table claims(claim_rec_id bigint, claim_invoice_num string, typ_c int);
> alter table claims update statistics set 
> ('numRows'='1154941534','rawDataSize'='1135307527922');
> set hive.stats.estimate=false;
> explain extended select count(1) from claims where typ_c=3;
> set hive.stats.ndv.estimate.percent=5e-7;
> explain extended select count(1) from claims where typ_c=3;
> {code}
> Expecting the standard /2 for the single filter, but we instead get 5 rows.
> {code}
> 'Map Operator Tree:'
> 'TableScan'
> '  alias: claims'
> '  filterExpr: (typ_c = 3) (type: boolean)'
> '  Statistics: Num rows: 1154941534 Data size: 4388777832 
> Basic stats: COMPLETE Column stats: NONE'
> '  GatherStats: false'
> '  Filter Operator'
> 'isSamplingPred: false'
> 'predicate: (typ_c = 3) (type: boolean)'
> 'Statistics: Num rows: 5 Data size: 19 Basic stats: 
> COMPLETE Column stats: NONE'
> {code}
> The estimation is in effect, as changing the estimate.percent changes this.
> {code}
> '  filterExpr: (typ_c = 3) (type: boolean)'
> '  Statistics: Num rows: 1154941534 Data size: 4388777832 
> Basic stats: COMPLETE Column stats: NONE'
> '  GatherStats: false'
> '  Filter Operator'
> 'isSamplingPred: false'
> 'predicate: (typ_c = 3) (type: boolean)'
> 'Statistics: Num rows: 230988307 Data size: 877755567 
> Basic stats: COMPLETE Column stats: NONE'
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (HIVE-22162) MVs are not using ACID tables.

2019-09-03 Thread Hive QA (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16921199#comment-16921199
 ] 

Hive QA commented on HIVE-22162:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
10s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
0s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
40s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  3m 
39s{color} | {color:blue} ql in master has 2248 extant Findbugs warnings. 
{color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
53s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
13s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 22m 36s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.43-2+deb8u5 (2017-09-19) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/data/hiveptest/working/yetus_PreCommit-HIVE-Build-18436/dev-support/hive-personality.sh
 |
| git revision | master / 04397e5 |
| Default Java | 1.8.0_111 |
| findbugs | v3.0.0 |
| modules | C: ql U: ql |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-18436/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> MVs are not using ACID tables.
> --
>
> Key: HIVE-22162
> URL: https://issues.apache.org/jira/browse/HIVE-22162
> Project: Hive
>  Issue Type: Bug
>  Components: Materialized views
>Affects Versions: 3.1.2
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: HIVE-22162.1.patch, HIVE-22162.2.patch, 
> HIVE-22162.3.patch, HIVE-22162.4.patch
>
>
> {code}
> SET hive.support.concurrency=true;
> SET hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
> SET metastore.strict.managed.tables=true;
> SET hive.default.fileformat=textfile;
> SET hive.default.fileformat.managed=orc;
> SET metastore.create.as.acid=true;
> CREATE TABLE cmv_basetable_n4 (a int, b varchar(256), c decimal(10,2));
> INSERT INTO cmv_basetable_n4 VALUES (1, 'alfred', 10.30),(2, 'bob', 3.14),(2, 
> 'bonnie', 172342.2),(3, 'calvin', 978.76),(3, 'charlie', 9.8);
> CREATE MATERIALIZED VIEW cmv_mat_view_n4 disable rewrite
> AS SELECT a, b, c FROM cmv_basetable_n4;
> DESCRIBE FORMATTED cmv_mat_view_n4;
> {code}
> {code}
> POSTHOOK: query: DESCRIBE FORMATTED cmv_mat_view_n4
> ...
> Table Type:   MATERIALIZED_VIEW
> Table Parameters:  
>   COLUMN_STATS_ACCURATE   
> {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"a\":\"true\",\"b\":\"true\",\"c\":\"true\"}}
>   bucketing_version   2   
>   numFiles1   
>   numRows 5   
>   rawDataSize 1025
>   totalSize   509