[jira] [Work logged] (HIVE-23387) Flip the Warehouse.getDefaultTablePath() to return path from ext warehouse

2020-09-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23387?focusedWorklogId=481227=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-481227
 ]

ASF GitHub Bot logged work on HIVE-23387:
-

Author: ASF GitHub Bot
Created on: 10/Sep/20 04:56
Start Date: 10/Sep/20 04:56
Worklog Time Spent: 10m 
  Work Description: nrg4878 closed pull request #1473:
URL: https://github.com/apache/hive/pull/1473


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 481227)
Time Spent: 0.5h  (was: 20m)

> Flip the Warehouse.getDefaultTablePath() to return path from ext warehouse
> --
>
> Key: HIVE-23387
> URL: https://issues.apache.org/jira/browse/HIVE-23387
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-23387.patch, HIVE-23387.patch, HIVE-23387.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> For backward compatibility, initial fix returned path that was set on db. It 
> could have been either from managed warehouse or external depending on what 
> was set. There were tests relying on certain paths to be returned. This fix 
> is to address the tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24133) Hive query with Hbase storagehandler can give back incorrect results when predicate contains null check

2020-09-09 Thread zhishui (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17193355#comment-17193355
 ] 

zhishui commented on HIVE-24133:


I track source and find there may be a wrong usage of scan of hbase or the bug 
is from hbase.

> Hive query with Hbase storagehandler can give back incorrect results when 
> predicate contains null check
> ---
>
> Key: HIVE-24133
> URL: https://issues.apache.org/jira/browse/HIVE-24133
> Project: Hive
>  Issue Type: Bug
>Reporter: Marton Bod
>Assignee: zhishui
>Priority: Major
>
> It has been observed that when using Hbase storage handler and the table 
> contains null values, Hive can give back wrong query results, depending on 
> what columns we select for and whether the where clause predicate contains 
> any null checks.
> For example:
> create 'default:hive_test', 'cf'
> put 'default:hive_test', '1', 'cf:col1', 'val1'
> put 'default:hive_test', '1', 'cf:col2', 'val2'
> put 'default:hive_test', '2', 'cf:col1', 'val1_2'
> put 'default:hive_test', '2', 'cf:col2', 'val2_2'
> put 'default:hive_test', '3', 'cf:col1', 'val1_3'
> put 'default:hive_test', '3', 'cf:col2', 'val2_3'
> put 'default:hive_test', '3', 'cf:col3', 'val3_3'
> put 'default:hive_test', '3', 'cf:col4', "\x00\x00\x00\x00\x00\x02\xC2"
> put 'default:hive_test', '4', 'cf:col1', 'val1_4'
> put 'default:hive_test', '4', 'cf:col2', 'val2_4'
> scan 'default:hive_test'
> = HIVE
> CREATE EXTERNAL TABLE hbase_hive_test (
> rowkey string,
> col1 string,
> col2 string,
> col3 string
> )
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> WITH SERDEPROPERTIES (
> "hbase.columns.mapping" = ":key,cf:col1,cf:col2,cf:col3"
> )
> TBLPROPERTIES("hbase.table.name" = "default:hive_test");
> query: select * from hbase_hive_test where col3 is null;
> result:
> Total MapReduce CPU Time Spent: 10 seconds 980 msec
> OK
> 1 val1 val2 NULL
> 2 val1_2 val2_2 NULL
> 4 val1_4 val2_4 NULL
> query: select rowkey from hbase_hive_test where col3 is null;
> This does not produce any records.
> However, select rowkey, col2 from hbase_hive_test where col3 is null;
> This gives back the correct results again.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24036) Kryo Exception while serializing plan for getSplits UDF call

2020-09-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24036?focusedWorklogId=481197=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-481197
 ]

ASF GitHub Bot logged work on HIVE-24036:
-

Author: ASF GitHub Bot
Created on: 10/Sep/20 02:06
Start Date: 10/Sep/20 02:06
Worklog Time Spent: 10m 
  Work Description: jcamachor merged pull request #1399:
URL: https://github.com/apache/hive/pull/1399


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 481197)
Time Spent: 0.5h  (was: 20m)

> Kryo Exception while serializing plan for getSplits UDF call
> 
>
> Key: HIVE-24036
> URL: https://issues.apache.org/jira/browse/HIVE-24036
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> {code:java}
> Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: 
> java.lang.IllegalArgumentException: Unable to create serializer 
> "org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer" for 
> class: org.apache.hadoop.hive.llap.LlapOutputFormat
> Serialization trace:
> outputFileFormatClass (org.apache.hadoop.hive.ql.plan.TableDesc)
> tableInfo (org.apache.hadoop.hive.ql.plan.FileSinkDesc)
> conf (org.apache.hadoop.hive.ql.exec.FileSinkOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.UnionOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)childOperators 
> (org.apache.hadoop.hive.ql.exec.MapJoinOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.PTFOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)    
>at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.serializeObjectByKryo(SerializationUtilities.java:700)
>   
>at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.serializePlan(SerializationUtilities.java:571)
>   
>at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.serializePlan(SerializationUtilities.java:560)
>  
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24036) Kryo Exception while serializing plan for getSplits UDF call

2020-09-09 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-24036:
---
Fix Version/s: 4.0.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Pushed to master, thanks [~nareshpr]!

> Kryo Exception while serializing plan for getSplits UDF call
> 
>
> Key: HIVE-24036
> URL: https://issues.apache.org/jira/browse/HIVE-24036
> Project: Hive
>  Issue Type: Bug
>Reporter: Naresh P R
>Assignee: Naresh P R
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> {code:java}
> Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: 
> java.lang.IllegalArgumentException: Unable to create serializer 
> "org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer" for 
> class: org.apache.hadoop.hive.llap.LlapOutputFormat
> Serialization trace:
> outputFileFormatClass (org.apache.hadoop.hive.ql.plan.TableDesc)
> tableInfo (org.apache.hadoop.hive.ql.plan.FileSinkDesc)
> conf (org.apache.hadoop.hive.ql.exec.FileSinkOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.UnionOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)childOperators 
> (org.apache.hadoop.hive.ql.exec.MapJoinOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.PTFOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)    
>at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.serializeObjectByKryo(SerializationUtilities.java:700)
>   
>at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.serializePlan(SerializationUtilities.java:571)
>   
>at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.serializePlan(SerializationUtilities.java:560)
>  
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24132) Metastore client doesn't close connection properly

2020-09-09 Thread zhishui (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17193276#comment-17193276
 ] 

zhishui commented on HIVE-24132:


Could you give more details about how to reproduce this situation.

> Metastore client doesn't close connection properly
> --
>
> Key: HIVE-24132
> URL: https://issues.apache.org/jira/browse/HIVE-24132
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 1.2.1
>Reporter: xiepengjie
>Priority: Major
>
> While closing metastore client connection, sometimes throws warning log with 
> following trace. 
> {code:java}
> 2020-09-09 10:56:14,408 WARN org.apache.thrift.transport.TIOStreamTransport: 
> Error closing output stream.
> java.net.SocketException: Socket closed
> at 
> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:116)
> at java.net.SocketOutputStream.write(SocketOutputStream.java:153)
> at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
> at java.io.FilterOutputStream.close(FilterOutputStream.java:158)
> at 
> org.apache.thrift.transport.TIOStreamTransport.close(TIOStreamTransport.java:110)
> at org.apache.thrift.transport.TSocket.close(TSocket.java:235)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.close(HiveMetaStoreClient.java:506)
> at sun.reflect.GeneratedMethodAccessor44.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:156)
> at com.sun.proxy.$Proxy6.close(Unknown Source)
> at sun.reflect.GeneratedMethodAccessor44.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:1992)
> at com.sun.proxy.$Proxy6.close(Unknown Source)
> at org.apache.hadoop.hive.ql.metadata.Hive.close(Hive.java:320)
> at org.apache.hadoop.hive.ql.metadata.Hive.access$000(Hive.java:143)
> at org.apache.hadoop.hive.ql.metadata.Hive$1.remove(Hive.java:167)
> at org.apache.hadoop.hive.ql.metadata.Hive.closeCurrent(Hive.java:288)
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.close(HiveSessionImpl.java:616)
> at 
> org.apache.hive.service.cli.session.HiveSessionImplwithUGI.close(HiveSessionImplwithUGI.java:93)
> at sun.reflect.GeneratedMethodAccessor117.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78)
> at 
> org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36)
> at 
> org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1923)
> at 
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59)
> at com.sun.proxy.$Proxy19.close(Unknown Source)
> at 
> org.apache.hive.service.cli.session.SessionManager.closeSession(SessionManager.java:300)
> at 
> org.apache.hive.service.cli.CLIService.closeSession(CLIService.java:237)
> at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.CloseSession(ThriftCLIService.java:464)
> at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$CloseSession.getResult(TCLIService.java:1273)
> at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$CloseSession.getResult(TCLIService.java:1258)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at 
> org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:57)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> 

[jira] [Work logged] (HIVE-24035) Add Jenkinsfile for branch-2.3

2020-09-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24035?focusedWorklogId=481074=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-481074
 ]

ASF GitHub Bot logged work on HIVE-24035:
-

Author: ASF GitHub Bot
Created on: 09/Sep/20 21:42
Start Date: 09/Sep/20 21:42
Worklog Time Spent: 10m 
  Work Description: sunchao commented on pull request #1398:
URL: https://github.com/apache/hive/pull/1398#issuecomment-689837266


   @kgyrtkirk can you review this PR? I think this is ready now - I've tried a 
few times already and it can successfully run most test cases with a few 
hundreds failed ones, which is similar to branch-2. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 481074)
Time Spent: 1h 50m  (was: 1h 40m)

> Add Jenkinsfile for branch-2.3
> --
>
> Key: HIVE-24035
> URL: https://issues.apache.org/jira/browse/HIVE-24035
> Project: Hive
>  Issue Type: Test
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> To enable precommit tests for github PR, we need to have a Jenkinsfile in the 
> repo. This is already done for master and branch-2. This adds the same for 
> branch-2.3



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23884) SemanticAnalyzer exception when addressing field with table name in group by

2020-09-09 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-23884:
---
Summary: SemanticAnalyzer exception when addressing field with table name 
in group by  (was: SemanticAnalyze exception when addressing field with table 
name in group by)

> SemanticAnalyzer exception when addressing field with table name in group by
> 
>
> Key: HIVE-23884
> URL: https://issues.apache.org/jira/browse/HIVE-23884
> Project: Hive
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Priority: Major
>
> {noformat}
> explain cbo 
> select  `item`.`i_item_id`,
> `store`.`s_state`, grouping(s_state) `g_state` from  
> `tpcds_bin_partitioned_orc_1`.`store`, 
> `tpcds_bin_partitioned_orc_1`.`item`
> where `store`.`s_state` in ('AL','IN', 'SC', 'NY', 'OH', 'FL')
> group by rollup (`item`.`i_item_id`, `s_state`)
> CBO PLAN:
> HiveProject(i_item_id=[$0], s_state=[$1], g_state=[grouping($2, 0:BIGINT)])
>   HiveAggregate(group=[{0, 1}], groups=[[{0, 1}, {0}, {}]], 
> GROUPING__ID=[GROUPING__ID()])
> HiveJoin(condition=[true], joinType=[inner], algorithm=[none], cost=[not 
> available])
>   HiveProject(i_item_id=[$1])
> HiveTableScan(table=[[tpcds_bin_partitioned_orc_1, item]], 
> table:alias=[item])
>   HiveProject(s_state=[$24])
> HiveFilter(condition=[IN($24, _UTF-16LE'AL', _UTF-16LE'IN', 
> _UTF-16LE'SC', _UTF-16LE'NY', _UTF-16LE'OH', _UTF-16LE'FL')])
>   HiveTableScan(table=[[tpcds_bin_partitioned_orc_1, store]], 
> table:alias=[store])
> {noformat}
>  
> However, adding fully qualified field name "*`store`.`s_state`*"" in the 
> second rollup throws SemanticAnalyzer exception
>  
> {noformat}
> explain cbo 
> select  `item`.`i_item_id`,
> `store`.`s_state`, grouping(s_state) `g_state` from  
> `tpcds_bin_partitioned_orc_1`.`store`, 
> `tpcds_bin_partitioned_orc_1`.`item`
> where `store`.`s_state` in ('AL','IN', 'SC', 'NY', 'OH', 'FL')
> group by rollup (`item`.`i_item_id`, `store`.`s_state`)
> Error: Error while compiling statement: FAILED: RuntimeException [Error 
> 10409]: Expression in GROUPING function not present in GROUP BY 
> (state=42000,code=10409)
> {noformat}
> Exception: based on 3.x; but mostly should occur in master as well.
> Related ticket: https://issues.apache.org/jira/browse/HIVE-15996
> {noformat}
> Caused by: java.lang.RuntimeException: Expression in GROUPING function not 
> present in GROUP BY
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer$2.post(SemanticAnalyzer.java:3296)
>  ~[hive-exec-3.1xyz]
>   at org.antlr.runtime.tree.TreeVisitor.visit(TreeVisitor.java:66) 
> ~[antlr-runtime-3.5.2.jar:3.5.2]
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.rewriteGroupingFunctionAST(SemanticAnalyzer.java:3305)
>  ~[hive-exec-3.1xyz]
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genSelectPlan(SemanticAnalyzer.java:4616)
>  ~[hive-exec-3.1xyz]
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genSelectPlan(SemanticAnalyzer.java:4392)
>  ~[hive-exec-3.1xyz]
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:11026)
>  ~[hive-exec-3.1xyz]
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:10965)
>  ~[hive-exec-3.1xyz]
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11894)
>  ~[hive-exec-3.1xyz]
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11764)
>  ~[hive-exec-3.1xyz]
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genOPTree(SemanticAnalyzer.java:12568)
>  ~[hive-exec-3.1xyz]
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:707)
>  ~[hive-exec-3.1xyz]
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12669)
>  ~[hive-exec-3.1xyz]
>   at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:426)
>  ~[hive-exec-3.1xyz]
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:288)
>  ~[hive-exec-3.1xyz]
>   at 
> org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:170)
>  ~[hive-exec-3.1xyz]
>   at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:288)
>  ~[hive-exec-3.1xyz]
>   at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:221) 
> ~[hive-exec-3.1xyz]
>   at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:104) 
> ~[hive-exec-3.1xyz]
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:188) 
> 

[jira] [Work logged] (HIVE-23454) Querying hive table which has Materialized view fails with HiveAccessControlException

2020-09-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23454?focusedWorklogId=481062=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-481062
 ]

ASF GitHub Bot logged work on HIVE-23454:
-

Author: ASF GitHub Bot
Created on: 09/Sep/20 21:22
Start Date: 09/Sep/20 21:22
Worklog Time Spent: 10m 
  Work Description: vineetgarg02 commented on a change in pull request 
#1471:
URL: https://github.com/apache/hive/pull/1471#discussion_r485929257



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/views/HiveMaterializedViewUtils.java
##
@@ -347,6 +355,38 @@ public static RelNode copyNodeNewCluster(RelOptCluster 
optCluster, RelNode node)
 }
   }
 
+  /**
+   * Validate if given materialized view has SELECT privileges for current user
+   * @param cachedMVTable
+   * @return false if user does not have privilege otherwise true
+   * @throws HiveException
+   */
+  public static boolean checkPrivilegeForMV(List cachedMVTableList) 
throws HiveException {

Review comment:
   Done

##
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java
##
@@ -2280,6 +2280,15 @@ private RelNode 
applyMaterializedViewRewriting(RelOptPlanner planner, RelNode ba
 return calcitePreMVRewritingPlan;
   }
 
+  try {
+if 
(!HiveMaterializedViewUtils.checkPrivilegeForMV(materializedViewsUsedAfterRewrite))
 {
+  // if materialized views do not have appropriate privilges, we 
shouldn't be using them

Review comment:
   Fixed





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 481062)
Time Spent: 1h 10m  (was: 1h)

> Querying hive table which has Materialized view fails with 
> HiveAccessControlException
> -
>
> Key: HIVE-23454
> URL: https://issues.apache.org/jira/browse/HIVE-23454
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization, HiveServer2
>Affects Versions: 3.0.0, 3.2.0
>Reporter: Chiran Ravani
>Assignee: Vineet Garg
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Query fails with HiveAccessControlException against table when there is  
> Materialized view pointing to that table which end user does not have access 
> to, but the actual table user has all the privileges.
> From the HiveServer2 logs - it looks as part of optimization Hive uses 
> materialized view to query the data instead of table and since end user does 
> not have access on MV we receive HiveAccessControlException.
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveVolcanoPlanner.java#L99
> The Simplest reproducer for this issue is as below.
> 1. Create a table using hive user and insert some data
> {code:java}
> create table db1.testmvtable(id int, name string) partitioned by(year int);
> insert into db1.testmvtable partition(year=2020) values(1,'Name1');
> insert into db1.testmvtable partition(year=2020) values(1,'Name2');
> insert into db1.testmvtable partition(year=2016) values(1,'Name1');
> insert into db1.testmvtable partition(year=2016) values(1,'Name2');
> {code}
> 2. Create Materialized view on top of above table with partitioned and where 
> clause as hive user.
> {code:java}
> CREATE MATERIALIZED VIEW db2.testmv PARTITIONED ON(year) as select * from 
> db1.testmvtable tmv where year >= 2018;
> {code}
> 3. Grant all (Select to be minimum) access to user 'chiran' via Ranger on 
> database db1.
> 4. Run select on base table db1.testmvtable as 'chiran' with where clause 
> having partition value >=2018, it runs into HiveAccessControlException on 
> db2.testmv
> {code:java}
> eg:- (select * from db1.testmvtable where year=2020;)
> 0: jdbc:hive2://node2> select * from db1.testmvtable where year=2020;
> Error: Error while compiling statement: FAILED: HiveAccessControlException 
> Permission denied: user [chiran] does not have [SELECT] privilege on 
> [db2/testmv/*] (state=42000,code=4)
> {code}
> 5. This works when partition column is not in MV
> {code:java}
> 0: jdbc:hive2://node2> select * from db1.testmvtable where year=2016;
> DEBUG : Acquired the compile lock.
> INFO  : Compiling 
> command(queryId=hive_20200507130248_841458fe-7048-4727-8816-3f9472d2a67a): 
> select * from db1.testmvtable where year=2016
> DEBUG : Encoding valid txns info 

[jira] [Work logged] (HIVE-23454) Querying hive table which has Materialized view fails with HiveAccessControlException

2020-09-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23454?focusedWorklogId=481063=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-481063
 ]

ASF GitHub Bot logged work on HIVE-23454:
-

Author: ASF GitHub Bot
Created on: 09/Sep/20 21:22
Start Date: 09/Sep/20 21:22
Worklog Time Spent: 10m 
  Work Description: vineetgarg02 commented on pull request #1471:
URL: https://github.com/apache/hive/pull/1471#issuecomment-689828973


   > We should probably create a follow-up JIRA to check authorization before 
triggering the rewriting algorithm. If the compilation overhead to check every 
MV that is applicable to the given query is unacceptable, permissions could 
possibly be kept in the HS2 registry and refreshed periodically in the 
background, then verified after rewriting, which would at least decrease the 
number of authorization failures. @vineetgarg02 , can you create a JIRA for 
this?
   
   @jcamachor  I have created a jira at 
https://issues.apache.org/jira/browse/HIVE-24140



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 481063)
Time Spent: 1h 20m  (was: 1h 10m)

> Querying hive table which has Materialized view fails with 
> HiveAccessControlException
> -
>
> Key: HIVE-23454
> URL: https://issues.apache.org/jira/browse/HIVE-23454
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization, HiveServer2
>Affects Versions: 3.0.0, 3.2.0
>Reporter: Chiran Ravani
>Assignee: Vineet Garg
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Query fails with HiveAccessControlException against table when there is  
> Materialized view pointing to that table which end user does not have access 
> to, but the actual table user has all the privileges.
> From the HiveServer2 logs - it looks as part of optimization Hive uses 
> materialized view to query the data instead of table and since end user does 
> not have access on MV we receive HiveAccessControlException.
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/cost/HiveVolcanoPlanner.java#L99
> The Simplest reproducer for this issue is as below.
> 1. Create a table using hive user and insert some data
> {code:java}
> create table db1.testmvtable(id int, name string) partitioned by(year int);
> insert into db1.testmvtable partition(year=2020) values(1,'Name1');
> insert into db1.testmvtable partition(year=2020) values(1,'Name2');
> insert into db1.testmvtable partition(year=2016) values(1,'Name1');
> insert into db1.testmvtable partition(year=2016) values(1,'Name2');
> {code}
> 2. Create Materialized view on top of above table with partitioned and where 
> clause as hive user.
> {code:java}
> CREATE MATERIALIZED VIEW db2.testmv PARTITIONED ON(year) as select * from 
> db1.testmvtable tmv where year >= 2018;
> {code}
> 3. Grant all (Select to be minimum) access to user 'chiran' via Ranger on 
> database db1.
> 4. Run select on base table db1.testmvtable as 'chiran' with where clause 
> having partition value >=2018, it runs into HiveAccessControlException on 
> db2.testmv
> {code:java}
> eg:- (select * from db1.testmvtable where year=2020;)
> 0: jdbc:hive2://node2> select * from db1.testmvtable where year=2020;
> Error: Error while compiling statement: FAILED: HiveAccessControlException 
> Permission denied: user [chiran] does not have [SELECT] privilege on 
> [db2/testmv/*] (state=42000,code=4)
> {code}
> 5. This works when partition column is not in MV
> {code:java}
> 0: jdbc:hive2://node2> select * from db1.testmvtable where year=2016;
> DEBUG : Acquired the compile lock.
> INFO  : Compiling 
> command(queryId=hive_20200507130248_841458fe-7048-4727-8816-3f9472d2a67a): 
> select * from db1.testmvtable where year=2016
> DEBUG : Encoding valid txns info 897:9223372036854775807::893,895,896 
> txnid:897
> INFO  : Semantic Analysis Completed (retrial = false)
> INFO  : Returning Hive schema: 
> Schema(fieldSchemas:[FieldSchema(name:testmvtable.id, type:int, 
> comment:null), FieldSchema(name:testmvtable.name, type:string, comment:null), 
> FieldSchema(name:testmvtable.year, type:int, comment:null)], properties:null)
> INFO  : Completed compiling 
> command(queryId=hive_20200507130248_841458fe-7048-4727-8816-3f9472d2a67a); 
> Time taken: 0.222 seconds
> DEBUG : Encoding valid txn write ids info 
> 

[jira] [Resolved] (HIVE-22257) Commutativity of operations is not taken into account, e.g., '+'

2020-09-09 Thread Jesus Camacho Rodriguez (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez resolved HIVE-22257.

Fix Version/s: 4.0.0
   Resolution: Fixed

Fixed by CALCITE-3914 . Thanks [~vgarg]!

> Commutativity of operations is not taken into account, e.g., '+'
> 
>
> Key: HIVE-22257
> URL: https://issues.apache.org/jira/browse/HIVE-22257
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO, Materialized views
>Affects Versions: 3.1.2
>Reporter: Steve Carlin
>Assignee: Vineet Garg
>Priority: Major
> Fix For: 4.0.0
>
> Attachments: expr9.sql
>
>
> ...as stated in subject.  Script to reproduce is attached.
> Query and materialized view are as follows:
> create materialized view view5 stored as orc as (select prod_id, cust_id, 
> store_id, sale_date, qty, amt, descr from sales where cust_id + prod_id > 1 + 
> 2);
> explain extended select  prod_id, cust_id  from sales where prod_id + cust_id 
> > 1 + 2;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24135) Drop database doesn't delete directory in managed location

2020-09-09 Thread Naveen Gangam (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17193083#comment-17193083
 ] 

Naveen Gangam commented on HIVE-24135:
--

[~klcopp] [~zhishui] I already had a fix for this in HIVE-23387 (PR 1435, 
PR1473) that has just been committed. It has been pending for a couple of 
months now. Could you please rebase and re-try this please? Thanks

> Drop database doesn't delete directory in managed location
> --
>
> Key: HIVE-24135
> URL: https://issues.apache.org/jira/browse/HIVE-24135
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Karen Coppage
>Assignee: Naveen Gangam
>Priority: Major
>
> Repro:
>  say the default managed location is managed/hive and the default external 
> location is external/hive.
> {code:java}
> create database db1; -- creates: external/hive/db1.db
> create table db1.table1 (i int); -- creates: managed/hive/db1.db and  
> managed/hive/db1.db/table1
> drop database db1 cascade; -- removes : external/hive/db1.db and 
> managed/hive/db1.db/table1
> {code}
> Problem: Directory managed/hive/db1.db remains.
> Since HIVE-22995, dbs have a managed (managedLocationUri) and an external 
> location (locationUri). I think the issue is that 
> HiveMetaStore.HMSHandler#drop_database_core deletes only the db directory in 
> the external location.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24139) VectorGroupByOperator is not flushing hash table entries as needed

2020-09-09 Thread Mustafa Iman (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17193041#comment-17193041
 ] 

Mustafa Iman commented on HIVE-24139:
-

[~rajesh.balamohan] This patch is very slightly different than the last one we 
discussed. I'd be grateful if you could verify this on cloud.

> VectorGroupByOperator is not flushing hash table entries as needed
> --
>
> Key: HIVE-24139
> URL: https://issues.apache.org/jira/browse/HIVE-24139
> Project: Hive
>  Issue Type: Bug
>Reporter: Mustafa Iman
>Assignee: Mustafa Iman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> After https://issues.apache.org/jira/browse/HIVE-23975 introduced a bug where 
> copyKey mutates some key wrappers while copying. This Jira is to fix it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24139) VectorGroupByOperator is not flushing hash table entries as needed

2020-09-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24139?focusedWorklogId=480924=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-480924
 ]

ASF GitHub Bot logged work on HIVE-24139:
-

Author: ASF GitHub Bot
Created on: 09/Sep/20 16:59
Start Date: 09/Sep/20 16:59
Worklog Time Spent: 10m 
  Work Description: mustafaiman opened a new pull request #1481:
URL: https://github.com/apache/hive/pull/1481


   …as needed
   
   Change-Id: Idf25882ac6bef0db63ce08f67e8abcbc9fc60712
   
   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 480924)
Remaining Estimate: 0h
Time Spent: 10m

> VectorGroupByOperator is not flushing hash table entries as needed
> --
>
> Key: HIVE-24139
> URL: https://issues.apache.org/jira/browse/HIVE-24139
> Project: Hive
>  Issue Type: Bug
>Reporter: Mustafa Iman
>Assignee: Mustafa Iman
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> After https://issues.apache.org/jira/browse/HIVE-23975 introduced a bug where 
> copyKey mutates some key wrappers while copying. This Jira is to fix it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23387) Flip the Warehouse.getDefaultTablePath() to return path from ext warehouse

2020-09-09 Thread Naveen Gangam (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam updated HIVE-23387:
-
Fix Version/s: 4.0.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Fix has been committed into master.

> Flip the Warehouse.getDefaultTablePath() to return path from ext warehouse
> --
>
> Key: HIVE-23387
> URL: https://issues.apache.org/jira/browse/HIVE-23387
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Affects Versions: 4.0.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-23387.patch, HIVE-23387.patch, HIVE-23387.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> For backward compatibility, initial fix returned path that was set on db. It 
> could have been either from managed warehouse or external depending on what 
> was set. There were tests relying on certain paths to be returned. This fix 
> is to address the tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24139) VectorGroupByOperator is not flushing hash table entries as needed

2020-09-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24139:
--
Labels: pull-request-available  (was: )

> VectorGroupByOperator is not flushing hash table entries as needed
> --
>
> Key: HIVE-24139
> URL: https://issues.apache.org/jira/browse/HIVE-24139
> Project: Hive
>  Issue Type: Bug
>Reporter: Mustafa Iman
>Assignee: Mustafa Iman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> After https://issues.apache.org/jira/browse/HIVE-23975 introduced a bug where 
> copyKey mutates some key wrappers while copying. This Jira is to fix it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (HIVE-24139) VectorGroupByOperator is not flushing hash table entries as needed

2020-09-09 Thread Mustafa Iman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-24139 started by Mustafa Iman.
---
> VectorGroupByOperator is not flushing hash table entries as needed
> --
>
> Key: HIVE-24139
> URL: https://issues.apache.org/jira/browse/HIVE-24139
> Project: Hive
>  Issue Type: Bug
>Reporter: Mustafa Iman
>Assignee: Mustafa Iman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> After https://issues.apache.org/jira/browse/HIVE-23975 introduced a bug where 
> copyKey mutates some key wrappers while copying. This Jira is to fix it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24139) VectorGroupByOperator is not flushing hash table entries as needed

2020-09-09 Thread Mustafa Iman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mustafa Iman updated HIVE-24139:

Status: Patch Available  (was: In Progress)

> VectorGroupByOperator is not flushing hash table entries as needed
> --
>
> Key: HIVE-24139
> URL: https://issues.apache.org/jira/browse/HIVE-24139
> Project: Hive
>  Issue Type: Bug
>Reporter: Mustafa Iman
>Assignee: Mustafa Iman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> After https://issues.apache.org/jira/browse/HIVE-23975 introduced a bug where 
> copyKey mutates some key wrappers while copying. This Jira is to fix it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24139) VectorGroupByOperator is not flushing hash table entries as needed

2020-09-09 Thread Mustafa Iman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mustafa Iman reassigned HIVE-24139:
---


> VectorGroupByOperator is not flushing hash table entries as needed
> --
>
> Key: HIVE-24139
> URL: https://issues.apache.org/jira/browse/HIVE-24139
> Project: Hive
>  Issue Type: Bug
>Reporter: Mustafa Iman
>Assignee: Mustafa Iman
>Priority: Major
>
> After https://issues.apache.org/jira/browse/HIVE-23975 introduced a bug where 
> copyKey mutates some key wrappers while copying. This Jira is to fix it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24134) Revert removal of HiveStrictManagedMigration code

2020-09-09 Thread Aasha Medhi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aasha Medhi updated HIVE-24134:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Revert removal of HiveStrictManagedMigration code
> -
>
> Key: HIVE-24134
> URL: https://issues.apache.org/jira/browse/HIVE-24134
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24134.01.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Partial revert of https://issues.apache.org/jira/browse/HIVE-23995 to keep 
> the migration code



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24134) Revert removal of HiveStrictManagedMigration code

2020-09-09 Thread Aasha Medhi (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17193024#comment-17193024
 ] 

Aasha Medhi commented on HIVE-24134:


Committed to master

> Revert removal of HiveStrictManagedMigration code
> -
>
> Key: HIVE-24134
> URL: https://issues.apache.org/jira/browse/HIVE-24134
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24134.01.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Partial revert of https://issues.apache.org/jira/browse/HIVE-23995 to keep 
> the migration code



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23851) MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions

2020-09-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23851?focusedWorklogId=480864=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-480864
 ]

ASF GitHub Bot logged work on HIVE-23851:
-

Author: ASF GitHub Bot
Created on: 09/Sep/20 14:42
Start Date: 09/Sep/20 14:42
Worklog Time Spent: 10m 
  Work Description: shameersss1 commented on pull request #1271:
URL: https://github.com/apache/hive/pull/1271#issuecomment-689608550


   @kgyrtkirk I have changed the approach, Please take a look at the new 
implementation.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 480864)
Time Spent: 4h  (was: 3h 50m)

> MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions
> 
>
> Key: HIVE-23851
> URL: https://issues.apache.org/jira/browse/HIVE-23851
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> *Steps to reproduce:*
> # Create external table
> # Run msck command to sync all the partitions with metastore
> # Remove one of the partition path
> # Run msck repair with partition filtering
> *Stack Trace:*
> {code:java}
>  2020-07-15T02:10:29,045 ERROR [4dad298b-28b1-4e6b-94b6-aa785b60c576 main] 
> ppr.PartitionExpressionForMetastore: Failed to deserialize the expression
>  java.lang.IndexOutOfBoundsException: Index: 110, Size: 0
>  at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_192]
>  at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_192]
>  at 
> org.apache.hive.com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:60)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:857)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:707) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:211)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectFromKryo(SerializationUtilities.java:806)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeExpressionFromKryo(SerializationUtilities.java:775)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.deserializeExpr(PartitionExpressionForMetastore.java:96)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.convertExprToFilter(PartitionExpressionForMetastore.java:52)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.metastore.PartFilterExprUtil.makeExpressionTree(PartFilterExprUtil.java:48)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExprInternal(ObjectStore.java:3593)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.metastore.VerifyingObjectStore.getPartitionsByExpr(VerifyingObjectStore.java:80)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT-tests.jar:4.0.0-SNAPSHOT]
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_192]
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_192]
> {code}
> *Cause:*
> In case of msck repair with partition filtering we expect expression proxy 
> class to be set as PartitionExpressionForMetastore ( 
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/ddl/misc/msck/MsckAnalyzer.java#L78
>  ), While dropping partition we serialize the drop partition filter 
> expression as ( 
> https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/Msck.java#L589
>  ) which is incompatible during deserializtion happening in 
> PartitionExpressionForMetastore ( 
> 

[jira] [Work logged] (HIVE-23851) MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions

2020-09-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23851?focusedWorklogId=480863=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-480863
 ]

ASF GitHub Bot logged work on HIVE-23851:
-

Author: ASF GitHub Bot
Created on: 09/Sep/20 14:41
Start Date: 09/Sep/20 14:41
Worklog Time Spent: 10m 
  Work Description: shameersss1 commented on a change in pull request #1271:
URL: https://github.com/apache/hive/pull/1271#discussion_r456500229



##
File path: 
ql/src/test/org/apache/hadoop/hive/ql/exec/TestPartitionManagement.java
##
@@ -15,7 +15,7 @@
  * See the License for the specific language governing permissions and
  * limitations under the License.
  */
-package org.apache.hadoop.hive.metastore;
+package org.apache.hadoop.hive.ql.exec;

Review comment:
   Moved TestPartitionManagement.java to ql module due to dependency on 
PartitionExpressionForMetastore and some other ql class for serializing 
partition expression while dropping partition.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 480863)
Time Spent: 3h 50m  (was: 3h 40m)

> MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions
> 
>
> Key: HIVE-23851
> URL: https://issues.apache.org/jira/browse/HIVE-23851
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> *Steps to reproduce:*
> # Create external table
> # Run msck command to sync all the partitions with metastore
> # Remove one of the partition path
> # Run msck repair with partition filtering
> *Stack Trace:*
> {code:java}
>  2020-07-15T02:10:29,045 ERROR [4dad298b-28b1-4e6b-94b6-aa785b60c576 main] 
> ppr.PartitionExpressionForMetastore: Failed to deserialize the expression
>  java.lang.IndexOutOfBoundsException: Index: 110, Size: 0
>  at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_192]
>  at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_192]
>  at 
> org.apache.hive.com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:60)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hive.com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:857)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:707) 
> ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:211)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectFromKryo(SerializationUtilities.java:806)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeExpressionFromKryo(SerializationUtilities.java:775)
>  ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.deserializeExpr(PartitionExpressionForMetastore.java:96)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.convertExprToFilter(PartitionExpressionForMetastore.java:52)
>  [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.metastore.PartFilterExprUtil.makeExpressionTree(PartFilterExprUtil.java:48)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExprInternal(ObjectStore.java:3593)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
>  at 
> org.apache.hadoop.hive.metastore.VerifyingObjectStore.getPartitionsByExpr(VerifyingObjectStore.java:80)
>  [hive-standalone-metastore-server-4.0.0-SNAPSHOT-tests.jar:4.0.0-SNAPSHOT]
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_192]
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_192]
> {code}
> *Cause:*
> In case of msck repair with partition filtering we expect expression proxy 
> class to be set as PartitionExpressionForMetastore ( 
> 

[jira] [Work logged] (HIVE-24134) Revert removal of HiveStrictManagedMigration code

2020-09-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24134?focusedWorklogId=480839=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-480839
 ]

ASF GitHub Bot logged work on HIVE-24134:
-

Author: ASF GitHub Bot
Created on: 09/Sep/20 14:03
Start Date: 09/Sep/20 14:03
Worklog Time Spent: 10m 
  Work Description: szlta merged pull request #1480:
URL: https://github.com/apache/hive/pull/1480


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 480839)
Time Spent: 20m  (was: 10m)

> Revert removal of HiveStrictManagedMigration code
> -
>
> Key: HIVE-24134
> URL: https://issues.apache.org/jira/browse/HIVE-24134
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24134.01.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Partial revert of https://issues.apache.org/jira/browse/HIVE-23995 to keep 
> the migration code



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Issue Comment Deleted] (HIVE-24133) Hive query with Hbase storagehandler can give back incorrect results when predicate contains null check

2020-09-09 Thread zhishui (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhishui updated HIVE-24133:
---
Comment: was deleted

(was: I try as you describe and get same result. I have try use explain but get 
same result, I think there may caused by hbase.)

> Hive query with Hbase storagehandler can give back incorrect results when 
> predicate contains null check
> ---
>
> Key: HIVE-24133
> URL: https://issues.apache.org/jira/browse/HIVE-24133
> Project: Hive
>  Issue Type: Bug
>Reporter: Marton Bod
>Priority: Major
>
> It has been observed that when using Hbase storage handler and the table 
> contains null values, Hive can give back wrong query results, depending on 
> what columns we select for and whether the where clause predicate contains 
> any null checks.
> For example:
> create 'default:hive_test', 'cf'
> put 'default:hive_test', '1', 'cf:col1', 'val1'
> put 'default:hive_test', '1', 'cf:col2', 'val2'
> put 'default:hive_test', '2', 'cf:col1', 'val1_2'
> put 'default:hive_test', '2', 'cf:col2', 'val2_2'
> put 'default:hive_test', '3', 'cf:col1', 'val1_3'
> put 'default:hive_test', '3', 'cf:col2', 'val2_3'
> put 'default:hive_test', '3', 'cf:col3', 'val3_3'
> put 'default:hive_test', '3', 'cf:col4', "\x00\x00\x00\x00\x00\x02\xC2"
> put 'default:hive_test', '4', 'cf:col1', 'val1_4'
> put 'default:hive_test', '4', 'cf:col2', 'val2_4'
> scan 'default:hive_test'
> = HIVE
> CREATE EXTERNAL TABLE hbase_hive_test (
> rowkey string,
> col1 string,
> col2 string,
> col3 string
> )
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> WITH SERDEPROPERTIES (
> "hbase.columns.mapping" = ":key,cf:col1,cf:col2,cf:col3"
> )
> TBLPROPERTIES("hbase.table.name" = "default:hive_test");
> query: select * from hbase_hive_test where col3 is null;
> result:
> Total MapReduce CPU Time Spent: 10 seconds 980 msec
> OK
> 1 val1 val2 NULL
> 2 val1_2 val2_2 NULL
> 4 val1_4 val2_4 NULL
> query: select rowkey from hbase_hive_test where col3 is null;
> This does not produce any records.
> However, select rowkey, col2 from hbase_hive_test where col3 is null;
> This gives back the correct results again.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24133) Hive query with Hbase storagehandler can give back incorrect results when predicate contains null check

2020-09-09 Thread zhishui (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhishui reassigned HIVE-24133:
--

Assignee: zhishui

> Hive query with Hbase storagehandler can give back incorrect results when 
> predicate contains null check
> ---
>
> Key: HIVE-24133
> URL: https://issues.apache.org/jira/browse/HIVE-24133
> Project: Hive
>  Issue Type: Bug
>Reporter: Marton Bod
>Assignee: zhishui
>Priority: Major
>
> It has been observed that when using Hbase storage handler and the table 
> contains null values, Hive can give back wrong query results, depending on 
> what columns we select for and whether the where clause predicate contains 
> any null checks.
> For example:
> create 'default:hive_test', 'cf'
> put 'default:hive_test', '1', 'cf:col1', 'val1'
> put 'default:hive_test', '1', 'cf:col2', 'val2'
> put 'default:hive_test', '2', 'cf:col1', 'val1_2'
> put 'default:hive_test', '2', 'cf:col2', 'val2_2'
> put 'default:hive_test', '3', 'cf:col1', 'val1_3'
> put 'default:hive_test', '3', 'cf:col2', 'val2_3'
> put 'default:hive_test', '3', 'cf:col3', 'val3_3'
> put 'default:hive_test', '3', 'cf:col4', "\x00\x00\x00\x00\x00\x02\xC2"
> put 'default:hive_test', '4', 'cf:col1', 'val1_4'
> put 'default:hive_test', '4', 'cf:col2', 'val2_4'
> scan 'default:hive_test'
> = HIVE
> CREATE EXTERNAL TABLE hbase_hive_test (
> rowkey string,
> col1 string,
> col2 string,
> col3 string
> )
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> WITH SERDEPROPERTIES (
> "hbase.columns.mapping" = ":key,cf:col1,cf:col2,cf:col3"
> )
> TBLPROPERTIES("hbase.table.name" = "default:hive_test");
> query: select * from hbase_hive_test where col3 is null;
> result:
> Total MapReduce CPU Time Spent: 10 seconds 980 msec
> OK
> 1 val1 val2 NULL
> 2 val1_2 val2_2 NULL
> 4 val1_4 val2_4 NULL
> query: select rowkey from hbase_hive_test where col3 is null;
> This does not produce any records.
> However, select rowkey, col2 from hbase_hive_test where col3 is null;
> This gives back the correct results again.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24138) Llap external client flow is broken due to netty shading

2020-09-09 Thread Shubham Chaurasia (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shubham Chaurasia updated HIVE-24138:
-
Description: 
We shaded netty in hive-exec in - 
https://issues.apache.org/jira/browse/HIVE-23073

This breaks LLAP external client flow on LLAP daemon side - 

LLAP daemon stacktrace - 
{code}
2020-09-09T18:22:13,413  INFO [TezTR-222977_4_0_0_0_0 
(497418324441977_0004_0_00_00_0)] llap.LlapOutputFormat: Returning 
writer for: attempt_497418324441977_0004_0_00_00_0
2020-09-09T18:22:13,419 ERROR [TezTR-222977_4_0_0_0_0 
(497418324441977_0004_0_00_00_0)] tez.MapRecordSource: 
java.lang.NoSuchMethodError: 
org.apache.arrow.memory.BufferAllocator.buffer(I)Lorg/apache/hive/io/netty/buffer/ArrowBuf;
at 
org.apache.hadoop.hive.llap.WritableByteChannelAdapter.write(WritableByteChannelAdapter.java:96)
at org.apache.arrow.vector.ipc.WriteChannel.write(WriteChannel.java:74)
at org.apache.arrow.vector.ipc.WriteChannel.write(WriteChannel.java:57)
at 
org.apache.arrow.vector.ipc.WriteChannel.writeIntLittleEndian(WriteChannel.java:89)
at 
org.apache.arrow.vector.ipc.message.MessageSerializer.serialize(MessageSerializer.java:88)
at 
org.apache.arrow.vector.ipc.ArrowWriter.ensureStarted(ArrowWriter.java:130)
at 
org.apache.arrow.vector.ipc.ArrowWriter.writeBatch(ArrowWriter.java:102)
at 
org.apache.hadoop.hive.llap.LlapArrowRecordWriter.write(LlapArrowRecordWriter.java:85)
at 
org.apache.hadoop.hive.llap.LlapArrowRecordWriter.write(LlapArrowRecordWriter.java:46)
at 
org.apache.hadoop.hive.ql.exec.vector.filesink.VectorFileSinkArrowOperator.process(VectorFileSinkArrowOperator.java:137)
at 
org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:969)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:158)
at 
org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:969)
at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:172)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.deliverVectorizedRowBatch(VectorMapOperator.java:809)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:842)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:92)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:76)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:426)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:75)
at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:62)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:62)
at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:38)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at 
org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
{code}

Arrow method signature mismatch mainly happens due to the fact that arrow 
contains some classes which are packaged under {{io.netty.buffer.*}} - 
{code}
io.netty.buffer.ArrowBuf
io.netty.buffer.ExpandableByteBuf
io.netty.buffer.LargeBuffer
io.netty.buffer.MutableWrappedByteBuf
io.netty.buffer.PooledByteBufAllocatorL
io.netty.buffer.UnsafeDirectLittleEndian
{code}

Since we have relocated netty, these classes have also been relocated to 
{{org.apache.hive.io.netty.buffer.*}} and causing {{NoSuchMethodError}}.

cc [~anishek] [~thejas] [~abstractdog] [~irashid] [~bruce.robbins]

  was:
We shaded netty in hive-exec in - 
https://issues.apache.org/jira/browse/HIVE-23073

This breaks LLAP external client flow on LLAP daemon side - 
{code}
2020-09-09T18:22:13,413  INFO [TezTR-222977_4_0_0_0_0 
(497418324441977_0004_0_00_00_0)] 

[jira] [Updated] (HIVE-24138) Llap external client flow is broken due to netty shading

2020-09-09 Thread Shubham Chaurasia (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shubham Chaurasia updated HIVE-24138:
-
Description: 
We shaded netty in hive-exec in - 
https://issues.apache.org/jira/browse/HIVE-23073

This breaks LLAP external client flow on LLAP daemon side - 
{code}
2020-09-09T18:22:13,413  INFO [TezTR-222977_4_0_0_0_0 
(497418324441977_0004_0_00_00_0)] llap.LlapOutputFormat: Returning 
writer for: attempt_497418324441977_0004_0_00_00_0
2020-09-09T18:22:13,419 ERROR [TezTR-222977_4_0_0_0_0 
(497418324441977_0004_0_00_00_0)] tez.MapRecordSource: 
java.lang.NoSuchMethodError: 
org.apache.arrow.memory.BufferAllocator.buffer(I)Lorg/apache/hive/io/netty/buffer/ArrowBuf;
at 
org.apache.hadoop.hive.llap.WritableByteChannelAdapter.write(WritableByteChannelAdapter.java:96)
at org.apache.arrow.vector.ipc.WriteChannel.write(WriteChannel.java:74)
at org.apache.arrow.vector.ipc.WriteChannel.write(WriteChannel.java:57)
at 
org.apache.arrow.vector.ipc.WriteChannel.writeIntLittleEndian(WriteChannel.java:89)
at 
org.apache.arrow.vector.ipc.message.MessageSerializer.serialize(MessageSerializer.java:88)
at 
org.apache.arrow.vector.ipc.ArrowWriter.ensureStarted(ArrowWriter.java:130)
at 
org.apache.arrow.vector.ipc.ArrowWriter.writeBatch(ArrowWriter.java:102)
at 
org.apache.hadoop.hive.llap.LlapArrowRecordWriter.write(LlapArrowRecordWriter.java:85)
at 
org.apache.hadoop.hive.llap.LlapArrowRecordWriter.write(LlapArrowRecordWriter.java:46)
at 
org.apache.hadoop.hive.ql.exec.vector.filesink.VectorFileSinkArrowOperator.process(VectorFileSinkArrowOperator.java:137)
at 
org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:969)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:158)
at 
org.apache.hadoop.hive.ql.exec.Operator.vectorForward(Operator.java:969)
at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:172)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.deliverVectorizedRowBatch(VectorMapOperator.java:809)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:842)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:92)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:76)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:426)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267)
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:75)
at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:62)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:62)
at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:38)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at 
org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
{code}

Arrow method signature mismatch mainly happens due to the fact that arrow 
contains some classes which are packaged under {{io.netty.buffer.*}} - 
{code}
io.netty.buffer.ArrowBuf
io.netty.buffer.ExpandableByteBuf
io.netty.buffer.LargeBuffer
io.netty.buffer.MutableWrappedByteBuf
io.netty.buffer.PooledByteBufAllocatorL
io.netty.buffer.UnsafeDirectLittleEndian
{code}

Since we have relocated netty, these classes have also been relocated to 
{{org.apache.hive.io.netty.buffer.*}} and causing {{NoSuchMethodError}}.

  was:
We shaded netty in hive-exec in - 
https://issues.apache.org/jira/browse/HIVE-23073

This breaks LLAP external client flow on LLAP daemon side - 
{code}
2020-09-09T18:22:13,413  INFO [TezTR-222977_4_0_0_0_0 
(497418324441977_0004_0_00_00_0)] llap.LlapOutputFormat: Returning 
writer for: attempt_497418324441977_0004_0_00_00_0

[jira] [Commented] (HIVE-24133) Hive query with Hbase storagehandler can give back incorrect results when predicate contains null check

2020-09-09 Thread zhishui (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17192854#comment-17192854
 ] 

zhishui commented on HIVE-24133:


I try as you describe and get same result. I have try use explain but get same 
result, I think there may caused by hbase.

> Hive query with Hbase storagehandler can give back incorrect results when 
> predicate contains null check
> ---
>
> Key: HIVE-24133
> URL: https://issues.apache.org/jira/browse/HIVE-24133
> Project: Hive
>  Issue Type: Bug
>Reporter: Marton Bod
>Priority: Major
>
> It has been observed that when using Hbase storage handler and the table 
> contains null values, Hive can give back wrong query results, depending on 
> what columns we select for and whether the where clause predicate contains 
> any null checks.
> For example:
> create 'default:hive_test', 'cf'
> put 'default:hive_test', '1', 'cf:col1', 'val1'
> put 'default:hive_test', '1', 'cf:col2', 'val2'
> put 'default:hive_test', '2', 'cf:col1', 'val1_2'
> put 'default:hive_test', '2', 'cf:col2', 'val2_2'
> put 'default:hive_test', '3', 'cf:col1', 'val1_3'
> put 'default:hive_test', '3', 'cf:col2', 'val2_3'
> put 'default:hive_test', '3', 'cf:col3', 'val3_3'
> put 'default:hive_test', '3', 'cf:col4', "\x00\x00\x00\x00\x00\x02\xC2"
> put 'default:hive_test', '4', 'cf:col1', 'val1_4'
> put 'default:hive_test', '4', 'cf:col2', 'val2_4'
> scan 'default:hive_test'
> = HIVE
> CREATE EXTERNAL TABLE hbase_hive_test (
> rowkey string,
> col1 string,
> col2 string,
> col3 string
> )
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> WITH SERDEPROPERTIES (
> "hbase.columns.mapping" = ":key,cf:col1,cf:col2,cf:col3"
> )
> TBLPROPERTIES("hbase.table.name" = "default:hive_test");
> query: select * from hbase_hive_test where col3 is null;
> result:
> Total MapReduce CPU Time Spent: 10 seconds 980 msec
> OK
> 1 val1 val2 NULL
> 2 val1_2 val2_2 NULL
> 4 val1_4 val2_4 NULL
> query: select rowkey from hbase_hive_test where col3 is null;
> This does not produce any records.
> However, select rowkey, col2 from hbase_hive_test where col3 is null;
> This gives back the correct results again.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24135) Drop database doesn't delete directory in managed location

2020-09-09 Thread Karen Coppage (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17192836#comment-17192836
 ] 

Karen Coppage commented on HIVE-24135:
--

metastore.warehouse.dir needs to have a different value from 
metastore.warehouse.external.dir

> Drop database doesn't delete directory in managed location
> --
>
> Key: HIVE-24135
> URL: https://issues.apache.org/jira/browse/HIVE-24135
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Karen Coppage
>Assignee: Naveen Gangam
>Priority: Major
>
> Repro:
>  say the default managed location is managed/hive and the default external 
> location is external/hive.
> {code:java}
> create database db1; -- creates: external/hive/db1.db
> create table db1.table1 (i int); -- creates: managed/hive/db1.db and  
> managed/hive/db1.db/table1
> drop database db1 cascade; -- removes : external/hive/db1.db and 
> managed/hive/db1.db/table1
> {code}
> Problem: Directory managed/hive/db1.db remains.
> Since HIVE-22995, dbs have a managed (managedLocationUri) and an external 
> location (locationUri). I think the issue is that 
> HiveMetaStore.HMSHandler#drop_database_core deletes only the db directory in 
> the external location.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24136) create table table_name as 任务执行成功,表没有创建出来

2020-09-09 Thread paul (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

paul updated HIVE-24136:

Description: 
hive 版本 3.1.2,使用 CTAS 方式创建表,执行状态成功但是表没有创建出来。 通过查询日志,发现没有  
metastore.HiveMetaStore: 22556: create_table: 
Table(tableName:t_nagent_trade_water_day_temp1061  和  exec.Task: Moving data to 
directory  相关的日志。 

查找过mysql binlog,没有在元数据库执行创建表语句

 

这个问题不是经常出现,定时任务每天跑着跑着就会出现一次。 有的时候yarn 集群重启也会发生这种情况

 

 

 

 

create table t_nagent_trade_water_day_temp1061 as

select 
a.trade_water_id,mrch_no,FIRST_REPORT_SUC_TIME,trade_amt,trade_date,create_time,trans_type,a.agent_code,mrch_type,level_four,level_four_name,level_three,level_three_name,

level_two,level_two_name,level_one,level_one_name from 

(select 
t1.trade_water_id,t1.mrch_no,t1.trade_amt,t1.trade_date,t1.create_time,t1.trans_type,t1.agent_code,t2.level_four,t2.level_four_name,

t2.level_three,t2.level_three_name,t2.level_two,t2.level_two_name,t2.level_one,t2.level_one_name

from t_nagent_trade_water t1 left join agent_belong_temp1061 t2 on 
t1.agent_code=t2.level_four

where t2.level_four!='' and t1.trade_status='1'

union all

select 
t1.trade_water_id,t1.mrch_no,t1.trade_amt,t1.trade_date,t1.create_time,t1.trans_type,t1.agent_code,''
 level_four,'' level_four_name,

t2.level_three,t2.level_three_name,t2.level_two,t2.level_two_name,t2.level_one,t2.level_one_name

from t_nagent_trade_water t1 left join 

(select 
level_three,level_three_name,level_two,level_two_name,level_one,level_one_name 
from agent_belong_temp1061

group by 
level_three,level_three_name,level_two,level_two_name,level_one,level_one_name) 

t2 on t1.agent_code=t2.level_three\nwhere t2.level_three!='' and 
t1.trade_status='1'

union all

select 
t1.trade_water_id,t1.mrch_no,t1.trade_amt,t1.trade_date,t1.create_time,t1.trans_type,t1.agent_code,''
 level_four,'' level_four_name,

'' level_three,'' 
level_three_name,t2.level_two,t2.level_two_name,t2.level_one,t2.level_one_name\nfrom
 t_nagent_trade_water t1 left join 

(select level_two,level_two_name,level_one,level_one_name from 
agent_belong_temp1061\ngroup by 
level_two,level_two_name,level_one,level_one_name) 

t2 on t1.agent_code=t2.level_two\nwhere t2.level_two!='' and t1.trade_status='1'

union all

select 
t1.trade_water_id,t1.mrch_no,t1.trade_amt,t1.trade_date,t1.create_time,t1.trans_type,t1.agent_code,''
 level_four,'' level_four_name,

'' level_three,'' level_three_name,'' level_two,'' 
level_two_name,t2.level_one,t2.level_one_name

from t_nagent_trade_water t1 left join 

(select level_one,level_one_name from agent_belong_temp1061\ngroup by 
level_one,level_one_name) 

t2 on t1.agent_code=t2.level_one

where t2.level_one!='' and t1.trade_status='1'

) a left join t_nagent_merchant_incoming b on a.mrch_no=b.merc_no where 
platform_code='05' and 
to_date(create_time)=from_unixtime(unix_timestamp("20200829",'MMdd'),'-MM-dd')

  was:
hive 版本 3.1.2,使用 CTAS 方式创建表,执行状态成功但是表没有创建出来。 通过查询日志,发现没有  
metastore.HiveMetaStore: 22556: create_table: 
Table(tableName:t_nagent_trade_water_day_temp1061  和  exec.Task: Moving data to 
directory  相关的日志。 

查找过mysql binlog,没有在元数据库执行创建表语句

 

 

 

 

create table t_nagent_trade_water_day_temp1061 as

select 
a.trade_water_id,mrch_no,FIRST_REPORT_SUC_TIME,trade_amt,trade_date,create_time,trans_type,a.agent_code,mrch_type,level_four,level_four_name,level_three,level_three_name,

level_two,level_two_name,level_one,level_one_name from 

(select 
t1.trade_water_id,t1.mrch_no,t1.trade_amt,t1.trade_date,t1.create_time,t1.trans_type,t1.agent_code,t2.level_four,t2.level_four_name,

t2.level_three,t2.level_three_name,t2.level_two,t2.level_two_name,t2.level_one,t2.level_one_name

from t_nagent_trade_water t1 left join agent_belong_temp1061 t2 on 
t1.agent_code=t2.level_four

where t2.level_four!='' and t1.trade_status='1'

union all

select 
t1.trade_water_id,t1.mrch_no,t1.trade_amt,t1.trade_date,t1.create_time,t1.trans_type,t1.agent_code,''
 level_four,'' level_four_name,

t2.level_three,t2.level_three_name,t2.level_two,t2.level_two_name,t2.level_one,t2.level_one_name

from t_nagent_trade_water t1 left join 

(select 
level_three,level_three_name,level_two,level_two_name,level_one,level_one_name 
from agent_belong_temp1061

group by 
level_three,level_three_name,level_two,level_two_name,level_one,level_one_name) 

t2 on t1.agent_code=t2.level_three\nwhere t2.level_three!='' and 
t1.trade_status='1'

union all

select 
t1.trade_water_id,t1.mrch_no,t1.trade_amt,t1.trade_date,t1.create_time,t1.trans_type,t1.agent_code,''
 level_four,'' level_four_name,

'' level_three,'' 
level_three_name,t2.level_two,t2.level_two_name,t2.level_one,t2.level_one_name\nfrom
 t_nagent_trade_water t1 left join 

(select level_two,level_two_name,level_one,level_one_name from 
agent_belong_temp1061\ngroup by 
level_two,level_two_name,level_one,level_one_name) 

t2 on t1.agent_code=t2.level_two\nwhere t2.level_two!='' and 

[jira] [Commented] (HIVE-24135) Drop database doesn't delete directory in managed location

2020-09-09 Thread zhishui (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17192817#comment-17192817
 ] 

zhishui commented on HIVE-24135:


I use sqls you provide but I could not reproduce your problem. Could you give 
more details about this issue.

> Drop database doesn't delete directory in managed location
> --
>
> Key: HIVE-24135
> URL: https://issues.apache.org/jira/browse/HIVE-24135
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Karen Coppage
>Assignee: Naveen Gangam
>Priority: Major
>
> Repro:
>  say the default managed location is managed/hive and the default external 
> location is external/hive.
> {code:java}
> create database db1; -- creates: external/hive/db1.db
> create table db1.table1 (i int); -- creates: managed/hive/db1.db and  
> managed/hive/db1.db/table1
> drop database db1 cascade; -- removes : external/hive/db1.db and 
> managed/hive/db1.db/table1
> {code}
> Problem: Directory managed/hive/db1.db remains.
> Since HIVE-22995, dbs have a managed (managedLocationUri) and an external 
> location (locationUri). I think the issue is that 
> HiveMetaStore.HMSHandler#drop_database_core deletes only the db directory in 
> the external location.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24060) When the CBO is false, NPE is thrown by an EXCEPT or INTERSECT execution

2020-09-09 Thread Zhihua Deng (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17192808#comment-17192808
 ] 

Zhihua Deng commented on HIVE-24060:


The intersect and except are now only available through CBO.  The issue may due 
to the method genPlan(QB, QBExpr) returns null when the Opcode is INTERSECT.

> When the CBO is false, NPE is thrown by an EXCEPT or INTERSECT execution
> 
>
> Key: HIVE-24060
> URL: https://issues.apache.org/jira/browse/HIVE-24060
> Project: Hive
>  Issue Type: Bug
>  Components: CBO, Hive
>Affects Versions: 3.1.0, 3.1.2
>Reporter: GuangMing Lu
>Priority: Major
>
> {code:java}
> set hive.cbo.enable=false;
> create table testtable(idx string, namex string) stored as orc;
> insert into testtable values('123', 'aaa'), ('234', 'bbb');
> explain select a.idx from (select idx,namex from testtable intersect select 
> idx,namex from testtable) a
> {code}
>  The execution throws a NullPointException:
> {code:java}
> 2020-08-24 15:12:24,261 | WARN  | HiveServer2-Handler-Pool: Thread-345 | 
> Error executing statement:  | 
> org.apache.hive.service.cli.thrift.ThriftCLIService.executeNewStatement(ThriftCLIService.java:1155)
> org.apache.hive.service.cli.HiveSQLException: Error while compiling 
> statement: FAILED: NullPointerException null
> at 
> org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:341)
>  ~[hive-service-3.1.0.jar:3.1.0]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:215)
>  ~[hive-service-3.1.0.jar:3.1.0]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:316)
>  ~[hive-service-3.1.0.jar:3.1.0]
> at 
> org.apache.hive.service.cli.operation.Operation.run(Operation.java:253) 
> ~[hive-service-3.1.0.jar:3.1.0]
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:684)
>  ~[hive-service-3.1.0.jar:3.1.0]
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:670)
>  ~[hive-service-3.1.0.jar:3.1.0]
> at 
> org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:342)
>  ~[hive-service-3.1.0.jar:3.1.0]
> at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.executeNewStatement(ThriftCLIService.java:1144)
>  ~[hive-service-3.1.0.jar:3.1.0]
> at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:1280)
>  ~[hive-service-3.1.0.jar:3.1.0]
> at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1557)
>  ~[hive-service-rpc-3.1.0.jar:3.1.0]
> at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1542)
>  ~[hive-service-rpc-3.1.0.jar:3.1.0]
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) 
> ~[libthrift-0.9.3.jar:0.9.3]
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) 
> ~[libthrift-0.9.3.jar:0.9.3]
> at 
> org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:648)
>  ~[hive-standalone-metastore-3.1.0.jar:3.1.0]
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
>  ~[libthrift-0.9.3.jar:0.9.3]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  ~[?:1.8.0_201]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  ~[?:1.8.0_201]
> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_201]
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genSelectPlan(SemanticAnalyzer.java:4367)
>  ~[hive-exec-3.1.0.jar:3.1.0]
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genSelectPlan(SemanticAnalyzer.java:4346)
>  ~[hive-exec-3.1.0.jar:3.1.0]
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:10576)
>  ~[hive-exec-3.1.0.jar:3.1.0]
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:10515)
>  ~[hive-exec-3.1.0.jar:3.1.0]
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11434)
>  ~[hive-exec-3.1.0.jar:3.1.0]
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11291)
>  ~[hive-exec-3.1.0.jar:3.1.0]
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11318)
>  ~[hive-exec-3.1.0.jar:3.1.0]
> at 
> 

[jira] [Assigned] (HIVE-24060) When the CBO is false, NPE is thrown by an EXCEPT or INTERSECT execution

2020-09-09 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng reassigned HIVE-24060:
--

Assignee: (was: Zhihua Deng)

> When the CBO is false, NPE is thrown by an EXCEPT or INTERSECT execution
> 
>
> Key: HIVE-24060
> URL: https://issues.apache.org/jira/browse/HIVE-24060
> Project: Hive
>  Issue Type: Bug
>  Components: CBO, Hive
>Affects Versions: 3.1.0, 3.1.2
>Reporter: GuangMing Lu
>Priority: Major
>
> {code:java}
> set hive.cbo.enable=false;
> create table testtable(idx string, namex string) stored as orc;
> insert into testtable values('123', 'aaa'), ('234', 'bbb');
> explain select a.idx from (select idx,namex from testtable intersect select 
> idx,namex from testtable) a
> {code}
>  The execution throws a NullPointException:
> {code:java}
> 2020-08-24 15:12:24,261 | WARN  | HiveServer2-Handler-Pool: Thread-345 | 
> Error executing statement:  | 
> org.apache.hive.service.cli.thrift.ThriftCLIService.executeNewStatement(ThriftCLIService.java:1155)
> org.apache.hive.service.cli.HiveSQLException: Error while compiling 
> statement: FAILED: NullPointerException null
> at 
> org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:341)
>  ~[hive-service-3.1.0.jar:3.1.0]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:215)
>  ~[hive-service-3.1.0.jar:3.1.0]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:316)
>  ~[hive-service-3.1.0.jar:3.1.0]
> at 
> org.apache.hive.service.cli.operation.Operation.run(Operation.java:253) 
> ~[hive-service-3.1.0.jar:3.1.0]
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:684)
>  ~[hive-service-3.1.0.jar:3.1.0]
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:670)
>  ~[hive-service-3.1.0.jar:3.1.0]
> at 
> org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:342)
>  ~[hive-service-3.1.0.jar:3.1.0]
> at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.executeNewStatement(ThriftCLIService.java:1144)
>  ~[hive-service-3.1.0.jar:3.1.0]
> at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:1280)
>  ~[hive-service-3.1.0.jar:3.1.0]
> at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1557)
>  ~[hive-service-rpc-3.1.0.jar:3.1.0]
> at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1542)
>  ~[hive-service-rpc-3.1.0.jar:3.1.0]
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) 
> ~[libthrift-0.9.3.jar:0.9.3]
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) 
> ~[libthrift-0.9.3.jar:0.9.3]
> at 
> org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:648)
>  ~[hive-standalone-metastore-3.1.0.jar:3.1.0]
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
>  ~[libthrift-0.9.3.jar:0.9.3]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  ~[?:1.8.0_201]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  ~[?:1.8.0_201]
> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_201]
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genSelectPlan(SemanticAnalyzer.java:4367)
>  ~[hive-exec-3.1.0.jar:3.1.0]
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genSelectPlan(SemanticAnalyzer.java:4346)
>  ~[hive-exec-3.1.0.jar:3.1.0]
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:10576)
>  ~[hive-exec-3.1.0.jar:3.1.0]
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:10515)
>  ~[hive-exec-3.1.0.jar:3.1.0]
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11434)
>  ~[hive-exec-3.1.0.jar:3.1.0]
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11291)
>  ~[hive-exec-3.1.0.jar:3.1.0]
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11318)
>  ~[hive-exec-3.1.0.jar:3.1.0]
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11304)
>  ~[hive-exec-3.1.0.jar:3.1.0]
> at 
> 

[jira] [Resolved] (HIVE-24125) Incorrect transaction snapshot invalidation with unnecessary writeset check for exclusive operations

2020-09-09 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko resolved HIVE-24125.
---
Resolution: Fixed

> Incorrect transaction snapshot invalidation with unnecessary writeset check 
> for exclusive operations
> 
>
> Key: HIVE-24125
> URL: https://issues.apache.org/jira/browse/HIVE-24125
> Project: Hive
>  Issue Type: Bug
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>
> Fixes [HIVE-23725|https://issues.apache.org/jira/browse/HIVE-23725] and 
> addresses issue with concurrent exclusive writes (shouldn't fail on writeset 
> check).
> https://docs.google.com/document/d/1NVfk479_SxVIWPLXYmZkU8MYQE5nhcHbKMrf3bO_qwI



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23725) ValidTxnManager snapshot outdating causing partial reads in merge insert

2020-09-09 Thread Denys Kuzmenko (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17192760#comment-17192760
 ] 

Denys Kuzmenko commented on HIVE-23725:
---

reverted commit e2a02f1: https://github.com/apache/hive/pull/1474
[~kgyrtkirk], [~pvargacl], thank you for the review!

> ValidTxnManager snapshot outdating causing partial reads in merge insert
> 
>
> Key: HIVE-23725
> URL: https://issues.apache.org/jira/browse/HIVE-23725
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 7h 40m
>  Remaining Estimate: 0h
>
> When the ValidTxnManager invalidates the snapshot during merge insert and 
> starts to read committed transactions that were not committed when the query 
> compilation happened, it can cause partial read problems if the committed 
> transaction created new partition in the source or target table.
> The solution should be not only fix the snapshot but also recompile the query 
> and acquire the locks again.
> You could construct an example like this:
> 1. open and compile transaction 1 that merge inserts data from a partitioned 
> source table that has a few partition.
> 2. Open, run and commit transaction 2 that inserts data to an old and a new 
> partition to the source table.
> 3. Open, run and commit transaction 3 that inserts data to the target table 
> of the merge statement, that will retrigger a snapshot generation in 
> transaction 1.
> 4. Run transaction 1, the snapshot will be regenerated, and it will read 
> partial data from transaction 2 breaking the ACID properties.
> Different setup.
> Switch the transaction order:
> 1. compile transaction 1 that inserts data to an old and a new partition of 
> the source table.
> 2. compile transaction 2 that insert data to the target table
> 2. compile transaction 3 that merge inserts data from the source table to the 
> target table
> 3. run and commit transaction 1
> 4. run and commit transaction 2
> 5. run transaction 3, since it cointains 1 and 2 in its snaphot the 
> isValidTxnListState will be triggered and we do a partial read of the 
> transaction 1 for the same reasons.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24125) Incorrect transaction snapshot invalidation with unnecessary writeset check for exclusive operations

2020-09-09 Thread Denys Kuzmenko (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko reassigned HIVE-24125:
-

Assignee: Denys Kuzmenko

> Incorrect transaction snapshot invalidation with unnecessary writeset check 
> for exclusive operations
> 
>
> Key: HIVE-24125
> URL: https://issues.apache.org/jira/browse/HIVE-24125
> Project: Hive
>  Issue Type: Bug
>Reporter: Denys Kuzmenko
>Assignee: Denys Kuzmenko
>Priority: Major
>
> Fixes [HIVE-23725|https://issues.apache.org/jira/browse/HIVE-23725] and 
> addresses issue with concurrent exclusive writes (shouldn't fail on writeset 
> check).
> https://docs.google.com/document/d/1NVfk479_SxVIWPLXYmZkU8MYQE5nhcHbKMrf3bO_qwI



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-24125) Incorrect transaction snapshot invalidation with unnecessary writeset check for exclusive operations

2020-09-09 Thread Denys Kuzmenko (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-24125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17192757#comment-17192757
 ] 

Denys Kuzmenko commented on HIVE-24125:
---

Pushed to master.
[~pvarga], [~pvary], thank you for the review!


> Incorrect transaction snapshot invalidation with unnecessary writeset check 
> for exclusive operations
> 
>
> Key: HIVE-24125
> URL: https://issues.apache.org/jira/browse/HIVE-24125
> Project: Hive
>  Issue Type: Bug
>Reporter: Denys Kuzmenko
>Priority: Major
>
> Fixes [HIVE-23725|https://issues.apache.org/jira/browse/HIVE-23725] and 
> addresses issue with concurrent exclusive writes (shouldn't fail on writeset 
> check).
> https://docs.google.com/document/d/1NVfk479_SxVIWPLXYmZkU8MYQE5nhcHbKMrf3bO_qwI



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23725) ValidTxnManager snapshot outdating causing partial reads in merge insert

2020-09-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23725?focusedWorklogId=480712=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-480712
 ]

ASF GitHub Bot logged work on HIVE-23725:
-

Author: ASF GitHub Bot
Created on: 09/Sep/20 10:02
Start Date: 09/Sep/20 10:02
Worklog Time Spent: 10m 
  Work Description: deniskuzZ merged pull request #1474:
URL: https://github.com/apache/hive/pull/1474


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 480712)
Time Spent: 7h 40m  (was: 7.5h)

> ValidTxnManager snapshot outdating causing partial reads in merge insert
> 
>
> Key: HIVE-23725
> URL: https://issues.apache.org/jira/browse/HIVE-23725
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 7h 40m
>  Remaining Estimate: 0h
>
> When the ValidTxnManager invalidates the snapshot during merge insert and 
> starts to read committed transactions that were not committed when the query 
> compilation happened, it can cause partial read problems if the committed 
> transaction created new partition in the source or target table.
> The solution should be not only fix the snapshot but also recompile the query 
> and acquire the locks again.
> You could construct an example like this:
> 1. open and compile transaction 1 that merge inserts data from a partitioned 
> source table that has a few partition.
> 2. Open, run and commit transaction 2 that inserts data to an old and a new 
> partition to the source table.
> 3. Open, run and commit transaction 3 that inserts data to the target table 
> of the merge statement, that will retrigger a snapshot generation in 
> transaction 1.
> 4. Run transaction 1, the snapshot will be regenerated, and it will read 
> partial data from transaction 2 breaking the ACID properties.
> Different setup.
> Switch the transaction order:
> 1. compile transaction 1 that inserts data to an old and a new partition of 
> the source table.
> 2. compile transaction 2 that insert data to the target table
> 2. compile transaction 3 that merge inserts data from the source table to the 
> target table
> 3. run and commit transaction 1
> 4. run and commit transaction 2
> 5. run transaction 3, since it cointains 1 and 2 in its snaphot the 
> isValidTxnListState will be triggered and we do a partial read of the 
> transaction 1 for the same reasons.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24135) Drop database doesn't delete directory in managed location

2020-09-09 Thread Karen Coppage (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karen Coppage reassigned HIVE-24135:



> Drop database doesn't delete directory in managed location
> --
>
> Key: HIVE-24135
> URL: https://issues.apache.org/jira/browse/HIVE-24135
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Karen Coppage
>Assignee: Naveen Gangam
>Priority: Major
>
> Repro:
>  say the default managed location is managed/hive and the default external 
> location is external/hive.
> {code:java}
> create database db1; -- creates: external/hive/db1.db
> create table db1.table1 (i int); -- creates: managed/hive/db1.db and  
> managed/hive/db1.db/table1
> drop database db1 cascade; -- removes : external/hive/db1.db and 
> managed/hive/db1.db/table1
> {code}
> Problem: Directory managed/hive/db1.db remains.
> Since HIVE-22995, dbs have a managed (managedLocationUri) and an external 
> location (locationUri). I think the issue is that 
> HiveMetaStore.HMSHandler#drop_database_core deletes only the db directory in 
> the external location.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23413) Create a new config to skip all locks

2020-09-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23413?focusedWorklogId=480705=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-480705
 ]

ASF GitHub Bot logged work on HIVE-23413:
-

Author: ASF GitHub Bot
Created on: 09/Sep/20 09:43
Start Date: 09/Sep/20 09:43
Worklog Time Spent: 10m 
  Work Description: pvary commented on pull request #1220:
URL: https://github.com/apache/hive/pull/1220#issuecomment-689452296


   > Checked, however I'm not sure if it is a good idea to give the end user 
the option to completely disable the locking. They could end up shooting 
themselves in the foot.
   
   Administrator could add it to the restricted parameters which can not be set 
by the user, so a customer can turn off if they feel it is problematic.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 480705)
Time Spent: 1h 10m  (was: 1h)

> Create a new config to skip all locks
> -
>
> Key: HIVE-23413
> URL: https://issues.apache.org/jira/browse/HIVE-23413
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23413.1.patch, HIVE-23413.2.patch
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> From time-to-time some query is blocked on locks which should not.
> To have a quick workaround for this we should have a config which the user 
> can set in the session to disable acquiring/checking locks, so we can provide 
> it immediately and then later investigate and fix the root cause.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24134) Revert removal of HiveStrictManagedMigration code

2020-09-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-24134:
--
Labels: pull-request-available  (was: )

> Revert removal of HiveStrictManagedMigration code
> -
>
> Key: HIVE-24134
> URL: https://issues.apache.org/jira/browse/HIVE-24134
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24134.01.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Partial revert of https://issues.apache.org/jira/browse/HIVE-23995 to keep 
> the migration code



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24134) Revert removal of HiveStrictManagedMigration code

2020-09-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24134?focusedWorklogId=480695=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-480695
 ]

ASF GitHub Bot logged work on HIVE-24134:
-

Author: ASF GitHub Bot
Created on: 09/Sep/20 09:13
Start Date: 09/Sep/20 09:13
Worklog Time Spent: 10m 
  Work Description: aasha opened a new pull request #1480:
URL: https://github.com/apache/hive/pull/1480


   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 480695)
Remaining Estimate: 0h
Time Spent: 10m

> Revert removal of HiveStrictManagedMigration code
> -
>
> Key: HIVE-24134
> URL: https://issues.apache.org/jira/browse/HIVE-24134
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
> Attachments: HIVE-24134.01.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Partial revert of https://issues.apache.org/jira/browse/HIVE-23995 to keep 
> the migration code



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24134) Revert removal of HiveStrictManagedMigration code

2020-09-09 Thread Aasha Medhi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aasha Medhi updated HIVE-24134:
---
Attachment: HIVE-24134.01.patch
Status: Patch Available  (was: In Progress)

> Revert removal of HiveStrictManagedMigration code
> -
>
> Key: HIVE-24134
> URL: https://issues.apache.org/jira/browse/HIVE-24134
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
> Attachments: HIVE-24134.01.patch
>
>
> Partial revert of https://issues.apache.org/jira/browse/HIVE-23995 to keep 
> the migration code



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (HIVE-24134) Revert removal of HiveStrictManagedMigration code

2020-09-09 Thread Aasha Medhi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-24134 started by Aasha Medhi.
--
> Revert removal of HiveStrictManagedMigration code
> -
>
> Key: HIVE-24134
> URL: https://issues.apache.org/jira/browse/HIVE-24134
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
> Attachments: HIVE-24134.01.patch
>
>
> Partial revert of https://issues.apache.org/jira/browse/HIVE-23995 to keep 
> the migration code



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-24072) HiveAggregateJoinTransposeRule may try to create an invalid transformation

2020-09-09 Thread Zoltan Haindrich (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich resolved HIVE-24072.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

pushed to master. Thank you Jesus for reviewing the changes!

> HiveAggregateJoinTransposeRule may try to create an invalid transformation
> --
>
> Key: HIVE-24072
> URL: https://issues.apache.org/jira/browse/HIVE-24072
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> {code}
> java.lang.AssertionError: 
> Cannot add expression of different type to set:
> set type is RecordType(INTEGER NOT NULL o_orderkey, DECIMAL(10, 0) 
> o_totalprice, DATE o_orderdate, INTEGER NOT NULL c_custkey, VARCHAR(25) 
> CHARACTER SET "UTF-16LE" c_name, DOUBLE $f5) NOT NULL
> expression type is RecordType(INTEGER NOT NULL o_orderkey, INTEGER NOT NULL 
> o_custkey, DECIMAL(10, 0) o_totalprice, DATE o_orderdate, INTEGER NOT NULL 
> c_custkey, DOUBLE $f1) NOT NULL
> set is rel#567:HiveAggregate.HIVE.[].any(input=HepRelVertex#490,group={2, 4, 
> 5, 6, 7},agg#0=sum($1))
> expression is HiveProject(o_orderkey=[$2], o_custkey=[$3], o_totalprice=[$4], 
> o_orderdate=[$5], c_custkey=[$6], $f1=[$1])
>   HiveJoin(condition=[=($2, $0)], joinType=[inner], algorithm=[none], 
> cost=[{2284.5 rows, 0.0 cpu, 0.0 io}])
> HiveAggregate(group=[{0}], agg#0=[sum($1)])
>   HiveProject(l_orderkey=[$0], l_quantity=[$4])
> HiveTableScan(table=[[tpch_0_001, lineitem]], table:alias=[l])
> HiveJoin(condition=[=($0, $6)], joinType=[inner], algorithm=[none], 
> cost=[{1.9115E15 rows, 0.0 cpu, 0.0 io}])
>   HiveJoin(condition=[=($4, $1)], joinType=[inner], algorithm=[none], 
> cost=[{1650.0 rows, 0.0 cpu, 0.0 io}])
> HiveProject(o_orderkey=[$0], o_custkey=[$1], o_totalprice=[$3], 
> o_orderdate=[$4])
>   HiveTableScan(table=[[tpch_0_001, orders]], table:alias=[orders])
> HiveProject(c_custkey=[$0], c_name=[$1])
>   HiveTableScan(table=[[tpch_0_001, customer]], 
> table:alias=[customer])
>   HiveProject($f0=[$0])
> HiveFilter(condition=[>($1, 3E2)])
>   HiveAggregate(group=[{0}], agg#0=[sum($4)])
> HiveTableScan(table=[[tpch_0_001, lineitem]], 
> table:alias=[lineitem])
>   at 
> org.apache.calcite.plan.RelOptUtil.verifyTypeEquivalence(RelOptUtil.java:383)
>   at 
> org.apache.calcite.plan.hep.HepRuleCall.transformTo(HepRuleCall.java:57)
>   at 
> org.apache.calcite.plan.RelOptRuleCall.transformTo(RelOptRuleCall.java:236)
>   at 
> org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveAggregateJoinTransposeRule.onMatch(HiveAggregateJoinTransposeRule.java:300)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24072) HiveAggregateJoinTransposeRule may try to create an invalid transformation

2020-09-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24072?focusedWorklogId=480684=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-480684
 ]

ASF GitHub Bot logged work on HIVE-24072:
-

Author: ASF GitHub Bot
Created on: 09/Sep/20 08:37
Start Date: 09/Sep/20 08:37
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk merged pull request #1432:
URL: https://github.com/apache/hive/pull/1432


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 480684)
Time Spent: 1.5h  (was: 1h 20m)

> HiveAggregateJoinTransposeRule may try to create an invalid transformation
> --
>
> Key: HIVE-24072
> URL: https://issues.apache.org/jira/browse/HIVE-24072
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> {code}
> java.lang.AssertionError: 
> Cannot add expression of different type to set:
> set type is RecordType(INTEGER NOT NULL o_orderkey, DECIMAL(10, 0) 
> o_totalprice, DATE o_orderdate, INTEGER NOT NULL c_custkey, VARCHAR(25) 
> CHARACTER SET "UTF-16LE" c_name, DOUBLE $f5) NOT NULL
> expression type is RecordType(INTEGER NOT NULL o_orderkey, INTEGER NOT NULL 
> o_custkey, DECIMAL(10, 0) o_totalprice, DATE o_orderdate, INTEGER NOT NULL 
> c_custkey, DOUBLE $f1) NOT NULL
> set is rel#567:HiveAggregate.HIVE.[].any(input=HepRelVertex#490,group={2, 4, 
> 5, 6, 7},agg#0=sum($1))
> expression is HiveProject(o_orderkey=[$2], o_custkey=[$3], o_totalprice=[$4], 
> o_orderdate=[$5], c_custkey=[$6], $f1=[$1])
>   HiveJoin(condition=[=($2, $0)], joinType=[inner], algorithm=[none], 
> cost=[{2284.5 rows, 0.0 cpu, 0.0 io}])
> HiveAggregate(group=[{0}], agg#0=[sum($1)])
>   HiveProject(l_orderkey=[$0], l_quantity=[$4])
> HiveTableScan(table=[[tpch_0_001, lineitem]], table:alias=[l])
> HiveJoin(condition=[=($0, $6)], joinType=[inner], algorithm=[none], 
> cost=[{1.9115E15 rows, 0.0 cpu, 0.0 io}])
>   HiveJoin(condition=[=($4, $1)], joinType=[inner], algorithm=[none], 
> cost=[{1650.0 rows, 0.0 cpu, 0.0 io}])
> HiveProject(o_orderkey=[$0], o_custkey=[$1], o_totalprice=[$3], 
> o_orderdate=[$4])
>   HiveTableScan(table=[[tpch_0_001, orders]], table:alias=[orders])
> HiveProject(c_custkey=[$0], c_name=[$1])
>   HiveTableScan(table=[[tpch_0_001, customer]], 
> table:alias=[customer])
>   HiveProject($f0=[$0])
> HiveFilter(condition=[>($1, 3E2)])
>   HiveAggregate(group=[{0}], agg#0=[sum($4)])
> HiveTableScan(table=[[tpch_0_001, lineitem]], 
> table:alias=[lineitem])
>   at 
> org.apache.calcite.plan.RelOptUtil.verifyTypeEquivalence(RelOptUtil.java:383)
>   at 
> org.apache.calcite.plan.hep.HepRuleCall.transformTo(HepRuleCall.java:57)
>   at 
> org.apache.calcite.plan.RelOptRuleCall.transformTo(RelOptRuleCall.java:236)
>   at 
> org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveAggregateJoinTransposeRule.onMatch(HiveAggregateJoinTransposeRule.java:300)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24072) HiveAggregateJoinTransposeRule may try to create an invalid transformation

2020-09-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24072?focusedWorklogId=480683=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-480683
 ]

ASF GitHub Bot logged work on HIVE-24072:
-

Author: ASF GitHub Bot
Created on: 09/Sep/20 08:35
Start Date: 09/Sep/20 08:35
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk commented on pull request #1432:
URL: https://github.com/apache/hive/pull/1432#issuecomment-689416076


   > I assume the test that would hit this problem is part of 
[HIVE-24084](https://issues.apache.org/jira/browse/HIVE-24084)? Otherwise, it 
would be useful to add a test case here.
   
   yes; that test have hit a testing related issue (tpch database removal) - so 
I've not included it here



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 480683)
Time Spent: 1h 20m  (was: 1h 10m)

> HiveAggregateJoinTransposeRule may try to create an invalid transformation
> --
>
> Key: HIVE-24072
> URL: https://issues.apache.org/jira/browse/HIVE-24072
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> {code}
> java.lang.AssertionError: 
> Cannot add expression of different type to set:
> set type is RecordType(INTEGER NOT NULL o_orderkey, DECIMAL(10, 0) 
> o_totalprice, DATE o_orderdate, INTEGER NOT NULL c_custkey, VARCHAR(25) 
> CHARACTER SET "UTF-16LE" c_name, DOUBLE $f5) NOT NULL
> expression type is RecordType(INTEGER NOT NULL o_orderkey, INTEGER NOT NULL 
> o_custkey, DECIMAL(10, 0) o_totalprice, DATE o_orderdate, INTEGER NOT NULL 
> c_custkey, DOUBLE $f1) NOT NULL
> set is rel#567:HiveAggregate.HIVE.[].any(input=HepRelVertex#490,group={2, 4, 
> 5, 6, 7},agg#0=sum($1))
> expression is HiveProject(o_orderkey=[$2], o_custkey=[$3], o_totalprice=[$4], 
> o_orderdate=[$5], c_custkey=[$6], $f1=[$1])
>   HiveJoin(condition=[=($2, $0)], joinType=[inner], algorithm=[none], 
> cost=[{2284.5 rows, 0.0 cpu, 0.0 io}])
> HiveAggregate(group=[{0}], agg#0=[sum($1)])
>   HiveProject(l_orderkey=[$0], l_quantity=[$4])
> HiveTableScan(table=[[tpch_0_001, lineitem]], table:alias=[l])
> HiveJoin(condition=[=($0, $6)], joinType=[inner], algorithm=[none], 
> cost=[{1.9115E15 rows, 0.0 cpu, 0.0 io}])
>   HiveJoin(condition=[=($4, $1)], joinType=[inner], algorithm=[none], 
> cost=[{1650.0 rows, 0.0 cpu, 0.0 io}])
> HiveProject(o_orderkey=[$0], o_custkey=[$1], o_totalprice=[$3], 
> o_orderdate=[$4])
>   HiveTableScan(table=[[tpch_0_001, orders]], table:alias=[orders])
> HiveProject(c_custkey=[$0], c_name=[$1])
>   HiveTableScan(table=[[tpch_0_001, customer]], 
> table:alias=[customer])
>   HiveProject($f0=[$0])
> HiveFilter(condition=[>($1, 3E2)])
>   HiveAggregate(group=[{0}], agg#0=[sum($4)])
> HiveTableScan(table=[[tpch_0_001, lineitem]], 
> table:alias=[lineitem])
>   at 
> org.apache.calcite.plan.RelOptUtil.verifyTypeEquivalence(RelOptUtil.java:383)
>   at 
> org.apache.calcite.plan.hep.HepRuleCall.transformTo(HepRuleCall.java:57)
>   at 
> org.apache.calcite.plan.RelOptRuleCall.transformTo(RelOptRuleCall.java:236)
>   at 
> org.apache.hadoop.hive.ql.optimizer.calcite.rules.HiveAggregateJoinTransposeRule.onMatch(HiveAggregateJoinTransposeRule.java:300)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-24131) Use original src location always when data copy runs on target

2020-09-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24131?focusedWorklogId=480670=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-480670
 ]

ASF GitHub Bot logged work on HIVE-24131:
-

Author: ASF GitHub Bot
Created on: 09/Sep/20 08:23
Start Date: 09/Sep/20 08:23
Worklog Time Spent: 10m 
  Work Description: aasha commented on a change in pull request #1477:
URL: https://github.com/apache/hive/pull/1477#discussion_r485347419



##
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/repl/CopyUtils.java
##
@@ -73,21 +73,16 @@ public CopyUtils(String distCpDoAsUser, HiveConf hiveConf, 
FileSystem destinatio
   // Used by replication, copy files from source to destination. It is 
possible source file is
   // changed/removed during copy, so double check the checksum after copy,
   // if not match, copy again from cm
-  public void copyAndVerify(Path destRoot, List 
srcFiles, Path origSrcPath,
-boolean overwrite)
-  throws IOException, LoginException, HiveFatalException {
+  public void copyAndVerify(Path destRoot, List 
srcFiles, boolean readSrcAsFilesList,
+boolean overwrite) throws IOException, 
LoginException, HiveFatalException {
 UserGroupInformation proxyUser = getProxyUser();
 if (CollectionUtils.isEmpty(srcFiles)) {
   throw new IOException(ErrorMsg.REPL_INVALID_ARGUMENTS.format("SrcFiles 
can not be empty during copy operation."));
 }
 FileSystem sourceFs = srcFiles.get(0).getSrcFs();
 boolean useRegularCopy = regularCopy(sourceFs, srcFiles);
 try {
-  if (!useRegularCopy) {
-srcFiles.clear();
-srcFiles.add(new ReplChangeManager.FileInfo(sourceFs, origSrcPath, 
null));
-doCopyRetry(sourceFs, srcFiles, destRoot, proxyUser, useRegularCopy, 
overwrite);
-  } else {
+  if (useRegularCopy || readSrcAsFilesList) {

Review comment:
   useRegularCopy is not used in this method. In both if and else it passes 
the value to doCopyRetry and only doCopyRetry is using its value. Better to use 
it in doCopyRetry directly instead of passing the value





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 480670)
Time Spent: 20m  (was: 10m)

> Use original src location always when data copy runs on target 
> ---
>
> Key: HIVE-24131
> URL: https://issues.apache.org/jira/browse/HIVE-24131
> Project: Hive
>  Issue Type: Bug
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24131.01.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-24134) Revert deletion of HiveStrictManagedMigration

2020-09-09 Thread Aasha Medhi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aasha Medhi reassigned HIVE-24134:
--


> Revert deletion of HiveStrictManagedMigration 
> --
>
> Key: HIVE-24134
> URL: https://issues.apache.org/jira/browse/HIVE-24134
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>
> Partial revert of https://issues.apache.org/jira/browse/HIVE-23995 to keep 
> the migration code



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24134) Revert removal of HiveStrictManagedMigration code

2020-09-09 Thread Aasha Medhi (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aasha Medhi updated HIVE-24134:
---
Summary: Revert removal of HiveStrictManagedMigration code  (was: Revert 
deletion of HiveStrictManagedMigration )

> Revert removal of HiveStrictManagedMigration code
> -
>
> Key: HIVE-24134
> URL: https://issues.apache.org/jira/browse/HIVE-24134
> Project: Hive
>  Issue Type: Task
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>
> Partial revert of https://issues.apache.org/jira/browse/HIVE-23995 to keep 
> the migration code



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24131) Use original src location always when data copy runs on target

2020-09-09 Thread Pravin Sinha (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pravin Sinha updated HIVE-24131:

Attachment: HIVE-24131.01.patch

> Use original src location always when data copy runs on target 
> ---
>
> Key: HIVE-24131
> URL: https://issues.apache.org/jira/browse/HIVE-24131
> Project: Hive
>  Issue Type: Bug
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-24131.01.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-24131) Use original src location always when data copy runs on target

2020-09-09 Thread Pravin Sinha (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-24131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pravin Sinha updated HIVE-24131:

Attachment: (was: HIVE-24131.01.patch)

> Use original src location always when data copy runs on target 
> ---
>
> Key: HIVE-24131
> URL: https://issues.apache.org/jira/browse/HIVE-24131
> Project: Hive
>  Issue Type: Bug
>Reporter: Pravin Sinha
>Assignee: Pravin Sinha
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23413) Create a new config to skip all locks

2020-09-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23413?focusedWorklogId=480657=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-480657
 ]

ASF GitHub Bot logged work on HIVE-23413:
-

Author: ASF GitHub Bot
Created on: 09/Sep/20 07:42
Start Date: 09/Sep/20 07:42
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on pull request #1220:
URL: https://github.com/apache/hive/pull/1220#issuecomment-689372016


   Checked, however I'm not sure if it is a good idea to give the end user the 
option to completely disable the locking. They could end up shooting themselves 
in the foot. 
   
   From code perspective, change looks good.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 480657)
Time Spent: 1h  (was: 50m)

> Create a new config to skip all locks
> -
>
> Key: HIVE-23413
> URL: https://issues.apache.org/jira/browse/HIVE-23413
> Project: Hive
>  Issue Type: Improvement
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23413.1.patch, HIVE-23413.2.patch
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> From time-to-time some query is blocked on locks which should not.
> To have a quick workaround for this we should have a config which the user 
> can set in the session to disable acquiring/checking locks, so we can provide 
> it immediately and then later investigate and fix the root cause.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23725) ValidTxnManager snapshot outdating causing partial reads in merge insert

2020-09-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23725?focusedWorklogId=480653=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-480653
 ]

ASF GitHub Bot logged work on HIVE-23725:
-

Author: ASF GitHub Bot
Created on: 09/Sep/20 07:35
Start Date: 09/Sep/20 07:35
Worklog Time Spent: 10m 
  Work Description: pvargacl commented on pull request #1474:
URL: https://github.com/apache/hive/pull/1474#issuecomment-689367884


   LGTM +1



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 480653)
Time Spent: 7.5h  (was: 7h 20m)

> ValidTxnManager snapshot outdating causing partial reads in merge insert
> 
>
> Key: HIVE-23725
> URL: https://issues.apache.org/jira/browse/HIVE-23725
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 7.5h
>  Remaining Estimate: 0h
>
> When the ValidTxnManager invalidates the snapshot during merge insert and 
> starts to read committed transactions that were not committed when the query 
> compilation happened, it can cause partial read problems if the committed 
> transaction created new partition in the source or target table.
> The solution should be not only fix the snapshot but also recompile the query 
> and acquire the locks again.
> You could construct an example like this:
> 1. open and compile transaction 1 that merge inserts data from a partitioned 
> source table that has a few partition.
> 2. Open, run and commit transaction 2 that inserts data to an old and a new 
> partition to the source table.
> 3. Open, run and commit transaction 3 that inserts data to the target table 
> of the merge statement, that will retrigger a snapshot generation in 
> transaction 1.
> 4. Run transaction 1, the snapshot will be regenerated, and it will read 
> partial data from transaction 2 breaking the ACID properties.
> Different setup.
> Switch the transaction order:
> 1. compile transaction 1 that inserts data to an old and a new partition of 
> the source table.
> 2. compile transaction 2 that insert data to the target table
> 2. compile transaction 3 that merge inserts data from the source table to the 
> target table
> 3. run and commit transaction 1
> 4. run and commit transaction 2
> 5. run transaction 3, since it cointains 1 and 2 in its snaphot the 
> isValidTxnListState will be triggered and we do a partial read of the 
> transaction 1 for the same reasons.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23851) MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions

2020-09-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23851?focusedWorklogId=480648=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-480648
 ]

ASF GitHub Bot logged work on HIVE-23851:
-

Author: ASF GitHub Bot
Created on: 09/Sep/20 07:24
Start Date: 09/Sep/20 07:24
Worklog Time Spent: 10m 
  Work Description: shameersss1 commented on a change in pull request #1271:
URL: https://github.com/apache/hive/pull/1271#discussion_r485393860



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MsckPartitionExpressionProxy.java
##
@@ -1,114 +0,0 @@
-package org.apache.hadoop.hive.metastore;
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- * 
- * http://www.apache.org/licenses/LICENSE-2.0
- * 
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-import java.nio.charset.StandardCharsets;
-import java.util.ArrayList;
-import java.util.HashSet;
-import java.util.List;
-import java.util.Set;
-
-import org.apache.hadoop.hive.metastore.api.FieldSchema;
-import org.apache.hadoop.hive.metastore.api.FileMetadataExprType;
-import org.apache.hadoop.hive.metastore.api.MetaException;
-import org.apache.hadoop.hive.metastore.utils.FileUtils;
-import org.apache.hadoop.hive.ql.io.sarg.SearchArgument;
-import org.apache.hadoop.util.StringUtils;
-import org.slf4j.Logger;
-import org.slf4j.LoggerFactory;
-
-// This is added as part of moving MSCK code from ql to standalone-metastore. 
There is a metastore API to drop
-// partitions by name but we cannot use it because msck typically will contain 
partition value (year=2014). We almost
-// never drop partition by name (year). So we need to construct expression 
filters, the current
-// PartitionExpressionProxy implementations (PartitionExpressionForMetastore 
and HCatClientHMSImpl.ExpressionBuilder)
-// all depend on ql code to build ExprNodeDesc for the partition expressions. 
It also depends on kryo for serializing
-// the expression objects to byte[]. For MSCK drop partition, we don't need 
complex expression generator. For now,
-// all we do is split the partition spec (year=2014/month=24) into filter 
expression year='2014' and month='24' and
-// rely on metastore database to deal with type conversions. Ideally, 
PartitionExpressionProxy default implementation
-// should use SearchArgument (storage-api) to construct the filter expression 
and not depend on ql, but the usecase
-// for msck is pretty simple and this specific implementation should suffice.

Review comment:
   @kgyrtkirk Any thoughts on above two approaches?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 480648)
Time Spent: 3h 40m  (was: 3.5h)

> MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions
> 
>
> Key: HIVE-23851
> URL: https://issues.apache.org/jira/browse/HIVE-23851
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> *Steps to reproduce:*
> # Create external table
> # Run msck command to sync all the partitions with metastore
> # Remove one of the partition path
> # Run msck repair with partition filtering
> *Stack Trace:*
> {code:java}
>  2020-07-15T02:10:29,045 ERROR [4dad298b-28b1-4e6b-94b6-aa785b60c576 main] 
> ppr.PartitionExpressionForMetastore: Failed to deserialize the expression
>  java.lang.IndexOutOfBoundsException: Index: 110, Size: 0
>  at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_192]
>  at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_192]
>  at 
> 

[jira] [Work logged] (HIVE-23851) MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions

2020-09-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23851?focusedWorklogId=480646=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-480646
 ]

ASF GitHub Bot logged work on HIVE-23851:
-

Author: ASF GitHub Bot
Created on: 09/Sep/20 07:23
Start Date: 09/Sep/20 07:23
Worklog Time Spent: 10m 
  Work Description: shameersss1 commented on a change in pull request #1271:
URL: https://github.com/apache/hive/pull/1271#discussion_r485393494



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MsckPartitionExpressionProxy.java
##
@@ -1,114 +0,0 @@
-package org.apache.hadoop.hive.metastore;
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- * 
- * http://www.apache.org/licenses/LICENSE-2.0
- * 
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-import java.nio.charset.StandardCharsets;
-import java.util.ArrayList;
-import java.util.HashSet;
-import java.util.List;
-import java.util.Set;
-
-import org.apache.hadoop.hive.metastore.api.FieldSchema;
-import org.apache.hadoop.hive.metastore.api.FileMetadataExprType;
-import org.apache.hadoop.hive.metastore.api.MetaException;
-import org.apache.hadoop.hive.metastore.utils.FileUtils;
-import org.apache.hadoop.hive.ql.io.sarg.SearchArgument;
-import org.apache.hadoop.util.StringUtils;
-import org.slf4j.Logger;
-import org.slf4j.LoggerFactory;
-
-// This is added as part of moving MSCK code from ql to standalone-metastore. 
There is a metastore API to drop
-// partitions by name but we cannot use it because msck typically will contain 
partition value (year=2014). We almost
-// never drop partition by name (year). So we need to construct expression 
filters, the current
-// PartitionExpressionProxy implementations (PartitionExpressionForMetastore 
and HCatClientHMSImpl.ExpressionBuilder)
-// all depend on ql code to build ExprNodeDesc for the partition expressions. 
It also depends on kryo for serializing
-// the expression objects to byte[]. For MSCK drop partition, we don't need 
complex expression generator. For now,
-// all we do is split the partition spec (year=2014/month=24) into filter 
expression year='2014' and month='24' and
-// rely on metastore database to deal with type conversions. Ideally, 
PartitionExpressionProxy default implementation
-// should use SearchArgument (storage-api) to construct the filter expression 
and not depend on ql, but the usecase
-// for msck is pretty simple and this specific implementation should suffice.

Review comment:
   or another approach could be when we fail to deserizalize the expr bytes 
with kyro on exception simply try converting byte array to string





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 480646)
Time Spent: 3.5h  (was: 3h 20m)

> MSCK REPAIR Command With Partition Filtering Fails While Dropping Partitions
> 
>
> Key: HIVE-23851
> URL: https://issues.apache.org/jira/browse/HIVE-23851
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 4.0.0
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> *Steps to reproduce:*
> # Create external table
> # Run msck command to sync all the partitions with metastore
> # Remove one of the partition path
> # Run msck repair with partition filtering
> *Stack Trace:*
> {code:java}
>  2020-07-15T02:10:29,045 ERROR [4dad298b-28b1-4e6b-94b6-aa785b60c576 main] 
> ppr.PartitionExpressionForMetastore: Failed to deserialize the expression
>  java.lang.IndexOutOfBoundsException: Index: 110, Size: 0
>  at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_192]
>  at 

[jira] [Commented] (HIVE-23976) Enable vectorization for multi-col semi join reducers

2020-09-09 Thread Jira


[ 
https://issues.apache.org/jira/browse/HIVE-23976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17192691#comment-17192691
 ] 

László Bodor commented on HIVE-23976:
-

[~zabetak]: thanks for taking a look. I think we should go the latter way, as 
we don't have n-ary vectorized expressions. It used to be a performance 
decision + now the descriptors don't let us match expression on an "arbitrary" 
number of input arguments.

> Enable vectorization for multi-col semi join reducers
> -
>
> Key: HIVE-23976
> URL: https://issues.apache.org/jira/browse/HIVE-23976
> Project: Hive
>  Issue Type: Improvement
>Reporter: Stamatis Zampetakis
>Assignee: László Bodor
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> HIVE-21196 introduces multi-column semi-join reducers in the query engine. 
> However, the implementation relies on GenericUDFMurmurHash which is not 
> vectorized thus the respective operators cannot be executed in vectorized 
> mode. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)