[jira] [Work logged] (HIVE-25522) NullPointerException in TxnHandler

2021-09-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25522?focusedWorklogId=658737=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-658737
 ]

ASF GitHub Bot logged work on HIVE-25522:
-

Author: ASF GitHub Bot
Created on: 01/Oct/21 05:28
Start Date: 01/Oct/21 05:28
Worklog Time Spent: 10m 
  Work Description: szehon-ho edited a comment on pull request #2647:
URL: https://github.com/apache/hive/pull/2647#issuecomment-931909748


   Hi guys sorry for delay, I took a little look at it.  Seems like test fail 
because in test environment TxnHandler is initialized more than once but not 
tearing down when shutdown (seems JVM is recycled).  So everytime it makes more 
connection to database and eventually gets exhausted.  Maybe I should just call 
TxnUtils.getTxnStore() instead.
   
   Actually found something else, in current version of Hive it is now on 
critical path of startup, since AcidMetricService is added to list of services 
in https://issues.apache.org/jira/browse/HIVE-24824:
   
   ```
   setConf:369, TxnHandler (org.apache.hadoop.hive.metastore.txn)
   getTxnStore:124, TxnUtils (org.apache.hadoop.hive.metastore.txn)
   setConf:314, AcidMetricService (org.apache.hadoop.hive.metastore.metrics)
   startAlwaysTaskThreads:525, HMSHandler (org.apache.hadoop.hive.metastore)
   init:493, HMSHandler (org.apache.hadoop.hive.metastore)
   invoke0:-1, NativeMethodAccessorImpl (sun.reflect)
   invoke:62, NativeMethodAccessorImpl (sun.reflect)
   invoke:43, DelegatingMethodAccessorImpl (sun.reflect)
   invoke:498, Method (java.lang.reflect)
   invokeInternal:147, RetryingHMSHandler (org.apache.hadoop.hive.metastore)
   invoke:108, RetryingHMSHandler (org.apache.hadoop.hive.metastore)
   :80, RetryingHMSHandler (org.apache.hadoop.hive.metastore)
   getProxy:93, RetryingHMSHandler (org.apache.hadoop.hive.metastore)
   newRetryingHMSHandler:133, HiveMetaStore (org.apache.hadoop.hive.metastore)
   invoke0:-1, NativeMethodAccessorImpl (sun.reflect)
   invoke:62, NativeMethodAccessorImpl (sun.reflect)
   invoke:43, DelegatingMethodAccessorImpl (sun.reflect)
   invoke:498, Method (java.lang.reflect)
   callEmbeddedMetastore:291, HiveMetaStoreClient 
(org.apache.hadoop.hive.metastore)
   :205, HiveMetaStoreClient (org.apache.hadoop.hive.metastore)
   :113, HiveMetaStoreClientWithLocalCache 
(org.apache.hadoop.hive.metastore)
   :155, SessionHiveMetaStoreClient (org.apache.hadoop.hive.ql.metadata)
   newInstance0:-1, NativeConstructorAccessorImpl (sun.reflect)
   newInstance:62, NativeConstructorAccessorImpl (sun.reflect)
   newInstance:45, DelegatingConstructorAccessorImpl (sun.reflect)
   newInstance:423, Constructor (java.lang.reflect)
   newInstance:84, JavaUtils (org.apache.hadoop.hive.metastore.utils)
   :101, RetryingMetaStoreClient (org.apache.hadoop.hive.metastore)
   getProxy:154, RetryingMetaStoreClient (org.apache.hadoop.hive.metastore)
   getProxy:125, RetryingMetaStoreClient (org.apache.hadoop.hive.metastore)
   createMetaStoreClient:5444, Hive (org.apache.hadoop.hive.ql.metadata)
   getMSC:5522, Hive (org.apache.hadoop.hive.ql.metadata)
   getMSC:5502, Hive (org.apache.hadoop.hive.ql.metadata)
   getAllFunctions:5810, Hive (org.apache.hadoop.hive.ql.metadata)
   reloadFunctions:337, Hive (org.apache.hadoop.hive.ql.metadata)
   registerAllFunctionsOnce:316, Hive (org.apache.hadoop.hive.ql.metadata)
   :542, Hive (org.apache.hadoop.hive.ql.metadata)
   create:434, Hive (org.apache.hadoop.hive.ql.metadata)
   getInternal:421, Hive (org.apache.hadoop.hive.ql.metadata)
   get:377, Hive (org.apache.hadoop.hive.ql.metadata)
   createHiveDB:291, BaseSemanticAnalyzer (org.apache.hadoop.hive.ql.parse)
   :269, BaseSemanticAnalyzer (org.apache.hadoop.hive.ql.parse)
   :473, SemanticAnalyzer (org.apache.hadoop.hive.ql.parse)
   postInit:630, QTestUtil (org.apache.hadoop.hive.ql)
   evaluate:88, CliAdapter$1$1 (org.apache.hadoop.hive.cli.control)
   evaluate:20, RunRules (org.junit.rules)
   evaluate:306, ParentRunner$3 (org.junit.runners)
   run:413, ParentRunner (org.junit.runners)
   execute:365, JUnit4Provider (org.apache.maven.surefire.junit4)
   executeWithRerun:273, JUnit4Provider (org.apache.maven.surefire.junit4)
   executeTestSet:238, JUnit4Provider (org.apache.maven.surefire.junit4)
   invoke:159, JUnit4Provider (org.apache.maven.surefire.junit4)
   runSuitesInProcess:377, ForkedBooter (org.apache.maven.surefire.booter)
   execute:138, ForkedBooter (org.apache.maven.surefire.booter)
   run:465, ForkedBooter (org.apache.maven.surefire.booter)
   main:451, ForkedBooter (org.apache.maven.surefire.booter)
   ```
   
   So seems it's a Hive 3 only bug?  FYI @sunchao 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries 

[jira] [Work logged] (HIVE-25522) NullPointerException in TxnHandler

2021-09-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25522?focusedWorklogId=658735=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-658735
 ]

ASF GitHub Bot logged work on HIVE-25522:
-

Author: ASF GitHub Bot
Created on: 01/Oct/21 05:27
Start Date: 01/Oct/21 05:27
Worklog Time Spent: 10m 
  Work Description: szehon-ho edited a comment on pull request #2647:
URL: https://github.com/apache/hive/pull/2647#issuecomment-931909748


   Hi guys sorry for delay, I took a little look at it.  Seems like test fail 
because in test environment TxnHandler is initialized more than once but not 
tearing down when shutdown (seems JVM is recycled).  So everytime it makes more 
connection to database and eventually gets exhausted.  Maybe I should just call 
TxnUtils.getTxnStore()
   
   Actually found something else, in current version of Hive it is now on 
critical path of startup, since AcidMetricService is added to list of services 
in https://issues.apache.org/jira/browse/HIVE-24824:
   
   ```
   setConf:369, TxnHandler (org.apache.hadoop.hive.metastore.txn)
   getTxnStore:124, TxnUtils (org.apache.hadoop.hive.metastore.txn)
   setConf:314, AcidMetricService (org.apache.hadoop.hive.metastore.metrics)
   startAlwaysTaskThreads:525, HMSHandler (org.apache.hadoop.hive.metastore)
   init:493, HMSHandler (org.apache.hadoop.hive.metastore)
   invoke0:-1, NativeMethodAccessorImpl (sun.reflect)
   invoke:62, NativeMethodAccessorImpl (sun.reflect)
   invoke:43, DelegatingMethodAccessorImpl (sun.reflect)
   invoke:498, Method (java.lang.reflect)
   invokeInternal:147, RetryingHMSHandler (org.apache.hadoop.hive.metastore)
   invoke:108, RetryingHMSHandler (org.apache.hadoop.hive.metastore)
   :80, RetryingHMSHandler (org.apache.hadoop.hive.metastore)
   getProxy:93, RetryingHMSHandler (org.apache.hadoop.hive.metastore)
   newRetryingHMSHandler:133, HiveMetaStore (org.apache.hadoop.hive.metastore)
   invoke0:-1, NativeMethodAccessorImpl (sun.reflect)
   invoke:62, NativeMethodAccessorImpl (sun.reflect)
   invoke:43, DelegatingMethodAccessorImpl (sun.reflect)
   invoke:498, Method (java.lang.reflect)
   callEmbeddedMetastore:291, HiveMetaStoreClient 
(org.apache.hadoop.hive.metastore)
   :205, HiveMetaStoreClient (org.apache.hadoop.hive.metastore)
   :113, HiveMetaStoreClientWithLocalCache 
(org.apache.hadoop.hive.metastore)
   :155, SessionHiveMetaStoreClient (org.apache.hadoop.hive.ql.metadata)
   newInstance0:-1, NativeConstructorAccessorImpl (sun.reflect)
   newInstance:62, NativeConstructorAccessorImpl (sun.reflect)
   newInstance:45, DelegatingConstructorAccessorImpl (sun.reflect)
   newInstance:423, Constructor (java.lang.reflect)
   newInstance:84, JavaUtils (org.apache.hadoop.hive.metastore.utils)
   :101, RetryingMetaStoreClient (org.apache.hadoop.hive.metastore)
   getProxy:154, RetryingMetaStoreClient (org.apache.hadoop.hive.metastore)
   getProxy:125, RetryingMetaStoreClient (org.apache.hadoop.hive.metastore)
   createMetaStoreClient:5444, Hive (org.apache.hadoop.hive.ql.metadata)
   getMSC:5522, Hive (org.apache.hadoop.hive.ql.metadata)
   getMSC:5502, Hive (org.apache.hadoop.hive.ql.metadata)
   getAllFunctions:5810, Hive (org.apache.hadoop.hive.ql.metadata)
   reloadFunctions:337, Hive (org.apache.hadoop.hive.ql.metadata)
   registerAllFunctionsOnce:316, Hive (org.apache.hadoop.hive.ql.metadata)
   :542, Hive (org.apache.hadoop.hive.ql.metadata)
   create:434, Hive (org.apache.hadoop.hive.ql.metadata)
   getInternal:421, Hive (org.apache.hadoop.hive.ql.metadata)
   get:377, Hive (org.apache.hadoop.hive.ql.metadata)
   createHiveDB:291, BaseSemanticAnalyzer (org.apache.hadoop.hive.ql.parse)
   :269, BaseSemanticAnalyzer (org.apache.hadoop.hive.ql.parse)
   :473, SemanticAnalyzer (org.apache.hadoop.hive.ql.parse)
   postInit:630, QTestUtil (org.apache.hadoop.hive.ql)
   evaluate:88, CliAdapter$1$1 (org.apache.hadoop.hive.cli.control)
   evaluate:20, RunRules (org.junit.rules)
   evaluate:306, ParentRunner$3 (org.junit.runners)
   run:413, ParentRunner (org.junit.runners)
   execute:365, JUnit4Provider (org.apache.maven.surefire.junit4)
   executeWithRerun:273, JUnit4Provider (org.apache.maven.surefire.junit4)
   executeTestSet:238, JUnit4Provider (org.apache.maven.surefire.junit4)
   invoke:159, JUnit4Provider (org.apache.maven.surefire.junit4)
   runSuitesInProcess:377, ForkedBooter (org.apache.maven.surefire.booter)
   execute:138, ForkedBooter (org.apache.maven.surefire.booter)
   run:465, ForkedBooter (org.apache.maven.surefire.booter)
   main:451, ForkedBooter (org.apache.maven.surefire.booter)
   ```
   
   So seems it's a Hive 3 only bug?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, 

[jira] [Work logged] (HIVE-25522) NullPointerException in TxnHandler

2021-09-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25522?focusedWorklogId=658736=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-658736
 ]

ASF GitHub Bot logged work on HIVE-25522:
-

Author: ASF GitHub Bot
Created on: 01/Oct/21 05:27
Start Date: 01/Oct/21 05:27
Worklog Time Spent: 10m 
  Work Description: szehon-ho edited a comment on pull request #2647:
URL: https://github.com/apache/hive/pull/2647#issuecomment-931909748


   Hi guys sorry for delay, I took a little look at it.  Seems like test fail 
because in test environment TxnHandler is initialized more than once but not 
tearing down when shutdown (seems JVM is recycled).  So everytime it makes more 
connection to database and eventually gets exhausted.  Maybe I should just call 
TxnUtils.getTxnStore() in the path.
   
   Actually found something else, in current version of Hive it is now on 
critical path of startup, since AcidMetricService is added to list of services 
in https://issues.apache.org/jira/browse/HIVE-24824:
   
   ```
   setConf:369, TxnHandler (org.apache.hadoop.hive.metastore.txn)
   getTxnStore:124, TxnUtils (org.apache.hadoop.hive.metastore.txn)
   setConf:314, AcidMetricService (org.apache.hadoop.hive.metastore.metrics)
   startAlwaysTaskThreads:525, HMSHandler (org.apache.hadoop.hive.metastore)
   init:493, HMSHandler (org.apache.hadoop.hive.metastore)
   invoke0:-1, NativeMethodAccessorImpl (sun.reflect)
   invoke:62, NativeMethodAccessorImpl (sun.reflect)
   invoke:43, DelegatingMethodAccessorImpl (sun.reflect)
   invoke:498, Method (java.lang.reflect)
   invokeInternal:147, RetryingHMSHandler (org.apache.hadoop.hive.metastore)
   invoke:108, RetryingHMSHandler (org.apache.hadoop.hive.metastore)
   :80, RetryingHMSHandler (org.apache.hadoop.hive.metastore)
   getProxy:93, RetryingHMSHandler (org.apache.hadoop.hive.metastore)
   newRetryingHMSHandler:133, HiveMetaStore (org.apache.hadoop.hive.metastore)
   invoke0:-1, NativeMethodAccessorImpl (sun.reflect)
   invoke:62, NativeMethodAccessorImpl (sun.reflect)
   invoke:43, DelegatingMethodAccessorImpl (sun.reflect)
   invoke:498, Method (java.lang.reflect)
   callEmbeddedMetastore:291, HiveMetaStoreClient 
(org.apache.hadoop.hive.metastore)
   :205, HiveMetaStoreClient (org.apache.hadoop.hive.metastore)
   :113, HiveMetaStoreClientWithLocalCache 
(org.apache.hadoop.hive.metastore)
   :155, SessionHiveMetaStoreClient (org.apache.hadoop.hive.ql.metadata)
   newInstance0:-1, NativeConstructorAccessorImpl (sun.reflect)
   newInstance:62, NativeConstructorAccessorImpl (sun.reflect)
   newInstance:45, DelegatingConstructorAccessorImpl (sun.reflect)
   newInstance:423, Constructor (java.lang.reflect)
   newInstance:84, JavaUtils (org.apache.hadoop.hive.metastore.utils)
   :101, RetryingMetaStoreClient (org.apache.hadoop.hive.metastore)
   getProxy:154, RetryingMetaStoreClient (org.apache.hadoop.hive.metastore)
   getProxy:125, RetryingMetaStoreClient (org.apache.hadoop.hive.metastore)
   createMetaStoreClient:5444, Hive (org.apache.hadoop.hive.ql.metadata)
   getMSC:5522, Hive (org.apache.hadoop.hive.ql.metadata)
   getMSC:5502, Hive (org.apache.hadoop.hive.ql.metadata)
   getAllFunctions:5810, Hive (org.apache.hadoop.hive.ql.metadata)
   reloadFunctions:337, Hive (org.apache.hadoop.hive.ql.metadata)
   registerAllFunctionsOnce:316, Hive (org.apache.hadoop.hive.ql.metadata)
   :542, Hive (org.apache.hadoop.hive.ql.metadata)
   create:434, Hive (org.apache.hadoop.hive.ql.metadata)
   getInternal:421, Hive (org.apache.hadoop.hive.ql.metadata)
   get:377, Hive (org.apache.hadoop.hive.ql.metadata)
   createHiveDB:291, BaseSemanticAnalyzer (org.apache.hadoop.hive.ql.parse)
   :269, BaseSemanticAnalyzer (org.apache.hadoop.hive.ql.parse)
   :473, SemanticAnalyzer (org.apache.hadoop.hive.ql.parse)
   postInit:630, QTestUtil (org.apache.hadoop.hive.ql)
   evaluate:88, CliAdapter$1$1 (org.apache.hadoop.hive.cli.control)
   evaluate:20, RunRules (org.junit.rules)
   evaluate:306, ParentRunner$3 (org.junit.runners)
   run:413, ParentRunner (org.junit.runners)
   execute:365, JUnit4Provider (org.apache.maven.surefire.junit4)
   executeWithRerun:273, JUnit4Provider (org.apache.maven.surefire.junit4)
   executeTestSet:238, JUnit4Provider (org.apache.maven.surefire.junit4)
   invoke:159, JUnit4Provider (org.apache.maven.surefire.junit4)
   runSuitesInProcess:377, ForkedBooter (org.apache.maven.surefire.booter)
   execute:138, ForkedBooter (org.apache.maven.surefire.booter)
   run:465, ForkedBooter (org.apache.maven.surefire.booter)
   main:451, ForkedBooter (org.apache.maven.surefire.booter)
   ```
   
   So seems it's a Hive 3 only bug?  FYI @sunchao 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For 

[jira] [Work logged] (HIVE-25522) NullPointerException in TxnHandler

2021-09-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25522?focusedWorklogId=658734=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-658734
 ]

ASF GitHub Bot logged work on HIVE-25522:
-

Author: ASF GitHub Bot
Created on: 01/Oct/21 05:26
Start Date: 01/Oct/21 05:26
Worklog Time Spent: 10m 
  Work Description: szehon-ho edited a comment on pull request #2647:
URL: https://github.com/apache/hive/pull/2647#issuecomment-931909748


   Hi guys sorry for delay, I took a little look at it.  Seems like test fail 
because in test environment TxnHandler is initialized more than once but not 
tearing down when shutdown (seems JVM is recycled).
   
   Actually found something else, in current version of Hive it is now on 
critical path of startup, since AcidMetricService is added to list of services 
in https://issues.apache.org/jira/browse/HIVE-24824:
   
   ```
   setConf:369, TxnHandler (org.apache.hadoop.hive.metastore.txn)
   getTxnStore:124, TxnUtils (org.apache.hadoop.hive.metastore.txn)
   setConf:314, AcidMetricService (org.apache.hadoop.hive.metastore.metrics)
   startAlwaysTaskThreads:525, HMSHandler (org.apache.hadoop.hive.metastore)
   init:493, HMSHandler (org.apache.hadoop.hive.metastore)
   invoke0:-1, NativeMethodAccessorImpl (sun.reflect)
   invoke:62, NativeMethodAccessorImpl (sun.reflect)
   invoke:43, DelegatingMethodAccessorImpl (sun.reflect)
   invoke:498, Method (java.lang.reflect)
   invokeInternal:147, RetryingHMSHandler (org.apache.hadoop.hive.metastore)
   invoke:108, RetryingHMSHandler (org.apache.hadoop.hive.metastore)
   :80, RetryingHMSHandler (org.apache.hadoop.hive.metastore)
   getProxy:93, RetryingHMSHandler (org.apache.hadoop.hive.metastore)
   newRetryingHMSHandler:133, HiveMetaStore (org.apache.hadoop.hive.metastore)
   invoke0:-1, NativeMethodAccessorImpl (sun.reflect)
   invoke:62, NativeMethodAccessorImpl (sun.reflect)
   invoke:43, DelegatingMethodAccessorImpl (sun.reflect)
   invoke:498, Method (java.lang.reflect)
   callEmbeddedMetastore:291, HiveMetaStoreClient 
(org.apache.hadoop.hive.metastore)
   :205, HiveMetaStoreClient (org.apache.hadoop.hive.metastore)
   :113, HiveMetaStoreClientWithLocalCache 
(org.apache.hadoop.hive.metastore)
   :155, SessionHiveMetaStoreClient (org.apache.hadoop.hive.ql.metadata)
   newInstance0:-1, NativeConstructorAccessorImpl (sun.reflect)
   newInstance:62, NativeConstructorAccessorImpl (sun.reflect)
   newInstance:45, DelegatingConstructorAccessorImpl (sun.reflect)
   newInstance:423, Constructor (java.lang.reflect)
   newInstance:84, JavaUtils (org.apache.hadoop.hive.metastore.utils)
   :101, RetryingMetaStoreClient (org.apache.hadoop.hive.metastore)
   getProxy:154, RetryingMetaStoreClient (org.apache.hadoop.hive.metastore)
   getProxy:125, RetryingMetaStoreClient (org.apache.hadoop.hive.metastore)
   createMetaStoreClient:5444, Hive (org.apache.hadoop.hive.ql.metadata)
   getMSC:5522, Hive (org.apache.hadoop.hive.ql.metadata)
   getMSC:5502, Hive (org.apache.hadoop.hive.ql.metadata)
   getAllFunctions:5810, Hive (org.apache.hadoop.hive.ql.metadata)
   reloadFunctions:337, Hive (org.apache.hadoop.hive.ql.metadata)
   registerAllFunctionsOnce:316, Hive (org.apache.hadoop.hive.ql.metadata)
   :542, Hive (org.apache.hadoop.hive.ql.metadata)
   create:434, Hive (org.apache.hadoop.hive.ql.metadata)
   getInternal:421, Hive (org.apache.hadoop.hive.ql.metadata)
   get:377, Hive (org.apache.hadoop.hive.ql.metadata)
   createHiveDB:291, BaseSemanticAnalyzer (org.apache.hadoop.hive.ql.parse)
   :269, BaseSemanticAnalyzer (org.apache.hadoop.hive.ql.parse)
   :473, SemanticAnalyzer (org.apache.hadoop.hive.ql.parse)
   postInit:630, QTestUtil (org.apache.hadoop.hive.ql)
   evaluate:88, CliAdapter$1$1 (org.apache.hadoop.hive.cli.control)
   evaluate:20, RunRules (org.junit.rules)
   evaluate:306, ParentRunner$3 (org.junit.runners)
   run:413, ParentRunner (org.junit.runners)
   execute:365, JUnit4Provider (org.apache.maven.surefire.junit4)
   executeWithRerun:273, JUnit4Provider (org.apache.maven.surefire.junit4)
   executeTestSet:238, JUnit4Provider (org.apache.maven.surefire.junit4)
   invoke:159, JUnit4Provider (org.apache.maven.surefire.junit4)
   runSuitesInProcess:377, ForkedBooter (org.apache.maven.surefire.booter)
   execute:138, ForkedBooter (org.apache.maven.surefire.booter)
   run:465, ForkedBooter (org.apache.maven.surefire.booter)
   main:451, ForkedBooter (org.apache.maven.surefire.booter)
   ```
   
   So seems it's a Hive 3 only bug.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 658734)
   

[jira] [Assigned] (HIVE-25584) [llap-ext-client] Load data from a Text file for Map dataType is giving errors

2021-09-30 Thread Sruthi Mooriyathvariam (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sruthi Mooriyathvariam reassigned HIVE-25584:
-


> [llap-ext-client] Load data from a Text file for Map dataType is giving errors
> --
>
> Key: HIVE-25584
> URL: https://issues.apache.org/jira/browse/HIVE-25584
> Project: Hive
>  Issue Type: Task
>  Components: llap
>Reporter: Sruthi Mooriyathvariam
>Assignee: Sruthi Mooriyathvariam
>Priority: Minor
> Fix For: 4.0.0
>
>
> Currently, there is no support for non-ORC writes (text and parquet) for the 
> llap-ext-client. Thus loading data from a text file leads to errors for Map 
> data type. This has to be fixed while adding support for non-ORC writes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25522) NullPointerException in TxnHandler

2021-09-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25522?focusedWorklogId=658733=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-658733
 ]

ASF GitHub Bot logged work on HIVE-25522:
-

Author: ASF GitHub Bot
Created on: 01/Oct/21 05:25
Start Date: 01/Oct/21 05:25
Worklog Time Spent: 10m 
  Work Description: szehon-ho commented on pull request #2647:
URL: https://github.com/apache/hive/pull/2647#issuecomment-931909748


   Hi guys sorry for delay, I took a little look at it.  Seems like test fail 
because in test environment TxnHandler is initialized more than once but not 
tearing down when shutdown (seems JVM is recycled).
   
   Actually found something else, in current version of Hive it is now on 
critical path of startup, since AcidMetricService is added to list of services 
in https://issues.apache.org/jira/browse/HIVE-24824.  
   
   So seems it's a Hive 3 only bug.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 658733)
Time Spent: 5h 10m  (was: 5h)

> NullPointerException in TxnHandler
> --
>
> Key: HIVE-25522
> URL: https://issues.apache.org/jira/browse/HIVE-25522
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Affects Versions: 3.1.2, 4.0.0
>Reporter: Szehon Ho
>Assignee: Szehon Ho
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> Environment: Using Iceberg on Hive 3.1.2 standalone metastore.  Iceberg 
> issues a lot of lock() calls for commits.
> We hit randomly a strange NPE that fails Iceberg commits.
> {noformat}
> 2021-08-21T11:08:05,665 ERROR [pool-6-thread-195] 
> metastore.RetryingHMSHandler: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.metastore.txn.TxnHandler.enqueueLockWithRetry(TxnHandler.java:1903)
>   at 
> org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:1827)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.lock(HiveMetaStore.java:7217)
>   at jdk.internal.reflect.GeneratedMethodAccessor52.invoke(Unknown Source)
>   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.base/java.lang.reflect.Method.invoke(Method.java:566)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108)
>   at com.sun.proxy.$Proxy27.lock(Unknown Source)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$lock.getResult(ThriftHiveMetastore.java:18111)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$lock.getResult(ThriftHiveMetastore.java:18095)
>   at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
>   at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:111)
>   at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:107)
>   at java.base/java.security.AccessController.doPrivileged(Native Method)
>   at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
>   at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:119)
>   at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>   at java.base/java.lang.Thread.run(Thread.java:834)
> 2021-08-21T11:08:05,665 ERROR [pool-6-thread-195] server.TThreadPoolServer: 
> Error occurred during processing of message.
> java.lang.NullPointerException: null
>   at 
> org.apache.hadoop.hive.metastore.txn.TxnHandler.enqueueLockWithRetry(TxnHandler.java:1903)
>  ~[hive-exec-3.1.2.jar:3.1.2]
>   at 
> org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:1827) 
> ~[hive-exec-3.1.2.jar:3.1.2]
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.lock(HiveMetaStore.java:7217)
>  ~[hive-exec-3.1.2.jar:3.1.2]
>   at jdk.internal.reflect.GeneratedMethodAccessor52.invoke(Unknown 
> Source) 

[jira] [Work logged] (HIVE-23633) Metastore some JDO query objects do not close properly

2021-09-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23633?focusedWorklogId=658730=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-658730
 ]

ASF GitHub Bot logged work on HIVE-23633:
-

Author: ASF GitHub Bot
Created on: 01/Oct/21 05:16
Start Date: 01/Oct/21 05:16
Worklog Time Spent: 10m 
  Work Description: pvary commented on pull request #2344:
URL: https://github.com/apache/hive/pull/2344#issuecomment-931905835


   > Hi @pvary, cloud you help merge this? thank you! :)
   
   Ahh... Sorry for forgetting about this review. Could you please ping 
@belugabehr to check if he is OK with the current solution?
   
   Also we will need to rebase / rerun the tests. I see that theoretically it 
can be merge without a conflict, but I have had headache several times merging 
changes with outdated tests which caused problems for the next contributor.
   
   Thanks for keeping this PR alive @dengzhhu653! 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 658730)
Time Spent: 14h 20m  (was: 14h 10m)

> Metastore some JDO query objects do not close properly
> --
>
> Key: HIVE-23633
> URL: https://issues.apache.org/jira/browse/HIVE-23633
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-23633.01.patch
>
>  Time Spent: 14h 20m
>  Remaining Estimate: 0h
>
> After patched [HIVE-10895|https://issues.apache.org/jira/browse/HIVE-10895],  
> The metastore still has seen a memory leak on db resources: many 
> StatementImpls left unclosed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23633) Metastore some JDO query objects do not close properly

2021-09-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23633?focusedWorklogId=658731=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-658731
 ]

ASF GitHub Bot logged work on HIVE-23633:
-

Author: ASF GitHub Bot
Created on: 01/Oct/21 05:16
Start Date: 01/Oct/21 05:16
Worklog Time Spent: 10m 
  Work Description: pvary edited a comment on pull request #2344:
URL: https://github.com/apache/hive/pull/2344#issuecomment-931905835


   > Hi @pvary, cloud you help merge this? thank you! :)
   
   Ahh... Sorry for forgetting about this review. Could you please ping 
@belugabehr to check if he is OK with the current solution?
   
   Also we will need to rebase / rerun the tests. I see that theoretically it 
can be merged without a conflict, but I have had headache several times before 
with merging changes with outdated tests which caused problems for the next 
contributor.
   
   Thanks for keeping this PR alive @dengzhhu653! 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 658731)
Time Spent: 14.5h  (was: 14h 20m)

> Metastore some JDO query objects do not close properly
> --
>
> Key: HIVE-23633
> URL: https://issues.apache.org/jira/browse/HIVE-23633
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-23633.01.patch
>
>  Time Spent: 14.5h
>  Remaining Estimate: 0h
>
> After patched [HIVE-10895|https://issues.apache.org/jira/browse/HIVE-10895],  
> The metastore still has seen a memory leak on db resources: many 
> StatementImpls left unclosed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25583) Support parallel load for HastTables - Interfaces

2021-09-30 Thread Ramesh Kumar Thangarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramesh Kumar Thangarajan updated HIVE-25583:

Parent: HIVE-24037
Issue Type: Sub-task  (was: Task)

> Support parallel load for HastTables - Interfaces
> -
>
> Key: HIVE-25583
> URL: https://issues.apache.org/jira/browse/HIVE-25583
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ramesh Kumar Thangarajan
>Assignee: Ramesh Kumar Thangarajan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25583) Support parallel load for HastTables - Interfaces

2021-09-30 Thread Ramesh Kumar Thangarajan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramesh Kumar Thangarajan reassigned HIVE-25583:
---


> Support parallel load for HastTables - Interfaces
> -
>
> Key: HIVE-25583
> URL: https://issues.apache.org/jira/browse/HIVE-25583
> Project: Hive
>  Issue Type: Task
>Reporter: Ramesh Kumar Thangarajan
>Assignee: Ramesh Kumar Thangarajan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25528) Avoid recalculating types after CBO on second AST pass

2021-09-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25528?focusedWorklogId=658698=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-658698
 ]

ASF GitHub Bot logged work on HIVE-25528:
-

Author: ASF GitHub Bot
Created on: 01/Oct/21 01:29
Start Date: 01/Oct/21 01:29
Worklog Time Spent: 10m 
  Work Description: scarlin-cloudera closed pull request #2653:
URL: https://github.com/apache/hive/pull/2653


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 658698)
Time Spent: 1h 10m  (was: 1h)

> Avoid recalculating types after CBO on second AST pass
> --
>
> Key: HIVE-25528
> URL: https://issues.apache.org/jira/browse/HIVE-25528
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Steve Carlin
>Assignee: Steve Carlin
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> It should be possible to avoid recalculating and reevaluating types on the 
> second pass after going through CBO.  CBO is making the effort to change the 
> types so to reassess them is a waste of time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25582) Empty result when using offset limit with MR

2021-09-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25582:
--
Labels: pull-request-available  (was: )

> Empty result when using offset limit with MR
> 
>
> Key: HIVE-25582
> URL: https://issues.apache.org/jira/browse/HIVE-25582
> Project: Hive
>  Issue Type: Bug
>  Components: Operators
>Affects Versions: 4.0.0
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The _mr.ObjectCache_ caches nothing, every time when the limit [retrieving 
> global counter from the 
> cache|https://github.com/apache/hive/blob/7b3ecf617a6d46f48a3b6f77e0339fd4ad95a420/ql/src/java/org/apache/hadoop/hive/ql/exec/LimitOperator.java#L150-L161],
>  a new AtomicInteger will be returned. This make offset _<= 
> currentCountForAllTasksInt_ always be evaluated to false, as _offset > 0_, 
> the operator will skip all rows.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25582) Empty result when using offset limit with MR

2021-09-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25582?focusedWorklogId=658693=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-658693
 ]

ASF GitHub Bot logged work on HIVE-25582:
-

Author: ASF GitHub Bot
Created on: 01/Oct/21 00:57
Start Date: 01/Oct/21 00:57
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 opened a new pull request #2693:
URL: https://github.com/apache/hive/pull/2693


   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 658693)
Remaining Estimate: 0h
Time Spent: 10m

> Empty result when using offset limit with MR
> 
>
> Key: HIVE-25582
> URL: https://issues.apache.org/jira/browse/HIVE-25582
> Project: Hive
>  Issue Type: Bug
>  Components: Operators
>Affects Versions: 4.0.0
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The _mr.ObjectCache_ caches nothing, every time when the limit [retrieving 
> global counter from the 
> cache|https://github.com/apache/hive/blob/7b3ecf617a6d46f48a3b6f77e0339fd4ad95a420/ql/src/java/org/apache/hadoop/hive/ql/exec/LimitOperator.java#L150-L161],
>  a new AtomicInteger will be returned. This make offset _<= 
> currentCountForAllTasksInt_ always be evaluated to false, as _offset > 0_, 
> the operator will skip all rows.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25582) Empty result when using offset limit with MR

2021-09-30 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng reassigned HIVE-25582:
--


> Empty result when using offset limit with MR
> 
>
> Key: HIVE-25582
> URL: https://issues.apache.org/jira/browse/HIVE-25582
> Project: Hive
>  Issue Type: Bug
>  Components: Operators
>Affects Versions: 4.0.0
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>
> The _mr.ObjectCache_ caches nothing, every time when the limit [retrieving 
> global counter from the 
> cache|https://github.com/apache/hive/blob/7b3ecf617a6d46f48a3b6f77e0339fd4ad95a420/ql/src/java/org/apache/hadoop/hive/ql/exec/LimitOperator.java#L150-L161],
>  a new AtomicInteger will be returned. This make offset _<= 
> currentCountForAllTasksInt_ always __ be __ evaluated to false_,_ as _offset 
> > 0_, the operator will skip all rows.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25582) Empty result when using offset limit with MR

2021-09-30 Thread Zhihua Deng (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihua Deng updated HIVE-25582:
---
Description: The _mr.ObjectCache_ caches nothing, every time when the limit 
[retrieving global counter from the 
cache|https://github.com/apache/hive/blob/7b3ecf617a6d46f48a3b6f77e0339fd4ad95a420/ql/src/java/org/apache/hadoop/hive/ql/exec/LimitOperator.java#L150-L161],
 a new AtomicInteger will be returned. This make offset _<= 
currentCountForAllTasksInt_ always be evaluated to false, as _offset > 0_, the 
operator will skip all rows.  (was: The _mr.ObjectCache_ caches nothing, every 
time when the limit [retrieving global counter from the 
cache|https://github.com/apache/hive/blob/7b3ecf617a6d46f48a3b6f77e0339fd4ad95a420/ql/src/java/org/apache/hadoop/hive/ql/exec/LimitOperator.java#L150-L161],
 a new AtomicInteger will be returned. This make offset _<= 
currentCountForAllTasksInt_ always __ be __ evaluated to false_,_ as _offset > 
0_, the operator will skip all rows.)

> Empty result when using offset limit with MR
> 
>
> Key: HIVE-25582
> URL: https://issues.apache.org/jira/browse/HIVE-25582
> Project: Hive
>  Issue Type: Bug
>  Components: Operators
>Affects Versions: 4.0.0
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>
> The _mr.ObjectCache_ caches nothing, every time when the limit [retrieving 
> global counter from the 
> cache|https://github.com/apache/hive/blob/7b3ecf617a6d46f48a3b6f77e0339fd4ad95a420/ql/src/java/org/apache/hadoop/hive/ql/exec/LimitOperator.java#L150-L161],
>  a new AtomicInteger will be returned. This make offset _<= 
> currentCountForAllTasksInt_ always be evaluated to false, as _offset > 0_, 
> the operator will skip all rows.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25580) Increase the performance of getTableColumnStatistics and getPartitionColumnStatistics

2021-09-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25580?focusedWorklogId=658569=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-658569
 ]

ASF GitHub Bot logged work on HIVE-25580:
-

Author: ASF GitHub Bot
Created on: 30/Sep/21 19:19
Start Date: 30/Sep/21 19:19
Worklog Time Spent: 10m 
  Work Description: belugabehr commented on pull request #2692:
URL: https://github.com/apache/hive/pull/2692#issuecomment-931597115


   LGTM (pending tests)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 658569)
Time Spent: 20m  (was: 10m)

> Increase the performance of getTableColumnStatistics and 
> getPartitionColumnStatistics
> -
>
> Key: HIVE-25580
> URL: https://issues.apache.org/jira/browse/HIVE-25580
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When the PART_COL_STATS table contains high number of rows the 
> getTableColumnStatistics and getPartitionColumnStatistics response time 
> increases.
> The root cause is the full table scan for the jdbc query below:
> {code:java}
> 2021-09-27 13:22:44,218 DEBUG DataNucleus.Datastore.Native: 
> [pool-6-thread-199]: SELECT DISTINCT "A0"."ENGINE" FROM "PART_COL_STATS" "A0"
> 2021-09-27 13:22:50,569 DEBUG DataNucleus.Datastore.Retrieve: 
> [pool-6-thread-199]: Execution Time = 6351 ms {code}
> The time spent in 
> [here|https://github.com/apache/hive/blob/ed1882ef569f8d00317597c269cfae35ace5a5fa/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L9965]:
> {code:java}
>   query = pm.newQuery(MPartitionColumnStatistics.class);
>   query.setResult("DISTINCT engine");
>   Collection names = (Collection) query.execute();
> {code}
> We might get a better performance if we limit the query range based on the 
> cat/db/table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-25580) Increase the performance of getTableColumnStatistics and getPartitionColumnStatistics

2021-09-30 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17422984#comment-17422984
 ] 

David Mollitor commented on HIVE-25580:
---

Sorry, was looking at an older version of the schema.  4.0 is:

{code:sql}
CREATE INDEX TAB_COL_STATS_IDX ON TAB_COL_STATS (CAT_NAME, DB_NAME, TABLE_NAME, 
COLUMN_NAME) USING BTREE;
{code}

> Increase the performance of getTableColumnStatistics and 
> getPartitionColumnStatistics
> -
>
> Key: HIVE-25580
> URL: https://issues.apache.org/jira/browse/HIVE-25580
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When the PART_COL_STATS table contains high number of rows the 
> getTableColumnStatistics and getPartitionColumnStatistics response time 
> increases.
> The root cause is the full table scan for the jdbc query below:
> {code:java}
> 2021-09-27 13:22:44,218 DEBUG DataNucleus.Datastore.Native: 
> [pool-6-thread-199]: SELECT DISTINCT "A0"."ENGINE" FROM "PART_COL_STATS" "A0"
> 2021-09-27 13:22:50,569 DEBUG DataNucleus.Datastore.Retrieve: 
> [pool-6-thread-199]: Execution Time = 6351 ms {code}
> The time spent in 
> [here|https://github.com/apache/hive/blob/ed1882ef569f8d00317597c269cfae35ace5a5fa/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L9965]:
> {code:java}
>   query = pm.newQuery(MPartitionColumnStatistics.class);
>   query.setResult("DISTINCT engine");
>   Collection names = (Collection) query.execute();
> {code}
> We might get a better performance if we limit the query range based on the 
> cat/db/table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25580) Increase the performance of getTableColumnStatistics and getPartitionColumnStatistics

2021-09-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25580?focusedWorklogId=658548=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-658548
 ]

ASF GitHub Bot logged work on HIVE-25580:
-

Author: ASF GitHub Bot
Created on: 30/Sep/21 18:57
Start Date: 30/Sep/21 18:57
Worklog Time Spent: 10m 
  Work Description: pvary opened a new pull request #2692:
URL: https://github.com/apache/hive/pull/2692


   ### What changes were proposed in this pull request?
   Limit the query range based on the cat/db/table
   
   ### Why are the changes needed?
   To gain performance
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   Added a test to run a quick check, but existing test should already cover 
this path
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 658548)
Remaining Estimate: 0h
Time Spent: 10m

> Increase the performance of getTableColumnStatistics and 
> getPartitionColumnStatistics
> -
>
> Key: HIVE-25580
> URL: https://issues.apache.org/jira/browse/HIVE-25580
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When the PART_COL_STATS table contains high number of rows the 
> getTableColumnStatistics and getPartitionColumnStatistics response time 
> increases.
> The root cause is the full table scan for the jdbc query below:
> {code:java}
> 2021-09-27 13:22:44,218 DEBUG DataNucleus.Datastore.Native: 
> [pool-6-thread-199]: SELECT DISTINCT "A0"."ENGINE" FROM "PART_COL_STATS" "A0"
> 2021-09-27 13:22:50,569 DEBUG DataNucleus.Datastore.Retrieve: 
> [pool-6-thread-199]: Execution Time = 6351 ms {code}
> The time spent in 
> [here|https://github.com/apache/hive/blob/ed1882ef569f8d00317597c269cfae35ace5a5fa/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L9965]:
> {code:java}
>   query = pm.newQuery(MPartitionColumnStatistics.class);
>   query.setResult("DISTINCT engine");
>   Collection names = (Collection) query.execute();
> {code}
> We might get a better performance if we limit the query range based on the 
> cat/db/table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25580) Increase the performance of getTableColumnStatistics and getPartitionColumnStatistics

2021-09-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25580:
--
Labels: pull-request-available  (was: )

> Increase the performance of getTableColumnStatistics and 
> getPartitionColumnStatistics
> -
>
> Key: HIVE-25580
> URL: https://issues.apache.org/jira/browse/HIVE-25580
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When the PART_COL_STATS table contains high number of rows the 
> getTableColumnStatistics and getPartitionColumnStatistics response time 
> increases.
> The root cause is the full table scan for the jdbc query below:
> {code:java}
> 2021-09-27 13:22:44,218 DEBUG DataNucleus.Datastore.Native: 
> [pool-6-thread-199]: SELECT DISTINCT "A0"."ENGINE" FROM "PART_COL_STATS" "A0"
> 2021-09-27 13:22:50,569 DEBUG DataNucleus.Datastore.Retrieve: 
> [pool-6-thread-199]: Execution Time = 6351 ms {code}
> The time spent in 
> [here|https://github.com/apache/hive/blob/ed1882ef569f8d00317597c269cfae35ace5a5fa/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L9965]:
> {code:java}
>   query = pm.newQuery(MPartitionColumnStatistics.class);
>   query.setResult("DISTINCT engine");
>   Collection names = (Collection) query.execute();
> {code}
> We might get a better performance if we limit the query range based on the 
> cat/db/table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25577) unix_timestamp() is ignoring the time zone value

2021-09-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25577?focusedWorklogId=658475=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-658475
 ]

ASF GitHub Bot logged work on HIVE-25577:
-

Author: ASF GitHub Bot
Created on: 30/Sep/21 16:25
Start Date: 30/Sep/21 16:25
Worklog Time Spent: 10m 
  Work Description: ashish-kumar-sharma commented on a change in pull 
request #2686:
URL: https://github.com/apache/hive/pull/2686#discussion_r719570755



##
File path: 
ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFToUnixTimestamp.java
##
@@ -167,7 +167,15 @@ public void testStringArg2() throws HiveException {
 runAndVerify(udf2,
 new Text("1400-02-01 00:00:00 ICT"),
 new Text("-MM-dd HH:mm:ss z"),
-new LongWritable(TimestampTZUtil.parse("1400-02-01 00:00:00", 
ZoneId.systemDefault()).getEpochSecond()));
+new LongWritable(TimestampTZUtil.parse("1400-01-31 09:00:22", 
ZoneId.systemDefault()).getEpochSecond()));

Review comment:
   Yes Since the input is in ICT which is time of vietnam i.e. UTC 
+07:06:40  and PDT is UTC -07:52:58 So the total comes to  "1400-01-31 09:00:22"




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 658475)
Time Spent: 50m  (was: 40m)

> unix_timestamp() is ignoring the time zone value
> 
>
> Key: HIVE-25577
> URL: https://issues.apache.org/jira/browse/HIVE-25577
> Project: Hive
>  Issue Type: Bug
>Reporter: Ashish Sharma
>Assignee: Ashish Sharma
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> set hive.local.time.zone=Asia/Bangkok;
> Query - SELECT FROM_UNIXTIME(UNIX_TIMESTAMP('2000-01-07 00:00:00 
> GMT','-MM-dd HH:mm:ss z'));
> Result - 2000-01-07 00:00:00 ICT



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25577) unix_timestamp() is ignoring the time zone value

2021-09-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25577?focusedWorklogId=658474=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-658474
 ]

ASF GitHub Bot logged work on HIVE-25577:
-

Author: ASF GitHub Bot
Created on: 30/Sep/21 16:22
Start Date: 30/Sep/21 16:22
Worklog Time Spent: 10m 
  Work Description: ashish-kumar-sharma commented on a change in pull 
request #2686:
URL: https://github.com/apache/hive/pull/2686#discussion_r719568775



##
File path: ql/src/test/results/clientpositive/llap/udf5.q.out
##
@@ -342,3 +342,219 @@ POSTHOOK: type: QUERY
 POSTHOOK: Input: _dummy_database@_dummy_table
  A masked pattern was here 
 NULL
+PREHOOK: query: SELECT FROM_UNIXTIME(UNIX_TIMESTAMP('2021-01-02 03:04:05 
UTC','-MM-dd HH:mm:ss z'))
+PREHOOK: type: QUERY
+PREHOOK: Input: _dummy_database@_dummy_table
+ A masked pattern was here 
+POSTHOOK: query: SELECT FROM_UNIXTIME(UNIX_TIMESTAMP('2021-01-02 03:04:05 
UTC','-MM-dd HH:mm:ss z'))
+POSTHOOK: type: QUERY
+POSTHOOK: Input: _dummy_database@_dummy_table
+ A masked pattern was here 
+2021-01-02 10:04:05
+PREHOOK: query: SELECT FROM_UNIXTIME(UNIX_TIMESTAMP('1400-01-01 00:00:00 
UTC','-MM-dd HH:mm:ss z'))
+PREHOOK: type: QUERY
+PREHOOK: Input: _dummy_database@_dummy_table
+ A masked pattern was here 
+POSTHOOK: query: SELECT FROM_UNIXTIME(UNIX_TIMESTAMP('1400-01-01 00:00:00 
UTC','-MM-dd HH:mm:ss z'))
+POSTHOOK: type: QUERY
+POSTHOOK: Input: _dummy_database@_dummy_table
+ A masked pattern was here 
+1400-01-01 06:42:04
+PREHOOK: query: SELECT FROM_UNIXTIME(UNIX_TIMESTAMP('1800-01-01 00:00:00 
UTC','-MM-dd HH:mm:ss z'))
+PREHOOK: type: QUERY
+PREHOOK: Input: _dummy_database@_dummy_table
+ A masked pattern was here 
+POSTHOOK: query: SELECT FROM_UNIXTIME(UNIX_TIMESTAMP('1800-01-01 00:00:00 
UTC','-MM-dd HH:mm:ss z'))
+POSTHOOK: type: QUERY
+POSTHOOK: Input: _dummy_database@_dummy_table
+ A masked pattern was here 
+1800-01-01 06:42:04
+PREHOOK: query: SELECT FROM_UNIXTIME(UNIX_TIMESTAMP('1900-01-01 00:00:00 
UTC','-MM-dd HH:mm:ss z'))
+PREHOOK: type: QUERY
+PREHOOK: Input: _dummy_database@_dummy_table
+ A masked pattern was here 
+POSTHOOK: query: SELECT FROM_UNIXTIME(UNIX_TIMESTAMP('1900-01-01 00:00:00 
UTC','-MM-dd HH:mm:ss z'))
+POSTHOOK: type: QUERY
+POSTHOOK: Input: _dummy_database@_dummy_table
+ A masked pattern was here 
+1900-01-01 06:42:04
+PREHOOK: query: SELECT FROM_UNIXTIME(UNIX_TIMESTAMP('2000-01-07 00:00:00 
UTC','-MM-dd HH:mm:ss z'))
+PREHOOK: type: QUERY
+PREHOOK: Input: _dummy_database@_dummy_table
+ A masked pattern was here 
+POSTHOOK: query: SELECT FROM_UNIXTIME(UNIX_TIMESTAMP('2000-01-07 00:00:00 
UTC','-MM-dd HH:mm:ss z'))
+POSTHOOK: type: QUERY
+POSTHOOK: Input: _dummy_database@_dummy_table
+ A masked pattern was here 
+2000-01-07 07:00:00
+PREHOOK: query: SELECT FROM_UNIXTIME(UNIX_TIMESTAMP('-00-00 00:00:00 
UTC','-MM-dd HH:mm:ss z'))
+PREHOOK: type: QUERY
+PREHOOK: Input: _dummy_database@_dummy_table
+ A masked pattern was here 
+POSTHOOK: query: SELECT FROM_UNIXTIME(UNIX_TIMESTAMP('-00-00 00:00:00 
UTC','-MM-dd HH:mm:ss z'))
+POSTHOOK: type: QUERY
+POSTHOOK: Input: _dummy_database@_dummy_table
+ A masked pattern was here 
+NULL
+PREHOOK: query: SELECT FROM_UNIXTIME(UNIX_TIMESTAMP('-99-99 00:00:00 
UTC','-MM-dd HH:mm:ss z'))
+PREHOOK: type: QUERY
+PREHOOK: Input: _dummy_database@_dummy_table
+ A masked pattern was here 
+POSTHOOK: query: SELECT FROM_UNIXTIME(UNIX_TIMESTAMP('-99-99 00:00:00 
UTC','-MM-dd HH:mm:ss z'))
+POSTHOOK: type: QUERY
+POSTHOOK: Input: _dummy_database@_dummy_table
+ A masked pattern was here 
+NULL
+PREHOOK: query: SELECT FROM_UNIXTIME(UNIX_TIMESTAMP('-12-31 00:00:00 
UTC','-MM-dd HH:mm:ss z'))
+PREHOOK: type: QUERY
+PREHOOK: Input: _dummy_database@_dummy_table
+ A masked pattern was here 
+POSTHOOK: query: SELECT FROM_UNIXTIME(UNIX_TIMESTAMP('-12-31 00:00:00 
UTC','-MM-dd HH:mm:ss z'))
+POSTHOOK: type: QUERY
+POSTHOOK: Input: _dummy_database@_dummy_table
+ A masked pattern was here 
+-12-31 07:00:00
+PREHOOK: query: SELECT FROM_UNIXTIME(UNIX_TIMESTAMP('2021-01-02 03:04:05 
ICT','-MM-dd HH:mm:ss z'))
+PREHOOK: type: QUERY
+PREHOOK: Input: _dummy_database@_dummy_table
+ A masked pattern was here 
+POSTHOOK: query: SELECT FROM_UNIXTIME(UNIX_TIMESTAMP('2021-01-02 03:04:05 
ICT','-MM-dd HH:mm:ss z'))
+POSTHOOK: type: QUERY
+POSTHOOK: Input: _dummy_database@_dummy_table
+ A masked pattern was here 
+2021-01-02 03:04:05
+PREHOOK: query: SELECT FROM_UNIXTIME(UNIX_TIMESTAMP('1400-01-01 00:00:00 
ICT','-MM-dd HH:mm:ss z'))
+PREHOOK: type: QUERY
+PREHOOK: Input: _dummy_database@_dummy_table
+ A masked pattern was here 
+POSTHOOK: query: SELECT 

[jira] [Work logged] (HIVE-25577) unix_timestamp() is ignoring the time zone value

2021-09-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25577?focusedWorklogId=658455=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-658455
 ]

ASF GitHub Bot logged work on HIVE-25577:
-

Author: ASF GitHub Bot
Created on: 30/Sep/21 15:52
Start Date: 30/Sep/21 15:52
Worklog Time Spent: 10m 
  Work Description: ashish-kumar-sharma commented on a change in pull 
request #2686:
URL: https://github.com/apache/hive/pull/2686#discussion_r719543736



##
File path: ql/src/test/results/clientpositive/llap/udf5.q.out
##
@@ -260,7 +260,7 @@ POSTHOOK: query: select 
from_unixtime(unix_timestamp('1400-11-08 08:00:00 ICT',
 POSTHOOK: type: QUERY
 POSTHOOK: Input: _dummy_database@_dummy_table
  A masked pattern was here 
-1400-11-08 08:00:00
+1400-11-08 07:35:24

Review comment:
   Because input is ICT(+07:00) output is Asia/bangkok(+6:42:04) due to 
that we are seeing this difference.

##
File path: ql/src/test/results/clientpositive/llap/udf5.q.out
##
@@ -260,7 +260,7 @@ POSTHOOK: query: select 
from_unixtime(unix_timestamp('1400-11-08 08:00:00 ICT',
 POSTHOOK: type: QUERY
 POSTHOOK: Input: _dummy_database@_dummy_table
  A masked pattern was here 
-1400-11-08 08:00:00
+1400-11-08 07:35:24

Review comment:
   Because input is ICT(+07:00) output is Asia/bangkok(+06:42:04) due to 
that we are seeing this difference.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 658455)
Time Spent: 0.5h  (was: 20m)

> unix_timestamp() is ignoring the time zone value
> 
>
> Key: HIVE-25577
> URL: https://issues.apache.org/jira/browse/HIVE-25577
> Project: Hive
>  Issue Type: Bug
>Reporter: Ashish Sharma
>Assignee: Ashish Sharma
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> set hive.local.time.zone=Asia/Bangkok;
> Query - SELECT FROM_UNIXTIME(UNIX_TIMESTAMP('2000-01-07 00:00:00 
> GMT','-MM-dd HH:mm:ss z'));
> Result - 2000-01-07 00:00:00 ICT



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25514) Alter table with partitions should honor {OWNER} policies from Apache Ranger in the HMS

2021-09-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25514?focusedWorklogId=658451=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-658451
 ]

ASF GitHub Bot logged work on HIVE-25514:
-

Author: ASF GitHub Bot
Created on: 30/Sep/21 15:40
Start Date: 30/Sep/21 15:40
Worklog Time Spent: 10m 
  Work Description: nrg4878 commented on a change in pull request #2634:
URL: https://github.com/apache/hive/pull/2634#discussion_r719533659



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java
##
@@ -5461,6 +5462,37 @@ public GetPartitionResponse 
get_partition_req(GetPartitionRequest req)
*/
   private void fireReadTablePreEvent(String catName, String dbName, String 
tblName)
   throws MetaException, NoSuchObjectException {
+if (catName == null) {
+  throw new NullPointerException("catName is null");
+}
+
+if (isBlank(catName)) {
+  throw new NoSuchObjectException("catName is not valid");
+}
+
+if (dbName == null) {
+  throw new NullPointerException("dbName is null");
+}
+
+if (isBlank(dbName)) {
+  throw new NoSuchObjectException("dbName is not valid");
+}
+
+List filteredDb = 
FilterUtils.filterDbNamesIfEnabled(isServerFilterEnabled, filterHook,
+Collections.singletonList(dbName));
+
+if (filteredDb.isEmpty()) {
+  throw new NoSuchObjectException("Database " + dbName + " does not 
exist");
+}
+
+if (tblName == null) {
+  throw new NullPointerException("tblName is null");
+}
+
+if (isBlank(tblName)) {
+  throw new NoSuchObjectException("tblName is not valid");
+}
+

Review comment:
   So how is the table authorized like we used to via 
"authorizeTableForPartitionMetadata"? we are only authorizing DBs above. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 658451)
Time Spent: 1h  (was: 50m)

> Alter table with partitions should honor {OWNER} policies from Apache Ranger 
> in the HMS
> ---
>
> Key: HIVE-25514
> URL: https://issues.apache.org/jira/browse/HIVE-25514
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Standalone Metastore
>Affects Versions: 4.0.0
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The following commands should honor \{OWNER} policies from Apache Ranger in 
> the HMS.
> {code:java}
> Show partitions table_name;
> alter table foo.table_name partition (country='us') rename to partition 
> (country='canada);
> alter table foo.table_name drop partition (id='canada');{code}
> The examples above are tables with partitions. So the partition APIs in HMS 
> should be modifed to honor \{owner} policies from Apache ranger. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25581) Iceberg storage handler should set common projection pruning config

2021-09-30 Thread Marton Bod (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod reassigned HIVE-25581:
-


> Iceberg storage handler should set common projection pruning config
> ---
>
> Key: HIVE-25581
> URL: https://issues.apache.org/jira/browse/HIVE-25581
> Project: Hive
>  Issue Type: Improvement
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>
> Currently the value for the config "tez.mrreader.config.update.properties" is 
> not set for Iceberg jobs, when in fact it needs to be part of the jobConf for 
> all Iceberg queries. This change should ensure it's set by the Iceberg 
> storage handler by default.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25514) Alter table with partitions should honor {OWNER} policies from Apache Ranger in the HMS

2021-09-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25514?focusedWorklogId=658449=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-658449
 ]

ASF GitHub Bot logged work on HIVE-25514:
-

Author: ASF GitHub Bot
Created on: 30/Sep/21 15:31
Start Date: 30/Sep/21 15:31
Worklog Time Spent: 10m 
  Work Description: nrg4878 commented on a change in pull request #2634:
URL: https://github.com/apache/hive/pull/2634#discussion_r719525592



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java
##
@@ -6652,15 +6683,17 @@ public GetPartitionsPsWithAuthResponse 
get_partitions_ps_with_auth_req(GetPartit
 String[] parsedDbName = parseDbName(db_name, conf);
 startPartitionFunction("get_partitions_names_ps", parsedDbName[CAT_NAME],
 parsedDbName[DB_NAME], tbl_name, part_vals);
-fireReadTablePreEvent(parsedDbName[CAT_NAME], parsedDbName[DB_NAME], 
tbl_name);
 List ret = null;
 Exception ex = null;
 try {
-  authorizeTableForPartitionMetadata(parsedDbName[CAT_NAME], 
parsedDbName[DB_NAME], tbl_name);
+  fireReadTablePreEvent(parsedDbName[CAT_NAME], parsedDbName[DB_NAME], 
tbl_name);
   ret = getMS().listPartitionNamesPs(parsedDbName[CAT_NAME], 
parsedDbName[DB_NAME], tbl_name,
   part_vals, max_parts);
   ret = FilterUtils.filterPartitionNamesIfEnabled(isServerFilterEnabled,
   filterHook, parsedDbName[CAT_NAME], parsedDbName[DB_NAME], tbl_name, 
ret);
+} catch (NullPointerException e) {

Review comment:
   we shouldnt be catching NullPointerException here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 658449)
Time Spent: 50m  (was: 40m)

> Alter table with partitions should honor {OWNER} policies from Apache Ranger 
> in the HMS
> ---
>
> Key: HIVE-25514
> URL: https://issues.apache.org/jira/browse/HIVE-25514
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Standalone Metastore
>Affects Versions: 4.0.0
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The following commands should honor \{OWNER} policies from Apache Ranger in 
> the HMS.
> {code:java}
> Show partitions table_name;
> alter table foo.table_name partition (country='us') rename to partition 
> (country='canada);
> alter table foo.table_name drop partition (id='canada');{code}
> The examples above are tables with partitions. So the partition APIs in HMS 
> should be modifed to honor \{owner} policies from Apache ranger. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25514) Alter table with partitions should honor {OWNER} policies from Apache Ranger in the HMS

2021-09-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25514?focusedWorklogId=658447=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-658447
 ]

ASF GitHub Bot logged work on HIVE-25514:
-

Author: ASF GitHub Bot
Created on: 30/Sep/21 15:29
Start Date: 30/Sep/21 15:29
Worklog Time Spent: 10m 
  Work Description: nrg4878 commented on a change in pull request #2634:
URL: https://github.com/apache/hive/pull/2634#discussion_r719523916



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java
##
@@ -5461,6 +5462,37 @@ public GetPartitionResponse 
get_partition_req(GetPartitionRequest req)
*/
   private void fireReadTablePreEvent(String catName, String dbName, String 
tblName)
   throws MetaException, NoSuchObjectException {
+if (catName == null) {
+  throw new NullPointerException("catName is null");
+}
+
+if (isBlank(catName)) {
+  throw new NoSuchObjectException("catName is not valid");
+}
+
+if (dbName == null) {
+  throw new NullPointerException("dbName is null");
+}
+
+if (isBlank(dbName)) {
+  throw new NoSuchObjectException("dbName is not valid");
+}
+
+List filteredDb = 
FilterUtils.filterDbNamesIfEnabled(isServerFilterEnabled, filterHook,
+Collections.singletonList(dbName));
+
+if (filteredDb.isEmpty()) {
+  throw new NoSuchObjectException("Database " + dbName + " does not 
exist");
+}
+
+if (tblName == null) {

Review comment:
   These checks can me moved to the top and merged with catName and dbName ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 658447)
Time Spent: 40m  (was: 0.5h)

> Alter table with partitions should honor {OWNER} policies from Apache Ranger 
> in the HMS
> ---
>
> Key: HIVE-25514
> URL: https://issues.apache.org/jira/browse/HIVE-25514
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Standalone Metastore
>Affects Versions: 4.0.0
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The following commands should honor \{OWNER} policies from Apache Ranger in 
> the HMS.
> {code:java}
> Show partitions table_name;
> alter table foo.table_name partition (country='us') rename to partition 
> (country='canada);
> alter table foo.table_name drop partition (id='canada');{code}
> The examples above are tables with partitions. So the partition APIs in HMS 
> should be modifed to honor \{owner} policies from Apache ranger. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25514) Alter table with partitions should honor {OWNER} policies from Apache Ranger in the HMS

2021-09-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25514?focusedWorklogId=658443=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-658443
 ]

ASF GitHub Bot logged work on HIVE-25514:
-

Author: ASF GitHub Bot
Created on: 30/Sep/21 15:27
Start Date: 30/Sep/21 15:27
Worklog Time Spent: 10m 
  Work Description: nrg4878 commented on a change in pull request #2634:
URL: https://github.com/apache/hive/pull/2634#discussion_r719521711



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java
##
@@ -5461,6 +5462,37 @@ public GetPartitionResponse 
get_partition_req(GetPartitionRequest req)
*/
   private void fireReadTablePreEvent(String catName, String dbName, String 
tblName)
   throws MetaException, NoSuchObjectException {
+if (catName == null) {
+  throw new NullPointerException("catName is null");
+}
+
+if (isBlank(catName)) {
+  throw new NoSuchObjectException("catName is not valid");
+}
+
+if (dbName == null) {
+  throw new NullPointerException("dbName is null");
+}
+
+if (isBlank(dbName)) {

Review comment:
   Same here. Converge them into a single condition or even merge with the 
catalog condition
   if catName == null || isBlank(catName) || dbName == null || isBlank(dbName)
   throw NoSuchObjectException




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 658443)
Time Spent: 0.5h  (was: 20m)

> Alter table with partitions should honor {OWNER} policies from Apache Ranger 
> in the HMS
> ---
>
> Key: HIVE-25514
> URL: https://issues.apache.org/jira/browse/HIVE-25514
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Standalone Metastore
>Affects Versions: 4.0.0
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The following commands should honor \{OWNER} policies from Apache Ranger in 
> the HMS.
> {code:java}
> Show partitions table_name;
> alter table foo.table_name partition (country='us') rename to partition 
> (country='canada);
> alter table foo.table_name drop partition (id='canada');{code}
> The examples above are tables with partitions. So the partition APIs in HMS 
> should be modifed to honor \{owner} policies from Apache ranger. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25514) Alter table with partitions should honor {OWNER} policies from Apache Ranger in the HMS

2021-09-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25514?focusedWorklogId=658442=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-658442
 ]

ASF GitHub Bot logged work on HIVE-25514:
-

Author: ASF GitHub Bot
Created on: 30/Sep/21 15:25
Start Date: 30/Sep/21 15:25
Worklog Time Spent: 10m 
  Work Description: nrg4878 commented on a change in pull request #2634:
URL: https://github.com/apache/hive/pull/2634#discussion_r719519946



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java
##
@@ -5461,6 +5462,37 @@ public GetPartitionResponse 
get_partition_req(GetPartitionRequest req)
*/
   private void fireReadTablePreEvent(String catName, String dbName, String 
tblName)
   throws MetaException, NoSuchObjectException {
+if (catName == null) {
+  throw new NullPointerException("catName is null");
+}
+
+if (isBlank(catName)) {

Review comment:
   I think we can converge the null check and blank check into a single 
condition and throw a NoSuchObjectException in that case. We shouldn't be 
throwing a NullPointerException.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 658442)
Time Spent: 20m  (was: 10m)

> Alter table with partitions should honor {OWNER} policies from Apache Ranger 
> in the HMS
> ---
>
> Key: HIVE-25514
> URL: https://issues.apache.org/jira/browse/HIVE-25514
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Standalone Metastore
>Affects Versions: 4.0.0
>Reporter: Sai Hemanth Gantasala
>Assignee: Sai Hemanth Gantasala
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The following commands should honor \{OWNER} policies from Apache Ranger in 
> the HMS.
> {code:java}
> Show partitions table_name;
> alter table foo.table_name partition (country='us') rename to partition 
> (country='canada);
> alter table foo.table_name drop partition (id='canada');{code}
> The examples above are tables with partitions. So the partition APIs in HMS 
> should be modifed to honor \{owner} policies from Apache ranger. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25553) Support Map data-type natively in Arrow format

2021-09-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25553?focusedWorklogId=658413=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-658413
 ]

ASF GitHub Bot logged work on HIVE-25553:
-

Author: ASF GitHub Bot
Created on: 30/Sep/21 15:01
Start Date: 30/Sep/21 15:01
Worklog Time Spent: 10m 
  Work Description: sankarh commented on a change in pull request #2689:
URL: https://github.com/apache/hive/pull/2689#discussion_r719480605



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/arrow/ArrowColumnarBatchSerDe.java
##
@@ -122,6 +122,7 @@ private static Field toField(String name, TypeInfo 
typeInfo) {
   case SHORT:
 return Field.nullable(name, MinorType.SMALLINT.getType());
   case INT:
+new Field(name, new FieldType(false, new ArrowType.Int(32, true), 
null), null);

Review comment:
   This statement is no-op. Should be removed.

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/arrow/ArrowColumnarBatchSerDe.java
##
@@ -160,7 +161,7 @@ private static Field toField(String name, TypeInfo 
typeInfo) {
 final ListTypeInfo listTypeInfo = (ListTypeInfo) typeInfo;
 final TypeInfo elementTypeInfo = listTypeInfo.getListElementTypeInfo();
 return new Field(name, FieldType.nullable(MinorType.LIST.getType()),
-Lists.newArrayList(toField(DEFAULT_ARROW_FIELD_NAME, 
elementTypeInfo)));
+Lists.newArrayList(toField(name, elementTypeInfo)));

Review comment:
   Why default name (DEFAULT_ARROW_FIELD_NAME) is changed in LIST but used 
in UNION?

##
File path: 
itests/hive-unit/src/test/java/org/apache/hive/jdbc/TestJdbcWithMiniLlapVectorArrow.java
##
@@ -64,8 +65,8 @@ public static void beforeTest() throws Exception {
 return new LlapArrowRowInputFormat(Long.MAX_VALUE);
   }
 
-  // Currently MAP type is not supported. Add it back when Arrow 1.0 is 
released.
-  // See: SPARK-21187
+  // Currently, loading from a text file gives errors with Map dataType.
+  // This needs to be fixed when adding support for non-ORC writes (text and 
parquet) for the llap-ext-client.

Review comment:
   Remove this statement and create a follow-up JIRA to fix this issue and 
link with this one.

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/arrow/ArrowColumnarBatchSerDe.java
##
@@ -185,7 +186,7 @@ private static Field toField(String name, TypeInfo 
typeInfo) {
 final TypeInfo keyTypeInfo = mapTypeInfo.getMapKeyTypeInfo();
 final TypeInfo valueTypeInfo = mapTypeInfo.getMapValueTypeInfo();
 final StructTypeInfo mapStructTypeInfo = new StructTypeInfo();
-mapStructTypeInfo.setAllStructFieldNames(Lists.newArrayList("keys", 
"values"));
+mapStructTypeInfo.setAllStructFieldNames(Lists.newArrayList("key", 
"value"));

Review comment:
   Why is this naming change significant?

##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/io/arrow/ArrowColumnarBatchSerDe.java
##
@@ -170,7 +171,7 @@ private static Field toField(String name, TypeInfo 
typeInfo) {
 for (int i = 0; i < structSize; i++) {
   structFields.add(toField(fieldNames.get(i), fieldTypeInfos.get(i)));
 }
-return new Field(name, FieldType.nullable(MinorType.STRUCT.getType()), 
structFields);
+return new Field(name, new FieldType(false, new ArrowType.Struct(), 
null), structFields);

Review comment:
   Why struct value is not-nullable?

##
File path: ql/src/java/org/apache/hadoop/hive/ql/io/arrow/Serializer.java
##
@@ -226,7 +226,7 @@ public ArrowWrapperWritable 
serializeBatch(VectorizedRowBatch vectorizedRowBatch
   }
 
   private static FieldType toFieldType(TypeInfo typeInfo) {
-return new FieldType(true, toArrowType(typeInfo), null);
+return new FieldType(false, toArrowType(typeInfo), null);

Review comment:
   Why is it changed to nullable=false?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 658413)
Time Spent: 20m  (was: 10m)

> Support Map data-type natively in Arrow format
> --
>
> Key: HIVE-25553
> URL: https://issues.apache.org/jira/browse/HIVE-25553
> Project: Hive
>  Issue Type: Improvement
>  Components: llap, Serializers/Deserializers
>Reporter: Adesh Kumar Rao
>Assignee: Adesh Kumar Rao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 

[jira] [Work logged] (HIVE-25576) Raise exception instead of silent change for new DateTimeformatter

2021-09-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25576?focusedWorklogId=658412=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-658412
 ]

ASF GitHub Bot logged work on HIVE-25576:
-

Author: ASF GitHub Bot
Created on: 30/Sep/21 15:01
Start Date: 30/Sep/21 15:01
Worklog Time Spent: 10m 
  Work Description: ashish-kumar-sharma opened a new pull request #2690:
URL: https://github.com/apache/hive/pull/2690


   …formatter
   
   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 658412)
Remaining Estimate: 0h
Time Spent: 10m

> Raise exception instead of silent change for new DateTimeformatter
> --
>
> Key: HIVE-25576
> URL: https://issues.apache.org/jira/browse/HIVE-25576
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0
>Reporter: Ashish Sharma
>Assignee: Ashish Sharma
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> *History*
> *Hive 1.2* - 
> VM time zone set to Asia/Bangkok
> *Query* - SELECT FROM_UNIXTIME(UNIX_TIMESTAMP('1800-01-01 00:00:00 
> UTC','-MM-dd HH:mm:ss z'));
> *Result* - 1800-01-01 07:00:00
> *Implementation details* - 
> SimpleDateFormat formatter = new SimpleDateFormat(pattern);
> Long unixtime = formatter.parse(textval).getTime() / 1000;
> Date date = new Date(unixtime * 1000L);
> https://docs.oracle.com/javase/8/docs/api/java/util/Date.html . In official 
> documentation they have mention that "Unfortunately, the API for these 
> functions was not amenable to internationalization and The corresponding 
> methods in Date are deprecated" . Due to that this is producing wrong result
> *Master branch* - 
> set hive.local.time.zone=Asia/Bangkok;
> *Query* - SELECT FROM_UNIXTIME(UNIX_TIMESTAMP('1800-01-01 00:00:00 
> UTC','-MM-dd HH:mm:ss z'));
> *Result* - 1800-01-01 06:42:04
> *Implementation details* - 
> DateTimeFormatter dtformatter = new DateTimeFormatterBuilder()
> .parseCaseInsensitive()
> .appendPattern(pattern)
> .toFormatter();
> ZonedDateTime zonedDateTime = 
> ZonedDateTime.parse(textval,dtformatter).withZoneSameInstant(ZoneId.of(timezone));
> Long dttime = zonedDateTime.toInstant().getEpochSecond();
> *Problem*- 
> Now *SimpleDateFormat* has been replaced with *DateTimeFormatter* which is 
> giving the correct result but it is not backword compatible. Which is causing 
> issue at time for migration to new version. Because the older data written is 
> using Hive 1.x or 2.x is not compatible with *DateTimeFormatter*.
> *Solution*
> Introduce an config "hive.legacy.timeParserPolicy" with following values -
> EXCEPTION - compare value of both *SimpleDateFormat* & *DateTimeFormatter* 
> raise exception if doesn't match 
> LEGACY - use *SimpleDateFormat* 
> CORRECTED  - use *DateTimeFormatter*
> This will help hive user in following manner - 
> 1. Migrate to new version using *LEGACY*
> 2. Find values which are not compatible with new version - *EXCEPTION*
> 3. Use latest date apis - *CORRECTED*
> Note: apache spark also face the same issue 
> https://issues.apache.org/jira/browse/SPARK-30668



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25576) Raise exception instead of silent change for new DateTimeformatter

2021-09-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25576:
--
Labels: pull-request-available  (was: )

> Raise exception instead of silent change for new DateTimeformatter
> --
>
> Key: HIVE-25576
> URL: https://issues.apache.org/jira/browse/HIVE-25576
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 4.0.0
>Reporter: Ashish Sharma
>Assignee: Ashish Sharma
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> *History*
> *Hive 1.2* - 
> VM time zone set to Asia/Bangkok
> *Query* - SELECT FROM_UNIXTIME(UNIX_TIMESTAMP('1800-01-01 00:00:00 
> UTC','-MM-dd HH:mm:ss z'));
> *Result* - 1800-01-01 07:00:00
> *Implementation details* - 
> SimpleDateFormat formatter = new SimpleDateFormat(pattern);
> Long unixtime = formatter.parse(textval).getTime() / 1000;
> Date date = new Date(unixtime * 1000L);
> https://docs.oracle.com/javase/8/docs/api/java/util/Date.html . In official 
> documentation they have mention that "Unfortunately, the API for these 
> functions was not amenable to internationalization and The corresponding 
> methods in Date are deprecated" . Due to that this is producing wrong result
> *Master branch* - 
> set hive.local.time.zone=Asia/Bangkok;
> *Query* - SELECT FROM_UNIXTIME(UNIX_TIMESTAMP('1800-01-01 00:00:00 
> UTC','-MM-dd HH:mm:ss z'));
> *Result* - 1800-01-01 06:42:04
> *Implementation details* - 
> DateTimeFormatter dtformatter = new DateTimeFormatterBuilder()
> .parseCaseInsensitive()
> .appendPattern(pattern)
> .toFormatter();
> ZonedDateTime zonedDateTime = 
> ZonedDateTime.parse(textval,dtformatter).withZoneSameInstant(ZoneId.of(timezone));
> Long dttime = zonedDateTime.toInstant().getEpochSecond();
> *Problem*- 
> Now *SimpleDateFormat* has been replaced with *DateTimeFormatter* which is 
> giving the correct result but it is not backword compatible. Which is causing 
> issue at time for migration to new version. Because the older data written is 
> using Hive 1.x or 2.x is not compatible with *DateTimeFormatter*.
> *Solution*
> Introduce an config "hive.legacy.timeParserPolicy" with following values -
> EXCEPTION - compare value of both *SimpleDateFormat* & *DateTimeFormatter* 
> raise exception if doesn't match 
> LEGACY - use *SimpleDateFormat* 
> CORRECTED  - use *DateTimeFormatter*
> This will help hive user in following manner - 
> 1. Migrate to new version using *LEGACY*
> 2. Find values which are not compatible with new version - *EXCEPTION*
> 3. Use latest date apis - *CORRECTED*
> Note: apache spark also face the same issue 
> https://issues.apache.org/jira/browse/SPARK-30668



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25556) Remove com.vlkan.flatbuffers dependency from serde

2021-09-30 Thread Sankar Hariappan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-25556:

Fix Version/s: 4.0.0

> Remove com.vlkan.flatbuffers dependency from serde
> --
>
> Key: HIVE-25556
> URL: https://issues.apache.org/jira/browse/HIVE-25556
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Adesh Kumar Rao
>Assignee: Adesh Kumar Rao
>Priority: Major
> Fix For: 4.0.0
>
>
> This dependency was added initially as google flatbuffers were not getting 
> published to maven. 
>  
> Since this is not the case now 
> ([https://mvnrepository.com/artifact/com.google.flatbuffers/flatbuffers-java),]
>  this should be removed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25556) Remove com.vlkan.flatbuffers dependency from serde

2021-09-30 Thread Sankar Hariappan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-25556:

Component/s: Serializers/Deserializers

> Remove com.vlkan.flatbuffers dependency from serde
> --
>
> Key: HIVE-25556
> URL: https://issues.apache.org/jira/browse/HIVE-25556
> Project: Hive
>  Issue Type: Sub-task
>  Components: Serializers/Deserializers
>Reporter: Adesh Kumar Rao
>Assignee: Adesh Kumar Rao
>Priority: Major
> Fix For: 4.0.0
>
>
> This dependency was added initially as google flatbuffers were not getting 
> published to maven. 
>  
> Since this is not the case now 
> ([https://mvnrepository.com/artifact/com.google.flatbuffers/flatbuffers-java),]
>  this should be removed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-25556) Remove com.vlkan.flatbuffers dependency from serde

2021-09-30 Thread Sankar Hariappan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan resolved HIVE-25556.
-
Resolution: Invalid

Already fixed by [HIVE-22827|https://issues.apache.org/jira/browse/HIVE-22827]

> Remove com.vlkan.flatbuffers dependency from serde
> --
>
> Key: HIVE-25556
> URL: https://issues.apache.org/jira/browse/HIVE-25556
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Adesh Kumar Rao
>Assignee: Adesh Kumar Rao
>Priority: Major
>
> This dependency was added initially as google flatbuffers were not getting 
> published to maven. 
>  
> Since this is not the case now 
> ([https://mvnrepository.com/artifact/com.google.flatbuffers/flatbuffers-java),]
>  this should be removed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25546) Enable incremental rebuild of Materialized views with insert only source tables

2021-09-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25546?focusedWorklogId=658350=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-658350
 ]

ASF GitHub Bot logged work on HIVE-25546:
-

Author: ASF GitHub Bot
Created on: 30/Sep/21 13:37
Start Date: 30/Sep/21 13:37
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on a change in pull request #2663:
URL: https://github.com/apache/hive/pull/2663#discussion_r719415132



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/views/HiveInsertOnlyScanWriteIdRule.java
##
@@ -0,0 +1,56 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.optimizer.calcite.rules.views;
+
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.rel.RelNode;
+import org.apache.hadoop.hive.ql.io.AcidUtils;
+import org.apache.hadoop.hive.ql.metadata.Table;
+import org.apache.hadoop.hive.ql.optimizer.calcite.RelOptHiveTable;
+import org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveTableScan;
+
+import static 
org.apache.hadoop.hive.conf.Constants.INSERT_ONLY_FETCH_BUCKET_ID;
+
+/**
+ * This rule turns on populating writeId of insert only table scans.
+ */
+public class HiveInsertOnlyScanWriteIdRule extends RelOptRule {
+
+  public static final HiveInsertOnlyScanWriteIdRule INSTANCE = new 
HiveInsertOnlyScanWriteIdRule();
+
+  private HiveInsertOnlyScanWriteIdRule() {
+super(operand(HiveTableScan.class, none()));
+  }
+
+  @Override
+  public boolean matches(RelOptRuleCall call) {
+HiveTableScan tableScan = call.rel(0);
+Table tableMD = ((RelOptHiveTable) tableScan.getTable()).getHiveTableMD();
+return !tableMD.isMaterializedView() && 
AcidUtils.isInsertOnlyTable(tableMD);
+  }
+
+  @Override
+  public void onMatch(RelOptRuleCall call) {
+HiveTableScan tableScan = call.rel(0);
+RelNode newTableScan = call.builder()
+
.push(tableScan.setTableScanTrait(HiveTableScan.HiveTableScanTrait.FetchInsertOnlyBucketIds))

Review comment:
   Sure. Added these details to the javadocs.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 658350)
Time Spent: 3h  (was: 2h 50m)

> Enable incremental rebuild of Materialized views with insert only source 
> tables
> ---
>
> Key: HIVE-25546
> URL: https://issues.apache.org/jira/browse/HIVE-25546
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO, Materialized views
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> {code}
> create table t1(a int, b int, c int) stored as parquet TBLPROPERTIES 
> ('transactional'='true', 'transactional_properties'='insert_only');
> create materialized view mat1 stored as orc TBLPROPERTIES 
> ('transactional'='true') as
> select a, b, c from t1 where a > 10;
> {code}
> Currently materialized view *mat1* can not be rebuilt incrementally because 
> it has an insert only source table (t1). Such tables does not have 
> ROW_ID.write_id which is required to identify newly inserted records since 
> the last rebuild.
> HIVE-25406 adds the ability to query write_id.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-25580) Increase the performance of getTableColumnStatistics and getPartitionColumnStatistics

2021-09-30 Thread David Mollitor (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17422788#comment-17422788
 ] 

David Mollitor commented on HIVE-25580:
---

Yes.

{code:sql}
CREATE INDEX PCS_STATS_IDX ON PART_COL_STATS 
(DB_NAME,TABLE_NAME,COLUMN_NAME,PARTITION_NAME) USING BTREE;
{code}

DB/TBL should hit the index and be much faster.

> Increase the performance of getTableColumnStatistics and 
> getPartitionColumnStatistics
> -
>
> Key: HIVE-25580
> URL: https://issues.apache.org/jira/browse/HIVE-25580
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>
> When the PART_COL_STATS table contains high number of rows the 
> getTableColumnStatistics and getPartitionColumnStatistics response time 
> increases.
> The root cause is the full table scan for the jdbc query below:
> {code:java}
> 2021-09-27 13:22:44,218 DEBUG DataNucleus.Datastore.Native: 
> [pool-6-thread-199]: SELECT DISTINCT "A0"."ENGINE" FROM "PART_COL_STATS" "A0"
> 2021-09-27 13:22:50,569 DEBUG DataNucleus.Datastore.Retrieve: 
> [pool-6-thread-199]: Execution Time = 6351 ms {code}
> The time spent in 
> [here|https://github.com/apache/hive/blob/ed1882ef569f8d00317597c269cfae35ace5a5fa/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L9965]:
> {code:java}
>   query = pm.newQuery(MPartitionColumnStatistics.class);
>   query.setResult("DISTINCT engine");
>   Collection names = (Collection) query.execute();
> {code}
> We might get a better performance if we limit the query range based on the 
> cat/db/table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25553) Support Map data-type natively in Arrow format

2021-09-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25553?focusedWorklogId=658338=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-658338
 ]

ASF GitHub Bot logged work on HIVE-25553:
-

Author: ASF GitHub Bot
Created on: 30/Sep/21 13:24
Start Date: 30/Sep/21 13:24
Worklog Time Spent: 10m 
  Work Description: warriersruthi opened a new pull request #2689:
URL: https://github.com/apache/hive/pull/2689


   This covers the following sub-tasks as well:
   HIVE-25554: Upgrade arrow version to 0.15
   HIVE-2: ArrowColumnarBatchSerDe should store map natively instead of 
converting to list
   HIVE-25556: Remove com.vlkan.flatbuffers dependency from serde
   
   **What changes were proposed in this pull request?**
   a. Upgrading arrow version to version 0.15.0 (where map data-type is 
supported)
   b. Modifying ArrowColumnarBatchSerDe and corresponding 
Serializer/Deserializer to not use list as a workaround for map and use the 
arrow map data-type instead
   c. Taking care of creating non-nullable struct and non-nullable key type for 
the map data-type in ArrowColumnarBatchSerDe
   
   **Why are the changes needed?**
   Currently, ArrowColumnarBatchSerDe converts map datatype as a list of 
structs data-type (where the struct is containing the key-value pair of the 
map).
   This causes issues when reading Map datatype using llap-ext-client as it 
reads a list of structs instead.
   HiveWarehouseConnector which uses the llap-ext-client throws exception when 
the schema (containing Map data type) is different from actual data (list of 
structs).
   This change includes the fix for this issue.
   
   **Does this PR introduce any user-facing change?**
   No
   
   **How was this patch tested?**
   Enabled back the Arrow specific tests in Hive code


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 658338)
Remaining Estimate: 0h
Time Spent: 10m

> Support Map data-type natively in Arrow format
> --
>
> Key: HIVE-25553
> URL: https://issues.apache.org/jira/browse/HIVE-25553
> Project: Hive
>  Issue Type: Improvement
>  Components: llap, Serializers/Deserializers
>Reporter: Adesh Kumar Rao
>Assignee: Adesh Kumar Rao
>Priority: Major
> Fix For: 4.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently ArrowColumnarBatchSerDe converts map datatype as a list of structs 
> data-type (where stuct is containing the key-value pair of the map). This 
> causes issues when reading Map datatype using llap-ext-client as it reads a 
> list of structs instead. 
> HiveWarehouseConnector which uses the llap-ext-client throws exception when 
> the schema (containing Map data type) is different from actual data (list of 
> structs).
>  
> Fixing this issue requires upgrading arrow version (where map data-type is 
> supported), modifying ArrowColumnarBatchSerDe and corresponding 
> Serializer/Deserializer to not use list as a workaround for map and use the 
> arrow map data-type instead. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25553) Support Map data-type natively in Arrow format

2021-09-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25553:
--
Labels: pull-request-available  (was: )

> Support Map data-type natively in Arrow format
> --
>
> Key: HIVE-25553
> URL: https://issues.apache.org/jira/browse/HIVE-25553
> Project: Hive
>  Issue Type: Improvement
>  Components: llap, Serializers/Deserializers
>Reporter: Adesh Kumar Rao
>Assignee: Adesh Kumar Rao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently ArrowColumnarBatchSerDe converts map datatype as a list of structs 
> data-type (where stuct is containing the key-value pair of the map). This 
> causes issues when reading Map datatype using llap-ext-client as it reads a 
> list of structs instead. 
> HiveWarehouseConnector which uses the llap-ext-client throws exception when 
> the schema (containing Map data type) is different from actual data (list of 
> structs).
>  
> Fixing this issue requires upgrading arrow version (where map data-type is 
> supported), modifying ArrowColumnarBatchSerDe and corresponding 
> Serializer/Deserializer to not use list as a workaround for map and use the 
> arrow map data-type instead. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25577) unix_timestamp() is ignoring the time zone value

2021-09-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25577?focusedWorklogId=658324=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-658324
 ]

ASF GitHub Bot logged work on HIVE-25577:
-

Author: ASF GitHub Bot
Created on: 30/Sep/21 13:10
Start Date: 30/Sep/21 13:10
Worklog Time Spent: 10m 
  Work Description: sankarh commented on a change in pull request #2686:
URL: https://github.com/apache/hive/pull/2686#discussion_r719373104



##
File path: 
ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFToUnixTimestamp.java
##
@@ -167,7 +167,15 @@ public void testStringArg2() throws HiveException {
 runAndVerify(udf2,
 new Text("1400-02-01 00:00:00 ICT"),
 new Text("-MM-dd HH:mm:ss z"),
-new LongWritable(TimestampTZUtil.parse("1400-02-01 00:00:00", 
ZoneId.systemDefault()).getEpochSecond()));
+new LongWritable(TimestampTZUtil.parse("1400-01-31 09:00:22", 
ZoneId.systemDefault()).getEpochSecond()));

Review comment:
   Isn't PDT is 13:42:04 behind ICT (PDT = UTC-7), which means the value 
should be 1400-01-31 10:17:56.

##
File path: ql/src/test/results/clientpositive/llap/udf5.q.out
##
@@ -342,3 +342,219 @@ POSTHOOK: type: QUERY
 POSTHOOK: Input: _dummy_database@_dummy_table
  A masked pattern was here 
 NULL
+PREHOOK: query: SELECT FROM_UNIXTIME(UNIX_TIMESTAMP('2021-01-02 03:04:05 
UTC','-MM-dd HH:mm:ss z'))
+PREHOOK: type: QUERY
+PREHOOK: Input: _dummy_database@_dummy_table
+ A masked pattern was here 
+POSTHOOK: query: SELECT FROM_UNIXTIME(UNIX_TIMESTAMP('2021-01-02 03:04:05 
UTC','-MM-dd HH:mm:ss z'))
+POSTHOOK: type: QUERY
+POSTHOOK: Input: _dummy_database@_dummy_table
+ A masked pattern was here 
+2021-01-02 10:04:05
+PREHOOK: query: SELECT FROM_UNIXTIME(UNIX_TIMESTAMP('1400-01-01 00:00:00 
UTC','-MM-dd HH:mm:ss z'))
+PREHOOK: type: QUERY
+PREHOOK: Input: _dummy_database@_dummy_table
+ A masked pattern was here 
+POSTHOOK: query: SELECT FROM_UNIXTIME(UNIX_TIMESTAMP('1400-01-01 00:00:00 
UTC','-MM-dd HH:mm:ss z'))
+POSTHOOK: type: QUERY
+POSTHOOK: Input: _dummy_database@_dummy_table
+ A masked pattern was here 
+1400-01-01 06:42:04
+PREHOOK: query: SELECT FROM_UNIXTIME(UNIX_TIMESTAMP('1800-01-01 00:00:00 
UTC','-MM-dd HH:mm:ss z'))
+PREHOOK: type: QUERY
+PREHOOK: Input: _dummy_database@_dummy_table
+ A masked pattern was here 
+POSTHOOK: query: SELECT FROM_UNIXTIME(UNIX_TIMESTAMP('1800-01-01 00:00:00 
UTC','-MM-dd HH:mm:ss z'))
+POSTHOOK: type: QUERY
+POSTHOOK: Input: _dummy_database@_dummy_table
+ A masked pattern was here 
+1800-01-01 06:42:04
+PREHOOK: query: SELECT FROM_UNIXTIME(UNIX_TIMESTAMP('1900-01-01 00:00:00 
UTC','-MM-dd HH:mm:ss z'))
+PREHOOK: type: QUERY
+PREHOOK: Input: _dummy_database@_dummy_table
+ A masked pattern was here 
+POSTHOOK: query: SELECT FROM_UNIXTIME(UNIX_TIMESTAMP('1900-01-01 00:00:00 
UTC','-MM-dd HH:mm:ss z'))
+POSTHOOK: type: QUERY
+POSTHOOK: Input: _dummy_database@_dummy_table
+ A masked pattern was here 
+1900-01-01 06:42:04
+PREHOOK: query: SELECT FROM_UNIXTIME(UNIX_TIMESTAMP('2000-01-07 00:00:00 
UTC','-MM-dd HH:mm:ss z'))
+PREHOOK: type: QUERY
+PREHOOK: Input: _dummy_database@_dummy_table
+ A masked pattern was here 
+POSTHOOK: query: SELECT FROM_UNIXTIME(UNIX_TIMESTAMP('2000-01-07 00:00:00 
UTC','-MM-dd HH:mm:ss z'))
+POSTHOOK: type: QUERY
+POSTHOOK: Input: _dummy_database@_dummy_table
+ A masked pattern was here 
+2000-01-07 07:00:00
+PREHOOK: query: SELECT FROM_UNIXTIME(UNIX_TIMESTAMP('-00-00 00:00:00 
UTC','-MM-dd HH:mm:ss z'))
+PREHOOK: type: QUERY
+PREHOOK: Input: _dummy_database@_dummy_table
+ A masked pattern was here 
+POSTHOOK: query: SELECT FROM_UNIXTIME(UNIX_TIMESTAMP('-00-00 00:00:00 
UTC','-MM-dd HH:mm:ss z'))
+POSTHOOK: type: QUERY
+POSTHOOK: Input: _dummy_database@_dummy_table
+ A masked pattern was here 
+NULL
+PREHOOK: query: SELECT FROM_UNIXTIME(UNIX_TIMESTAMP('-99-99 00:00:00 
UTC','-MM-dd HH:mm:ss z'))
+PREHOOK: type: QUERY
+PREHOOK: Input: _dummy_database@_dummy_table
+ A masked pattern was here 
+POSTHOOK: query: SELECT FROM_UNIXTIME(UNIX_TIMESTAMP('-99-99 00:00:00 
UTC','-MM-dd HH:mm:ss z'))
+POSTHOOK: type: QUERY
+POSTHOOK: Input: _dummy_database@_dummy_table
+ A masked pattern was here 
+NULL
+PREHOOK: query: SELECT FROM_UNIXTIME(UNIX_TIMESTAMP('-12-31 00:00:00 
UTC','-MM-dd HH:mm:ss z'))
+PREHOOK: type: QUERY
+PREHOOK: Input: _dummy_database@_dummy_table
+ A masked pattern was here 
+POSTHOOK: query: SELECT FROM_UNIXTIME(UNIX_TIMESTAMP('-12-31 00:00:00 
UTC','-MM-dd HH:mm:ss z'))
+POSTHOOK: type: QUERY
+POSTHOOK: Input: _dummy_database@_dummy_table
+ A masked pattern was here 
+-12-31 07:00:00
+PREHOOK: query: SELECT 

[jira] [Work logged] (HIVE-25546) Enable incremental rebuild of Materialized views with insert only source tables

2021-09-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25546?focusedWorklogId=658318=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-658318
 ]

ASF GitHub Bot logged work on HIVE-25546:
-

Author: ASF GitHub Bot
Created on: 30/Sep/21 13:06
Start Date: 30/Sep/21 13:06
Worklog Time Spent: 10m 
  Work Description: zabetak commented on a change in pull request #2663:
URL: https://github.com/apache/hive/pull/2663#discussion_r719385811



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/views/HiveInsertOnlyScanWriteIdRule.java
##
@@ -0,0 +1,56 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.optimizer.calcite.rules.views;
+
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.rel.RelNode;
+import org.apache.hadoop.hive.ql.io.AcidUtils;
+import org.apache.hadoop.hive.ql.metadata.Table;
+import org.apache.hadoop.hive.ql.optimizer.calcite.RelOptHiveTable;
+import org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveTableScan;
+
+import static 
org.apache.hadoop.hive.conf.Constants.INSERT_ONLY_FETCH_BUCKET_ID;
+
+/**
+ * This rule turns on populating writeId of insert only table scans.
+ */
+public class HiveInsertOnlyScanWriteIdRule extends RelOptRule {
+
+  public static final HiveInsertOnlyScanWriteIdRule INSTANCE = new 
HiveInsertOnlyScanWriteIdRule();
+
+  private HiveInsertOnlyScanWriteIdRule() {
+super(operand(HiveTableScan.class, none()));
+  }
+
+  @Override
+  public boolean matches(RelOptRuleCall call) {
+HiveTableScan tableScan = call.rel(0);
+Table tableMD = ((RelOptHiveTable) tableScan.getTable()).getHiveTableMD();
+return !tableMD.isMaterializedView() && 
AcidUtils.isInsertOnlyTable(tableMD);
+  }
+
+  @Override
+  public void onMatch(RelOptRuleCall call) {
+HiveTableScan tableScan = call.rel(0);
+RelNode newTableScan = call.builder()
+
.push(tableScan.setTableScanTrait(HiveTableScan.HiveTableScanTrait.FetchInsertOnlyBucketIds))

Review comment:
   Please add a small comment in the Class javadoc cause I am sure my 
future self seeing this will not remember the discussion and may try to remove 
the rule or put it in another place :).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 658318)
Time Spent: 2h 50m  (was: 2h 40m)

> Enable incremental rebuild of Materialized views with insert only source 
> tables
> ---
>
> Key: HIVE-25546
> URL: https://issues.apache.org/jira/browse/HIVE-25546
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO, Materialized views
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> {code}
> create table t1(a int, b int, c int) stored as parquet TBLPROPERTIES 
> ('transactional'='true', 'transactional_properties'='insert_only');
> create materialized view mat1 stored as orc TBLPROPERTIES 
> ('transactional'='true') as
> select a, b, c from t1 where a > 10;
> {code}
> Currently materialized view *mat1* can not be rebuilt incrementally because 
> it has an insert only source table (t1). Such tables does not have 
> ROW_ID.write_id which is required to identify newly inserted records since 
> the last rebuild.
> HIVE-25406 adds the ability to query write_id.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25546) Enable incremental rebuild of Materialized views with insert only source tables

2021-09-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25546?focusedWorklogId=658314=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-658314
 ]

ASF GitHub Bot logged work on HIVE-25546:
-

Author: ASF GitHub Bot
Created on: 30/Sep/21 13:03
Start Date: 30/Sep/21 13:03
Worklog Time Spent: 10m 
  Work Description: zabetak commented on a change in pull request #2663:
URL: https://github.com/apache/hive/pull/2663#discussion_r719383292



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/views/HiveInsertOnlyScanWriteIdRule.java
##
@@ -0,0 +1,56 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.optimizer.calcite.rules.views;
+
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.rel.RelNode;
+import org.apache.hadoop.hive.ql.io.AcidUtils;
+import org.apache.hadoop.hive.ql.metadata.Table;
+import org.apache.hadoop.hive.ql.optimizer.calcite.RelOptHiveTable;
+import org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveTableScan;
+
+import static 
org.apache.hadoop.hive.conf.Constants.INSERT_ONLY_FETCH_BUCKET_ID;
+
+/**
+ * This rule turns on populating writeId of insert only table scans.
+ */
+public class HiveInsertOnlyScanWriteIdRule extends RelOptRule {
+
+  public static final HiveInsertOnlyScanWriteIdRule INSTANCE = new 
HiveInsertOnlyScanWriteIdRule();
+
+  private HiveInsertOnlyScanWriteIdRule() {
+super(operand(HiveTableScan.class, none()));
+  }
+
+  @Override
+  public boolean matches(RelOptRuleCall call) {
+HiveTableScan tableScan = call.rel(0);
+Table tableMD = ((RelOptHiveTable) tableScan.getTable()).getHiveTableMD();
+return !tableMD.isMaterializedView() && 
AcidUtils.isInsertOnlyTable(tableMD);
+  }
+
+  @Override
+  public void onMatch(RelOptRuleCall call) {
+HiveTableScan tableScan = call.rel(0);
+RelNode newTableScan = call.builder()
+
.push(tableScan.setTableScanTrait(HiveTableScan.HiveTableScanTrait.FetchInsertOnlyBucketIds))

Review comment:
   OK, I finally got it :) Thanks for explaining.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 658314)
Time Spent: 2h 40m  (was: 2.5h)

> Enable incremental rebuild of Materialized views with insert only source 
> tables
> ---
>
> Key: HIVE-25546
> URL: https://issues.apache.org/jira/browse/HIVE-25546
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO, Materialized views
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> {code}
> create table t1(a int, b int, c int) stored as parquet TBLPROPERTIES 
> ('transactional'='true', 'transactional_properties'='insert_only');
> create materialized view mat1 stored as orc TBLPROPERTIES 
> ('transactional'='true') as
> select a, b, c from t1 where a > 10;
> {code}
> Currently materialized view *mat1* can not be rebuilt incrementally because 
> it has an insert only source table (t1). Such tables does not have 
> ROW_ID.write_id which is required to identify newly inserted records since 
> the last rebuild.
> HIVE-25406 adds the ability to query write_id.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25546) Enable incremental rebuild of Materialized views with insert only source tables

2021-09-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25546?focusedWorklogId=658302=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-658302
 ]

ASF GitHub Bot logged work on HIVE-25546:
-

Author: ASF GitHub Bot
Created on: 30/Sep/21 12:50
Start Date: 30/Sep/21 12:50
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on a change in pull request #2663:
URL: https://github.com/apache/hive/pull/2663#discussion_r719371779



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/views/HiveInsertOnlyScanWriteIdRule.java
##
@@ -0,0 +1,56 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.optimizer.calcite.rules.views;
+
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.rel.RelNode;
+import org.apache.hadoop.hive.ql.io.AcidUtils;
+import org.apache.hadoop.hive.ql.metadata.Table;
+import org.apache.hadoop.hive.ql.optimizer.calcite.RelOptHiveTable;
+import org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveTableScan;
+
+import static 
org.apache.hadoop.hive.conf.Constants.INSERT_ONLY_FETCH_BUCKET_ID;
+
+/**
+ * This rule turns on populating writeId of insert only table scans.
+ */
+public class HiveInsertOnlyScanWriteIdRule extends RelOptRule {
+
+  public static final HiveInsertOnlyScanWriteIdRule INSTANCE = new 
HiveInsertOnlyScanWriteIdRule();
+
+  private HiveInsertOnlyScanWriteIdRule() {
+super(operand(HiveTableScan.class, none()));
+  }
+
+  @Override
+  public boolean matches(RelOptRuleCall call) {
+HiveTableScan tableScan = call.rel(0);
+Table tableMD = ((RelOptHiveTable) tableScan.getTable()).getHiveTableMD();
+return !tableMD.isMaterializedView() && 
AcidUtils.isInsertOnlyTable(tableMD);
+  }
+
+  @Override
+  public void onMatch(RelOptRuleCall call) {
+HiveTableScan tableScan = call.rel(0);
+RelNode newTableScan = call.builder()
+
.push(tableScan.setTableScanTrait(HiveTableScan.HiveTableScanTrait.FetchInsertOnlyBucketIds))

Review comment:
   I don't want to enable fetching writeId automatically for any non MV 
insert only table.
   If someone execute the query like 
   ```
   SELECT t1.ROW__ID, t1.ROW__ID.writeId, a, b FROM t1;
   ```
   where `t1` is an insert only table 
   the result will contain `0` for ROW__ID.rowId for all records but will 
contain valid values for bucketId and writeId but only for not compacted 
records. This is confusing for users.
   
   But this feature can still be used in case of incremental MV rebuild since 
compaction excludes the use of incremental rebuild.
   
   So the goal was enable writeId fetch just when it is necessary. To control 
it in the AST level the `insertonly.fetch.bucketid` is used. It is a similar 
property like the `acid.fetch.deleted.rows`.
   See #2549
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 658302)
Time Spent: 2.5h  (was: 2h 20m)

> Enable incremental rebuild of Materialized views with insert only source 
> tables
> ---
>
> Key: HIVE-25546
> URL: https://issues.apache.org/jira/browse/HIVE-25546
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO, Materialized views
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> {code}
> create table t1(a int, b int, c int) stored as parquet TBLPROPERTIES 
> ('transactional'='true', 'transactional_properties'='insert_only');
> create materialized view mat1 stored as orc TBLPROPERTIES 
> 

[jira] [Assigned] (HIVE-25580) Increase the performance of getTableColumnStatistics and getPartitionColumnStatistics

2021-09-30 Thread Peter Vary (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary reassigned HIVE-25580:
-


> Increase the performance of getTableColumnStatistics and 
> getPartitionColumnStatistics
> -
>
> Key: HIVE-25580
> URL: https://issues.apache.org/jira/browse/HIVE-25580
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Major
>
> When the PART_COL_STATS table contains high number of rows the 
> getTableColumnStatistics and getPartitionColumnStatistics response time 
> increases.
> The root cause is the full table scan for the jdbc query below:
> {code:java}
> 2021-09-27 13:22:44,218 DEBUG DataNucleus.Datastore.Native: 
> [pool-6-thread-199]: SELECT DISTINCT "A0"."ENGINE" FROM "PART_COL_STATS" "A0"
> 2021-09-27 13:22:50,569 DEBUG DataNucleus.Datastore.Retrieve: 
> [pool-6-thread-199]: Execution Time = 6351 ms {code}
> The time spent in 
> [here|https://github.com/apache/hive/blob/ed1882ef569f8d00317597c269cfae35ace5a5fa/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L9965]:
> {code:java}
>   query = pm.newQuery(MPartitionColumnStatistics.class);
>   query.setResult("DISTINCT engine");
>   Collection names = (Collection) query.execute();
> {code}
> We might get a better performance if we limit the query range based on the 
> cat/db/table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25546) Enable incremental rebuild of Materialized views with insert only source tables

2021-09-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25546?focusedWorklogId=658272=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-658272
 ]

ASF GitHub Bot logged work on HIVE-25546:
-

Author: ASF GitHub Bot
Created on: 30/Sep/21 11:43
Start Date: 30/Sep/21 11:43
Worklog Time Spent: 10m 
  Work Description: zabetak commented on a change in pull request #2663:
URL: https://github.com/apache/hive/pull/2663#discussion_r719322625



##
File path: ql/src/test/queries/clientpositive/materialized_view_parquet.q
##
@@ -189,8 +189,10 @@ alter materialized view mv1_parquet_n2 rebuild;
 alter materialized view mv1_parquet_n2 rebuild;
 
 explain cbo
-select name from emps_parquet_n3 group by name;
+select name, sum(empid) from emps_parquet_n3 group by name;

Review comment:
   Thanks for clarifying




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 658272)
Time Spent: 2h 20m  (was: 2h 10m)

> Enable incremental rebuild of Materialized views with insert only source 
> tables
> ---
>
> Key: HIVE-25546
> URL: https://issues.apache.org/jira/browse/HIVE-25546
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO, Materialized views
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> {code}
> create table t1(a int, b int, c int) stored as parquet TBLPROPERTIES 
> ('transactional'='true', 'transactional_properties'='insert_only');
> create materialized view mat1 stored as orc TBLPROPERTIES 
> ('transactional'='true') as
> select a, b, c from t1 where a > 10;
> {code}
> Currently materialized view *mat1* can not be rebuilt incrementally because 
> it has an insert only source table (t1). Such tables does not have 
> ROW_ID.write_id which is required to identify newly inserted records since 
> the last rebuild.
> HIVE-25406 adds the ability to query write_id.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25546) Enable incremental rebuild of Materialized views with insert only source tables

2021-09-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25546?focusedWorklogId=658268=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-658268
 ]

ASF GitHub Bot logged work on HIVE-25546:
-

Author: ASF GitHub Bot
Created on: 30/Sep/21 11:30
Start Date: 30/Sep/21 11:30
Worklog Time Spent: 10m 
  Work Description: zabetak commented on a change in pull request #2663:
URL: https://github.com/apache/hive/pull/2663#discussion_r719314095



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/views/HiveInsertOnlyScanWriteIdRule.java
##
@@ -0,0 +1,56 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.optimizer.calcite.rules.views;
+
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.rel.RelNode;
+import org.apache.hadoop.hive.ql.io.AcidUtils;
+import org.apache.hadoop.hive.ql.metadata.Table;
+import org.apache.hadoop.hive.ql.optimizer.calcite.RelOptHiveTable;
+import org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveTableScan;
+
+import static 
org.apache.hadoop.hive.conf.Constants.INSERT_ONLY_FETCH_BUCKET_ID;
+
+/**
+ * This rule turns on populating writeId of insert only table scans.
+ */
+public class HiveInsertOnlyScanWriteIdRule extends RelOptRule {
+
+  public static final HiveInsertOnlyScanWriteIdRule INSTANCE = new 
HiveInsertOnlyScanWriteIdRule();
+
+  private HiveInsertOnlyScanWriteIdRule() {
+super(operand(HiveTableScan.class, none()));
+  }
+
+  @Override
+  public boolean matches(RelOptRuleCall call) {
+HiveTableScan tableScan = call.rel(0);
+Table tableMD = ((RelOptHiveTable) tableScan.getTable()).getHiveTableMD();
+return !tableMD.isMaterializedView() && 
AcidUtils.isInsertOnlyTable(tableMD);
+  }
+
+  @Override
+  public void onMatch(RelOptRuleCall call) {
+HiveTableScan tableScan = call.rel(0);
+RelNode newTableScan = call.builder()
+
.push(tableScan.setTableScanTrait(HiveTableScan.HiveTableScanTrait.FetchInsertOnlyBucketIds))

Review comment:
   I assume that at some point (later in the compilation phase),  you check 
the `HiveTableScan` operator to obtain/use the `FetchInsertOnlyBucketIds` 
trait. Unless I am missing something you have all the information already 
inside the `HiveTableScan` (`!tableMD.isMaterializedView() && 
AcidUtils.isInsertOnlyTable(tableMD)`) to derive it on the fly when needed. I 
don't understand very well why you need a rule to set it explicitly. 
   
   Apologies if my reasoning is off but this is my happy break from escalations 
:)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 658268)
Time Spent: 2h 10m  (was: 2h)

> Enable incremental rebuild of Materialized views with insert only source 
> tables
> ---
>
> Key: HIVE-25546
> URL: https://issues.apache.org/jira/browse/HIVE-25546
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO, Materialized views
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> {code}
> create table t1(a int, b int, c int) stored as parquet TBLPROPERTIES 
> ('transactional'='true', 'transactional_properties'='insert_only');
> create materialized view mat1 stored as orc TBLPROPERTIES 
> ('transactional'='true') as
> select a, b, c from t1 where a > 10;
> {code}
> Currently materialized view *mat1* can not be rebuilt incrementally because 
> it has an insert only source table (t1). Such tables does not have 
> ROW_ID.write_id which is required to 

[jira] [Work logged] (HIVE-25546) Enable incremental rebuild of Materialized views with insert only source tables

2021-09-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25546?focusedWorklogId=658259=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-658259
 ]

ASF GitHub Bot logged work on HIVE-25546:
-

Author: ASF GitHub Bot
Created on: 30/Sep/21 10:55
Start Date: 30/Sep/21 10:55
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on a change in pull request #2663:
URL: https://github.com/apache/hive/pull/2663#discussion_r719291208



##
File path: 
ql/src/test/queries/clientpositive/materialized_view_create_rewrite_8.q
##
@@ -0,0 +1,97 @@
+-- Test Incremental rebuild of materialized view without aggregate when a 
source table is insert only.
+
+SET hive.vectorized.execution.enabled=false;
+set hive.support.concurrency=true;
+set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
+set hive.strict.checks.cartesian.product=false;
+set hive.materializedview.rewriting=true;
+
+create table cmv_basetable_n6 (a int, b varchar(256), c decimal(10,2), d int) 
stored as orc TBLPROPERTIES ('transactional'='true');
+
+insert into cmv_basetable_n6 values
+ (1, 'alfred', 10.30, 2),
+ (2, 'bob', 3.14, 3),
+ (2, 'bonnie', 172342.2, 3),
+ (3, 'calvin', 978.76, 3),
+ (3, 'charlie', 9.8, 1);
+
+create table cmv_basetable_2_n3 (a int, b varchar(256), c decimal(10,2), d 
int) stored as orc TBLPROPERTIES ('transactional'='true', 
'transactional_properties'='insert_only');
+
+insert into cmv_basetable_2_n3 values
+ (1, 'alfred', 10.30, 2),
+ (3, 'calvin', 978.76, 3);
+
+CREATE MATERIALIZED VIEW cmv_mat_view_n6
+  TBLPROPERTIES ('transactional'='true') AS
+  SELECT cmv_basetable_n6.a, cmv_basetable_2_n3.c
+  FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = 
cmv_basetable_2_n3.a)
+  WHERE cmv_basetable_2_n3.c > 10.0;
+
+insert into cmv_basetable_2_n3 values
+ (3, 'charlie', 15.8, 1);
+
+-- CANNOT USE THE VIEW, IT IS OUTDATED
+EXPLAIN CBO
+SELECT cmv_basetable_n6.a
+FROM cmv_basetable_n6 join cmv_basetable_2_n3 ON (cmv_basetable_n6.a = 
cmv_basetable_2_n3.a)
+WHERE cmv_basetable_2_n3.c > 10.10;
+
+EXPLAIN
+SELECT cmv_basetable_n6.a
+FROM cmv_basetable_n6 join cmv_basetable_2_n3 ON (cmv_basetable_n6.a = 
cmv_basetable_2_n3.a)
+WHERE cmv_basetable_2_n3.c > 10.10;

Review comment:
   Yes. Originally there were no `EXPLAIN CBO` in other tests. Since I 
added it the `EXPLAIN` is not necessary.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 658259)
Time Spent: 1h 50m  (was: 1h 40m)

> Enable incremental rebuild of Materialized views with insert only source 
> tables
> ---
>
> Key: HIVE-25546
> URL: https://issues.apache.org/jira/browse/HIVE-25546
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO, Materialized views
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> {code}
> create table t1(a int, b int, c int) stored as parquet TBLPROPERTIES 
> ('transactional'='true', 'transactional_properties'='insert_only');
> create materialized view mat1 stored as orc TBLPROPERTIES 
> ('transactional'='true') as
> select a, b, c from t1 where a > 10;
> {code}
> Currently materialized view *mat1* can not be rebuilt incrementally because 
> it has an insert only source table (t1). Such tables does not have 
> ROW_ID.write_id which is required to identify newly inserted records since 
> the last rebuild.
> HIVE-25406 adds the ability to query write_id.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25546) Enable incremental rebuild of Materialized views with insert only source tables

2021-09-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25546?focusedWorklogId=658260=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-658260
 ]

ASF GitHub Bot logged work on HIVE-25546:
-

Author: ASF GitHub Bot
Created on: 30/Sep/21 10:55
Start Date: 30/Sep/21 10:55
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on a change in pull request #2663:
URL: https://github.com/apache/hive/pull/2663#discussion_r719291208



##
File path: 
ql/src/test/queries/clientpositive/materialized_view_create_rewrite_8.q
##
@@ -0,0 +1,97 @@
+-- Test Incremental rebuild of materialized view without aggregate when a 
source table is insert only.
+
+SET hive.vectorized.execution.enabled=false;
+set hive.support.concurrency=true;
+set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
+set hive.strict.checks.cartesian.product=false;
+set hive.materializedview.rewriting=true;
+
+create table cmv_basetable_n6 (a int, b varchar(256), c decimal(10,2), d int) 
stored as orc TBLPROPERTIES ('transactional'='true');
+
+insert into cmv_basetable_n6 values
+ (1, 'alfred', 10.30, 2),
+ (2, 'bob', 3.14, 3),
+ (2, 'bonnie', 172342.2, 3),
+ (3, 'calvin', 978.76, 3),
+ (3, 'charlie', 9.8, 1);
+
+create table cmv_basetable_2_n3 (a int, b varchar(256), c decimal(10,2), d 
int) stored as orc TBLPROPERTIES ('transactional'='true', 
'transactional_properties'='insert_only');
+
+insert into cmv_basetable_2_n3 values
+ (1, 'alfred', 10.30, 2),
+ (3, 'calvin', 978.76, 3);
+
+CREATE MATERIALIZED VIEW cmv_mat_view_n6
+  TBLPROPERTIES ('transactional'='true') AS
+  SELECT cmv_basetable_n6.a, cmv_basetable_2_n3.c
+  FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = 
cmv_basetable_2_n3.a)
+  WHERE cmv_basetable_2_n3.c > 10.0;
+
+insert into cmv_basetable_2_n3 values
+ (3, 'charlie', 15.8, 1);
+
+-- CANNOT USE THE VIEW, IT IS OUTDATED
+EXPLAIN CBO
+SELECT cmv_basetable_n6.a
+FROM cmv_basetable_n6 join cmv_basetable_2_n3 ON (cmv_basetable_n6.a = 
cmv_basetable_2_n3.a)
+WHERE cmv_basetable_2_n3.c > 10.10;
+
+EXPLAIN
+SELECT cmv_basetable_n6.a
+FROM cmv_basetable_n6 join cmv_basetable_2_n3 ON (cmv_basetable_n6.a = 
cmv_basetable_2_n3.a)
+WHERE cmv_basetable_2_n3.c > 10.10;

Review comment:
   Yes. Originally there were no `EXPLAIN CBO` in other tests. Since I 
added it here the `EXPLAIN` is not necessary.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 658260)
Time Spent: 2h  (was: 1h 50m)

> Enable incremental rebuild of Materialized views with insert only source 
> tables
> ---
>
> Key: HIVE-25546
> URL: https://issues.apache.org/jira/browse/HIVE-25546
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO, Materialized views
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> {code}
> create table t1(a int, b int, c int) stored as parquet TBLPROPERTIES 
> ('transactional'='true', 'transactional_properties'='insert_only');
> create materialized view mat1 stored as orc TBLPROPERTIES 
> ('transactional'='true') as
> select a, b, c from t1 where a > 10;
> {code}
> Currently materialized view *mat1* can not be rebuilt incrementally because 
> it has an insert only source table (t1). Such tables does not have 
> ROW_ID.write_id which is required to identify newly inserted records since 
> the last rebuild.
> HIVE-25406 adds the ability to query write_id.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25546) Enable incremental rebuild of Materialized views with insert only source tables

2021-09-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25546?focusedWorklogId=658258=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-658258
 ]

ASF GitHub Bot logged work on HIVE-25546:
-

Author: ASF GitHub Bot
Created on: 30/Sep/21 10:54
Start Date: 30/Sep/21 10:54
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on a change in pull request #2663:
URL: https://github.com/apache/hive/pull/2663#discussion_r719290528



##
File path: 
ql/src/test/queries/clientpositive/materialized_view_create_rewrite_8.q
##
@@ -0,0 +1,97 @@
+-- Test Incremental rebuild of materialized view without aggregate when a 
source table is insert only.
+
+SET hive.vectorized.execution.enabled=false;
+set hive.support.concurrency=true;
+set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
+set hive.strict.checks.cartesian.product=false;

Review comment:
   Removed these.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 658258)
Time Spent: 1h 40m  (was: 1.5h)

> Enable incremental rebuild of Materialized views with insert only source 
> tables
> ---
>
> Key: HIVE-25546
> URL: https://issues.apache.org/jira/browse/HIVE-25546
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO, Materialized views
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> {code}
> create table t1(a int, b int, c int) stored as parquet TBLPROPERTIES 
> ('transactional'='true', 'transactional_properties'='insert_only');
> create materialized view mat1 stored as orc TBLPROPERTIES 
> ('transactional'='true') as
> select a, b, c from t1 where a > 10;
> {code}
> Currently materialized view *mat1* can not be rebuilt incrementally because 
> it has an insert only source table (t1). Such tables does not have 
> ROW_ID.write_id which is required to identify newly inserted records since 
> the last rebuild.
> HIVE-25406 adds the ability to query write_id.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25546) Enable incremental rebuild of Materialized views with insert only source tables

2021-09-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25546?focusedWorklogId=658257=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-658257
 ]

ASF GitHub Bot logged work on HIVE-25546:
-

Author: ASF GitHub Bot
Created on: 30/Sep/21 10:54
Start Date: 30/Sep/21 10:54
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on a change in pull request #2663:
URL: https://github.com/apache/hive/pull/2663#discussion_r719290288



##
File path: 
ql/src/test/queries/clientpositive/materialized_view_partitioned_create_rewrite_agg_3.q
##
@@ -0,0 +1,46 @@
+-- Test partition bases MV rebuild when source table is insert only

Review comment:
   fixed




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 658257)
Time Spent: 1.5h  (was: 1h 20m)

> Enable incremental rebuild of Materialized views with insert only source 
> tables
> ---
>
> Key: HIVE-25546
> URL: https://issues.apache.org/jira/browse/HIVE-25546
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO, Materialized views
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> {code}
> create table t1(a int, b int, c int) stored as parquet TBLPROPERTIES 
> ('transactional'='true', 'transactional_properties'='insert_only');
> create materialized view mat1 stored as orc TBLPROPERTIES 
> ('transactional'='true') as
> select a, b, c from t1 where a > 10;
> {code}
> Currently materialized view *mat1* can not be rebuilt incrementally because 
> it has an insert only source table (t1). Such tables does not have 
> ROW_ID.write_id which is required to identify newly inserted records since 
> the last rebuild.
> HIVE-25406 adds the ability to query write_id.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25578) Tests are failing because operators can't be closed

2021-09-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25578?focusedWorklogId=658255=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-658255
 ]

ASF GitHub Bot logged work on HIVE-25578:
-

Author: ASF GitHub Bot
Created on: 30/Sep/21 10:50
Start Date: 30/Sep/21 10:50
Worklog Time Spent: 10m 
  Work Description: kgyrtkirk opened a new pull request #2688:
URL: https://github.com/apache/hive/pull/2688


   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 658255)
Remaining Estimate: 0h
Time Spent: 10m

> Tests are failing because operators can't be closed
> ---
>
> Key: HIVE-25578
> URL: https://issues.apache.org/jira/browse/HIVE-25578
> Project: Hive
>  Issue Type: Bug
>Reporter: Karen Coppage
>Priority: Critical
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The following qtests are failing consistently 
> ([example|http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-2667/6/tests/])
>  on the master branch:
>  * TestMiniLlapCliDriver 
> ([http://ci.hive.apache.org/job/hive-flaky-check/420/])
>  ** newline
>  ** groupby_bigdata
>  ** input20
>  ** input33
>  ** rcfile_bigdata
>  ** remote_script
>  * TestContribCliDriver 
> ([http://ci.hive.apache.org/job/hive-flaky-check/421/])
>  ** serde_typedbytes5
> The failure reason for all seems to be that operators can't be closed. Not 
> 100% sure that TestContribCliDriver#serde_typedbytes5 failure is related to 
> the others – the issue seems to be the same, the error message is a bit 
> different.
> I'm about to disable these as they are blocking all work.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25578) Tests are failing because operators can't be closed

2021-09-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25578:
--
Labels: pull-request-available  (was: )

> Tests are failing because operators can't be closed
> ---
>
> Key: HIVE-25578
> URL: https://issues.apache.org/jira/browse/HIVE-25578
> Project: Hive
>  Issue Type: Bug
>Reporter: Karen Coppage
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The following qtests are failing consistently 
> ([example|http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-2667/6/tests/])
>  on the master branch:
>  * TestMiniLlapCliDriver 
> ([http://ci.hive.apache.org/job/hive-flaky-check/420/])
>  ** newline
>  ** groupby_bigdata
>  ** input20
>  ** input33
>  ** rcfile_bigdata
>  ** remote_script
>  * TestContribCliDriver 
> ([http://ci.hive.apache.org/job/hive-flaky-check/421/])
>  ** serde_typedbytes5
> The failure reason for all seems to be that operators can't be closed. Not 
> 100% sure that TestContribCliDriver#serde_typedbytes5 failure is related to 
> the others – the issue seems to be the same, the error message is a bit 
> different.
> I'm about to disable these as they are blocking all work.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25546) Enable incremental rebuild of Materialized views with insert only source tables

2021-09-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25546?focusedWorklogId=658253=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-658253
 ]

ASF GitHub Bot logged work on HIVE-25546:
-

Author: ASF GitHub Bot
Created on: 30/Sep/21 10:44
Start Date: 30/Sep/21 10:44
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on a change in pull request #2663:
URL: https://github.com/apache/hive/pull/2663#discussion_r719283577



##
File path: ql/src/test/queries/clientpositive/materialized_view_parquet.q
##
@@ -189,8 +189,10 @@ alter materialized view mv1_parquet_n2 rebuild;
 alter materialized view mv1_parquet_n2 rebuild;
 
 explain cbo
-select name from emps_parquet_n3 group by name;
+select name, sum(empid) from emps_parquet_n3 group by name;

Review comment:
   The original test was actually hiding that projecting `ROW__ID.writeId` 
in case of insert-only tables always return `NULL`. However incremental MV 
rebuild was used and because of the `NULL` writeIds it did not inserted any 
records so the view data remained in an outdated state but the view was marked 
up to date by the system.
   By adding the aggregation `sum(empid)` and the extra query after dropping 
the view the difference in the results showed up.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 658253)
Time Spent: 1h 20m  (was: 1h 10m)

> Enable incremental rebuild of Materialized views with insert only source 
> tables
> ---
>
> Key: HIVE-25546
> URL: https://issues.apache.org/jira/browse/HIVE-25546
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO, Materialized views
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> {code}
> create table t1(a int, b int, c int) stored as parquet TBLPROPERTIES 
> ('transactional'='true', 'transactional_properties'='insert_only');
> create materialized view mat1 stored as orc TBLPROPERTIES 
> ('transactional'='true') as
> select a, b, c from t1 where a > 10;
> {code}
> Currently materialized view *mat1* can not be rebuilt incrementally because 
> it has an insert only source table (t1). Such tables does not have 
> ROW_ID.write_id which is required to identify newly inserted records since 
> the last rebuild.
> HIVE-25406 adds the ability to query write_id.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-25578) Tests are failing because operators can't be closed

2021-09-30 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17422692#comment-17422692
 ] 

Zoltan Haindrich commented on HIVE-25578:
-

with newer programs in the container - we are now defaulting to python3
the error is the most common a python2/python3:
{code}
dev@master2:~/hive$ python ./data/scripts/newline.py
  File "./data/scripts/newline.py", line 22
print "1\\n2"
^
SyntaxError: Missing parentheses in call to 'print'. Did you mean 
print("1\\n2")?
{code}

it's a shame that we are communicating this issue as "[Error 20003]: An error 
occurred when trying to close the Operator running your custom script."

> Tests are failing because operators can't be closed
> ---
>
> Key: HIVE-25578
> URL: https://issues.apache.org/jira/browse/HIVE-25578
> Project: Hive
>  Issue Type: Bug
>Reporter: Karen Coppage
>Priority: Critical
>
> The following qtests are failing consistently 
> ([example|http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-2667/6/tests/])
>  on the master branch:
>  * TestMiniLlapCliDriver 
> ([http://ci.hive.apache.org/job/hive-flaky-check/420/])
>  ** newline
>  ** groupby_bigdata
>  ** input20
>  ** input33
>  ** rcfile_bigdata
>  ** remote_script
>  * TestContribCliDriver 
> ([http://ci.hive.apache.org/job/hive-flaky-check/421/])
>  ** serde_typedbytes5
> The failure reason for all seems to be that operators can't be closed. Not 
> 100% sure that TestContribCliDriver#serde_typedbytes5 failure is related to 
> the others – the issue seems to be the same, the error message is a bit 
> different.
> I'm about to disable these as they are blocking all work.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-25578) Tests are failing because operators can't be closed

2021-09-30 Thread Zoltan Haindrich (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17422690#comment-17422690
 ] 

Zoltan Haindrich commented on HIVE-25578:
-

hmm so python is the common denominator? what made that broken?
I was just able to reproduce it locally - I'll take a look at that script/etc

> Tests are failing because operators can't be closed
> ---
>
> Key: HIVE-25578
> URL: https://issues.apache.org/jira/browse/HIVE-25578
> Project: Hive
>  Issue Type: Bug
>Reporter: Karen Coppage
>Priority: Critical
>
> The following qtests are failing consistently 
> ([example|http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-2667/6/tests/])
>  on the master branch:
>  * TestMiniLlapCliDriver 
> ([http://ci.hive.apache.org/job/hive-flaky-check/420/])
>  ** newline
>  ** groupby_bigdata
>  ** input20
>  ** input33
>  ** rcfile_bigdata
>  ** remote_script
>  * TestContribCliDriver 
> ([http://ci.hive.apache.org/job/hive-flaky-check/421/])
>  ** serde_typedbytes5
> The failure reason for all seems to be that operators can't be closed. Not 
> 100% sure that TestContribCliDriver#serde_typedbytes5 failure is related to 
> the others – the issue seems to be the same, the error message is a bit 
> different.
> I'm about to disable these as they are blocking all work.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25546) Enable incremental rebuild of Materialized views with insert only source tables

2021-09-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25546?focusedWorklogId=658246=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-658246
 ]

ASF GitHub Bot logged work on HIVE-25546:
-

Author: ASF GitHub Bot
Created on: 30/Sep/21 10:38
Start Date: 30/Sep/21 10:38
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on a change in pull request #2663:
URL: https://github.com/apache/hive/pull/2663#discussion_r719279447



##
File path: 
ql/src/test/queries/clientpositive/materialized_view_create_rewrite_8.q
##
@@ -0,0 +1,97 @@
+-- Test Incremental rebuild of materialized view without aggregate when a 
source table is insert only.
+
+SET hive.vectorized.execution.enabled=false;
+set hive.support.concurrency=true;
+set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
+set hive.strict.checks.cartesian.product=false;
+set hive.materializedview.rewriting=true;
+
+create table cmv_basetable_n6 (a int, b varchar(256), c decimal(10,2), d int) 
stored as orc TBLPROPERTIES ('transactional'='true');
+
+insert into cmv_basetable_n6 values
+ (1, 'alfred', 10.30, 2),
+ (2, 'bob', 3.14, 3),
+ (2, 'bonnie', 172342.2, 3),
+ (3, 'calvin', 978.76, 3),
+ (3, 'charlie', 9.8, 1);
+
+create table cmv_basetable_2_n3 (a int, b varchar(256), c decimal(10,2), d 
int) stored as orc TBLPROPERTIES ('transactional'='true', 
'transactional_properties'='insert_only');
+
+insert into cmv_basetable_2_n3 values
+ (1, 'alfred', 10.30, 2),
+ (3, 'calvin', 978.76, 3);
+
+CREATE MATERIALIZED VIEW cmv_mat_view_n6
+  TBLPROPERTIES ('transactional'='true') AS
+  SELECT cmv_basetable_n6.a, cmv_basetable_2_n3.c
+  FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = 
cmv_basetable_2_n3.a)
+  WHERE cmv_basetable_2_n3.c > 10.0;
+
+insert into cmv_basetable_2_n3 values
+ (3, 'charlie', 15.8, 1);
+
+-- CANNOT USE THE VIEW, IT IS OUTDATED
+EXPLAIN CBO
+SELECT cmv_basetable_n6.a
+FROM cmv_basetable_n6 join cmv_basetable_2_n3 ON (cmv_basetable_n6.a = 
cmv_basetable_2_n3.a)
+WHERE cmv_basetable_2_n3.c > 10.10;
+
+EXPLAIN
+SELECT cmv_basetable_n6.a
+FROM cmv_basetable_n6 join cmv_basetable_2_n3 ON (cmv_basetable_n6.a = 
cmv_basetable_2_n3.a)
+WHERE cmv_basetable_2_n3.c > 10.10;
+
+SELECT cmv_basetable_n6.a
+FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = 
cmv_basetable_2_n3.a)
+WHERE cmv_basetable_2_n3.c > 10.10;
+
+-- REBUILD
+EXPLAIN
+ALTER MATERIALIZED VIEW cmv_mat_view_n6 REBUILD;
+
+ALTER MATERIALIZED VIEW cmv_mat_view_n6 REBUILD;
+
+-- NOW IT CAN BE USED AGAIN
+EXPLAIN CBO
+SELECT cmv_basetable_n6.a
+FROM cmv_basetable_n6 join cmv_basetable_2_n3 ON (cmv_basetable_n6.a = 
cmv_basetable_2_n3.a)
+WHERE cmv_basetable_2_n3.c > 10.10;
+
+EXPLAIN
+SELECT cmv_basetable_n6.a
+FROM cmv_basetable_n6 join cmv_basetable_2_n3 ON (cmv_basetable_n6.a = 
cmv_basetable_2_n3.a)
+WHERE cmv_basetable_2_n3.c > 10.10;
+
+SELECT cmv_basetable_n6.a
+FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = 
cmv_basetable_2_n3.a)
+WHERE cmv_basetable_2_n3.c > 10.10;
+
+-- NOW AN UPDATE
+UPDATE cmv_basetable_n6 SET a=2 WHERE a=1;
+
+-- INCREMENTAL REBUILD CAN BE TRIGGERED
+EXPLAIN
+ALTER MATERIALIZED VIEW cmv_mat_view_n6 REBUILD;
+
+ALTER MATERIALIZED VIEW cmv_mat_view_n6 REBUILD;
+
+-- MV CAN BE USED
+EXPLAIN CBO
+SELECT cmv_basetable_n6.a
+FROM cmv_basetable_n6 join cmv_basetable_2_n3 ON (cmv_basetable_n6.a = 
cmv_basetable_2_n3.a)
+WHERE cmv_basetable_2_n3.c > 10.10;
+
+EXPLAIN
+SELECT cmv_basetable_n6.a
+FROM cmv_basetable_n6 join cmv_basetable_2_n3 ON (cmv_basetable_n6.a = 
cmv_basetable_2_n3.a)
+WHERE cmv_basetable_2_n3.c > 10.10;
+
+SELECT cmv_basetable_n6.a
+FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = 
cmv_basetable_2_n3.a)
+WHERE cmv_basetable_2_n3.c > 10.10;
+
+drop materialized view cmv_mat_view_n6;
+
+SELECT cmv_basetable_n6.a
+FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = 
cmv_basetable_2_n3.a)
+WHERE cmv_basetable_2_n3.c > 10.10;

Review comment:
   The intention was to check if we got the same results in case the view 
is used and not used.
   Personally this is really useful for me. :slightly_smiling_face:




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 658246)
Time Spent: 1h 10m  (was: 1h)

> Enable incremental rebuild of Materialized views with insert only source 
> tables
> ---
>
> Key: HIVE-25546
> URL: https://issues.apache.org/jira/browse/HIVE-25546
>  

[jira] [Work logged] (HIVE-25546) Enable incremental rebuild of Materialized views with insert only source tables

2021-09-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25546?focusedWorklogId=658240=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-658240
 ]

ASF GitHub Bot logged work on HIVE-25546:
-

Author: ASF GitHub Bot
Created on: 30/Sep/21 10:33
Start Date: 30/Sep/21 10:33
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on a change in pull request #2663:
URL: https://github.com/apache/hive/pull/2663#discussion_r719275880



##
File path: 
ql/src/test/queries/clientpositive/materialized_view_create_rewrite_8.q
##
@@ -0,0 +1,97 @@
+-- Test Incremental rebuild of materialized view without aggregate when a 
source table is insert only.
+
+SET hive.vectorized.execution.enabled=false;
+set hive.support.concurrency=true;
+set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
+set hive.strict.checks.cartesian.product=false;
+set hive.materializedview.rewriting=true;
+
+create table cmv_basetable_n6 (a int, b varchar(256), c decimal(10,2), d int) 
stored as orc TBLPROPERTIES ('transactional'='true');
+
+insert into cmv_basetable_n6 values
+ (1, 'alfred', 10.30, 2),
+ (2, 'bob', 3.14, 3),
+ (2, 'bonnie', 172342.2, 3),
+ (3, 'calvin', 978.76, 3),
+ (3, 'charlie', 9.8, 1);
+
+create table cmv_basetable_2_n3 (a int, b varchar(256), c decimal(10,2), d 
int) stored as orc TBLPROPERTIES ('transactional'='true', 
'transactional_properties'='insert_only');
+
+insert into cmv_basetable_2_n3 values
+ (1, 'alfred', 10.30, 2),
+ (3, 'calvin', 978.76, 3);
+
+CREATE MATERIALIZED VIEW cmv_mat_view_n6
+  TBLPROPERTIES ('transactional'='true') AS
+  SELECT cmv_basetable_n6.a, cmv_basetable_2_n3.c
+  FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = 
cmv_basetable_2_n3.a)
+  WHERE cmv_basetable_2_n3.c > 10.0;
+
+insert into cmv_basetable_2_n3 values
+ (3, 'charlie', 15.8, 1);
+
+-- CANNOT USE THE VIEW, IT IS OUTDATED
+EXPLAIN CBO
+SELECT cmv_basetable_n6.a
+FROM cmv_basetable_n6 join cmv_basetable_2_n3 ON (cmv_basetable_n6.a = 
cmv_basetable_2_n3.a)
+WHERE cmv_basetable_2_n3.c > 10.10;
+
+EXPLAIN
+SELECT cmv_basetable_n6.a
+FROM cmv_basetable_n6 join cmv_basetable_2_n3 ON (cmv_basetable_n6.a = 
cmv_basetable_2_n3.a)
+WHERE cmv_basetable_2_n3.c > 10.10;
+
+SELECT cmv_basetable_n6.a
+FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = 
cmv_basetable_2_n3.a)
+WHERE cmv_basetable_2_n3.c > 10.10;
+
+-- REBUILD
+EXPLAIN
+ALTER MATERIALIZED VIEW cmv_mat_view_n6 REBUILD;

Review comment:
   Incremental MV rebuild plan is created in two teps:
   1. CBO rewrites the query part of the plan
   2. The generated AST from the CBO plan is transformed from a union based 
`insert-overwrite` to an `insert` plan




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 658240)
Time Spent: 1h  (was: 50m)

> Enable incremental rebuild of Materialized views with insert only source 
> tables
> ---
>
> Key: HIVE-25546
> URL: https://issues.apache.org/jira/browse/HIVE-25546
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO, Materialized views
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> {code}
> create table t1(a int, b int, c int) stored as parquet TBLPROPERTIES 
> ('transactional'='true', 'transactional_properties'='insert_only');
> create materialized view mat1 stored as orc TBLPROPERTIES 
> ('transactional'='true') as
> select a, b, c from t1 where a > 10;
> {code}
> Currently materialized view *mat1* can not be rebuilt incrementally because 
> it has an insert only source table (t1). Such tables does not have 
> ROW_ID.write_id which is required to identify newly inserted records since 
> the last rebuild.
> HIVE-25406 adds the ability to query write_id.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25546) Enable incremental rebuild of Materialized views with insert only source tables

2021-09-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25546?focusedWorklogId=658232=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-658232
 ]

ASF GitHub Bot logged work on HIVE-25546:
-

Author: ASF GitHub Bot
Created on: 30/Sep/21 10:23
Start Date: 30/Sep/21 10:23
Worklog Time Spent: 10m 
  Work Description: kasakrisz commented on a change in pull request #2663:
URL: https://github.com/apache/hive/pull/2663#discussion_r719269143



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/views/HiveInsertOnlyScanWriteIdRule.java
##
@@ -0,0 +1,56 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.optimizer.calcite.rules.views;
+
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.rel.RelNode;
+import org.apache.hadoop.hive.ql.io.AcidUtils;
+import org.apache.hadoop.hive.ql.metadata.Table;
+import org.apache.hadoop.hive.ql.optimizer.calcite.RelOptHiveTable;
+import org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveTableScan;
+
+import static 
org.apache.hadoop.hive.conf.Constants.INSERT_ONLY_FETCH_BUCKET_ID;
+
+/**
+ * This rule turns on populating writeId of insert only table scans.
+ */
+public class HiveInsertOnlyScanWriteIdRule extends RelOptRule {
+
+  public static final HiveInsertOnlyScanWriteIdRule INSTANCE = new 
HiveInsertOnlyScanWriteIdRule();
+
+  private HiveInsertOnlyScanWriteIdRule() {
+super(operand(HiveTableScan.class, none()));
+  }
+
+  @Override
+  public boolean matches(RelOptRuleCall call) {
+HiveTableScan tableScan = call.rel(0);
+Table tableMD = ((RelOptHiveTable) tableScan.getTable()).getHiveTableMD();
+return !tableMD.isMaterializedView() && 
AcidUtils.isInsertOnlyTable(tableMD);
+  }
+
+  @Override
+  public void onMatch(RelOptRuleCall call) {
+HiveTableScan tableScan = call.rel(0);
+RelNode newTableScan = call.builder()
+
.push(tableScan.setTableScanTrait(HiveTableScan.HiveTableScanTrait.FetchInsertOnlyBucketIds))

Review comment:
   WriteId is populated into the `ROW_ID` struct which contains three 
fields: `writeId, bucketId, rowId`. In case of insert-only tables `rowId` is 
not available so `FetchInsertOnlyBucketIds` is an internal feature.
   The initial logical plan for MV rebuild is a full rebuild and it does not 
required writeIds.
   Later when all check passed which is need for incremental rebuild the plan 
is transformed to an incremental rebuild plan and that also required `writeId` 
filtering.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 658232)
Time Spent: 50m  (was: 40m)

> Enable incremental rebuild of Materialized views with insert only source 
> tables
> ---
>
> Key: HIVE-25546
> URL: https://issues.apache.org/jira/browse/HIVE-25546
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO, Materialized views
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> {code}
> create table t1(a int, b int, c int) stored as parquet TBLPROPERTIES 
> ('transactional'='true', 'transactional_properties'='insert_only');
> create materialized view mat1 stored as orc TBLPROPERTIES 
> ('transactional'='true') as
> select a, b, c from t1 where a > 10;
> {code}
> Currently materialized view *mat1* can not be rebuilt incrementally because 
> it has an insert only source table (t1). Such tables does not have 
> ROW_ID.write_id which is required to identify newly inserted records since 
> the last 

[jira] [Work logged] (HIVE-25546) Enable incremental rebuild of Materialized views with insert only source tables

2021-09-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25546?focusedWorklogId=658226=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-658226
 ]

ASF GitHub Bot logged work on HIVE-25546:
-

Author: ASF GitHub Bot
Created on: 30/Sep/21 09:52
Start Date: 30/Sep/21 09:52
Worklog Time Spent: 10m 
  Work Description: zabetak commented on a change in pull request #2663:
URL: https://github.com/apache/hive/pull/2663#discussion_r719238961



##
File path: 
ql/src/test/queries/clientpositive/materialized_view_create_rewrite_8.q
##
@@ -0,0 +1,97 @@
+-- Test Incremental rebuild of materialized view without aggregate when a 
source table is insert only.
+
+SET hive.vectorized.execution.enabled=false;
+set hive.support.concurrency=true;
+set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
+set hive.strict.checks.cartesian.product=false;

Review comment:
   Why do the properties below matter for the incremental rebuild?
   ```
   hive.vectorized.execution.enabled=false
   hive.strict.checks.cartesian.product=false
   ```

##
File path: 
ql/src/test/queries/clientpositive/materialized_view_create_rewrite_8.q
##
@@ -0,0 +1,97 @@
+-- Test Incremental rebuild of materialized view without aggregate when a 
source table is insert only.
+
+SET hive.vectorized.execution.enabled=false;
+set hive.support.concurrency=true;
+set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
+set hive.strict.checks.cartesian.product=false;
+set hive.materializedview.rewriting=true;
+
+create table cmv_basetable_n6 (a int, b varchar(256), c decimal(10,2), d int) 
stored as orc TBLPROPERTIES ('transactional'='true');
+
+insert into cmv_basetable_n6 values
+ (1, 'alfred', 10.30, 2),
+ (2, 'bob', 3.14, 3),
+ (2, 'bonnie', 172342.2, 3),
+ (3, 'calvin', 978.76, 3),
+ (3, 'charlie', 9.8, 1);
+
+create table cmv_basetable_2_n3 (a int, b varchar(256), c decimal(10,2), d 
int) stored as orc TBLPROPERTIES ('transactional'='true', 
'transactional_properties'='insert_only');
+
+insert into cmv_basetable_2_n3 values
+ (1, 'alfred', 10.30, 2),
+ (3, 'calvin', 978.76, 3);
+
+CREATE MATERIALIZED VIEW cmv_mat_view_n6
+  TBLPROPERTIES ('transactional'='true') AS
+  SELECT cmv_basetable_n6.a, cmv_basetable_2_n3.c
+  FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = 
cmv_basetable_2_n3.a)
+  WHERE cmv_basetable_2_n3.c > 10.0;
+
+insert into cmv_basetable_2_n3 values
+ (3, 'charlie', 15.8, 1);
+
+-- CANNOT USE THE VIEW, IT IS OUTDATED
+EXPLAIN CBO
+SELECT cmv_basetable_n6.a
+FROM cmv_basetable_n6 join cmv_basetable_2_n3 ON (cmv_basetable_n6.a = 
cmv_basetable_2_n3.a)
+WHERE cmv_basetable_2_n3.c > 10.10;
+
+EXPLAIN
+SELECT cmv_basetable_n6.a
+FROM cmv_basetable_n6 join cmv_basetable_2_n3 ON (cmv_basetable_n6.a = 
cmv_basetable_2_n3.a)
+WHERE cmv_basetable_2_n3.c > 10.10;
+
+SELECT cmv_basetable_n6.a
+FROM cmv_basetable_n6 JOIN cmv_basetable_2_n3 ON (cmv_basetable_n6.a = 
cmv_basetable_2_n3.a)
+WHERE cmv_basetable_2_n3.c > 10.10;
+
+-- REBUILD
+EXPLAIN
+ALTER MATERIALIZED VIEW cmv_mat_view_n6 REBUILD;

Review comment:
   Is it necessary?

##
File path: ql/src/test/queries/clientpositive/materialized_view_parquet.q
##
@@ -189,8 +189,10 @@ alter materialized view mv1_parquet_n2 rebuild;
 alter materialized view mv1_parquet_n2 rebuild;
 
 explain cbo
-select name from emps_parquet_n3 group by name;
+select name, sum(empid) from emps_parquet_n3 group by name;

Review comment:
   Why are we changing existing queries? If it is useless then it's fine, 
if we want to test additional stuff then we should add new queries instead of 
modifying existing ones.

##
File path: 
ql/src/test/queries/clientpositive/materialized_view_create_rewrite_8.q
##
@@ -0,0 +1,97 @@
+-- Test Incremental rebuild of materialized view without aggregate when a 
source table is insert only.
+
+SET hive.vectorized.execution.enabled=false;
+set hive.support.concurrency=true;
+set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
+set hive.strict.checks.cartesian.product=false;
+set hive.materializedview.rewriting=true;
+
+create table cmv_basetable_n6 (a int, b varchar(256), c decimal(10,2), d int) 
stored as orc TBLPROPERTIES ('transactional'='true');
+
+insert into cmv_basetable_n6 values
+ (1, 'alfred', 10.30, 2),
+ (2, 'bob', 3.14, 3),
+ (2, 'bonnie', 172342.2, 3),
+ (3, 'calvin', 978.76, 3),
+ (3, 'charlie', 9.8, 1);
+
+create table cmv_basetable_2_n3 (a int, b varchar(256), c decimal(10,2), d 
int) stored as orc TBLPROPERTIES ('transactional'='true', 
'transactional_properties'='insert_only');
+
+insert into cmv_basetable_2_n3 values
+ (1, 'alfred', 10.30, 2),
+ (3, 'calvin', 978.76, 3);
+
+CREATE MATERIALIZED VIEW cmv_mat_view_n6
+  TBLPROPERTIES ('transactional'='true') AS
+  SELECT cmv_basetable_n6.a, cmv_basetable_2_n3.c
+  FROM cmv_basetable_n6 

[jira] [Commented] (HIVE-25578) Tests are failing because operators can't be closed

2021-09-30 Thread Krisztian Kasa (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17422663#comment-17422663
 ] 

Krisztian Kasa commented on HIVE-25578:
---

Seems that the issue is related to this command
{code}
add file ../../data/scripts/newline.py;
{code} 
I got the exception
{code}
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: [Error 20003]: An 
error occurred when trying to close the Operator running your custom script.
at 
org.apache.hadoop.hive.ql.exec.ScriptOperator.close(ScriptOperator.java:557)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:708)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:708)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:708)
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:459)
... 15 more
{code}
 in a cluster if the file is not accessible in the specified path if I run 
query:
{code}
insert overwrite table tmp_tmp_n0 SELECT TRANSFORM(key, value) USING 'python 
newline.py' AS key, value FROM src limit 6;
{code}

> Tests are failing because operators can't be closed
> ---
>
> Key: HIVE-25578
> URL: https://issues.apache.org/jira/browse/HIVE-25578
> Project: Hive
>  Issue Type: Bug
>Reporter: Karen Coppage
>Priority: Critical
>
> The following qtests are failing consistently 
> ([example|http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-2667/6/tests/])
>  on the master branch:
>  * TestMiniLlapCliDriver 
> ([http://ci.hive.apache.org/job/hive-flaky-check/420/])
>  ** newline
>  ** groupby_bigdata
>  ** input20
>  ** input33
>  ** rcfile_bigdata
>  ** remote_script
>  * TestContribCliDriver 
> ([http://ci.hive.apache.org/job/hive-flaky-check/421/])
>  ** serde_typedbytes5
> The failure reason for all seems to be that operators can't be closed. Not 
> 100% sure that TestContribCliDriver#serde_typedbytes5 failure is related to 
> the others – the issue seems to be the same, the error message is a bit 
> different.
> I'm about to disable these as they are blocking all work.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25546) Enable incremental rebuild of Materialized views with insert only source tables

2021-09-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25546?focusedWorklogId=658222=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-658222
 ]

ASF GitHub Bot logged work on HIVE-25546:
-

Author: ASF GitHub Bot
Created on: 30/Sep/21 09:40
Start Date: 30/Sep/21 09:40
Worklog Time Spent: 10m 
  Work Description: zabetak commented on a change in pull request #2663:
URL: https://github.com/apache/hive/pull/2663#discussion_r719236721



##
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/views/HiveInsertOnlyScanWriteIdRule.java
##
@@ -0,0 +1,56 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hive.ql.optimizer.calcite.rules.views;
+
+import org.apache.calcite.plan.RelOptRule;
+import org.apache.calcite.plan.RelOptRuleCall;
+import org.apache.calcite.rel.RelNode;
+import org.apache.hadoop.hive.ql.io.AcidUtils;
+import org.apache.hadoop.hive.ql.metadata.Table;
+import org.apache.hadoop.hive.ql.optimizer.calcite.RelOptHiveTable;
+import org.apache.hadoop.hive.ql.optimizer.calcite.reloperators.HiveTableScan;
+
+import static 
org.apache.hadoop.hive.conf.Constants.INSERT_ONLY_FETCH_BUCKET_ID;
+
+/**
+ * This rule turns on populating writeId of insert only table scans.
+ */
+public class HiveInsertOnlyScanWriteIdRule extends RelOptRule {
+
+  public static final HiveInsertOnlyScanWriteIdRule INSTANCE = new 
HiveInsertOnlyScanWriteIdRule();
+
+  private HiveInsertOnlyScanWriteIdRule() {
+super(operand(HiveTableScan.class, none()));
+  }
+
+  @Override
+  public boolean matches(RelOptRuleCall call) {
+HiveTableScan tableScan = call.rel(0);
+Table tableMD = ((RelOptHiveTable) tableScan.getTable()).getHiveTableMD();
+return !tableMD.isMaterializedView() && 
AcidUtils.isInsertOnlyTable(tableMD);
+  }
+
+  @Override
+  public void onMatch(RelOptRuleCall call) {
+HiveTableScan tableScan = call.rel(0);
+RelNode newTableScan = call.builder()
+
.push(tableScan.setTableScanTrait(HiveTableScan.HiveTableScanTrait.FetchInsertOnlyBucketIds))

Review comment:
   I suppose we have the information that a table is insert only before 
constructing the initial logical plan. Why do we need a rule to set this and we 
don't do it when constructing the initial `HiveTableScan`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 658222)
Time Spent: 0.5h  (was: 20m)

> Enable incremental rebuild of Materialized views with insert only source 
> tables
> ---
>
> Key: HIVE-25546
> URL: https://issues.apache.org/jira/browse/HIVE-25546
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO, Materialized views
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> {code}
> create table t1(a int, b int, c int) stored as parquet TBLPROPERTIES 
> ('transactional'='true', 'transactional_properties'='insert_only');
> create materialized view mat1 stored as orc TBLPROPERTIES 
> ('transactional'='true') as
> select a, b, c from t1 where a > 10;
> {code}
> Currently materialized view *mat1* can not be rebuilt incrementally because 
> it has an insert only source table (t1). Such tables does not have 
> ROW_ID.write_id which is required to identify newly inserted records since 
> the last rebuild.
> HIVE-25406 adds the ability to query write_id.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25579) LOAD overwrite appends rather than overwriting

2021-09-30 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HIVE-25579:

Summary: LOAD overwrite appends rather than overwriting  (was: LOAD 
overwrite appends rather than ovewriting)

> LOAD overwrite appends rather than overwriting
> --
>
> Key: HIVE-25579
> URL: https://issues.apache.org/jira/browse/HIVE-25579
> Project: Hive
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Critical
>  Labels: pull-request-available
>
> The overwrite query gets converted to append.
> {noformat}
> 7b6-4b43-8452-52c44e8a2f71): LOAD DATA INPATH 
> 'hdfs://ayushsaxena-1.ayushsaxena.root.hwx.site:8020/warehouse/tablespace/external/hive/test_ext/00_0'
>  OVERWRITE  INTO TABLE test_spark
> 2021-09-30 03:30:23,033 INFO  org.apache.hadoop.hive.ql.lockmgr.DbTxnManager: 
> [db2ab9c9-bf54-4304-bc06-e3bef76f2e79 HiveServer2-Handler-Pool: Thread-2600]: 
> Opened txnid:15
> 2021-09-30 03:30:23,035 INFO  
> org.apache.hadoop.hive.ql.parse.LoadSemanticAnalyzer: 
> [db2ab9c9-bf54-4304-bc06-e3bef76f2e79 HiveServer2-Handler-Pool: Thread-2600]: 
> Starting caching scope for: 
> hive_20210930033023_bb1f6dc4-d7b6-4b43-8452-52c44e8a2f71
> 2021-09-30 03:30:23,042 INFO  
> org.apache.hadoop.hive.ql.parse.LoadSemanticAnalyzer: 
> [db2ab9c9-bf54-4304-bc06-e3bef76f2e79 HiveServer2-Handler-Pool: Thread-2600]: 
> Load data triggered a Tez job instead of usual file operation
> 2021-09-30 03:30:23,042 INFO  
> org.apache.hadoop.hive.ql.parse.LoadSemanticAnalyzer: 
> [db2ab9c9-bf54-4304-bc06-e3bef76f2e79 HiveServer2-Handler-Pool: Thread-2600]: 
> Going to reparse  'hdfs://ayushsaxena-1.ayushsaxena.root.hwx.site:8020/warehouse/tablespace/external/hive/test_ext/00_0'
>  OVERWRITE  INTO TABLE test_spark> as
>  test_spark__temp_table_for_load_data__>
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23633) Metastore some JDO query objects do not close properly

2021-09-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23633?focusedWorklogId=658213=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-658213
 ]

ASF GitHub Bot logged work on HIVE-23633:
-

Author: ASF GitHub Bot
Created on: 30/Sep/21 09:05
Start Date: 30/Sep/21 09:05
Worklog Time Spent: 10m 
  Work Description: dengzhhu653 commented on pull request #2344:
URL: https://github.com/apache/hive/pull/2344#issuecomment-931094725


   Hi @pvary, cloud you help merge this? thank you! :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 658213)
Time Spent: 14h 10m  (was: 14h)

> Metastore some JDO query objects do not close properly
> --
>
> Key: HIVE-23633
> URL: https://issues.apache.org/jira/browse/HIVE-23633
> Project: Hive
>  Issue Type: Bug
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: HIVE-23633.01.patch
>
>  Time Spent: 14h 10m
>  Remaining Estimate: 0h
>
> After patched [HIVE-10895|https://issues.apache.org/jira/browse/HIVE-10895],  
> The metastore still has seen a memory leak on db resources: many 
> StatementImpls left unclosed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-25549) Wrong results for window function with expression in PARTITION BY or ORDER BY clause

2021-09-30 Thread Karen Coppage (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karen Coppage resolved HIVE-25549.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Committed to master branch. Thanks for the feedback [~abstractdog] and for the 
review [~szita]!

> Wrong results for window function with expression in PARTITION BY or ORDER BY 
> clause
> 
>
> Key: HIVE-25549
> URL: https://issues.apache.org/jira/browse/HIVE-25549
> Project: Hive
>  Issue Type: Bug
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Sometimes the partition in a vectorized PTF needs some sort of 
> transformation. For these to work the partition expression may need some 
> transient variables initialized. 
> Example with row_number:
> {code:java}
> create table test_rownumber (a string, b string) stored as orc;
> insert into test_rownumber values
> ('1', 'a'),
> ('2', 'b'),
> ('3', 'c'),
> ('4', 'd'),
> ('5', 'e');
> CREATE VIEW `test_rownumber_vue` AS SELECT `test_rownumber`.`a` AS 
> `a`,CAST(`test_rownumber`.`a` as INT) AS `a_int`,
> `test_rownumber`.`b` as `b` from `default`.`test_rownumber`;
> set hive.vectorized.execution.enabled=true;
> select *, row_number() over(partition by a_int order by b) from 
> test_rownumber_vue;
> {code}
> Output is:
> {code:java}
> +---+---+---+--+
> | test_rownumber_vue.a  | test_rownumber_vue.a_int  | test_rownumber_vue.b  | 
> row_number_window_0  |
> +---+---+---+--+
> | 1 | 1 | a | 
> 1|
> | 2 | 2 | b | 
> 2|
> | 3 | 3 | c | 
> 3|
> | 4 | 4 | d | 
> 4|
> | 5 | 5 | e | 
> 5|
> +---+---+---+--+
> {code}
> But it should be this, because we restart the row numbering for each 
> partition:
> {code:java}
> +---+---+---+--+
> | test_rownumber_vue.a  | test_rownumber_vue.a_int  | test_rownumber_vue.b  | 
> row_number_window_0  |
> +---+---+---+--+
> | 1 | 1 | a | 
> 1|
> | 2 | 2 | b | 
> 1|
> | 3 | 3 | c | 
> 1|
> | 4 | 4 | d | 
> 1|
> | 5 | 5 | e | 
> 1|
> +---+---+---+--+
> {code}
> Explanation:
> CastStringToLong has to be executed on the partition column (a_int). Because 
> CastStringToLong.integerPrimitiveCategory is not initialized, all output of 
> CastStringToLong is null - so a_int is interpreted as containing null values 
> only and partitioning is ignored.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25549) Wrong results for window function with expression in PARTITION BY or ORDER BY clause

2021-09-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25549?focusedWorklogId=658191=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-658191
 ]

ASF GitHub Bot logged work on HIVE-25549:
-

Author: ASF GitHub Bot
Created on: 30/Sep/21 08:47
Start Date: 30/Sep/21 08:47
Worklog Time Spent: 10m 
  Work Description: klcopp merged pull request #2667:
URL: https://github.com/apache/hive/pull/2667


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 658191)
Time Spent: 1.5h  (was: 1h 20m)

> Wrong results for window function with expression in PARTITION BY or ORDER BY 
> clause
> 
>
> Key: HIVE-25549
> URL: https://issues.apache.org/jira/browse/HIVE-25549
> Project: Hive
>  Issue Type: Bug
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Sometimes the partition in a vectorized PTF needs some sort of 
> transformation. For these to work the partition expression may need some 
> transient variables initialized. 
> Example with row_number:
> {code:java}
> create table test_rownumber (a string, b string) stored as orc;
> insert into test_rownumber values
> ('1', 'a'),
> ('2', 'b'),
> ('3', 'c'),
> ('4', 'd'),
> ('5', 'e');
> CREATE VIEW `test_rownumber_vue` AS SELECT `test_rownumber`.`a` AS 
> `a`,CAST(`test_rownumber`.`a` as INT) AS `a_int`,
> `test_rownumber`.`b` as `b` from `default`.`test_rownumber`;
> set hive.vectorized.execution.enabled=true;
> select *, row_number() over(partition by a_int order by b) from 
> test_rownumber_vue;
> {code}
> Output is:
> {code:java}
> +---+---+---+--+
> | test_rownumber_vue.a  | test_rownumber_vue.a_int  | test_rownumber_vue.b  | 
> row_number_window_0  |
> +---+---+---+--+
> | 1 | 1 | a | 
> 1|
> | 2 | 2 | b | 
> 2|
> | 3 | 3 | c | 
> 3|
> | 4 | 4 | d | 
> 4|
> | 5 | 5 | e | 
> 5|
> +---+---+---+--+
> {code}
> But it should be this, because we restart the row numbering for each 
> partition:
> {code:java}
> +---+---+---+--+
> | test_rownumber_vue.a  | test_rownumber_vue.a_int  | test_rownumber_vue.b  | 
> row_number_window_0  |
> +---+---+---+--+
> | 1 | 1 | a | 
> 1|
> | 2 | 2 | b | 
> 1|
> | 3 | 3 | c | 
> 1|
> | 4 | 4 | d | 
> 1|
> | 5 | 5 | e | 
> 1|
> +---+---+---+--+
> {code}
> Explanation:
> CastStringToLong has to be executed on the partition column (a_int). Because 
> CastStringToLong.integerPrimitiveCategory is not initialized, all output of 
> CastStringToLong is null - so a_int is interpreted as containing null values 
> only and partitioning is ignored.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-25579) LOAD overwrite appends rather than ovewriting

2021-09-30 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HIVE-25579:

Labels: pull-request-available  (was: )

> LOAD overwrite appends rather than ovewriting
> -
>
> Key: HIVE-25579
> URL: https://issues.apache.org/jira/browse/HIVE-25579
> Project: Hive
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Critical
>  Labels: pull-request-available
>
> The overwrite query gets converted to append.
> {noformat}
> 7b6-4b43-8452-52c44e8a2f71): LOAD DATA INPATH 
> 'hdfs://ayushsaxena-1.ayushsaxena.root.hwx.site:8020/warehouse/tablespace/external/hive/test_ext/00_0'
>  OVERWRITE  INTO TABLE test_spark
> 2021-09-30 03:30:23,033 INFO  org.apache.hadoop.hive.ql.lockmgr.DbTxnManager: 
> [db2ab9c9-bf54-4304-bc06-e3bef76f2e79 HiveServer2-Handler-Pool: Thread-2600]: 
> Opened txnid:15
> 2021-09-30 03:30:23,035 INFO  
> org.apache.hadoop.hive.ql.parse.LoadSemanticAnalyzer: 
> [db2ab9c9-bf54-4304-bc06-e3bef76f2e79 HiveServer2-Handler-Pool: Thread-2600]: 
> Starting caching scope for: 
> hive_20210930033023_bb1f6dc4-d7b6-4b43-8452-52c44e8a2f71
> 2021-09-30 03:30:23,042 INFO  
> org.apache.hadoop.hive.ql.parse.LoadSemanticAnalyzer: 
> [db2ab9c9-bf54-4304-bc06-e3bef76f2e79 HiveServer2-Handler-Pool: Thread-2600]: 
> Load data triggered a Tez job instead of usual file operation
> 2021-09-30 03:30:23,042 INFO  
> org.apache.hadoop.hive.ql.parse.LoadSemanticAnalyzer: 
> [db2ab9c9-bf54-4304-bc06-e3bef76f2e79 HiveServer2-Handler-Pool: Thread-2600]: 
> Going to reparse  'hdfs://ayushsaxena-1.ayushsaxena.root.hwx.site:8020/warehouse/tablespace/external/hive/test_ext/00_0'
>  OVERWRITE  INTO TABLE test_spark> as
>  test_spark__temp_table_for_load_data__>
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-25546) Enable incremental rebuild of Materialized views with insert only source tables

2021-09-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25546?focusedWorklogId=658154=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-658154
 ]

ASF GitHub Bot logged work on HIVE-25546:
-

Author: ASF GitHub Bot
Created on: 30/Sep/21 08:05
Start Date: 30/Sep/21 08:05
Worklog Time Spent: 10m 
  Work Description: asolimando commented on a change in pull request #2663:
URL: https://github.com/apache/hive/pull/2663#discussion_r719160162



##
File path: 
ql/src/test/queries/clientpositive/materialized_view_partitioned_create_rewrite_agg_3.q
##
@@ -0,0 +1,46 @@
+-- Test partition bases MV rebuild when source table is insert only

Review comment:
   Typo: partition bases -> partition based




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 658154)
Time Spent: 20m  (was: 10m)

> Enable incremental rebuild of Materialized views with insert only source 
> tables
> ---
>
> Key: HIVE-25546
> URL: https://issues.apache.org/jira/browse/HIVE-25546
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO, Materialized views
>Reporter: Krisztian Kasa
>Assignee: Krisztian Kasa
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> {code}
> create table t1(a int, b int, c int) stored as parquet TBLPROPERTIES 
> ('transactional'='true', 'transactional_properties'='insert_only');
> create materialized view mat1 stored as orc TBLPROPERTIES 
> ('transactional'='true') as
> select a, b, c from t1 where a > 10;
> {code}
> Currently materialized view *mat1* can not be rebuilt incrementally because 
> it has an insert only source table (t1). Such tables does not have 
> ROW_ID.write_id which is required to identify newly inserted records since 
> the last rebuild.
> HIVE-25406 adds the ability to query write_id.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-25579) LOAD overwrite appends rather than ovewriting

2021-09-30 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena reassigned HIVE-25579:
---


> LOAD overwrite appends rather than ovewriting
> -
>
> Key: HIVE-25579
> URL: https://issues.apache.org/jira/browse/HIVE-25579
> Project: Hive
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Critical
>
> The overwrite query gets converted to append.
> {noformat}
> 7b6-4b43-8452-52c44e8a2f71): LOAD DATA INPATH 
> 'hdfs://ayushsaxena-1.ayushsaxena.root.hwx.site:8020/warehouse/tablespace/external/hive/test_ext/00_0'
>  OVERWRITE  INTO TABLE test_spark
> 2021-09-30 03:30:23,033 INFO  org.apache.hadoop.hive.ql.lockmgr.DbTxnManager: 
> [db2ab9c9-bf54-4304-bc06-e3bef76f2e79 HiveServer2-Handler-Pool: Thread-2600]: 
> Opened txnid:15
> 2021-09-30 03:30:23,035 INFO  
> org.apache.hadoop.hive.ql.parse.LoadSemanticAnalyzer: 
> [db2ab9c9-bf54-4304-bc06-e3bef76f2e79 HiveServer2-Handler-Pool: Thread-2600]: 
> Starting caching scope for: 
> hive_20210930033023_bb1f6dc4-d7b6-4b43-8452-52c44e8a2f71
> 2021-09-30 03:30:23,042 INFO  
> org.apache.hadoop.hive.ql.parse.LoadSemanticAnalyzer: 
> [db2ab9c9-bf54-4304-bc06-e3bef76f2e79 HiveServer2-Handler-Pool: Thread-2600]: 
> Load data triggered a Tez job instead of usual file operation
> 2021-09-30 03:30:23,042 INFO  
> org.apache.hadoop.hive.ql.parse.LoadSemanticAnalyzer: 
> [db2ab9c9-bf54-4304-bc06-e3bef76f2e79 HiveServer2-Handler-Pool: Thread-2600]: 
> Going to reparse  'hdfs://ayushsaxena-1.ayushsaxena.root.hwx.site:8020/warehouse/tablespace/external/hive/test_ext/00_0'
>  OVERWRITE  INTO TABLE test_spark> as
>  test_spark__temp_table_for_load_data__>
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)