[jira] [Comment Edited] (HIVE-23638) Fix FindBug issues in hive-common

2020-06-26 Thread Panagiotis Garefalakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17146131#comment-17146131
 ] 

Panagiotis Garefalakis edited comment on HIVE-23638 at 6/26/20, 8:51 AM:
-

Hey [~belugabehr] -- can you please take a look at this PR?
Want to use the FBWArning interface of common in the rest of the packages


was (Author: pgaref):
Hey [~belugabehr] -- can you please take a look at this PR?

> Fix FindBug issues in hive-common
> -
>
> Key: HIVE-23638
> URL: https://issues.apache.org/jira/browse/HIVE-23638
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
> Attachments: spotbugsXml.xml
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> mvn -Pspotbugs 
> -Dorg.slf4j.simpleLogger.log.org.apache.maven.plugin.surefire.SurefirePlugin=INFO
>  -pl :hive-common test-compile 
> com.github.spotbugs:spotbugs-maven-plugin:4.0.0:check



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-23638) Fix FindBug issues in hive-common

2020-06-26 Thread Panagiotis Garefalakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17146131#comment-17146131
 ] 

Panagiotis Garefalakis edited comment on HIVE-23638 at 6/26/20, 8:51 AM:
-

Hey [~belugabehr] -- can you please take a look at this PR?
Planning to use the FBWArning interface of common in the rest of the packages


was (Author: pgaref):
Hey [~belugabehr] -- can you please take a look at this PR?
Want to use the FBWArning interface of common in the rest of the packages

> Fix FindBug issues in hive-common
> -
>
> Key: HIVE-23638
> URL: https://issues.apache.org/jira/browse/HIVE-23638
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
> Attachments: spotbugsXml.xml
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> mvn -Pspotbugs 
> -Dorg.slf4j.simpleLogger.log.org.apache.maven.plugin.surefire.SurefirePlugin=INFO
>  -pl :hive-common test-compile 
> com.github.spotbugs:spotbugs-maven-plugin:4.0.0:check



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HIVE-23638) Fix FindBug issues in hive-common

2020-06-26 Thread Panagiotis Garefalakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17146131#comment-17146131
 ] 

Panagiotis Garefalakis edited comment on HIVE-23638 at 6/26/20, 8:51 AM:
-

Hey [~belugabehr] [~kgyrtkirk] -- can you please take a look at this PR?
Planning to use the FBWArning interface of common in the rest of the packages


was (Author: pgaref):
Hey [~belugabehr] -- can you please take a look at this PR?
Planning to use the FBWArning interface of common in the rest of the packages

> Fix FindBug issues in hive-common
> -
>
> Key: HIVE-23638
> URL: https://issues.apache.org/jira/browse/HIVE-23638
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
> Attachments: spotbugsXml.xml
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> mvn -Pspotbugs 
> -Dorg.slf4j.simpleLogger.log.org.apache.maven.plugin.surefire.SurefirePlugin=INFO
>  -pl :hive-common test-compile 
> com.github.spotbugs:spotbugs-maven-plugin:4.0.0:check



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23638) Fix FindBug issues in hive-common

2020-06-26 Thread Panagiotis Garefalakis (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17146131#comment-17146131
 ] 

Panagiotis Garefalakis commented on HIVE-23638:
---

Hey [~belugabehr] -- can you please take a look at this PR?

> Fix FindBug issues in hive-common
> -
>
> Key: HIVE-23638
> URL: https://issues.apache.org/jira/browse/HIVE-23638
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Panagiotis Garefalakis
>Assignee: Panagiotis Garefalakis
>Priority: Major
>  Labels: pull-request-available
> Attachments: spotbugsXml.xml
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> mvn -Pspotbugs 
> -Dorg.slf4j.simpleLogger.log.org.apache.maven.plugin.surefire.SurefirePlugin=INFO
>  -pl :hive-common test-compile 
> com.github.spotbugs:spotbugs-maven-plugin:4.0.0:check



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23759) Refactor CommitTxnRequest field order

2020-06-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23759?focusedWorklogId=451425=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-451425
 ]

ASF GitHub Bot logged work on HIVE-23759:
-

Author: ASF GitHub Bot
Created on: 26/Jun/20 07:10
Start Date: 26/Jun/20 07:10
Worklog Time Spent: 10m 
  Work Description: deniskuzZ merged pull request #1176:
URL: https://github.com/apache/hive/pull/1176


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 451425)
Time Spent: 20m  (was: 10m)

> Refactor CommitTxnRequest field order
> -
>
> Key: HIVE-23759
> URL: https://issues.apache.org/jira/browse/HIVE-23759
> Project: Hive
>  Issue Type: Task
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Refactor CommitTxnRequest field order (keyValue and replLastIdInfo). This 
> should be a safe change as neither of these fields have been part of any 
> official Hive release.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23741) Store CacheTags in the file cache level

2020-06-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23741?focusedWorklogId=451479=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-451479
 ]

ASF GitHub Bot logged work on HIVE-23741:
-

Author: ASF GitHub Bot
Created on: 26/Jun/20 10:05
Start Date: 26/Jun/20 10:05
Worklog Time Spent: 10m 
  Work Description: szlta commented on pull request #1159:
URL: https://github.com/apache/hive/pull/1159#issuecomment-650099134


   +1 pending tests



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 451479)
Time Spent: 40m  (was: 0.5h)

> Store CacheTags in the file cache level
> ---
>
> Key: HIVE-23741
> URL: https://issues.apache.org/jira/browse/HIVE-23741
> Project: Hive
>  Issue Type: Improvement
>Reporter: Antal Sinkovits
>Assignee: Antal Sinkovits
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> CacheTags are currently stored for every data buffer. The strings are 
> internalized, but the number of cache tag objects can be reduced by moving 
> them to the file cache level, and back referencing them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23671) MSCK repair should handle transactional tables in certain usecases

2020-06-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23671?focusedWorklogId=451528=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-451528
 ]

ASF GitHub Bot logged work on HIVE-23671:
-

Author: ASF GitHub Bot
Created on: 26/Jun/20 12:10
Start Date: 26/Jun/20 12:10
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1087:
URL: https://github.com/apache/hive/pull/1087#discussion_r446134506



##
File path: 
ql/src/test/org/apache/hadoop/hive/ql/metadata/TestHiveMetaStoreChecker.java
##
@@ -74,21 +76,21 @@
   @Before
   public void setUp() throws Exception {
 hive = Hive.get();
-
hive.getConf().setIntVar(HiveConf.ConfVars.METASTORE_FS_HANDLER_THREADS_COUNT, 
15);
-hive.getConf().set(HiveConf.ConfVars.HIVE_MSCK_PATH_VALIDATION.varname, 
"throw");
+
hive.getConf().set(MetastoreConf.ConfVars.FS_HANDLER_THREADS_COUNT.getVarname(),
 "15");

Review comment:
   why not setIntVar?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 451528)
Time Spent: 1h 50m  (was: 1h 40m)

> MSCK repair should handle transactional tables in certain usecases
> --
>
> Key: HIVE-23671
> URL: https://issues.apache.org/jira/browse/HIVE-23671
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> The MSCK REPAIR tool does not handle transactional tables too well. It can 
> find and add new partitions the same way as for non-transactional tables, but 
> since the writeId differences are not handled, the data can not read back 
> from the new partitions.
> We could handle some usecases when the writeIds in the HMS and the underlying 
> data are not conflicting. If the HMS does not contains allocated writes for 
> the table we can seed the table with the writeIds read from the directory 
> structrure.
> Real life use cases could be:
>  * Copy data files from one cluster to another with different HMS, create the 
> table and call MSCK REPAIR
>  * If the HMS db is lost, recreate the table and call MSCK REPAIR
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23671) MSCK repair should handle transactional tables in certain usecases

2020-06-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23671?focusedWorklogId=451503=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-451503
 ]

ASF GitHub Bot logged work on HIVE-23671:
-

Author: ASF GitHub Bot
Created on: 26/Jun/20 11:20
Start Date: 26/Jun/20 11:20
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1087:
URL: https://github.com/apache/hive/pull/1087#discussion_r446123907



##
File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java
##
@@ -2392,33 +2392,29 @@ public static TableSnapshot 
getTableSnapshot(Configuration conf,
 long writeId = -1;
 ValidWriteIdList validWriteIdList = null;
 
-HiveTxnManager sessionTxnMgr = SessionState.get().getTxnMgr();
-String fullTableName = getFullTableName(dbName, tblName);
-if (sessionTxnMgr != null && sessionTxnMgr.getCurrentTxnId() > 0) {
-  validWriteIdList = getTableValidWriteIdList(conf, fullTableName);
-  if (isStatsUpdater) {
-writeId = SessionState.get().getTxnMgr() != null ?
-SessionState.get().getTxnMgr().getAllocatedTableWriteId(
-  dbName, tblName) : -1;
-if (writeId < 1) {
-  // TODO: this is not ideal... stats updater that doesn't have write 
ID is currently
-  //   "create table"; writeId would be 0/-1 here. No need to call 
this w/true.
-  LOG.debug("Stats updater for {}.{} doesn't have a write ID ({})",
-  dbName, tblName, writeId);
+if (SessionState.get() != null) {
+  HiveTxnManager sessionTxnMgr = SessionState.get().getTxnMgr();
+  String fullTableName = getFullTableName(dbName, tblName);
+  if (sessionTxnMgr != null && sessionTxnMgr.getCurrentTxnId() > 0) {
+validWriteIdList = getTableValidWriteIdList(conf, fullTableName);
+if (isStatsUpdater) {
+  writeId = sessionTxnMgr != null ? 
sessionTxnMgr.getAllocatedTableWriteId(dbName, tblName) : -1;

Review comment:
   redundant check, see if condition above 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 451503)
Time Spent: 20m  (was: 10m)

> MSCK repair should handle transactional tables in certain usecases
> --
>
> Key: HIVE-23671
> URL: https://issues.apache.org/jira/browse/HIVE-23671
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The MSCK REPAIR tool does not handle transactional tables too well. It can 
> find and add new partitions the same way as for non-transactional tables, but 
> since the writeId differences are not handled, the data can not read back 
> from the new partitions.
> We could handle some usecases when the writeIds in the HMS and the underlying 
> data are not conflicting. If the HMS does not contains allocated writes for 
> the table we can seed the table with the writeIds read from the directory 
> structrure.
> Real life use cases could be:
>  * Copy data files from one cluster to another with different HMS, create the 
> table and call MSCK REPAIR
>  * If the HMS db is lost, recreate the table and call MSCK REPAIR
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23671) MSCK repair should handle transactional tables in certain usecases

2020-06-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23671?focusedWorklogId=451504=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-451504
 ]

ASF GitHub Bot logged work on HIVE-23671:
-

Author: ASF GitHub Bot
Created on: 26/Jun/20 11:20
Start Date: 26/Jun/20 11:20
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1087:
URL: https://github.com/apache/hive/pull/1087#discussion_r446123907



##
File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java
##
@@ -2392,33 +2392,29 @@ public static TableSnapshot 
getTableSnapshot(Configuration conf,
 long writeId = -1;
 ValidWriteIdList validWriteIdList = null;
 
-HiveTxnManager sessionTxnMgr = SessionState.get().getTxnMgr();
-String fullTableName = getFullTableName(dbName, tblName);
-if (sessionTxnMgr != null && sessionTxnMgr.getCurrentTxnId() > 0) {
-  validWriteIdList = getTableValidWriteIdList(conf, fullTableName);
-  if (isStatsUpdater) {
-writeId = SessionState.get().getTxnMgr() != null ?
-SessionState.get().getTxnMgr().getAllocatedTableWriteId(
-  dbName, tblName) : -1;
-if (writeId < 1) {
-  // TODO: this is not ideal... stats updater that doesn't have write 
ID is currently
-  //   "create table"; writeId would be 0/-1 here. No need to call 
this w/true.
-  LOG.debug("Stats updater for {}.{} doesn't have a write ID ({})",
-  dbName, tblName, writeId);
+if (SessionState.get() != null) {
+  HiveTxnManager sessionTxnMgr = SessionState.get().getTxnMgr();
+  String fullTableName = getFullTableName(dbName, tblName);
+  if (sessionTxnMgr != null && sessionTxnMgr.getCurrentTxnId() > 0) {
+validWriteIdList = getTableValidWriteIdList(conf, fullTableName);
+if (isStatsUpdater) {
+  writeId = sessionTxnMgr != null ? 
sessionTxnMgr.getAllocatedTableWriteId(dbName, tblName) : -1;

Review comment:
   redundant check (sessionTxnMgr != null), see if condition above 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 451504)
Time Spent: 0.5h  (was: 20m)

> MSCK repair should handle transactional tables in certain usecases
> --
>
> Key: HIVE-23671
> URL: https://issues.apache.org/jira/browse/HIVE-23671
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The MSCK REPAIR tool does not handle transactional tables too well. It can 
> find and add new partitions the same way as for non-transactional tables, but 
> since the writeId differences are not handled, the data can not read back 
> from the new partitions.
> We could handle some usecases when the writeIds in the HMS and the underlying 
> data are not conflicting. If the HMS does not contains allocated writes for 
> the table we can seed the table with the writeIds read from the directory 
> structrure.
> Real life use cases could be:
>  * Copy data files from one cluster to another with different HMS, create the 
> table and call MSCK REPAIR
>  * If the HMS db is lost, recreate the table and call MSCK REPAIR
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23671) MSCK repair should handle transactional tables in certain usecases

2020-06-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23671?focusedWorklogId=451508=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-451508
 ]

ASF GitHub Bot logged work on HIVE-23671:
-

Author: ASF GitHub Bot
Created on: 26/Jun/20 11:27
Start Date: 26/Jun/20 11:27
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1087:
URL: https://github.com/apache/hive/pull/1087#discussion_r446126904



##
File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java
##
@@ -2392,33 +2392,29 @@ public static TableSnapshot 
getTableSnapshot(Configuration conf,
 long writeId = -1;
 ValidWriteIdList validWriteIdList = null;
 
-HiveTxnManager sessionTxnMgr = SessionState.get().getTxnMgr();
-String fullTableName = getFullTableName(dbName, tblName);
-if (sessionTxnMgr != null && sessionTxnMgr.getCurrentTxnId() > 0) {
-  validWriteIdList = getTableValidWriteIdList(conf, fullTableName);
-  if (isStatsUpdater) {
-writeId = SessionState.get().getTxnMgr() != null ?
-SessionState.get().getTxnMgr().getAllocatedTableWriteId(
-  dbName, tblName) : -1;
-if (writeId < 1) {
-  // TODO: this is not ideal... stats updater that doesn't have write 
ID is currently
-  //   "create table"; writeId would be 0/-1 here. No need to call 
this w/true.
-  LOG.debug("Stats updater for {}.{} doesn't have a write ID ({})",
-  dbName, tblName, writeId);
+if (SessionState.get() != null) {
+  HiveTxnManager sessionTxnMgr = SessionState.get().getTxnMgr();
+  String fullTableName = getFullTableName(dbName, tblName);
+  if (sessionTxnMgr != null && sessionTxnMgr.getCurrentTxnId() > 0) {
+validWriteIdList = getTableValidWriteIdList(conf, fullTableName);
+if (isStatsUpdater) {
+  writeId = sessionTxnMgr != null ? 
sessionTxnMgr.getAllocatedTableWriteId(dbName, tblName) : -1;
+  if (writeId < 1) {

Review comment:
   is it ever a valid condition?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 451508)
Time Spent: 50m  (was: 40m)

> MSCK repair should handle transactional tables in certain usecases
> --
>
> Key: HIVE-23671
> URL: https://issues.apache.org/jira/browse/HIVE-23671
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The MSCK REPAIR tool does not handle transactional tables too well. It can 
> find and add new partitions the same way as for non-transactional tables, but 
> since the writeId differences are not handled, the data can not read back 
> from the new partitions.
> We could handle some usecases when the writeIds in the HMS and the underlying 
> data are not conflicting. If the HMS does not contains allocated writes for 
> the table we can seed the table with the writeIds read from the directory 
> structrure.
> Real life use cases could be:
>  * Copy data files from one cluster to another with different HMS, create the 
> table and call MSCK REPAIR
>  * If the HMS db is lost, recreate the table and call MSCK REPAIR
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23671) MSCK repair should handle transactional tables in certain usecases

2020-06-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23671?focusedWorklogId=451514=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-451514
 ]

ASF GitHub Bot logged work on HIVE-23671:
-

Author: ASF GitHub Bot
Created on: 26/Jun/20 11:46
Start Date: 26/Jun/20 11:46
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1087:
URL: https://github.com/apache/hive/pull/1087#discussion_r446134506



##
File path: 
ql/src/test/org/apache/hadoop/hive/ql/metadata/TestHiveMetaStoreChecker.java
##
@@ -74,21 +76,21 @@
   @Before
   public void setUp() throws Exception {
 hive = Hive.get();
-
hive.getConf().setIntVar(HiveConf.ConfVars.METASTORE_FS_HANDLER_THREADS_COUNT, 
15);
-hive.getConf().set(HiveConf.ConfVars.HIVE_MSCK_PATH_VALIDATION.varname, 
"throw");
+
hive.getConf().set(MetastoreConf.ConfVars.FS_HANDLER_THREADS_COUNT.getVarname(),
 "15");

Review comment:
   setIntVar





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 451514)
Time Spent: 1.5h  (was: 1h 20m)

> MSCK repair should handle transactional tables in certain usecases
> --
>
> Key: HIVE-23671
> URL: https://issues.apache.org/jira/browse/HIVE-23671
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> The MSCK REPAIR tool does not handle transactional tables too well. It can 
> find and add new partitions the same way as for non-transactional tables, but 
> since the writeId differences are not handled, the data can not read back 
> from the new partitions.
> We could handle some usecases when the writeIds in the HMS and the underlying 
> data are not conflicting. If the HMS does not contains allocated writes for 
> the table we can seed the table with the writeIds read from the directory 
> structrure.
> Real life use cases could be:
>  * Copy data files from one cluster to another with different HMS, create the 
> table and call MSCK REPAIR
>  * If the HMS db is lost, recreate the table and call MSCK REPAIR
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23671) MSCK repair should handle transactional tables in certain usecases

2020-06-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23671?focusedWorklogId=451511=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-451511
 ]

ASF GitHub Bot logged work on HIVE-23671:
-

Author: ASF GitHub Bot
Created on: 26/Jun/20 11:42
Start Date: 26/Jun/20 11:42
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1087:
URL: https://github.com/apache/hive/pull/1087#discussion_r446131988



##
File path: ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java
##
@@ -2209,20 +2209,7 @@ public void testAcidOrcWritePreservesFieldNames() throws 
Exception {
* sorts rows in dictionary order
*/
   static List stringifyValues(int[][] rowsIn) {
-assert rowsIn.length > 0;
-int[][] rows = rowsIn.clone();
-Arrays.sort(rows, new RowComp());
-List rs = new ArrayList();
-for(int[] row : rows) {
-  assert row.length > 0;
-  StringBuilder sb = new StringBuilder();
-  for(int value : row) {
-sb.append(value).append("\t");
-  }
-  sb.setLength(sb.length() - 1);
-  rs.add(sb.toString());
-}
-return rs;
+return TxnCommandsBaseForTests.stringifyValues(rowsIn);

Review comment:
   could you extend TestTxnCommands2 from TxnCommandsBaseForTests and 
remove static?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 451511)
Time Spent: 1h 10m  (was: 1h)

> MSCK repair should handle transactional tables in certain usecases
> --
>
> Key: HIVE-23671
> URL: https://issues.apache.org/jira/browse/HIVE-23671
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> The MSCK REPAIR tool does not handle transactional tables too well. It can 
> find and add new partitions the same way as for non-transactional tables, but 
> since the writeId differences are not handled, the data can not read back 
> from the new partitions.
> We could handle some usecases when the writeIds in the HMS and the underlying 
> data are not conflicting. If the HMS does not contains allocated writes for 
> the table we can seed the table with the writeIds read from the directory 
> structrure.
> Real life use cases could be:
>  * Copy data files from one cluster to another with different HMS, create the 
> table and call MSCK REPAIR
>  * If the HMS db is lost, recreate the table and call MSCK REPAIR
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23671) MSCK repair should handle transactional tables in certain usecases

2020-06-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23671?focusedWorklogId=451512=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-451512
 ]

ASF GitHub Bot logged work on HIVE-23671:
-

Author: ASF GitHub Bot
Created on: 26/Jun/20 11:44
Start Date: 26/Jun/20 11:44
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1087:
URL: https://github.com/apache/hive/pull/1087#discussion_r446133519



##
File path: ql/src/test/org/apache/hadoop/hive/ql/TxnCommandsBaseForTests.java
##
@@ -162,9 +163,23 @@ protected String getWarehouseDir() {
* takes raw data and turns it into a string as if from Driver.getResults()
* sorts rows in dictionary order
*/
-  List stringifyValues(int[][] rowsIn) {
-return TestTxnCommands2.stringifyValues(rowsIn);
+  public static List stringifyValues(int[][] rowsIn) {
+assert rowsIn.length > 0;
+int[][] rows = rowsIn.clone();
+Arrays.sort(rows, new TestTxnCommands2.RowComp());

Review comment:
   move RowComp from TestTxnCommands2 to TxnCommandsBaseForTests





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 451512)
Time Spent: 1h 20m  (was: 1h 10m)

> MSCK repair should handle transactional tables in certain usecases
> --
>
> Key: HIVE-23671
> URL: https://issues.apache.org/jira/browse/HIVE-23671
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> The MSCK REPAIR tool does not handle transactional tables too well. It can 
> find and add new partitions the same way as for non-transactional tables, but 
> since the writeId differences are not handled, the data can not read back 
> from the new partitions.
> We could handle some usecases when the writeIds in the HMS and the underlying 
> data are not conflicting. If the HMS does not contains allocated writes for 
> the table we can seed the table with the writeIds read from the directory 
> structrure.
> Real life use cases could be:
>  * Copy data files from one cluster to another with different HMS, create the 
> table and call MSCK REPAIR
>  * If the HMS db is lost, recreate the table and call MSCK REPAIR
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23671) MSCK repair should handle transactional tables in certain usecases

2020-06-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23671?focusedWorklogId=451527=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-451527
 ]

ASF GitHub Bot logged work on HIVE-23671:
-

Author: ASF GitHub Bot
Created on: 26/Jun/20 12:07
Start Date: 26/Jun/20 12:07
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1087:
URL: https://github.com/apache/hive/pull/1087#discussion_r446143608



##
File path: 
ql/src/test/org/apache/hadoop/hive/ql/metadata/TestHiveMetaStoreChecker.java
##
@@ -252,37 +241,165 @@ public void testInvalidPartitionKeyName()
   @Test
   public void testSkipInvalidPartitionKeyName()
 throws HiveException, AlreadyExistsException, IOException, 
MetastoreException {
-hive.getConf().set(HiveConf.ConfVars.HIVE_MSCK_PATH_VALIDATION.varname, 
"skip");
+
hive.getConf().set(MetastoreConf.ConfVars.MSCK_PATH_VALIDATION.getVarname(), 
"skip");
 checker = new HiveMetaStoreChecker(msc, hive.getConf());
-Table table = createTestTable();
+Table table = createTestTable(false);
 List partitions = hive.getPartitions(table);
 assertEquals(2, partitions.size());
 // add a fake partition dir on fs
 fs = partitions.get(0).getDataLocation().getFileSystem(hive.getConf());
-Path fakePart =
-new Path(table.getDataLocation().toString(), 
"fakedate=2009-01-01/fakecity=sanjose");
-fs.mkdirs(fakePart);
-fs.deleteOnExit(fakePart);
+addFolderToPath(fs, 
table.getDataLocation().toString(),"fakedate=2009-01-01/fakecity=sanjose");
 createPartitionsDirectoriesOnFS(table, 2);
-CheckResult result = new CheckResult();
-checker.checkMetastore(catName, dbName, tableName, null, null, result);
+CheckResult result = checker.checkMetastore(catName, dbName, tableName, 
null, null);
 assertEquals(Collections. emptySet(), result.getTablesNotInMs());
 assertEquals(Collections. emptySet(), result.getTablesNotOnFs());
 assertEquals(Collections. emptySet(), 
result.getPartitionsNotOnFs());
 // only 2 valid partitions should be added
 assertEquals(2, result.getPartitionsNotInMs().size());
   }
 
-  private Table createTestTable() throws HiveException, AlreadyExistsException 
{
+  /*
+   * Tests the case when we have normal delta_dirs in the partition folder
+   * does not throw HiveException
+   */
+  @Test
+  public void testAddPartitionNormalDeltas() throws Exception {
+Table table = createTestTable(true);
+List partitions = hive.getPartitions(table);
+assertEquals(2, partitions.size());
+// add a partition dir on fs
+fs = partitions.get(0).getDataLocation().getFileSystem(hive.getConf());
+Path newPart = addFolderToPath(fs, table.getDataLocation().toString(),
+partDateName + "=2017-01-01/" + partCityName + "=paloalto");
+
+// Add a few deltas
+addFolderToPath(fs, newPart.toString(), "delta_001_001_");
+addFolderToPath(fs, newPart.toString(), "delta_010_010_");
+addFolderToPath(fs, newPart.toString(), "delta_101_101_");
+CheckResult result = checker.checkMetastore(catName, dbName, tableName, 
null, null);
+assertEquals(Collections. emptySet(), 
result.getPartitionsNotOnFs());
+assertEquals(1, result.getPartitionsNotInMs().size());
+// Found the highest writeId
+assertEquals(101, 
result.getPartitionsNotInMs().iterator().next().getMaxWriteId());
+assertEquals(0, 
result.getPartitionsNotInMs().iterator().next().getMaxTxnId());
+  }
+  /*
+   * Tests the case when we have normal delta_dirs in the partition folder
+   * does not throw HiveException
+   */
+  @Test
+  public void testAddPartitionCompactedDeltas() throws Exception {
+Table table = createTestTable(true);
+List partitions = hive.getPartitions(table);
+assertEquals(2, partitions.size());
+// add a partition dir on fs
+fs = partitions.get(0).getDataLocation().getFileSystem(hive.getConf());
+Path newPart = addFolderToPath(fs, table.getDataLocation().toString(),
+partDateName + "=2017-01-01/" + partCityName + "=paloalto");
+
+// Add a few deltas
+addFolderToPath(fs, newPart.toString(), "delta_001_001_");
+addFolderToPath(fs, newPart.toString(), "delta_010_015_v067");
+addFolderToPath(fs, newPart.toString(), "delta_101_120_v087");
+CheckResult result = checker.checkMetastore(catName, dbName, tableName, 
null, null);
+assertEquals(Collections. emptySet(), 
result.getPartitionsNotOnFs());
+assertEquals(1, result.getPartitionsNotInMs().size());
+// Found the highest writeId
+assertEquals(120, 
result.getPartitionsNotInMs().iterator().next().getMaxWriteId());
+assertEquals(87, 
result.getPartitionsNotInMs().iterator().next().getMaxTxnId());
+  }
+  @Test
+  public void testAddPartitionCompactedBase() throws 

[jira] [Work logged] (HIVE-22015) [CachedStore] Cache table constraints in CachedStore

2020-06-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22015?focusedWorklogId=451534=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-451534
 ]

ASF GitHub Bot logged work on HIVE-22015:
-

Author: ASF GitHub Bot
Created on: 26/Jun/20 12:18
Start Date: 26/Jun/20 12:18
Worklog Time Spent: 10m 
  Work Description: sankarh commented on a change in pull request #1109:
URL: https://github.com/apache/hive/pull/1109#discussion_r445577541



##
File path: 
standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
##
@@ -402,6 +385,32 @@ private static void updateStatsForAlterTable(RawStore 
rawStore, Table tblBefore,
 sharedCache.removePartitionColStatsFromCache(catalogName, dbName, 
tableName, msgPart.getPartValues(),
 msgPart.getColName());
 break;
+  case MessageBuilder.ADD_PRIMARYKEY_EVENT:
+  AddPrimaryKeyMessage addPrimaryKeyMessage = 
deserializer.getAddPrimaryKeyMessage(message);

Review comment:
   Should be 2 spaced indentation. Check other places too.

##
File path: 
itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/cache/TestCachedStoreUpdateUsingEvents.java
##
@@ -295,6 +295,178 @@ public void testTableOpsForUpdateUsingEvents() throws 
Exception {
 sharedCache.getSdCache().clear();
   }
 
+  @Test
+  public void testConstraintsForUpdateUsingEvents() throws Exception {
+long lastEventId = -1;
+RawStore rawStore = hmsHandler.getMS();
+
+// Prewarm CachedStore
+CachedStore.setCachePrewarmedState(false);
+CachedStore.prewarm(rawStore);
+
+// Add a db via rawStore
+String dbName = "test_table_ops";
+String dbOwner = "user1";
+Database db = createTestDb(dbName, dbOwner);
+hmsHandler.create_database(db);
+db = rawStore.getDatabase(DEFAULT_CATALOG_NAME, dbName);
+
+String foreignDbName = "test_table_ops_foreign";
+Database foreignDb = createTestDb(foreignDbName, dbOwner);
+hmsHandler.create_database(foreignDb);
+foreignDb = rawStore.getDatabase(DEFAULT_CATALOG_NAME, foreignDbName);
+// Add a table via rawStore
+String tblName = "tbl";
+String tblOwner = "user1";
+FieldSchema col1 = new FieldSchema("col1", "int", "integer column");
+FieldSchema col2 = new FieldSchema("col2", "string", "string column");
+List cols = new ArrayList();
+cols.add(col1);
+cols.add(col2);
+List ptnCols = new ArrayList();
+Table tbl = createTestTbl(dbName, tblName, tblOwner, cols, ptnCols);
+String foreignTblName = "ftbl";
+Table foreignTbl = createTestTbl(foreignDbName, foreignTblName, tblOwner, 
cols, ptnCols);
+
+SQLPrimaryKey key = new SQLPrimaryKey(dbName, tblName, col1.getName(), 1, 
"pk1",
+false, false, false);
+SQLUniqueConstraint uC = new SQLUniqueConstraint(DEFAULT_CATALOG_NAME, 
dbName, tblName,
+col1.getName(), 2, "uc1", false, false, false);
+SQLNotNullConstraint nN = new SQLNotNullConstraint(DEFAULT_CATALOG_NAME, 
dbName, tblName,
+col1.getName(), "nn1", false, false, false);
+SQLForeignKey foreignKey = new SQLForeignKey(key.getTable_db(), 
key.getTable_name(), key.getColumn_name(),
+foreignDbName, foreignTblName, key.getColumn_name(), 2, 1,2,
+"fk1", key.getPk_name(), false, false, false);
+
+hmsHandler.create_table_with_constraints(tbl,
+Arrays.asList(key), null, Arrays.asList(uC), Arrays.asList(nN), 
null, null);
+hmsHandler.create_table_with_constraints(foreignTbl, null, 
Arrays.asList(foreignKey),
+null, null, null, null);
+
+tbl = rawStore.getTable(DEFAULT_CATALOG_NAME, dbName, tblName);
+foreignTbl = rawStore.getTable(DEFAULT_CATALOG_NAME, foreignDbName, 
foreignTblName);
+
+// Read database, table via CachedStore
+Database dbRead= sharedCache.getDatabaseFromCache(DEFAULT_CATALOG_NAME, 
dbName);
+Assert.assertEquals(db, dbRead);
+Table tblRead = sharedCache.getTableFromCache(DEFAULT_CATALOG_NAME, 
dbName, tblName);
+compareTables(tblRead, tbl);
+
+Table foreignTblRead = sharedCache.getTableFromCache(DEFAULT_CATALOG_NAME, 
foreignDbName, foreignTblName);
+compareTables(foreignTblRead, foreignTbl);
+
+List keys = rawStore.getPrimaryKeys(DEFAULT_CATALOG_NAME, 
dbName, tblName);
+List keysRead = 
sharedCache.listCachedPrimaryKeys(DEFAULT_CATALOG_NAME, dbName, tblName);
+assertsForPrimarkaryKey(keysRead, 1, 0, keys.get(0));
+
+List nNs = 
rawStore.getNotNullConstraints(DEFAULT_CATALOG_NAME, dbName, tblName);
+List nNsRead = 
sharedCache.listCachedNotNullConstraints(DEFAULT_CATALOG_NAME, dbName, tblName);
+assertsForNotNullConstraints(nNsRead, 1, 0, nNs.get(0));
+
+List uns = 
rawStore.getUniqueConstraints(DEFAULT_CATALOG_NAME, dbName, tblName);
+List unsRead = 

[jira] [Work logged] (HIVE-23671) MSCK repair should handle transactional tables in certain usecases

2020-06-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23671?focusedWorklogId=451507=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-451507
 ]

ASF GitHub Bot logged work on HIVE-23671:
-

Author: ASF GitHub Bot
Created on: 26/Jun/20 11:24
Start Date: 26/Jun/20 11:24
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1087:
URL: https://github.com/apache/hive/pull/1087#discussion_r446125639



##
File path: ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java
##
@@ -2392,33 +2392,29 @@ public static TableSnapshot 
getTableSnapshot(Configuration conf,
 long writeId = -1;
 ValidWriteIdList validWriteIdList = null;
 
-HiveTxnManager sessionTxnMgr = SessionState.get().getTxnMgr();
-String fullTableName = getFullTableName(dbName, tblName);
-if (sessionTxnMgr != null && sessionTxnMgr.getCurrentTxnId() > 0) {
-  validWriteIdList = getTableValidWriteIdList(conf, fullTableName);
-  if (isStatsUpdater) {
-writeId = SessionState.get().getTxnMgr() != null ?
-SessionState.get().getTxnMgr().getAllocatedTableWriteId(
-  dbName, tblName) : -1;
-if (writeId < 1) {
-  // TODO: this is not ideal... stats updater that doesn't have write 
ID is currently
-  //   "create table"; writeId would be 0/-1 here. No need to call 
this w/true.
-  LOG.debug("Stats updater for {}.{} doesn't have a write ID ({})",
-  dbName, tblName, writeId);
+if (SessionState.get() != null) {

Review comment:
   i think it can't be null, it returns ThreadLocal variable





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 451507)
Time Spent: 40m  (was: 0.5h)

> MSCK repair should handle transactional tables in certain usecases
> --
>
> Key: HIVE-23671
> URL: https://issues.apache.org/jira/browse/HIVE-23671
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The MSCK REPAIR tool does not handle transactional tables too well. It can 
> find and add new partitions the same way as for non-transactional tables, but 
> since the writeId differences are not handled, the data can not read back 
> from the new partitions.
> We could handle some usecases when the writeIds in the HMS and the underlying 
> data are not conflicting. If the HMS does not contains allocated writes for 
> the table we can seed the table with the writeIds read from the directory 
> structrure.
> Real life use cases could be:
>  * Copy data files from one cluster to another with different HMS, create the 
> table and call MSCK REPAIR
>  * If the HMS db is lost, recreate the table and call MSCK REPAIR
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23671) MSCK repair should handle transactional tables in certain usecases

2020-06-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23671?focusedWorklogId=451510=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-451510
 ]

ASF GitHub Bot logged work on HIVE-23671:
-

Author: ASF GitHub Bot
Created on: 26/Jun/20 11:40
Start Date: 26/Jun/20 11:40
Worklog Time Spent: 10m 
  Work Description: deniskuzZ commented on a change in pull request #1087:
URL: https://github.com/apache/hive/pull/1087#discussion_r446131988



##
File path: ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java
##
@@ -2209,20 +2209,7 @@ public void testAcidOrcWritePreservesFieldNames() throws 
Exception {
* sorts rows in dictionary order
*/
   static List stringifyValues(int[][] rowsIn) {
-assert rowsIn.length > 0;
-int[][] rows = rowsIn.clone();
-Arrays.sort(rows, new RowComp());
-List rs = new ArrayList();
-for(int[] row : rows) {
-  assert row.length > 0;
-  StringBuilder sb = new StringBuilder();
-  for(int value : row) {
-sb.append(value).append("\t");
-  }
-  sb.setLength(sb.length() - 1);
-  rs.add(sb.toString());
-}
-return rs;
+return TxnCommandsBaseForTests.stringifyValues(rowsIn);

Review comment:
   remove it and use static call





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 451510)
Time Spent: 1h  (was: 50m)

> MSCK repair should handle transactional tables in certain usecases
> --
>
> Key: HIVE-23671
> URL: https://issues.apache.org/jira/browse/HIVE-23671
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The MSCK REPAIR tool does not handle transactional tables too well. It can 
> find and add new partitions the same way as for non-transactional tables, but 
> since the writeId differences are not handled, the data can not read back 
> from the new partitions.
> We could handle some usecases when the writeIds in the HMS and the underlying 
> data are not conflicting. If the HMS does not contains allocated writes for 
> the table we can seed the table with the writeIds read from the directory 
> structrure.
> Real life use cases could be:
>  * Copy data files from one cluster to another with different HMS, create the 
> table and call MSCK REPAIR
>  * If the HMS db is lost, recreate the table and call MSCK REPAIR
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23725) ValidTxnManager snapshot outdating causing partial reads in merge insert

2020-06-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23725?focusedWorklogId=451520=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-451520
 ]

ASF GitHub Bot logged work on HIVE-23725:
-

Author: ASF GitHub Bot
Created on: 26/Jun/20 11:55
Start Date: 26/Jun/20 11:55
Worklog Time Spent: 10m 
  Work Description: deniskuzZ merged pull request #1151:
URL: https://github.com/apache/hive/pull/1151


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 451520)
Time Spent: 5h 40m  (was: 5.5h)

> ValidTxnManager snapshot outdating causing partial reads in merge insert
> 
>
> Key: HIVE-23725
> URL: https://issues.apache.org/jira/browse/HIVE-23725
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Varga
>Assignee: Peter Varga
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> When the ValidTxnManager invalidates the snapshot during merge insert and 
> starts to read committed transactions that were not committed when the query 
> compilation happened, it can cause partial read problems if the committed 
> transaction created new partition in the source or target table.
> The solution should be not only fix the snapshot but also recompile the query 
> and acquire the locks again.
> You could construct an example like this:
> 1. open and compile transaction 1 that merge inserts data from a partitioned 
> source table that has a few partition.
> 2. Open, run and commit transaction 2 that inserts data to an old and a new 
> partition to the source table.
> 3. Open, run and commit transaction 3 that inserts data to the target table 
> of the merge statement, that will retrigger a snapshot generation in 
> transaction 1.
> 4. Run transaction 1, the snapshot will be regenerated, and it will read 
> partial data from transaction 2 breaking the ACID properties.
> Different setup.
> Switch the transaction order:
> 1. compile transaction 1 that inserts data to an old and a new partition of 
> the source table.
> 2. compile transaction 2 that insert data to the target table
> 2. compile transaction 3 that merge inserts data from the source table to the 
> target table
> 3. run and commit transaction 1
> 4. run and commit transaction 2
> 5. run transaction 3, since it cointains 1 and 2 in its snaphot the 
> isValidTxnListState will be triggered and we do a partial read of the 
> transaction 1 for the same reasons.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-22878) Add caching of table constraints, foreignKeys in CachedStore

2020-06-26 Thread Sankar Hariappan (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan resolved HIVE-22878.
-
Resolution: Duplicate

Duplicate to https://issues.apache.org/jira/browse/HIVE-22015

> Add caching of table constraints, foreignKeys in CachedStore
> 
>
> Key: HIVE-22878
> URL: https://issues.apache.org/jira/browse/HIVE-22878
> Project: Hive
>  Issue Type: Improvement
>  Components: Hive
>Reporter: Rajesh Balamohan
>Assignee: Adesh Kumar Rao
>Priority: Major
> Attachments: Screenshot 2020-02-12 at 9.24.27 AM.jpg, Screenshot 
> 2020-02-12 at 9.25.33 AM.jpg
>
>
> All pink bars are misses from cachedstore.
> !Screenshot 2020-02-12 at 9.24.27 AM.jpg|width=428,height=314!
>  
> !Screenshot 2020-02-12 at 9.25.33 AM.jpg|width=648,height=470!
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-9028) Enhance the hive parser to accept tuples in where in clause filter

2020-06-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-9028?focusedWorklogId=451807=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-451807
 ]

ASF GitHub Bot logged work on HIVE-9028:


Author: ASF GitHub Bot
Created on: 27/Jun/20 00:25
Start Date: 27/Jun/20 00:25
Worklog Time Spent: 10m 
  Work Description: github-actions[bot] closed pull request #25:
URL: https://github.com/apache/hive/pull/25


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 451807)
Time Spent: 20m  (was: 10m)

> Enhance the hive parser to accept tuples in where in clause filter
> --
>
> Key: HIVE-9028
> URL: https://issues.apache.org/jira/browse/HIVE-9028
> Project: Hive
>  Issue Type: Improvement
>  Components: Parser
>Affects Versions: 0.13.1
>Reporter: Yash Datta
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently, the hive parser will only accept a list of values in the where in 
> clause and the filter is applied only on a single column. Enhanced it to 
> accept filter on multiple columns.
> So current support is for queries like :
> Select * from table where c1 in (value1,value2,...value n);
> Added support in the parser for queries like :
> Select  * from table where (c1,c2,... cn) in ((value1,value2...value n), 
> (value1' , value2' ... ,value n') )



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23699) Cleanup HIVEQUERYRESULTFILEFORMAT handling

2020-06-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23699?focusedWorklogId=451785=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-451785
 ]

ASF GitHub Bot logged work on HIVE-23699:
-

Author: ASF GitHub Bot
Created on: 26/Jun/20 22:58
Start Date: 26/Jun/20 22:58
Worklog Time Spent: 10m 
  Work Description: jfsii commented on a change in pull request #1119:
URL: https://github.com/apache/hive/pull/1119#discussion_r44614



##
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##
@@ -101,6 +102,65 @@
   private volatile boolean isSparkConfigUpdated = false;
   private static final int LOG_PREFIX_LENGTH = 64;
 
+  interface HiveConfEnum> {
+public static > T from(Class cls, String value, T 
invalidEnum) {
+  try {
+return T.valueOf(cls, value.toUpperCase());
+  } catch (Exception e) {
+return invalidEnum;
+  }
+}
+  }

Review comment:
   Yeah, I'll remove it. I don't even remember why it was a question for me.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 451785)
Time Spent: 40m  (was: 0.5h)

> Cleanup HIVEQUERYRESULTFILEFORMAT handling
> --
>
> Key: HIVE-23699
> URL: https://issues.apache.org/jira/browse/HIVE-23699
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: John Sherman
>Assignee: John Sherman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> HIVEQUERYRESULTFILEFORMAT format handling has grown over the years and has 
> become somewhat messy code wise in SemanticAnalyzer and TaskCompiler. There 
> are special cases where the HIVEQUERYRESULTFILEFORMAT setting gets changed at 
> runtime that may cause issues if a user changes execution.engines between 
> queries and probably other corner cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23768) Metastore's update service wrongly strips partition column stats from the cache

2020-06-26 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis reassigned HIVE-23768:
--


> Metastore's update service wrongly strips partition column stats from the 
> cache
> ---
>
> Key: HIVE-23768
> URL: https://issues.apache.org/jira/browse/HIVE-23768
> Project: Hive
>  Issue Type: Bug
>Reporter: Stamatis Zampetakis
>Assignee: Stamatis Zampetakis
>Priority: Critical
>
> Metastore's update service wrongly strips partition column stats from the 
> cache in an attempt to update them. The issue may go unnoticed since missing 
> stats do not lead to query failures. 
> However, they can alter significantly the query plan affecting performance. 
> Moreover, they lead to flakiness since some times the stats are present and 
> sometimes are not leading to a query that has a different plan overtime. 
> Normally missing elements from the cache shouldn't be a correctness problem 
> since we can always fallback to the raw stats. Unfortunately, there are many 
> interconnections with other parts of the code (e.g., code to obtain aggregate 
> statistics) where this contract breaks.   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23699) Cleanup HIVEQUERYRESULTFILEFORMAT handling

2020-06-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23699?focusedWorklogId=451782=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-451782
 ]

ASF GitHub Bot logged work on HIVE-23699:
-

Author: ASF GitHub Bot
Created on: 26/Jun/20 22:52
Start Date: 26/Jun/20 22:52
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on a change in pull request #1119:
URL: https://github.com/apache/hive/pull/1119#discussion_r446442884



##
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##
@@ -101,6 +102,65 @@
   private volatile boolean isSparkConfigUpdated = false;
   private static final int LOG_PREFIX_LENGTH = 64;
 
+  interface HiveConfEnum> {
+public static > T from(Class cls, String value, T 
invalidEnum) {
+  try {
+return T.valueOf(cls, value.toUpperCase());
+  } catch (Exception e) {
+return invalidEnum;
+  }
+}
+  }

Review comment:
   I am not sure we need the interface indeed, probably the utility method 
would suffice?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 451782)
Time Spent: 0.5h  (was: 20m)

> Cleanup HIVEQUERYRESULTFILEFORMAT handling
> --
>
> Key: HIVE-23699
> URL: https://issues.apache.org/jira/browse/HIVE-23699
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 4.0.0
>Reporter: John Sherman
>Assignee: John Sherman
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> HIVEQUERYRESULTFILEFORMAT format handling has grown over the years and has 
> become somewhat messy code wise in SemanticAnalyzer and TaskCompiler. There 
> are special cases where the HIVEQUERYRESULTFILEFORMAT setting gets changed at 
> runtime that may cause issues if a user changes execution.engines between 
> queries and probably other corner cases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (HIVE-23665) Rewrite last_value to first_value to enable streaming results

2020-06-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23665?focusedWorklogId=451666=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-451666
 ]

ASF GitHub Bot logged work on HIVE-23665:
-

Author: ASF GitHub Bot
Created on: 26/Jun/20 17:42
Start Date: 26/Jun/20 17:42
Worklog Time Spent: 10m 
  Work Description: jcamachor commented on a change in pull request #1177:
URL: https://github.com/apache/hive/pull/1177#discussion_r446323133



##
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/PTFTranslator.java
##
@@ -142,6 +143,19 @@ public PTFDesc translate(WindowingSpec wdwSpec, 
SemanticAnalyzer semAly, HiveCon
   UnparseTranslator unparseT)
   throws SemanticException {
 init(semAly, hCfg, inputRR, unparseT);
+for (int i = 0; i < wdwSpec.getWindowExpressions().size(); ++i) {

Review comment:
   Should we do this rewriting via Calcite rule instead of adding this 
custom logic over here?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 451666)
Time Spent: 40m  (was: 0.5h)

> Rewrite last_value to first_value to enable streaming results
> -
>
> Key: HIVE-23665
> URL: https://issues.apache.org/jira/browse/HIVE-23665
> Project: Hive
>  Issue Type: Bug
>Reporter: Ramesh Kumar Thangarajan
>Assignee: Ramesh Kumar Thangarajan
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23665.1.patch, HIVE-23665.2.patch, 
> HIVE-23665.3.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Rewrite last_value to first_value to enable streaming results
> last_value cannot be streamed because the intermediate results need to be 
> buffered to determine the window result till we get the last row in the 
> window. But if we can rewrite to first_value we can stream the results, 
> although the order of results will not be guaranteed (also not important)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23665) Rewrite last_value to first_value to enable streaming results

2020-06-26 Thread Ramesh Kumar Thangarajan (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17146476#comment-17146476
 ] 

Ramesh Kumar Thangarajan commented on HIVE-23665:
-

[~jcamachorodriguez] [~vgarg] Can you please review the attached PR and let me 
know your thoughts?

> Rewrite last_value to first_value to enable streaming results
> -
>
> Key: HIVE-23665
> URL: https://issues.apache.org/jira/browse/HIVE-23665
> Project: Hive
>  Issue Type: Bug
>Reporter: Ramesh Kumar Thangarajan
>Assignee: Ramesh Kumar Thangarajan
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23665.1.patch, HIVE-23665.2.patch, 
> HIVE-23665.3.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Rewrite last_value to first_value to enable streaming results
> last_value cannot be streamed because the intermediate results need to be 
> buffered to determine the window result till we get the last row in the 
> window. But if we can rewrite to first_value we can stream the results, 
> although the order of results will not be guaranteed (also not important)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23751) QTest: Override #mkdirs() method in ProxyFileSystem To Align After HADOOP-16582

2020-06-26 Thread Syed Shameerur Rahman (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17146422#comment-17146422
 ] 

Syed Shameerur Rahman commented on HIVE-23751:
--

[~jcamachorodriguez] [~kgyrtkirk] Could you please review?

> QTest: Override #mkdirs() method in ProxyFileSystem To Align After 
> HADOOP-16582
> ---
>
> Key: HIVE-23751
> URL: https://issues.apache.org/jira/browse/HIVE-23751
> Project: Hive
>  Issue Type: Task
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0, 3.2.0
>
> Attachments: HIVE-23751.01.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> HADOOP-16582 have changed the way how mkdirs() work:
> *Before HADOOP-16582:*
> All calls to mkdirs(p) were fast-tracked to FileSystem.mkdirs which were then 
> re-routed to mkdirs(p, permission) method. For ProxyFileSytem the call would 
> look like
> {code:java}
> FileUtiles.mkdir(p)  ->  FileSystem.mkdirs(p) ---> 
> ProxyFileSytem.mkdirs(p,permission)
> {code}
> An implementation of FileSystem have only needed implement mkdirs(p, 
> permission)
> *After HADOOP-16582:*
> Since FilterFileSystem overrides mkdirs(p) method the new call to 
> ProxyFileSystem would look like
> {code:java}
> FileUtiles.mkdir(p) ---> FilterFileSystem.mkdirs(p) -->
> {code}
> This will make all the qtests fails with the below exception 
> {code:java}
> Caused by: java.lang.IllegalArgumentException: Wrong FS: 
> pfile:/media/ebs1/workspace/hive-3.1-qtest/group/5/label/HiveQTest/hive-1.2.0/itests/qtest/target/warehouse/dest1,
>  expected: file:///
> {code}
> Note: We will hit this issue when we bump up hadoop version in hive.
> So as per the discussion in HADOOP-16963 ProxyFileSystem would need to 
> override the mkdirs(p) method inorder to solve the above problem. So now the 
> new flow would look like
> {code:java}
> FileUtiles.mkdir(p)  >   ProxyFileSytem.mkdirs(p) ---> 
> ProxyFileSytem.mkdirs(p, permission) --->
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23767) Send ValidWriteIDList in request for all the new HMS get_* APIs that are in request/response form

2020-06-26 Thread Kishen Das (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kishen Das reassigned HIVE-23767:
-

Assignee: Kishen Das

> Send ValidWriteIDList in request for all the new HMS get_* APIs that are in 
> request/response form
> -
>
> Key: HIVE-23767
> URL: https://issues.apache.org/jira/browse/HIVE-23767
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Kishen Das
>Assignee: Kishen Das
>Priority: Major
>
> We recently introduced new set of HMS APIs that take ValidWriteIDList in the 
> request, as part of HIVE-22017.
> We should switch to these new APIs, wherever required and start sending 
> ValidWriteIDList in request for all the new HMS get_* APIs that are in 
> request/response form.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HIVE-23766) Send ValidWriteIDList in request for all the new HMS get_* APIs that are in request/response form

2020-06-26 Thread Kishen Das (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kishen Das resolved HIVE-23766.
---
Resolution: Duplicate

> Send ValidWriteIDList in request for all the new HMS get_* APIs that are in 
> request/response form
> -
>
> Key: HIVE-23766
> URL: https://issues.apache.org/jira/browse/HIVE-23766
> Project: Hive
>  Issue Type: Task
>  Components: HiveServer2
>Reporter: Kishen Das
>Priority: Major
>
> We recently introduced new set of HMS APIs that take ValidWriteIDList in the 
> request, as part of HIVE-22017.
> We should switch to these new APIs, wherever required and start sending 
> ValidWriteIDList in request for all the new HMS get_* APIs that are in 
> request/response form.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22957) Support Partition Filtering In MSCK REPAIR TABLE Command

2020-06-26 Thread Syed Shameerur Rahman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed Shameerur Rahman updated HIVE-22957:
-
Attachment: HIVE-22957.03.patch

> Support Partition Filtering In MSCK REPAIR TABLE Command
> 
>
> Key: HIVE-22957
> URL: https://issues.apache.org/jira/browse/HIVE-22957
> Project: Hive
>  Issue Type: Improvement
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: Design Doc_ Partition Filtering In MSCK REPAIR 
> TABLE.pdf, HIVE-22957.01.patch, HIVE-22957.02.patch, HIVE-22957.03.patch
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> *Design Doc:*
> [^Design Doc_ Partition Filtering In MSCK REPAIR TABLE.pdf] 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-22957) Support Partition Filtering In MSCK REPAIR TABLE Command

2020-06-26 Thread Syed Shameerur Rahman (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-22957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17146462#comment-17146462
 ] 

Syed Shameerur Rahman commented on HIVE-22957:
--

Rebased master in HIVE-22957.03.patch

> Support Partition Filtering In MSCK REPAIR TABLE Command
> 
>
> Key: HIVE-22957
> URL: https://issues.apache.org/jira/browse/HIVE-22957
> Project: Hive
>  Issue Type: Improvement
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: Design Doc_ Partition Filtering In MSCK REPAIR 
> TABLE.pdf, HIVE-22957.01.patch, HIVE-22957.02.patch, HIVE-22957.03.patch
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> *Design Doc:*
> [^Design Doc_ Partition Filtering In MSCK REPAIR TABLE.pdf] 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22957) Support Partition Filtering In MSCK REPAIR TABLE Command

2020-06-26 Thread Syed Shameerur Rahman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed Shameerur Rahman updated HIVE-22957:
-
Attachment: HIVE-22957.03.patch

> Support Partition Filtering In MSCK REPAIR TABLE Command
> 
>
> Key: HIVE-22957
> URL: https://issues.apache.org/jira/browse/HIVE-22957
> Project: Hive
>  Issue Type: Improvement
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: Design Doc_ Partition Filtering In MSCK REPAIR 
> TABLE.pdf, HIVE-22957.01.patch, HIVE-22957.02.patch, HIVE-22957.03.patch
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> *Design Doc:*
> [^Design Doc_ Partition Filtering In MSCK REPAIR TABLE.pdf] 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-22957) Support Partition Filtering In MSCK REPAIR TABLE Command

2020-06-26 Thread Syed Shameerur Rahman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-22957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed Shameerur Rahman updated HIVE-22957:
-
Attachment: (was: HIVE-22957.03.patch)

> Support Partition Filtering In MSCK REPAIR TABLE Command
> 
>
> Key: HIVE-22957
> URL: https://issues.apache.org/jira/browse/HIVE-22957
> Project: Hive
>  Issue Type: Improvement
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: Design Doc_ Partition Filtering In MSCK REPAIR 
> TABLE.pdf, HIVE-22957.01.patch, HIVE-22957.02.patch
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> *Design Doc:*
> [^Design Doc_ Partition Filtering In MSCK REPAIR TABLE.pdf] 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23760) Upgrading to Kafka 2.5 Clients

2020-06-26 Thread Andras Katona (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andras Katona reassigned HIVE-23760:


Assignee: Karen Coppage  (was: Karen)

> Upgrading to Kafka 2.5 Clients
> --
>
> Key: HIVE-23760
> URL: https://issues.apache.org/jira/browse/HIVE-23760
> Project: Hive
>  Issue Type: Improvement
>  Components: kafka integration
>Reporter: Andras Katona
>Assignee: Karen Coppage
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HIVE-23760) Upgrading to Kafka 2.5 Clients

2020-06-26 Thread Andras Katona (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andras Katona reassigned HIVE-23760:


Assignee: Karen  (was: Andras Katona)

> Upgrading to Kafka 2.5 Clients
> --
>
> Key: HIVE-23760
> URL: https://issues.apache.org/jira/browse/HIVE-23760
> Project: Hive
>  Issue Type: Improvement
>  Components: kafka integration
>Reporter: Andras Katona
>Assignee: Karen
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Issue Comment Deleted] (HIVE-23751) QTest: Override #mkdirs() method in ProxyFileSystem To Align After HADOOP-16582

2020-06-26 Thread Syed Shameerur Rahman (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Syed Shameerur Rahman updated HIVE-23751:
-
Comment: was deleted

(was: [~kgyrtkirk] Could you please review?)

> QTest: Override #mkdirs() method in ProxyFileSystem To Align After 
> HADOOP-16582
> ---
>
> Key: HIVE-23751
> URL: https://issues.apache.org/jira/browse/HIVE-23751
> Project: Hive
>  Issue Type: Task
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0, 3.2.0
>
> Attachments: HIVE-23751.01.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> HADOOP-16582 have changed the way how mkdirs() work:
> *Before HADOOP-16582:*
> All calls to mkdirs(p) were fast-tracked to FileSystem.mkdirs which were then 
> re-routed to mkdirs(p, permission) method. For ProxyFileSytem the call would 
> look like
> {code:java}
> FileUtiles.mkdir(p)  ->  FileSystem.mkdirs(p) ---> 
> ProxyFileSytem.mkdirs(p,permission)
> {code}
> An implementation of FileSystem have only needed implement mkdirs(p, 
> permission)
> *After HADOOP-16582:*
> Since FilterFileSystem overrides mkdirs(p) method the new call to 
> ProxyFileSystem would look like
> {code:java}
> FileUtiles.mkdir(p) ---> FilterFileSystem.mkdirs(p) -->
> {code}
> This will make all the qtests fails with the below exception 
> {code:java}
> Caused by: java.lang.IllegalArgumentException: Wrong FS: 
> pfile:/media/ebs1/workspace/hive-3.1-qtest/group/5/label/HiveQTest/hive-1.2.0/itests/qtest/target/warehouse/dest1,
>  expected: file:///
> {code}
> Note: We will hit this issue when we bump up hadoop version in hive.
> So as per the discussion in HADOOP-16963 ProxyFileSystem would need to 
> override the mkdirs(p) method inorder to solve the above problem. So now the 
> new flow would look like
> {code:java}
> FileUtiles.mkdir(p)  >   ProxyFileSytem.mkdirs(p) ---> 
> ProxyFileSytem.mkdirs(p, permission) --->
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-23751) QTest: Override #mkdirs() method in ProxyFileSystem To Align After HADOOP-16582

2020-06-26 Thread Syed Shameerur Rahman (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-23751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17146421#comment-17146421
 ] 

Syed Shameerur Rahman commented on HIVE-23751:
--

As suspected TestPigHBaseStorageHandler#testPigHBaseSchema is  flaky:  
HIVE-23762


> QTest: Override #mkdirs() method in ProxyFileSystem To Align After 
> HADOOP-16582
> ---
>
> Key: HIVE-23751
> URL: https://issues.apache.org/jira/browse/HIVE-23751
> Project: Hive
>  Issue Type: Task
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0, 3.2.0
>
> Attachments: HIVE-23751.01.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> HADOOP-16582 have changed the way how mkdirs() work:
> *Before HADOOP-16582:*
> All calls to mkdirs(p) were fast-tracked to FileSystem.mkdirs which were then 
> re-routed to mkdirs(p, permission) method. For ProxyFileSytem the call would 
> look like
> {code:java}
> FileUtiles.mkdir(p)  ->  FileSystem.mkdirs(p) ---> 
> ProxyFileSytem.mkdirs(p,permission)
> {code}
> An implementation of FileSystem have only needed implement mkdirs(p, 
> permission)
> *After HADOOP-16582:*
> Since FilterFileSystem overrides mkdirs(p) method the new call to 
> ProxyFileSystem would look like
> {code:java}
> FileUtiles.mkdir(p) ---> FilterFileSystem.mkdirs(p) -->
> {code}
> This will make all the qtests fails with the below exception 
> {code:java}
> Caused by: java.lang.IllegalArgumentException: Wrong FS: 
> pfile:/media/ebs1/workspace/hive-3.1-qtest/group/5/label/HiveQTest/hive-1.2.0/itests/qtest/target/warehouse/dest1,
>  expected: file:///
> {code}
> Note: We will hit this issue when we bump up hadoop version in hive.
> So as per the discussion in HADOOP-16963 ProxyFileSystem would need to 
> override the mkdirs(p) method inorder to solve the above problem. So now the 
> new flow would look like
> {code:java}
> FileUtiles.mkdir(p)  >   ProxyFileSytem.mkdirs(p) ---> 
> ProxyFileSytem.mkdirs(p, permission) --->
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)