[jira] [Work logged] (HIVE-25741) HiveProtoLoggingHook EventLogger should always close old writer
[ https://issues.apache.org/jira/browse/HIVE-25741?focusedWorklogId=686946=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-686946 ] ASF GitHub Bot logged work on HIVE-25741: - Author: ASF GitHub Bot Created on: 26/Nov/21 23:40 Start Date: 26/Nov/21 23:40 Worklog Time Spent: 10m Work Description: pvary commented on a change in pull request #2819: URL: https://github.com/apache/hive/pull/2819#discussion_r757717860 ## File path: ql/src/java/org/apache/hadoop/hive/ql/hooks/HiveProtoLoggingHook.java ## @@ -314,6 +314,7 @@ private void writeEvent(HiveHookEventProtoPartialBuilder builder) { try { if (eventPerFile) { if (!maybeRolloverWriterForDay()) { + IOUtils.closeQuietly(writer); Review comment: Why not just call `maybeRolloverWriterForDay()`. That method makes sure that the writer is open and ready? Same way as for the normal case. Like: ``` maybeRolloverWriterForDay(); LOG.debug("Event per file enabled. New proto event file: {}", writer.getPath()); writer.writeProto(event); IOUtils.closeQuietly(writer); writer = null; ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 686946) Remaining Estimate: 0h Time Spent: 10m > HiveProtoLoggingHook EventLogger should always close old writer > --- > > Key: HIVE-25741 > URL: https://issues.apache.org/jira/browse/HIVE-25741 > Project: Hive > Issue Type: Bug >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > If {{hive.hook.proto.file.per.event=true}} (recommended for S3A filesystem), > the Hive proto {{EventLogger}} will create a new file for each proto event. > However, if we already had an appropriate writer (i.e. > maybeRolloverWriterForDay() returns false) from some previous operation - we > don't close the previous writer instance before creating a new one. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HIVE-25741) HiveProtoLoggingHook EventLogger should always close old writer
[ https://issues.apache.org/jira/browse/HIVE-25741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-25741: -- Labels: pull-request-available (was: ) > HiveProtoLoggingHook EventLogger should always close old writer > --- > > Key: HIVE-25741 > URL: https://issues.apache.org/jira/browse/HIVE-25741 > Project: Hive > Issue Type: Bug >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > If {{hive.hook.proto.file.per.event=true}} (recommended for S3A filesystem), > the Hive proto {{EventLogger}} will create a new file for each proto event. > However, if we already had an appropriate writer (i.e. > maybeRolloverWriterForDay() returns false) from some previous operation - we > don't close the previous writer instance before creating a new one. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25740) Handle race condition between compaction txn abort/commit and heartbeater
[ https://issues.apache.org/jira/browse/HIVE-25740?focusedWorklogId=686945=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-686945 ] ASF GitHub Bot logged work on HIVE-25740: - Author: ASF GitHub Bot Created on: 26/Nov/21 23:36 Start Date: 26/Nov/21 23:36 Worklog Time Spent: 10m Work Description: pvary commented on a change in pull request #2817: URL: https://github.com/apache/hive/pull/2817#discussion_r757717380 ## File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Worker.java ## @@ -748,6 +736,7 @@ void wasSuccessful() { * @throws Exception */ @Override public void close() throws Exception { + shutdownHeartbeater(); Review comment: We might just say to the heartbeater to ignore Exceptions at the shutdown phase -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 686945) Time Spent: 0.5h (was: 20m) > Handle race condition between compaction txn abort/commit and heartbeater > - > > Key: HIVE-25740 > URL: https://issues.apache.org/jira/browse/HIVE-25740 > Project: Hive > Issue Type: Bug >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > This issue is the following: once the compaction worker finishes, > commitTxn/abortTxn is invoked first, and the heartbeater thread is only > interrupted after that. This can lead to race conditions where the txn has > already been deleted from the backend DB via commit/abort, but the > concurrently running heartbeater thread still attempts to send a last > heartbeat after that, but the txn id won't be found in the DB, leading to > {{{}NoSuchTxnException{}}}. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25740) Handle race condition between compaction txn abort/commit and heartbeater
[ https://issues.apache.org/jira/browse/HIVE-25740?focusedWorklogId=686852=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-686852 ] ASF GitHub Bot logged work on HIVE-25740: - Author: ASF GitHub Bot Created on: 26/Nov/21 14:01 Start Date: 26/Nov/21 14:01 Worklog Time Spent: 10m Work Description: marton-bod commented on a change in pull request #2817: URL: https://github.com/apache/hive/pull/2817#discussion_r757361747 ## File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Worker.java ## @@ -748,6 +736,7 @@ void wasSuccessful() { * @throws Exception */ @Override public void close() throws Exception { + shutdownHeartbeater(); Review comment: > "Theoretically this have the same issue as before the patch, just the other way around. We stop the heartbeat, the transaction times out, and we try to commit / abort." That's true, this was my first thought as well. However, I think shutting down the heartbeater should be really fast and not cause problems in healthy systems. If it's waiting to be scheduled by the executor (which is most of the time), it will be shut down immediately. Otherwise it'll do one more heartbeating, but that heartbeating would need to take minutes (in line with the value of `hive.txn.timeout`) to cause any problems. If the heartbeating takes that long then we have other issues in the system anyway. > "How complicated would it be to turn off exception handling in the heart beater instead first, and stop it after abort / commit?" Can you elaborate on what you mean by turning off exception handling? I think the general problem would remain: we commit/abort the txn and then send a signal to the heartbeater thread to stop doing whatever it's currently doing, but if it has already called the `msc.heartbeat()` method by that point (or it's just about to call it), the signal will get "lost" and it will lead to failure. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 686852) Time Spent: 20m (was: 10m) > Handle race condition between compaction txn abort/commit and heartbeater > - > > Key: HIVE-25740 > URL: https://issues.apache.org/jira/browse/HIVE-25740 > Project: Hive > Issue Type: Bug >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > This issue is the following: once the compaction worker finishes, > commitTxn/abortTxn is invoked first, and the heartbeater thread is only > interrupted after that. This can lead to race conditions where the txn has > already been deleted from the backend DB via commit/abort, but the > concurrently running heartbeater thread still attempts to send a last > heartbeat after that, but the txn id won't be found in the DB, leading to > {{{}NoSuchTxnException{}}}. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25736) Close ORC readers
[ https://issues.apache.org/jira/browse/HIVE-25736?focusedWorklogId=686846=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-686846 ] ASF GitHub Bot logged work on HIVE-25736: - Author: ASF GitHub Bot Created on: 26/Nov/21 13:26 Start Date: 26/Nov/21 13:26 Worklog Time Spent: 10m Work Description: marton-bod commented on a change in pull request #2813: URL: https://github.com/apache/hive/pull/2813#discussion_r757496016 ## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/OrcFileMergeOperator.java ## @@ -154,6 +154,9 @@ private void processKeyValuePairs(Object key, Object value) // next file in the path if (!k.getInputPath().equals(prevPath)) { +if (reader != null) { Review comment: do we need to do this on line 111 as well? or at that point it's guaranteed to be the first reader instantiation? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 686846) Time Spent: 1h 20m (was: 1h 10m) > Close ORC readers > - > > Key: HIVE-25736 > URL: https://issues.apache.org/jira/browse/HIVE-25736 > Project: Hive > Issue Type: Bug >Reporter: Peter Vary >Priority: Major > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > After ORC-498 the Orc readers should be closed explicitly. One of the cases > was HIVE-25683, but there are several places where the ORC readers are still > not closed. > We should go through the code and make sure that the readers are closed. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HIVE-25741) HiveProtoLoggingHook EventLogger should always close old writer
[ https://issues.apache.org/jira/browse/HIVE-25741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17449565#comment-17449565 ] Marton Bod commented on HIVE-25741: --- PR: [https://github.com/apache/hive/pull/2819] > HiveProtoLoggingHook EventLogger should always close old writer > --- > > Key: HIVE-25741 > URL: https://issues.apache.org/jira/browse/HIVE-25741 > Project: Hive > Issue Type: Bug >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > > If {{hive.hook.proto.file.per.event=true}} (recommended for S3A filesystem), > the Hive proto {{EventLogger}} will create a new file for each proto event. > However, if we already had an appropriate writer (i.e. > maybeRolloverWriterForDay() returns false) from some previous operation - we > don't close the previous writer instance before creating a new one. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Assigned] (HIVE-25741) HiveProtoLoggingHook EventLogger should always close old writer
[ https://issues.apache.org/jira/browse/HIVE-25741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marton Bod reassigned HIVE-25741: - > HiveProtoLoggingHook EventLogger should always close old writer > --- > > Key: HIVE-25741 > URL: https://issues.apache.org/jira/browse/HIVE-25741 > Project: Hive > Issue Type: Bug >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > > If {{hive.hook.proto.file.per.event=true}} (recommended for S3A filesystem), > the Hive proto {{EventLogger}} will create a new file for each proto event. > However, if we already had an appropriate writer (i.e. > maybeRolloverWriterForDay() returns false) from some previous operation - we > don't close the previous writer instance before creating a new one. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25740) Handle race condition between compaction txn abort/commit and heartbeater
[ https://issues.apache.org/jira/browse/HIVE-25740?focusedWorklogId=686791=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-686791 ] ASF GitHub Bot logged work on HIVE-25740: - Author: ASF GitHub Bot Created on: 26/Nov/21 10:11 Start Date: 26/Nov/21 10:11 Worklog Time Spent: 10m Work Description: marton-bod commented on a change in pull request #2817: URL: https://github.com/apache/hive/pull/2817#discussion_r757361747 ## File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Worker.java ## @@ -748,6 +736,7 @@ void wasSuccessful() { * @throws Exception */ @Override public void close() throws Exception { + shutdownHeartbeater(); Review comment: > "Theoretically this have the same issue as before the patch, just the other way around. We stop the heartbeat, the transaction times out, and we try to commit / abort." That's true, this was my first thought as well. However, I think shutting down the heartbeater should be really fast and not cause problems in healthy systems. If it's waiting to be scheduled by the executor (which is most of the time), it will be shut down immediately. Otherwise it'll do one more heartbeating, but that heartbeating would need to take minutes (in line with the value of `hive.txn.timeout`) to cause any problems. If the heartbeating takes that long then we have other issues in the system anyway. > "How complicated would it be to turn off exception handling in the heart beater instead first, and stop it after abort / commit?" Can you elaborate on what you mean by turning off exception handling? I think the general problem would remain: we commit/abort the txn and then send a signal to the heartbeater thread to stop doing whatever it's currently doing, but if it has already called the `msc.heartbeat()` method by that point (or it's just about to call it), there's nothing much we can do and it will lead to failure. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 686791) Remaining Estimate: 0h Time Spent: 10m > Handle race condition between compaction txn abort/commit and heartbeater > - > > Key: HIVE-25740 > URL: https://issues.apache.org/jira/browse/HIVE-25740 > Project: Hive > Issue Type: Bug >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > This issue is the following: once the compaction worker finishes, > commitTxn/abortTxn is invoked first, and the heartbeater thread is only > interrupted after that. This can lead to race conditions where the txn has > already been deleted from the backend DB via commit/abort, but the > concurrently running heartbeater thread still attempts to send a last > heartbeat after that, but the txn id won't be found in the DB, leading to > {{{}NoSuchTxnException{}}}. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (HIVE-25740) Handle race condition between compaction txn abort/commit and heartbeater
[ https://issues.apache.org/jira/browse/HIVE-25740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HIVE-25740: -- Labels: pull-request-available (was: ) > Handle race condition between compaction txn abort/commit and heartbeater > - > > Key: HIVE-25740 > URL: https://issues.apache.org/jira/browse/HIVE-25740 > Project: Hive > Issue Type: Bug >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > This issue is the following: once the compaction worker finishes, > commitTxn/abortTxn is invoked first, and the heartbeater thread is only > interrupted after that. This can lead to race conditions where the txn has > already been deleted from the backend DB via commit/abort, but the > concurrently running heartbeater thread still attempts to send a last > heartbeat after that, but the txn id won't be found in the DB, leading to > {{{}NoSuchTxnException{}}}. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (HIVE-25740) Handle race condition between compaction txn abort/commit and heartbeater
[ https://issues.apache.org/jira/browse/HIVE-25740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17449492#comment-17449492 ] Marton Bod commented on HIVE-25740: --- PR: [https://github.com/apache/hive/pull/2817] > Handle race condition between compaction txn abort/commit and heartbeater > - > > Key: HIVE-25740 > URL: https://issues.apache.org/jira/browse/HIVE-25740 > Project: Hive > Issue Type: Bug >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > > This issue is the following: once the compaction worker finishes, > commitTxn/abortTxn is invoked first, and the heartbeater thread is only > interrupted after that. This can lead to race conditions where the txn has > already been deleted from the backend DB via commit/abort, but the > concurrently running heartbeater thread still attempts to send a last > heartbeat after that, but the txn id won't be found in the DB, leading to > {{{}NoSuchTxnException{}}}. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Assigned] (HIVE-25740) Handle race condition between compaction txn abort/commit and heartbeater
[ https://issues.apache.org/jira/browse/HIVE-25740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marton Bod reassigned HIVE-25740: - > Handle race condition between compaction txn abort/commit and heartbeater > - > > Key: HIVE-25740 > URL: https://issues.apache.org/jira/browse/HIVE-25740 > Project: Hive > Issue Type: Bug >Reporter: Marton Bod >Assignee: Marton Bod >Priority: Major > > This issue is the following: once the compaction worker finishes, > commitTxn/abortTxn is invoked first, and the heartbeater thread is only > interrupted after that. This can lead to race conditions where the txn has > already been deleted from the backend DB via commit/abort, but the > concurrently running heartbeater thread still attempts to send a last > heartbeat after that, but the txn id won't be found in the DB, leading to > {{{}NoSuchTxnException{}}}. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25739) Support Alter Partition Properties
[ https://issues.apache.org/jira/browse/HIVE-25739?focusedWorklogId=686773=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-686773 ] ASF GitHub Bot logged work on HIVE-25739: - Author: ASF GitHub Bot Created on: 26/Nov/21 08:31 Start Date: 26/Nov/21 08:31 Worklog Time Spent: 10m Work Description: southernriver commented on pull request #2818: URL: https://github.com/apache/hive/pull/2818#issuecomment-979783452 LGTM + 1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 686773) Time Spent: 20m (was: 10m) > Support Alter Partition Properties > -- > > Key: HIVE-25739 > URL: https://issues.apache.org/jira/browse/HIVE-25739 > Project: Hive > Issue Type: New Feature >Affects Versions: All Versions >Reporter: xiepengjie >Assignee: xiepengjie >Priority: Major > Labels: pull-request-available > Fix For: 2.3.8 > > Time Spent: 20m > Remaining Estimate: 0h > > Support alter partition properties like:}}{}}} > {code:java} > alter table alter1 partition(insertdate='2008-01-01') set tblproperties > ('a'='1', 'c'='3'); > alter table alter1 partition(insertdate='2008-01-01') unset tblproperties if > exists ('c'='3');{code} > > relates to https://issues.apache.org/jira/browse/HIVE-14261 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Work logged] (HIVE-25561) Killed task should not commit file.
[ https://issues.apache.org/jira/browse/HIVE-25561?focusedWorklogId=686770=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-686770 ] ASF GitHub Bot logged work on HIVE-25561: - Author: ASF GitHub Bot Created on: 26/Nov/21 08:19 Start Date: 26/Nov/21 08:19 Worklog Time Spent: 10m Work Description: abstractdog commented on pull request #2674: URL: https://github.com/apache/hive/pull/2674#issuecomment-979775926 thanks @kgyrtkirk for the review and @zhengchenyu for the patch! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 686770) Time Spent: 1h 50m (was: 1h 40m) > Killed task should not commit file. > --- > > Key: HIVE-25561 > URL: https://issues.apache.org/jira/browse/HIVE-25561 > Project: Hive > Issue Type: Bug > Components: Tez >Affects Versions: 1.2.1, 2.3.8, 2.4.0 >Reporter: zhengchenyu >Assignee: zhengchenyu >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > For tez engine in our cluster, I found some duplicate line, especially tez > speculation is enabled. In partition dir, I found both 02_0 and 02_1 > exist. > It's a very low probability event. HIVE-10429 has fix some bug about > interrupt, but some exception was not caught. > In our cluster, Task receive SIGTERM, then ClientFinalizer(Hadoop Class) was > called, hdfs client will close. Then will raise exception, but abort may not > set to true. > Then removeTempOrDuplicateFiles may fail because of inconsistency, duplicate > file will retain. > (Notes: Driver first list dir, then Task commit file, then Driver remove > duplicate file. It is a inconsistency case) -- This message was sent by Atlassian Jira (v8.20.1#820001)