[jira] [Work logged] (HIVE-25741) HiveProtoLoggingHook EventLogger should always close old writer

2021-11-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25741?focusedWorklogId=686946=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-686946
 ]

ASF GitHub Bot logged work on HIVE-25741:
-

Author: ASF GitHub Bot
Created on: 26/Nov/21 23:40
Start Date: 26/Nov/21 23:40
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2819:
URL: https://github.com/apache/hive/pull/2819#discussion_r757717860



##
File path: ql/src/java/org/apache/hadoop/hive/ql/hooks/HiveProtoLoggingHook.java
##
@@ -314,6 +314,7 @@ private void writeEvent(HiveHookEventProtoPartialBuilder 
builder) {
 try {
   if (eventPerFile) {
 if (!maybeRolloverWriterForDay()) {
+  IOUtils.closeQuietly(writer);

Review comment:
   Why not just call `maybeRolloverWriterForDay()`. That method makes sure 
that the writer is open and ready?
   Same way as for the normal case.
   Like:
   ```
   maybeRolloverWriterForDay();
   LOG.debug("Event per file enabled. New proto event file: {}", 
writer.getPath());
   writer.writeProto(event);
   IOUtils.closeQuietly(writer);
   writer = null;
   ```
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 686946)
Remaining Estimate: 0h
Time Spent: 10m

> HiveProtoLoggingHook EventLogger should always close old writer
> ---
>
> Key: HIVE-25741
> URL: https://issues.apache.org/jira/browse/HIVE-25741
> Project: Hive
>  Issue Type: Bug
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> If {{hive.hook.proto.file.per.event=true}} (recommended for S3A filesystem), 
> the Hive proto {{EventLogger}} will create a new file for each proto event. 
> However, if we already had an appropriate writer (i.e. 
> maybeRolloverWriterForDay() returns false) from some previous operation - we 
> don't close the previous writer instance before creating a new one.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-25741) HiveProtoLoggingHook EventLogger should always close old writer

2021-11-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25741:
--
Labels: pull-request-available  (was: )

> HiveProtoLoggingHook EventLogger should always close old writer
> ---
>
> Key: HIVE-25741
> URL: https://issues.apache.org/jira/browse/HIVE-25741
> Project: Hive
>  Issue Type: Bug
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> If {{hive.hook.proto.file.per.event=true}} (recommended for S3A filesystem), 
> the Hive proto {{EventLogger}} will create a new file for each proto event. 
> However, if we already had an appropriate writer (i.e. 
> maybeRolloverWriterForDay() returns false) from some previous operation - we 
> don't close the previous writer instance before creating a new one.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25740) Handle race condition between compaction txn abort/commit and heartbeater

2021-11-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25740?focusedWorklogId=686945=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-686945
 ]

ASF GitHub Bot logged work on HIVE-25740:
-

Author: ASF GitHub Bot
Created on: 26/Nov/21 23:36
Start Date: 26/Nov/21 23:36
Worklog Time Spent: 10m 
  Work Description: pvary commented on a change in pull request #2817:
URL: https://github.com/apache/hive/pull/2817#discussion_r757717380



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Worker.java
##
@@ -748,6 +736,7 @@ void wasSuccessful() {
  * @throws Exception
  */
 @Override public void close() throws Exception {
+  shutdownHeartbeater();

Review comment:
   We might just say to the heartbeater to ignore Exceptions at the 
shutdown phase 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 686945)
Time Spent: 0.5h  (was: 20m)

> Handle race condition between compaction txn abort/commit and heartbeater
> -
>
> Key: HIVE-25740
> URL: https://issues.apache.org/jira/browse/HIVE-25740
> Project: Hive
>  Issue Type: Bug
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> This issue is the following: once the compaction worker finishes, 
> commitTxn/abortTxn is invoked first, and the heartbeater thread is only 
> interrupted after that. This can lead to race conditions where the txn has 
> already been deleted from the backend DB via commit/abort, but the 
> concurrently running heartbeater thread still attempts to send a last 
> heartbeat after that, but the txn id won't be found in the DB, leading to 
> {{{}NoSuchTxnException{}}}.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25740) Handle race condition between compaction txn abort/commit and heartbeater

2021-11-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25740?focusedWorklogId=686852=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-686852
 ]

ASF GitHub Bot logged work on HIVE-25740:
-

Author: ASF GitHub Bot
Created on: 26/Nov/21 14:01
Start Date: 26/Nov/21 14:01
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2817:
URL: https://github.com/apache/hive/pull/2817#discussion_r757361747



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Worker.java
##
@@ -748,6 +736,7 @@ void wasSuccessful() {
  * @throws Exception
  */
 @Override public void close() throws Exception {
+  shutdownHeartbeater();

Review comment:
   > "Theoretically this have the same issue as before the patch, just the 
other way around. We stop the heartbeat, the transaction times out, and we try 
to commit / abort."
   
   That's true, this was my first thought as well. However, I think shutting 
down the heartbeater should be really fast and not cause problems in healthy 
systems. If it's waiting to be scheduled by the executor (which is most of the 
time), it will be shut down immediately. Otherwise it'll do one more 
heartbeating, but that heartbeating would need to take minutes (in line with 
the value of `hive.txn.timeout`) to cause any problems. If the heartbeating 
takes that long then we have other issues in the system anyway.
   
   > "How complicated would it be to turn off exception handling in the heart 
beater instead first, and stop it after abort / commit?"
   
   Can you elaborate on what you mean by turning off exception handling? I 
think the general problem would remain: we commit/abort the txn and then send a 
signal to the heartbeater thread to stop doing whatever it's currently doing, 
but if it has already called the `msc.heartbeat()` method by that point (or 
it's just about to call it), the signal will get "lost" and it will lead to 
failure.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 686852)
Time Spent: 20m  (was: 10m)

> Handle race condition between compaction txn abort/commit and heartbeater
> -
>
> Key: HIVE-25740
> URL: https://issues.apache.org/jira/browse/HIVE-25740
> Project: Hive
>  Issue Type: Bug
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This issue is the following: once the compaction worker finishes, 
> commitTxn/abortTxn is invoked first, and the heartbeater thread is only 
> interrupted after that. This can lead to race conditions where the txn has 
> already been deleted from the backend DB via commit/abort, but the 
> concurrently running heartbeater thread still attempts to send a last 
> heartbeat after that, but the txn id won't be found in the DB, leading to 
> {{{}NoSuchTxnException{}}}.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25736) Close ORC readers

2021-11-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25736?focusedWorklogId=686846=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-686846
 ]

ASF GitHub Bot logged work on HIVE-25736:
-

Author: ASF GitHub Bot
Created on: 26/Nov/21 13:26
Start Date: 26/Nov/21 13:26
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2813:
URL: https://github.com/apache/hive/pull/2813#discussion_r757496016



##
File path: ql/src/java/org/apache/hadoop/hive/ql/exec/OrcFileMergeOperator.java
##
@@ -154,6 +154,9 @@ private void processKeyValuePairs(Object key, Object value)
 
   // next file in the path
   if (!k.getInputPath().equals(prevPath)) {
+if (reader != null) {

Review comment:
   do we need to do this on line 111 as well? or at that point it's 
guaranteed to be the first reader instantiation?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 686846)
Time Spent: 1h 20m  (was: 1h 10m)

> Close ORC readers
> -
>
> Key: HIVE-25736
> URL: https://issues.apache.org/jira/browse/HIVE-25736
> Project: Hive
>  Issue Type: Bug
>Reporter: Peter Vary
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> After ORC-498 the Orc readers should be closed explicitly. One of the cases 
> was HIVE-25683, but there are several places where the ORC readers are still 
> not closed. 
> We should go through the code and make sure that the readers are closed.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HIVE-25741) HiveProtoLoggingHook EventLogger should always close old writer

2021-11-26 Thread Marton Bod (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17449565#comment-17449565
 ] 

Marton Bod commented on HIVE-25741:
---

PR: [https://github.com/apache/hive/pull/2819]

 

> HiveProtoLoggingHook EventLogger should always close old writer
> ---
>
> Key: HIVE-25741
> URL: https://issues.apache.org/jira/browse/HIVE-25741
> Project: Hive
>  Issue Type: Bug
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>
> If {{hive.hook.proto.file.per.event=true}} (recommended for S3A filesystem), 
> the Hive proto {{EventLogger}} will create a new file for each proto event. 
> However, if we already had an appropriate writer (i.e. 
> maybeRolloverWriterForDay() returns false) from some previous operation - we 
> don't close the previous writer instance before creating a new one.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (HIVE-25741) HiveProtoLoggingHook EventLogger should always close old writer

2021-11-26 Thread Marton Bod (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod reassigned HIVE-25741:
-


> HiveProtoLoggingHook EventLogger should always close old writer
> ---
>
> Key: HIVE-25741
> URL: https://issues.apache.org/jira/browse/HIVE-25741
> Project: Hive
>  Issue Type: Bug
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>
> If {{hive.hook.proto.file.per.event=true}} (recommended for S3A filesystem), 
> the Hive proto {{EventLogger}} will create a new file for each proto event. 
> However, if we already had an appropriate writer (i.e. 
> maybeRolloverWriterForDay() returns false) from some previous operation - we 
> don't close the previous writer instance before creating a new one.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25740) Handle race condition between compaction txn abort/commit and heartbeater

2021-11-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25740?focusedWorklogId=686791=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-686791
 ]

ASF GitHub Bot logged work on HIVE-25740:
-

Author: ASF GitHub Bot
Created on: 26/Nov/21 10:11
Start Date: 26/Nov/21 10:11
Worklog Time Spent: 10m 
  Work Description: marton-bod commented on a change in pull request #2817:
URL: https://github.com/apache/hive/pull/2817#discussion_r757361747



##
File path: ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Worker.java
##
@@ -748,6 +736,7 @@ void wasSuccessful() {
  * @throws Exception
  */
 @Override public void close() throws Exception {
+  shutdownHeartbeater();

Review comment:
   > "Theoretically this have the same issue as before the patch, just the 
other way around. We stop the heartbeat, the transaction times out, and we try 
to commit / abort."
   
   That's true, this was my first thought as well. However, I think shutting 
down the heartbeater should be really fast and not cause problems in healthy 
systems. If it's waiting to be scheduled by the executor (which is most of the 
time), it will be shut down immediately. Otherwise it'll do one more 
heartbeating, but that heartbeating would need to take minutes (in line with 
the value of `hive.txn.timeout`) to cause any problems. If the heartbeating 
takes that long then we have other issues in the system anyway.
   
   > "How complicated would it be to turn off exception handling in the heart 
beater instead first, and stop it after abort / commit?"
   
   Can you elaborate on what you mean by turning off exception handling? I 
think the general problem would remain: we commit/abort the txn and then send a 
signal to the heartbeater thread to stop doing whatever it's currently doing, 
but if it has already called the `msc.heartbeat()` method by that point (or 
it's just about to call it), there's nothing much we can do and it will lead to 
failure.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 686791)
Remaining Estimate: 0h
Time Spent: 10m

> Handle race condition between compaction txn abort/commit and heartbeater
> -
>
> Key: HIVE-25740
> URL: https://issues.apache.org/jira/browse/HIVE-25740
> Project: Hive
>  Issue Type: Bug
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This issue is the following: once the compaction worker finishes, 
> commitTxn/abortTxn is invoked first, and the heartbeater thread is only 
> interrupted after that. This can lead to race conditions where the txn has 
> already been deleted from the backend DB via commit/abort, but the 
> concurrently running heartbeater thread still attempts to send a last 
> heartbeat after that, but the txn id won't be found in the DB, leading to 
> {{{}NoSuchTxnException{}}}.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (HIVE-25740) Handle race condition between compaction txn abort/commit and heartbeater

2021-11-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-25740:
--
Labels: pull-request-available  (was: )

> Handle race condition between compaction txn abort/commit and heartbeater
> -
>
> Key: HIVE-25740
> URL: https://issues.apache.org/jira/browse/HIVE-25740
> Project: Hive
>  Issue Type: Bug
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This issue is the following: once the compaction worker finishes, 
> commitTxn/abortTxn is invoked first, and the heartbeater thread is only 
> interrupted after that. This can lead to race conditions where the txn has 
> already been deleted from the backend DB via commit/abort, but the 
> concurrently running heartbeater thread still attempts to send a last 
> heartbeat after that, but the txn id won't be found in the DB, leading to 
> {{{}NoSuchTxnException{}}}.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HIVE-25740) Handle race condition between compaction txn abort/commit and heartbeater

2021-11-26 Thread Marton Bod (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-25740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17449492#comment-17449492
 ] 

Marton Bod commented on HIVE-25740:
---

PR: [https://github.com/apache/hive/pull/2817]

 

> Handle race condition between compaction txn abort/commit and heartbeater
> -
>
> Key: HIVE-25740
> URL: https://issues.apache.org/jira/browse/HIVE-25740
> Project: Hive
>  Issue Type: Bug
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>
> This issue is the following: once the compaction worker finishes, 
> commitTxn/abortTxn is invoked first, and the heartbeater thread is only 
> interrupted after that. This can lead to race conditions where the txn has 
> already been deleted from the backend DB via commit/abort, but the 
> concurrently running heartbeater thread still attempts to send a last 
> heartbeat after that, but the txn id won't be found in the DB, leading to 
> {{{}NoSuchTxnException{}}}.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Assigned] (HIVE-25740) Handle race condition between compaction txn abort/commit and heartbeater

2021-11-26 Thread Marton Bod (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod reassigned HIVE-25740:
-


> Handle race condition between compaction txn abort/commit and heartbeater
> -
>
> Key: HIVE-25740
> URL: https://issues.apache.org/jira/browse/HIVE-25740
> Project: Hive
>  Issue Type: Bug
>Reporter: Marton Bod
>Assignee: Marton Bod
>Priority: Major
>
> This issue is the following: once the compaction worker finishes, 
> commitTxn/abortTxn is invoked first, and the heartbeater thread is only 
> interrupted after that. This can lead to race conditions where the txn has 
> already been deleted from the backend DB via commit/abort, but the 
> concurrently running heartbeater thread still attempts to send a last 
> heartbeat after that, but the txn id won't be found in the DB, leading to 
> {{{}NoSuchTxnException{}}}.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25739) Support Alter Partition Properties

2021-11-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25739?focusedWorklogId=686773=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-686773
 ]

ASF GitHub Bot logged work on HIVE-25739:
-

Author: ASF GitHub Bot
Created on: 26/Nov/21 08:31
Start Date: 26/Nov/21 08:31
Worklog Time Spent: 10m 
  Work Description: southernriver commented on pull request #2818:
URL: https://github.com/apache/hive/pull/2818#issuecomment-979783452


   LGTM + 1


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 686773)
Time Spent: 20m  (was: 10m)

> Support Alter Partition Properties
> --
>
> Key: HIVE-25739
> URL: https://issues.apache.org/jira/browse/HIVE-25739
> Project: Hive
>  Issue Type: New Feature
>Affects Versions: All Versions
>Reporter: xiepengjie
>Assignee: xiepengjie
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.3.8
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Support alter partition properties like:}}{}}}
> {code:java}
> alter table alter1 partition(insertdate='2008-01-01') set tblproperties 
> ('a'='1', 'c'='3');
> alter table alter1 partition(insertdate='2008-01-01') unset tblproperties if 
> exists ('c'='3');{code}
>  
> relates to https://issues.apache.org/jira/browse/HIVE-14261



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Work logged] (HIVE-25561) Killed task should not commit file.

2021-11-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-25561?focusedWorklogId=686770=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-686770
 ]

ASF GitHub Bot logged work on HIVE-25561:
-

Author: ASF GitHub Bot
Created on: 26/Nov/21 08:19
Start Date: 26/Nov/21 08:19
Worklog Time Spent: 10m 
  Work Description: abstractdog commented on pull request #2674:
URL: https://github.com/apache/hive/pull/2674#issuecomment-979775926


   thanks @kgyrtkirk for the review and @zhengchenyu for the patch!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 686770)
Time Spent: 1h 50m  (was: 1h 40m)

> Killed task should not commit file.
> ---
>
> Key: HIVE-25561
> URL: https://issues.apache.org/jira/browse/HIVE-25561
> Project: Hive
>  Issue Type: Bug
>  Components: Tez
>Affects Versions: 1.2.1, 2.3.8, 2.4.0
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> For tez engine in our cluster, I found some duplicate line, especially tez 
> speculation is enabled. In partition dir, I found both 02_0 and 02_1 
> exist.
> It's a very low probability event. HIVE-10429 has fix some bug about 
> interrupt, but some exception was not caught.
> In our cluster, Task receive SIGTERM, then ClientFinalizer(Hadoop Class) was 
> called, hdfs client will close. Then will raise exception, but abort may not 
> set to true.
> Then removeTempOrDuplicateFiles may fail because of inconsistency, duplicate 
> file will retain. 
> (Notes: Driver first list dir, then Task commit file, then Driver remove 
> duplicate file. It is a inconsistency case)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)