[jira] [Comment Edited] (HIVE-27884) LLAP: Reuse FileSystem objects from cache across different tasks in the same LLAP daemon

2023-11-20 Thread Jira


[ 
https://issues.apache.org/jira/browse/HIVE-27884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17787941#comment-17787941
 ] 

László Bodor edited comment on HIVE-27884 at 11/20/23 10:59 AM:


minor note: it's time to reconsider using autoDelete = fs.deleteOnExit() kind 
of things

we tend do something like:
1. autoDelete = fs.deleteOnExit
2. fs.deleteOnExit returns true if the file/dir exists and false if it doesn't
3. we later delete the file if !autoDelete - so if it didn't exist when 
fs.deleteOnExit was called - instead of simply calling fs.delete on that file 
(and ignoring the fact that it exists now or not, delete() will take care, 
right?)

my point is that checking the return value of deleteOnExit doesn't make any 
sense in making decisions about whether to delete files or not later :)

the conditional fs.delete is currently called in Operator.closeOp which is 
supposed to be used regardless of the task attempt's outcome
worst case, if fatal JVM error happens, when we cannot rely on 
Operator.closeOp, we cannot rely on hadoop's FileSystem.deleteOnExit as well


was (Author: abstractdog):
minor note: it's time to reconsider using autoDelete = fs.deleteOnExit() kind 
of things

we tend do something like:
1. autoDelete = fs.deleteOnExit
2. fs.deleteOnExit returns true if the file/dir exists and false if it doesn't
3. we later delete the file if !autoDelete - so if it didn't exist when 
fs.deleteOnExit was called - instead of simply calling fs.delete on that file 
(and ignoring the fact that it exists now or not, delete() will take care, 
right?)

my point is that checking the return value of deleteOnExit doesn't make any 
sense in making decisions about whether to delete files or not later :)



> LLAP: Reuse FileSystem objects from cache across different tasks in the same 
> LLAP daemon
> 
>
> Key: HIVE-27884
> URL: https://issues.apache.org/jira/browse/HIVE-27884
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>
> Originally, when the task runner was added to HIVE-10028 
> ([here|https://github.com/apache/hive/blob/23f40bd88043db3cb4efe3a763cbfd5c01a81d2f/llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/TaskRunnerCallable.java#L202]),
>  the FileSystem.closeAllForUGI was commented out for some reasons, and then, 
> in the scope of HIVE-9898 it was simply added back, 
> [here|https://github.com/apache/hive/commit/91c46a44dd9fbb68d01f22e93c4ce0931a4598e0#diff-270dbe6639879ca543ae21c44a239af6145390726d45fee832be809894bfc88eR236]
> A FileSystem.close call basically does the 
> [following|https://github.com/apache/hadoop/blob/0c10bab7bb77aa4ea3ca26c899ab28131561e052/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java#L2700-L2710]:
> 1. delete all paths that were marked as delete-on-exit.
> 2. removes the instance from the cache
> I saw that we 
> [call|https://github.com/apache/hive/blob/eb6f0b0c57dd55335927b7dde08cd47f4d00e74d/llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/TaskRunnerCallable.java#L302]
>  
> {code}
> FileSystem.closeAllForUGI
> {code}
> at the end of all task attempts, so we almost completely disable hadoop's 
> filesystem cache during a long-running LLAP daemon lifecycle
> some investigations on azure showed that creating a filesystem can be quite 
> expensive, as it involves the recreation of a whole object hierarchy like:
> {code}
> AzureBlobFileSystem -> AzureBlobFileSystemStore --> AbfsClient -> 
> TokenProvider(MsiTokenProvider)
> {code}
> which ends up pinging the token auth endpoint of azure, leading to e.g. a 
> HTTP response 429
> We need to check whether we can remove this closeAllForUGI in LLAP, 
> additionally check and remove all deleteOnExit calls that belong to hadoop 
> FileSystem objects (doesn't necessarily apply to java.io.File.deleteOnExit 
> calls):
> {code}
> grep -iRH "deleteOnExit" --include="*.java" | grep -v "test"
> ...
> ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java:// 
> in recent hadoop versions, use deleteOnExit to clean tmp files.
> ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java:
> autoDelete = fs.deleteOnExit(fsp.outPaths[filesIdx]);
> ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/util/PathInfo.java:
> fileSystem.deleteOnExit(dir);
> ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/RowContainer.java: 
>  parentDir.deleteOnExit();
> ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/RowContainer.java: 
>  tmpFile.deleteOnExit();
> ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/KeyValueContainer.java:
> parentDir.deleteOnExit();
> 

[jira] [Comment Edited] (HIVE-27884) LLAP: Reuse FileSystem objects from cache across different tasks in the same LLAP daemon

2023-11-20 Thread Jira


[ 
https://issues.apache.org/jira/browse/HIVE-27884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17787941#comment-17787941
 ] 

László Bodor edited comment on HIVE-27884 at 11/20/23 10:41 AM:


minor note: it's time to reconsider using autoDelete = fs.deleteOnExit() kind 
of things

we tend do something like:
1. autoDelete = fs.deleteOnExit
2. fs.deleteOnExit returns true if the file/dir exists and false if it doesn't
3. we later delete the file if !autoDelete - so if it didn't exist when 
fs.deleteOnExit was called - instead of simply calling fs.delete on that file 
(and ignoring the fact that it exists now or not, delete() will take care, 
right?)

my point is that checking the return value of deleteOnExit doesn't make any 
sense in making decisions about whether to delete files or not later :)




was (Author: abstractdog):
minor note: it's time to reconsider using autoDelete = fs.deleteOnExit() kind 
of things

we tend do something like:
1. autoDelete = fs.deleteOnExit
2. fs.deleteOnExit returns true if the file/dir exists and false if it doesn't
3. we later delete the file if !autoDelete - so if it didn't exist when 
fs.deleteOnExit was called - instead of simply calling fs.delete on that file 
(and ignoring the fact that it's existing now or not

my point is that checking the return value of deleteOnExit doesn't make any 
sense in making decisions about whether to delete files or not later :)



> LLAP: Reuse FileSystem objects from cache across different tasks in the same 
> LLAP daemon
> 
>
> Key: HIVE-27884
> URL: https://issues.apache.org/jira/browse/HIVE-27884
> Project: Hive
>  Issue Type: Improvement
>Reporter: László Bodor
>Assignee: László Bodor
>Priority: Major
>
> Originally, when the task runner was added to HIVE-10028 
> ([here|https://github.com/apache/hive/blob/23f40bd88043db3cb4efe3a763cbfd5c01a81d2f/llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/TaskRunnerCallable.java#L202]),
>  the FileSystem.closeAllForUGI was commented out for some reasons, and then, 
> in the scope of HIVE-9898 it was simply added back, 
> [here|https://github.com/apache/hive/commit/91c46a44dd9fbb68d01f22e93c4ce0931a4598e0#diff-270dbe6639879ca543ae21c44a239af6145390726d45fee832be809894bfc88eR236]
> A FileSystem.close call basically does the 
> [following|https://github.com/apache/hadoop/blob/0c10bab7bb77aa4ea3ca26c899ab28131561e052/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java#L2700-L2710]:
> 1. delete all paths that were marked as delete-on-exit.
> 2. removes the instance from the cache
> I saw that we 
> [call|https://github.com/apache/hive/blob/eb6f0b0c57dd55335927b7dde08cd47f4d00e74d/llap-server/src/java/org/apache/hadoop/hive/llap/daemon/impl/TaskRunnerCallable.java#L302]
>  
> {code}
> FileSystem.closeAllForUGI
> {code}
> at the end of all task attempts, so we almost completely disable hadoop's 
> filesystem cache during a long-running LLAP daemon lifecycle
> some investigations on azure showed that creating a filesystem can be quite 
> expensive, as it involves the recreation of a whole object hierarchy like:
> {code}
> AzureBlobFileSystem -> AzureBlobFileSystemStore --> AbfsClient -> 
> TokenProvider(MsiTokenProvider)
> {code}
> which ends up pinging the token auth endpoint of azure, leading to e.g. a 
> HTTP response 429
> We need to check whether we can remove this closeAllForUGI in LLAP, 
> additionally check and remove all deleteOnExit calls that belong to hadoop 
> FileSystem objects (doesn't necessarily apply to java.io.File.deleteOnExit 
> calls):
> {code}
> grep -iRH "deleteOnExit" --include="*.java" | grep -v "test"
> ...
> ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java:// 
> in recent hadoop versions, use deleteOnExit to clean tmp files.
> ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java:
> autoDelete = fs.deleteOnExit(fsp.outPaths[filesIdx]);
> ql/src/java/org/apache/hadoop/hive/ql/exec/repl/bootstrap/load/util/PathInfo.java:
> fileSystem.deleteOnExit(dir);
> ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/RowContainer.java: 
>  parentDir.deleteOnExit();
> ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/RowContainer.java: 
>  tmpFile.deleteOnExit();
> ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/KeyValueContainer.java:
> parentDir.deleteOnExit();
> ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/KeyValueContainer.java:
> tmpFile.deleteOnExit();
> ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/ObjectContainer.java:  
>   tmpFile.deleteOnExit();
> ql/src/java/org/apache/hadoop/hive/ql/exec/AbstractFileMergeOperator.java:
> autoDelete =