[Impala-ASF-CR] IMPALA-7738: Implement timeouts for HDFS open calls

Joe McDonnell (Code Review) Thu, 08 Nov 2018 09:54:23 -0800

Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11874 )


Change subject: IMPALA-7738: Implement timeouts for HDFS open calls
......................................................................


Patch Set 4:

(1 comment)

> Thanks!
 >
 > I was a little surprised to see hdfs-monitored-ops have different
 > signatures for the cached vs. non-cached file handles. Ultimately,
 > we're trying to implement hdfsOpenWithTimeout(), and I would have
 > expected existing code to be amended to use that. Equivalently,
 > could we be using lambdas or similar to have a totally generic
 > "timeout-pool" for this sort of operation?

I'm flexible about the interface in terms of file handle vs hdfs open, etc. The 
hdfsOpenWithTimeout() might be a better interface.

A lot of the changes for the file handle cache and disk io mgr are more about 
plumbing status around. The code used to represent failure as a null pointer, 
and now there are multiple failure modes that we want to distinguish between. 
All that is needed regardless of the specific interface.

I feel like the various HDFS operations that have a timeout implementation 
should all be defined in one place, so you would end up with something like an 
HdfsMonitor even if you implemented generic lambda stuff under the covers. What 
other non-HDFS operations were you planning on adding timeouts for? Would they 
have their own pools? Would they share a pool with HDFS?

http://gerrit.cloudera.org:8080/#/c/11874/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11874/4//COMMIT_MSG@14
PS4, Line 14: and submitting it to a thread pool. The original caller waits
> High level Qs about the thread pool:
Answers:
 - Yes, the pool is fixed-size.
 - We print to the WARNING log when we see any timeout. We print to the ERROR 
log if we can't even submit the task to the thread pool (i.e. the system is 
hosed). The impalad log would make it pretty clear. The error would be in the 
query profile as the query's status.
 - That is the big question. We could crash the impalad if we are stuck. We 
could do something to notify the statestore. The code uploaded at the moment 
just keeps failing.



--
To view, visit http://gerrit.cloudera.org:8080/11874
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia14403ca5f3f19c6d5f61b9ab2306b0ad3267454
Gerrit-Change-Number: 11874
Gerrit-PatchSet: 4
Gerrit-Owner: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Philip Zeyliger <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Comment-Date: Thu, 08 Nov 2018 02:04:19 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7738: Implement timeouts for HDFS open calls

Reply via email to