Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/11874 )
Change subject: IMPALA-7738: Implement timeouts for HDFS open calls ...................................................................... Patch Set 4: (1 comment) > Thanks! > > I was a little surprised to see hdfs-monitored-ops have different > signatures for the cached vs. non-cached file handles. Ultimately, > we're trying to implement hdfsOpenWithTimeout(), and I would have > expected existing code to be amended to use that. Equivalently, > could we be using lambdas or similar to have a totally generic > "timeout-pool" for this sort of operation? I'm flexible about the interface in terms of file handle vs hdfs open, etc. The hdfsOpenWithTimeout() might be a better interface. A lot of the changes for the file handle cache and disk io mgr are more about plumbing status around. The code used to represent failure as a null pointer, and now there are multiple failure modes that we want to distinguish between. All that is needed regardless of the specific interface. I feel like the various HDFS operations that have a timeout implementation should all be defined in one place, so you would end up with something like an HdfsMonitor even if you implemented generic lambda stuff under the covers. What other non-HDFS operations were you planning on adding timeouts for? Would they have their own pools? Would they share a pool with HDFS? http://gerrit.cloudera.org:8080/#/c/11874/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/11874/4//COMMIT_MSG@14 PS4, Line 14: and submitting it to a thread pool. The original caller waits > High level Qs about the thread pool: Answers: - Yes, the pool is fixed-size. - We print to the WARNING log when we see any timeout. We print to the ERROR log if we can't even submit the task to the thread pool (i.e. the system is hosed). The impalad log would make it pretty clear. The error would be in the query profile as the query's status. - That is the big question. We could crash the impalad if we are stuck. We could do something to notify the statestore. The code uploaded at the moment just keeps failing. -- To view, visit http://gerrit.cloudera.org:8080/11874 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia14403ca5f3f19c6d5f61b9ab2306b0ad3267454 Gerrit-Change-Number: 11874 Gerrit-PatchSet: 4 Gerrit-Owner: Joe McDonnell <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Joe McDonnell <[email protected]> Gerrit-Reviewer: Philip Zeyliger <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Comment-Date: Thu, 08 Nov 2018 02:04:19 +0000 Gerrit-HasComments: Yes
