Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/8200 )
Change subject: IMPALA-4623: [DOCS] Document file handle caching ...................................................................... Patch Set 1: (5 comments) http://gerrit.cloudera.org:8080/#/c/8200/1/docs/topics/impala_known_issues.xml File docs/topics/impala_known_issues.xml: http://gerrit.cloudera.org:8080/#/c/8200/1/docs/topics/impala_known_issues.xml@338 PS1, Line 338: continuously appended by an HDFS mechanism This also applies if an HDFS file is overwritten in place. http://gerrit.cloudera.org:8080/#/c/8200/1/docs/topics/impala_scalability.xml File docs/topics/impala_scalability.xml: http://gerrit.cloudera.org:8080/#/c/8200/1/docs/topics/impala_scalability.xml@967 PS1, Line 967: although the encryption layer : adds overhead that might lessen the benefit of the caching. I'm not familiar with this overhead. What is this referring to? http://gerrit.cloudera.org:8080/#/c/8200/1/docs/topics/impala_scalability.xml@973 PS1, Line 973: 20 thousand Just curious: How do you decide to use "20 thousand" vs "20,000"? http://gerrit.cloudera.org:8080/#/c/8200/1/docs/topics/impala_scalability.xml@991 PS1, Line 991: evict any stale file handles from the cache The file handles won't actually be evicted directly. The new metadata will mean that new statements will no longer use that file handle and eventually it will get aged out. I'm not sure if this distinction is important for documentation, but I think the important thing is that the memory may not be freed immediately. (This is something we are likely to change in a future release.) http://gerrit.cloudera.org:8080/#/c/8200/1/docs/topics/impala_scalability.xml@995 PS1, Line 995: To evaluate the effectiveness of file handle caching for a particular workload, issue the : <codeph>PROFILE</codeph> statement in <cmdname>impala-shell</cmdname> or examine query : profiles in the Impala web UI. Look for the ratio of <codeph>CachedFileHandlesHitCount</codeph> : (ideally, should be high) to <codeph>CachedFileHandlesMissCount</codeph> (ideally, should be low). : Before starting any evaluation, run some representative queries to <q>warm up</q> the cache, : because the first time each data file is accessed is always recorded as a cache miss. I'm not sure this belongs here, but information about the cache across the whole impalad is available via the metrics page under impala-server: impala-server.io.mgr.cached-file-handles-miss-count impala-server.io.mgr.cached-file-handles-hit-count The total number of file handles in the cache is: impala-server.io.mgr.num-cached-file-handles -- To view, visit http://gerrit.cloudera.org:8080/8200 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I261c29eff80dc376528bba29ffb7d8e0f895e25f Gerrit-Change-Number: 8200 Gerrit-PatchSet: 1 Gerrit-Owner: John Russell <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Joe McDonnell <[email protected]> Gerrit-Reviewer: Mostafa Mokhtar <[email protected]> Gerrit-Comment-Date: Thu, 05 Oct 2017 02:37:37 +0000 Gerrit-HasComments: Yes
