Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8200 )

Change subject: IMPALA-4623: [DOCS] Document file handle caching
......................................................................


Patch Set 1:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/8200/1/docs/topics/impala_known_issues.xml
File docs/topics/impala_known_issues.xml:

http://gerrit.cloudera.org:8080/#/c/8200/1/docs/topics/impala_known_issues.xml@338
PS1, Line 338: continuously appended by an HDFS mechanism
This also applies if an HDFS file is overwritten in place.


http://gerrit.cloudera.org:8080/#/c/8200/1/docs/topics/impala_scalability.xml
File docs/topics/impala_scalability.xml:

http://gerrit.cloudera.org:8080/#/c/8200/1/docs/topics/impala_scalability.xml@967
PS1, Line 967: although the encryption layer
             :         adds overhead that might lessen the benefit of the 
caching.
I'm not familiar with this overhead. What is this referring to?


http://gerrit.cloudera.org:8080/#/c/8200/1/docs/topics/impala_scalability.xml@973
PS1, Line 973: 20 thousand
Just curious: How do you decide to use "20 thousand" vs "20,000"?


http://gerrit.cloudera.org:8080/#/c/8200/1/docs/topics/impala_scalability.xml@991
PS1, Line 991: evict any stale file handles from the cache
The file handles won't actually be evicted directly. The new metadata will mean 
that new statements will no longer use that file handle and eventually it will 
get aged out. I'm not sure if this distinction is important for documentation, 
but I think the important thing is that the memory may not be freed 
immediately. (This is something we are likely to change in a future release.)


http://gerrit.cloudera.org:8080/#/c/8200/1/docs/topics/impala_scalability.xml@995
PS1, Line 995: To evaluate the effectiveness of file handle caching for a 
particular workload, issue the
             :         <codeph>PROFILE</codeph> statement in 
<cmdname>impala-shell</cmdname> or examine query
             :         profiles in the Impala web UI. Look for the ratio of 
<codeph>CachedFileHandlesHitCount</codeph>
             :         (ideally, should be high) to 
<codeph>CachedFileHandlesMissCount</codeph> (ideally, should be low).
             :         Before starting any evaluation, run some representative 
queries to <q>warm up</q> the cache,
             :         because the first time each data file is accessed is 
always recorded as a cache miss.
I'm not sure this belongs here, but information about the cache across the 
whole impalad is available via the metrics page under impala-server:
impala-server.io.mgr.cached-file-handles-miss-count
impala-server.io.mgr.cached-file-handles-hit-count

The total number of file handles in the cache is:
impala-server.io.mgr.num-cached-file-handles



--
To view, visit http://gerrit.cloudera.org:8080/8200
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I261c29eff80dc376528bba29ffb7d8e0f895e25f
Gerrit-Change-Number: 8200
Gerrit-PatchSet: 1
Gerrit-Owner: John Russell <[email protected]>
Gerrit-Reviewer: Dan Hecht <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Mostafa Mokhtar <[email protected]>
Gerrit-Comment-Date: Thu, 05 Oct 2017 02:37:37 +0000
Gerrit-HasComments: Yes

Reply via email to