Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/11420 )
Change subject: [docs] Add basic advice on setting block cache size ...................................................................... Patch Set 2: (8 comments) http://gerrit.cloudera.org:8080/#/c/11420/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/11420/1//COMMIT_MSG@9 PS1, Line 9: improving > Nit: "improving" is an odd choice here, since the subject is "the block cac Done http://gerrit.cloudera.org:8080/#/c/11420/1//COMMIT_MSG@11 PS1, Line 11: are so : workload dependent (e.g. consider a workload doing full table scans vs. : one mostly re-scanning a small rang > Worth noting that the scanner API has a SetCacheBlocks() method to control In the commit message, in the docs, or you are just letting me know? (Thanks, btw, I knew we didn't cache blocks when doing a checksum but I didn't know we had an scanner API for the same thing.) http://gerrit.cloudera.org:8080/#/c/11420/1/docs/troubleshooting.adoc File docs/troubleshooting.adoc: http://gerrit.cloudera.org:8080/#/c/11420/1/docs/troubleshooting.adoc@538 PS1, Line 538: _cache_capacity_mb` above the memory pressure > Hrm, at first glance I read this as either: I removed this clause and joined the first part of this sentence to the second instead. http://gerrit.cloudera.org:8080/#/c/11420/1/docs/troubleshooting.adoc@542 PS1, Line 542: re threshold is recommended. With the defaults, thi > Nit: clearer as "`--memory_pressure_percentage` of `--memory_limit_hard_byt Done (but I also rewrote the surrounding text a little) http://gerrit.cloudera.org:8080/#/c/11420/1/docs/troubleshooting.adoc@543 PS1, Line 543: `--block_cache_capacity_mb` should not exceed 30% of : `--memory_limit_hard_bytes`. On Ku > Nit: wouldn't it be clearer to use "block cache capacity" and "memory hard No, I think this is clearest when referring the concrete values of the gflags. I rewrote the text before this sentence a little, which may have helped. http://gerrit.cloudera.org:8080/#/c/11420/1/docs/troubleshooting.adoc@557 PS1, Line 557: metrics > metrics? Oops. Tangentially, why comment as a question if you know the text is wrong? http://gerrit.cloudera.org:8080/#/c/11420/1/docs/troubleshooting.adoc@596 PS1, Line 596: first wait until > Won't block_cache_usage be 100% of block_cache_capacity_mb given enough upt Right, and we don't want users to extrapolate steady-state cache behavior from that case. I rewrote the section to emphasize users should look at the cache once the server has been running for a while, when it's expected that the cache is full. http://gerrit.cloudera.org:8080/#/c/11420/1/docs/troubleshooting.adoc@602 PS1, Line 602: Kudu expected to read from cache but which weren't found in the cache. If a > I'm trying to think of a case where this isn't true, but the above is true. Looking at how the metrics behave on my local cluster, loading a block fresh counts as a miss, so the initial load of N blocks counts as N misses. Ergo we need to see evictions to confirm there is churn because pure miss numbers aren't sufficient. -- To view, visit http://gerrit.cloudera.org:8080/11420 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idc7411c38b6fcc8694509ec89c32e2fe74e6c0db Gerrit-Change-Number: 11420 Gerrit-PatchSet: 2 Gerrit-Owner: Will Berkeley <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: Attila Bukor <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Will Berkeley <[email protected]> Gerrit-Comment-Date: Thu, 13 Sep 2018 18:01:59 +0000 Gerrit-HasComments: Yes
