John Russell has posted comments on this change. ( http://gerrit.cloudera.org:8080/7999 )
Change subject: [DOCS] Tighten up advice about first COMPUTE INCREMENTAL STATS ...................................................................... Patch Set 2: (19 comments) Almost finished with the comments. I'll touch base with Alex to get a little more clarification about which stats are safe to, or make sense to, DROP INCREMENTAL STATS for. http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml File docs/shared/impala_common.xml: http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1224 PS2, Line 1224: For a particular table, use either <codeph>COMPUTE STATS</codeph> or > Yes! Done http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1228 PS2, Line 1228: DROP STATS</codeph> and : <codeph>DROP INCREMENTAL STATS</codeph>) > They are not required if you *exactly* what you are doing, but that does no Done http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1228 PS2, Line 1228: DROP STATS</codeph> and : <codeph>DROP INCREMENTAL STATS</codeph>) > are these drops required? Done http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1234 PS2, Line 1234: the statistics are computed again from scratch regardless of whether you previously ran > regardless of whether the table has existing stats. Done http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1236 PS2, Line 1236: for scanning the entire table when switching from <codeph>COMPUTE STATS</codeph> to > when running COMPUTE INCREMENTAL STATS for the first time on a given table. Done http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1243 PS2, Line 1243: be cached on every <cmdname>impalad</cmdname> host. If this metadata for a table exceeds > more specifically, impalads that are also coordinators? Done http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1243 PS2, Line 1243: be cached on every <cmdname>impalad</cmdname> host. If this metadata for a table exceeds > Yes Done http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1244 PS2, Line 1244: 2 GB, a serious error can occur. If only a limited number of partitions are actively being > If the aggregate metadata of all tables exceeds 2 GB you may experience ser Done. "Serious error" was my compromise I always used for MySQL, where the open source tradition leaned towards saying "crash" but the enterprise focus suggested something more euphemistic. http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1247 PS2, Line 1247: does not affect > Fine with me to expand this to add my earlier explanation of what the "incr Done http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1247 PS2, Line 1247: does not affect > does that mean lack of stats has not affect on optimization or something el Done http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1247 PS2, Line 1247: does not affect > fair pointer for me, but my comment is about whether this wording is clear Done http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1247 PS2, Line 1247: does not affect > Please see my explanation on what "incremental" stats is in previous patch Done http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1248 PS2, Line 1248: optimizations such as partition pruning. > such as partition pruning or join ordering. Done http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1248 PS2, Line 1248: optimizations such as partition pruning. > Actually I would remove partition pruning because stats have nothing to do Done http://gerrit.cloudera.org:8080/#/c/7999/2/docs/topics/impala_partitioning.xml File docs/topics/impala_partitioning.xml: http://gerrit.cloudera.org:8080/#/c/7999/2/docs/topics/impala_partitioning.xml@611 PS2, Line 611: frequently > remove Done http://gerrit.cloudera.org:8080/#/c/7999/2/docs/topics/impala_partitioning.xml@623 PS2, Line 623: is a shortcut > I don't know what "shortcut" means here. I'd remove it. I'm looking for a way to convey that it's faster to do COMPUTE INCREMENTAL STATS on a partitioned table than COMPUTE STATS. But the time savings only happens if you do C.I.S. multiple times, that is, because the table keeps getting new partitions. http://gerrit.cloudera.org:8080/#/c/7999/2/docs/topics/impala_perf_stats.xml File docs/topics/impala_perf_stats.xml: http://gerrit.cloudera.org:8080/#/c/7999/2/docs/topics/impala_perf_stats.xml@361 PS2, Line 361: <codeph>COMPUTE STATS</codeph> statement might take hours, or even days. That situation is where you switch > Rephrase to avoid "switch" since switching is bad Done http://gerrit.cloudera.org:8080/#/c/7999/2/docs/topics/impala_perf_stats.xml@361 PS2, Line 361: That situation is where you switch > I'd reword this part ("That situation is where ..."). Suggestion: I used wording similar to Vuk's suggestion, but without saying "do a CTAS into a whole new table and throw away the old table", the user is likely to follow their intuition into switching from C.S. to C.I.S. on the same table. http://gerrit.cloudera.org:8080/#/c/7999/2/docs/topics/impala_perf_stats.xml@412 PS2, Line 412: >COMPUTE INCREMENTAL STAT > docs in impala_common mention "drop stats" before making a switch. that's n The conref= lines in the <note> above will pull in the same text as in implala_common.xml with all the extra warnings and instructions. -- To view, visit http://gerrit.cloudera.org:8080/7999 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia53a6518ce5541e5c9a2cd896856ce042a599b03 Gerrit-Change-Number: 7999 Gerrit-PatchSet: 2 Gerrit-Owner: John Russell <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Greg Rahn <[email protected]> Gerrit-Reviewer: John Russell <[email protected]> Gerrit-Reviewer: Mostafa Mokhtar <[email protected]> Gerrit-Reviewer: Silvius Rus <[email protected]> Gerrit-Reviewer: Vuk Ercegovac <[email protected]> Gerrit-Comment-Date: Fri, 06 Oct 2017 18:09:52 +0000 Gerrit-HasComments: Yes
