Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/7999 )
Change subject: [DOCS] Tighten up advice about first COMPUTE INCREMENTAL STATS ...................................................................... Patch Set 2: (9 comments) http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml File docs/shared/impala_common.xml: http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1224 PS2, Line 1224: For a particular table, use either <codeph>COMPUTE STATS</codeph> or Yes! http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1233 PS2, Line 1233: When you run <codeph>COMPUTE INCREMENTAL STATS</codeph> on a table for the first time, I suggest some minor rephrasing to drive home the "don't switch mantra" a little more, see comments. http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1234 PS2, Line 1234: the statistics are computed again from scratch regardless of whether you previously ran regardless of whether the table has existing stats. http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1236 PS2, Line 1236: for scanning the entire table when switching from <codeph>COMPUTE STATS</codeph> to when running COMPUTE INCREMENTAL STATS for the first time on a given table. (do not mention switching... not supposed to do that) http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1244 PS2, Line 1244: 2 GB, a serious error can occur. If only a limited number of partitions are actively being If the aggregate metadata of all tables exceeds 2 GB you may experience service downtime (daemon crashes). ("serious error" really isn't clear to me) http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1245 PS2, Line 1245: added or inserted into, you can run <codeph>COMPUTE INCREMENTAL STATS</codeph> for the active Sorry my phrasing might have been misleading. By "active" partitions I meant those partitions that are being queried (i.e. read)... if you query some partitions very infrequently then there is no point in keeping incremental stats for them. http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1248 PS2, Line 1248: optimizations such as partition pruning. such as partition pruning or join ordering. http://gerrit.cloudera.org:8080/#/c/7999/2/docs/topics/impala_partitioning.xml File docs/topics/impala_partitioning.xml: http://gerrit.cloudera.org:8080/#/c/7999/2/docs/topics/impala_partitioning.xml@624 PS2, Line 624: subset of partitions rather than the entire table. The incremental nature makes it suitable for large tables Need to be careful here because "large tables" could be misinterpreted to mean "tables with many partitions". I'd prefer to avoid the word "suitable" and instead use a phrasing that states it enables updating the stats as partitions are added. Whether incremental stats is "suitable" for anything is questionable because of the huge memory downside. I'd agree that incremental stats could be suitable in situations where you have a huge partitioned table with a small rolling window of "active" partitions, so you only ever need to keep incremental stats on let's say <100 partitions. http://gerrit.cloudera.org:8080/#/c/7999/2/docs/topics/impala_perf_stats.xml File docs/topics/impala_perf_stats.xml: http://gerrit.cloudera.org:8080/#/c/7999/2/docs/topics/impala_perf_stats.xml@361 PS2, Line 361: <codeph>COMPUTE STATS</codeph> statement might take hours, or even days. That situation is where you switch Rephrase to avoid "switch" since switching is bad -- To view, visit http://gerrit.cloudera.org:8080/7999 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia53a6518ce5541e5c9a2cd896856ce042a599b03 Gerrit-Change-Number: 7999 Gerrit-PatchSet: 2 Gerrit-Owner: John Russell <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Greg Rahn <[email protected]> Gerrit-Reviewer: John Russell <[email protected]> Gerrit-Reviewer: Mostafa Mokhtar <[email protected]> Gerrit-Reviewer: Silvius Rus <[email protected]> Gerrit-Comment-Date: Fri, 06 Oct 2017 06:43:06 +0000 Gerrit-HasComments: Yes
