John Russell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7999 )

Change subject: [DOCS] Tighten up advice about first COMPUTE INCREMENTAL STATS
......................................................................


Patch Set 2:

(19 comments)

Almost finished with the comments. I'll touch base with Alex to get a little 
more clarification about which stats are safe to, or make sense to, DROP 
INCREMENTAL STATS for.

http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml
File docs/shared/impala_common.xml:

http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1224
PS2, Line 1224:         For a particular table, use either <codeph>COMPUTE 
STATS</codeph> or
> Yes!
Done


http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1228
PS2, Line 1228: DROP STATS</codeph> and
              :         <codeph>DROP INCREMENTAL STATS</codeph>)
> They are not required if you *exactly* what you are doing, but that does no
Done


http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1228
PS2, Line 1228: DROP STATS</codeph> and
              :         <codeph>DROP INCREMENTAL STATS</codeph>)
> are these drops required?
Done


http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1234
PS2, Line 1234:         the statistics are computed again from scratch 
regardless of whether you previously ran
> regardless of whether the table has existing stats.
Done


http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1236
PS2, Line 1236:         for scanning the entire table when switching from 
<codeph>COMPUTE STATS</codeph> to
> when running COMPUTE INCREMENTAL STATS for the first time on a given table.
Done


http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1243
PS2, Line 1243:         be cached on every <cmdname>impalad</cmdname> host. If 
this metadata for a table exceeds
> more specifically, impalads that are also coordinators?
Done


http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1243
PS2, Line 1243:         be cached on every <cmdname>impalad</cmdname> host. If 
this metadata for a table exceeds
> Yes
Done


http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1244
PS2, Line 1244:         2 GB, a serious error can occur. If only a limited 
number of partitions are actively being
> If the aggregate metadata of all tables exceeds 2 GB you may experience ser
Done. "Serious error" was my compromise I always used for MySQL, where the open 
source tradition leaned towards saying "crash" but the enterprise focus 
suggested something more euphemistic.


http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1247
PS2, Line 1247: does not affect
> Fine with me to expand this to add my earlier explanation of what the "incr
Done


http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1247
PS2, Line 1247: does not affect
> does that mean lack of stats has not affect on optimization or something el
Done


http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1247
PS2, Line 1247: does not affect
> fair pointer for me, but my comment is about whether this wording is clear
Done


http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1247
PS2, Line 1247: does not affect
> Please see my explanation on what "incremental" stats is in previous patch
Done


http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1248
PS2, Line 1248:         optimizations such as partition pruning.
> such as partition pruning or join ordering.
Done


http://gerrit.cloudera.org:8080/#/c/7999/2/docs/shared/impala_common.xml@1248
PS2, Line 1248:         optimizations such as partition pruning.
> Actually I would remove partition pruning because stats have nothing to do
Done


http://gerrit.cloudera.org:8080/#/c/7999/2/docs/topics/impala_partitioning.xml
File docs/topics/impala_partitioning.xml:

http://gerrit.cloudera.org:8080/#/c/7999/2/docs/topics/impala_partitioning.xml@611
PS2, Line 611: frequently
> remove
Done


http://gerrit.cloudera.org:8080/#/c/7999/2/docs/topics/impala_partitioning.xml@623
PS2, Line 623: is a shortcut
> I don't know what "shortcut" means here. I'd remove it.
I'm looking for a way to convey that it's faster to do COMPUTE INCREMENTAL 
STATS on a partitioned table than COMPUTE STATS. But the time savings only 
happens if you do C.I.S. multiple times, that is, because the table keeps 
getting new partitions.


http://gerrit.cloudera.org:8080/#/c/7999/2/docs/topics/impala_perf_stats.xml
File docs/topics/impala_perf_stats.xml:

http://gerrit.cloudera.org:8080/#/c/7999/2/docs/topics/impala_perf_stats.xml@361
PS2, Line 361:           <codeph>COMPUTE STATS</codeph> statement might take 
hours, or even days. That situation is where you switch
> Rephrase to avoid "switch" since switching is bad
Done


http://gerrit.cloudera.org:8080/#/c/7999/2/docs/topics/impala_perf_stats.xml@361
PS2, Line 361: That situation is where you switch
> I'd reword this part ("That situation is where ..."). Suggestion:
I used wording similar to Vuk's suggestion, but without saying "do a CTAS  into 
a whole new table and throw away the old table", the user is likely to follow 
their intuition into switching from C.S. to C.I.S. on the same table.


http://gerrit.cloudera.org:8080/#/c/7999/2/docs/topics/impala_perf_stats.xml@412
PS2, Line 412: >COMPUTE INCREMENTAL STAT
> docs in impala_common mention "drop stats" before making a switch. that's n
The conref= lines in the <note> above will pull in the same text as in 
implala_common.xml with all the extra warnings and instructions.



--
To view, visit http://gerrit.cloudera.org:8080/7999
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia53a6518ce5541e5c9a2cd896856ce042a599b03
Gerrit-Change-Number: 7999
Gerrit-PatchSet: 2
Gerrit-Owner: John Russell <[email protected]>
Gerrit-Reviewer: Alex Behm <[email protected]>
Gerrit-Reviewer: Greg Rahn <[email protected]>
Gerrit-Reviewer: John Russell <[email protected]>
Gerrit-Reviewer: Mostafa Mokhtar <[email protected]>
Gerrit-Reviewer: Silvius Rus <[email protected]>
Gerrit-Reviewer: Vuk Ercegovac <[email protected]>
Gerrit-Comment-Date: Fri, 06 Oct 2017 18:09:52 +0000
Gerrit-HasComments: Yes

Reply via email to