Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8003 )
Change subject: IMPALA-3200: [DOCS] Document user-facing aspects of new buffer pool ...................................................................... Patch Set 4: (18 comments) http://gerrit.cloudera.org:8080/#/c/8003/4/docs/topics/impala_buffer_pool_limit.xml File docs/topics/impala_buffer_pool_limit.xml: http://gerrit.cloudera.org:8080/#/c/8003/4/docs/topics/impala_buffer_pool_limit.xml@39 PS4, Line 39: Defines the size in bytes of an internal buffer for allocating memory during queries. First sentence isn't accurate. Something like this might be better: "Defines a limit on the amount of memory that the query can allocate from the internal buffer pool." http://gerrit.cloudera.org:8080/#/c/8003/4/docs/topics/impala_buffer_pool_limit.xml@53 PS4, Line 53: his default is represented by a value : of 0. The default is actually that the query option is unset. 0 means a limit of 0. I guess that impala-shell reported that the default was 0, which is incorrect: see IMPALA-5589 http://gerrit.cloudera.org:8080/#/c/8003/4/docs/topics/impala_buffer_pool_limit.xml@69 PS4, Line 69: set buffer_pool_limit=8GB; It can also be expressed as a percentage of mem_limit (e.g. 80%), which may be more useful in some cases. http://gerrit.cloudera.org:8080/#/c/8003/4/docs/topics/impala_default_spillable_buffer_size.xml File docs/topics/impala_default_spillable_buffer_size.xml: http://gerrit.cloudera.org:8080/#/c/8003/4/docs/topics/impala_default_spillable_buffer_size.xml@44 PS4, Line 44: <p conref="../shared/impala_common.xml#common/type_integer"/> Do we document somewhere the supported byte suffixes (e.g. B, KB, MB, GB plus variations). http://gerrit.cloudera.org:8080/#/c/8003/4/docs/topics/impala_default_spillable_buffer_size.xml@49 PS4, Line 49: <p> I think it's confusing to say that the default varies, since the query option value doesn't vary, it just sets an upper bound for the sizes that can be picked by the planner. http://gerrit.cloudera.org:8080/#/c/8003/4/docs/topics/impala_default_spillable_buffer_size.xml@50 PS4, Line 50: <codeph>65536</codeph> to <codeph>2097152</codeph>, depending on the Feel free to ignore, but might be easier to read if they were human-readable (e.g. 2MB). http://gerrit.cloudera.org:8080/#/c/8003/4/docs/topics/impala_default_spillable_buffer_size.xml@59 PS4, Line 59: consider increasing the <codeph>DEFAULT_SPILLABLE_BUFFER_SIZE</codeph> setting. Would it be helpful to elaborate here? E.g. "Larger buffer sizes will result in Impala issuing larger I/O operators to storage devices, which may result in higher throughput, particularly on rotational disks". http://gerrit.cloudera.org:8080/#/c/8003/4/docs/topics/impala_default_spillable_buffer_size.xml@61 PS4, Line 61: </p> It's implied but would it make sense to state the inverse. I.e. decreasing the option may reduce memory consumption. This all depends on whether the planner has actually chosen a spillable-buffer_size that is capped by this query option (visible when explain_level=2). It seems worth mentioning that since it's a precondition to this having any effect. http://gerrit.cloudera.org:8080/#/c/8003/4/docs/topics/impala_max_row_size.xml File docs/topics/impala_max_row_size.xml: http://gerrit.cloudera.org:8080/#/c/8003/4/docs/topics/impala_max_row_size.xml@47 PS4, Line 47: 524288 512K to be human-readable? http://gerrit.cloudera.org:8080/#/c/8003/4/docs/topics/impala_max_row_size.xml@56 PS4, Line 56: accomodate sp: accommodate http://gerrit.cloudera.org:8080/#/c/8003/4/docs/topics/impala_max_row_size.xml@57 PS4, Line 57: the total bytes stored in the largest row. It's left ambiguous whether queries processing rows larger than MAX_ROW_SIZE *will* fail or whether they *may* fail. It's really the latter - in many case they will succeed. I've sometimes described it as best-effort processing for rows > max_row_size. http://gerrit.cloudera.org:8080/#/c/8003/4/docs/topics/impala_max_row_size.xml@68 PS4, Line 68: <codeblock><![CDATA[ This example is really helpful! Hopefully this will head off some support cases (I know it comes up with some frequency). http://gerrit.cloudera.org:8080/#/c/8003/4/docs/topics/impala_min_spillable_buffer_size.xml File docs/topics/impala_min_spillable_buffer_size.xml: http://gerrit.cloudera.org:8080/#/c/8003/4/docs/topics/impala_min_spillable_buffer_size.xml@38 PS4, Line 38: <indexterm audience="hidden">MIN_SPILLABLE_BUFFER_SIZE query option</indexterm> I think most of the comments I made on DEFAULT_SPILLABLE_BUFFER_SIZE have analogues here. http://gerrit.cloudera.org:8080/#/c/8003/4/docs/topics/impala_scalability.xml File docs/topics/impala_scalability.xml: http://gerrit.cloudera.org:8080/#/c/8003/4/docs/topics/impala_scalability.xml@425 PS4, Line 425: allocated it's reserved at the beginning of the query, not allocated. The distinction is made in query profiles etc so we should make sure that it's clear here. http://gerrit.cloudera.org:8080/#/c/8003/4/docs/topics/impala_scalability.xml@560 PS4, Line 560: The infrastructure of the spilling feature affects the way the affected SQL operators, such as A lot of the discussion in this paragraph and the one below is describing the "small buffers" part of the old spill-to-disk code, which is no longer relevant. Now all of the memory required to spill to disk for these operators is reserved upfront and is displayed in the explain plan. There's no threshold now - operators may expand their memory consumption if there's memory available but can successfully spill to disk otherwise. http://gerrit.cloudera.org:8080/#/c/8003/4/docs/topics/impala_scalability.xml@615 PS4, Line 615: >BlockMgr.BytesWritten This still needs updating - IMPALA-5656 - should I keep an eye out for a different gerrit review? http://gerrit.cloudera.org:8080/#/c/8003/4/docs/topics/impala_scalability.xml@701 PS4, Line 701: <b>Testing performance implications of spilling to disk:</b> This section is stale too. The scenario still seems valid but the profile output is very different. http://gerrit.cloudera.org:8080/#/c/8003/4/docs/topics/impala_scalability.xml@745 PS4, Line 745: Do not specify a memory limit lower than about 300 MB, because with such a : low limit, queries could fail to start for other reasons. Now try the memory-intensive query again. This is one of the things that was fixed :) -- To view, visit http://gerrit.cloudera.org:8080/8003 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I49323f8ffbff3e195058e88762eedbb1fcb1bc0e Gerrit-Change-Number: 8003 Gerrit-PatchSet: 4 Gerrit-Owner: John Russell <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Mostafa Mokhtar <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Comment-Date: Wed, 04 Oct 2017 21:23:45 +0000 Gerrit-HasComments: Yes
