Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8003 )

Change subject: IMPALA-3200: [DOCS] Document user-facing aspects of new buffer 
pool
......................................................................


Patch Set 4:

(18 comments)

http://gerrit.cloudera.org:8080/#/c/8003/4/docs/topics/impala_buffer_pool_limit.xml
File docs/topics/impala_buffer_pool_limit.xml:

http://gerrit.cloudera.org:8080/#/c/8003/4/docs/topics/impala_buffer_pool_limit.xml@39
PS4, Line 39:       Defines the size in bytes of an internal buffer for 
allocating memory during queries.
First sentence isn't accurate. Something like this might be better: "Defines a 
limit on the amount of memory that the query can allocate from the internal 
buffer pool."


http://gerrit.cloudera.org:8080/#/c/8003/4/docs/topics/impala_buffer_pool_limit.xml@53
PS4, Line 53: his default is represented by a value
            :       of 0.
The default is actually that the query option is unset. 0 means a limit of 0. I 
guess that impala-shell reported that the default was 0, which is incorrect: 
see IMPALA-5589


http://gerrit.cloudera.org:8080/#/c/8003/4/docs/topics/impala_buffer_pool_limit.xml@69
PS4, Line 69: set buffer_pool_limit=8GB;
It can also be expressed as a percentage of mem_limit (e.g. 80%), which may be 
more useful in some cases.


http://gerrit.cloudera.org:8080/#/c/8003/4/docs/topics/impala_default_spillable_buffer_size.xml
File docs/topics/impala_default_spillable_buffer_size.xml:

http://gerrit.cloudera.org:8080/#/c/8003/4/docs/topics/impala_default_spillable_buffer_size.xml@44
PS4, Line 44:     <p conref="../shared/impala_common.xml#common/type_integer"/>
Do we document somewhere the supported byte suffixes (e.g. B, KB, MB, GB plus 
variations).


http://gerrit.cloudera.org:8080/#/c/8003/4/docs/topics/impala_default_spillable_buffer_size.xml@49
PS4, Line 49:     <p>
I think it's confusing to say that the default varies, since the query option 
value doesn't vary, it just sets an upper bound for the sizes that can be 
picked by the planner.


http://gerrit.cloudera.org:8080/#/c/8003/4/docs/topics/impala_default_spillable_buffer_size.xml@50
PS4, Line 50:       <codeph>65536</codeph> to <codeph>2097152</codeph>, 
depending on the
Feel free to ignore, but might be easier to read if they were human-readable 
(e.g. 2MB).


http://gerrit.cloudera.org:8080/#/c/8003/4/docs/topics/impala_default_spillable_buffer_size.xml@59
PS4, Line 59:       consider increasing the 
<codeph>DEFAULT_SPILLABLE_BUFFER_SIZE</codeph> setting.
Would it be helpful to elaborate here? E.g. "Larger buffer sizes will result in 
Impala issuing larger I/O operators to storage devices, which may result in 
higher throughput, particularly on rotational disks".


http://gerrit.cloudera.org:8080/#/c/8003/4/docs/topics/impala_default_spillable_buffer_size.xml@61
PS4, Line 61:     </p>
It's implied but would it make sense to state the inverse. I.e. decreasing the 
option may reduce memory consumption.

This all depends on whether the planner has actually chosen a 
spillable-buffer_size that is capped by this query option (visible when 
explain_level=2). It seems worth mentioning that since it's a precondition to 
this having any effect.


http://gerrit.cloudera.org:8080/#/c/8003/4/docs/topics/impala_max_row_size.xml
File docs/topics/impala_max_row_size.xml:

http://gerrit.cloudera.org:8080/#/c/8003/4/docs/topics/impala_max_row_size.xml@47
PS4, Line 47: 524288
512K to be human-readable?


http://gerrit.cloudera.org:8080/#/c/8003/4/docs/topics/impala_max_row_size.xml@56
PS4, Line 56: accomodate
sp: accommodate


http://gerrit.cloudera.org:8080/#/c/8003/4/docs/topics/impala_max_row_size.xml@57
PS4, Line 57:       the total bytes stored in the largest row.
It's left ambiguous whether queries processing rows larger than MAX_ROW_SIZE 
*will* fail or whether they *may* fail. It's really the latter - in many case 
they will succeed. I've sometimes described it as best-effort processing for 
rows > max_row_size.


http://gerrit.cloudera.org:8080/#/c/8003/4/docs/topics/impala_max_row_size.xml@68
PS4, Line 68: <codeblock><![CDATA[
This example is really helpful! Hopefully this will head off some support cases 
(I know it comes up with some frequency).


http://gerrit.cloudera.org:8080/#/c/8003/4/docs/topics/impala_min_spillable_buffer_size.xml
File docs/topics/impala_min_spillable_buffer_size.xml:

http://gerrit.cloudera.org:8080/#/c/8003/4/docs/topics/impala_min_spillable_buffer_size.xml@38
PS4, Line 38:       <indexterm audience="hidden">MIN_SPILLABLE_BUFFER_SIZE 
query option</indexterm>
I think most of the comments I made on DEFAULT_SPILLABLE_BUFFER_SIZE have 
analogues here.


http://gerrit.cloudera.org:8080/#/c/8003/4/docs/topics/impala_scalability.xml
File docs/topics/impala_scalability.xml:

http://gerrit.cloudera.org:8080/#/c/8003/4/docs/topics/impala_scalability.xml@425
PS4, Line 425: allocated
it's reserved at the beginning of the query, not allocated. The distinction is 
made in query profiles etc so we should make sure that it's clear here.


http://gerrit.cloudera.org:8080/#/c/8003/4/docs/topics/impala_scalability.xml@560
PS4, Line 560:         The infrastructure of the spilling feature affects the 
way the affected SQL operators, such as
A lot of the discussion in this paragraph and the one below is describing the 
"small buffers" part of the old spill-to-disk code, which is no longer 
relevant. Now all of the memory required to spill to disk for these operators 
is reserved upfront and is displayed in the explain plan. There's no threshold 
now - operators may expand their memory consumption if there's memory available 
but can successfully spill to disk otherwise.


http://gerrit.cloudera.org:8080/#/c/8003/4/docs/topics/impala_scalability.xml@615
PS4, Line 615: >BlockMgr.BytesWritten
This still needs updating - IMPALA-5656 - should I keep an eye out for a 
different gerrit review?


http://gerrit.cloudera.org:8080/#/c/8003/4/docs/topics/impala_scalability.xml@701
PS4, Line 701:         <b>Testing performance implications of spilling to 
disk:</b>
This section is stale too. The scenario still seems valid but the profile 
output is very different.


http://gerrit.cloudera.org:8080/#/c/8003/4/docs/topics/impala_scalability.xml@745
PS4, Line 745: Do not specify a memory limit lower than about 300 MB, because 
with such a
             :         low limit, queries could fail to start for other 
reasons. Now try the memory-intensive query again.
This is one of the things that was fixed :)



--
To view, visit http://gerrit.cloudera.org:8080/8003
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I49323f8ffbff3e195058e88762eedbb1fcb1bc0e
Gerrit-Change-Number: 8003
Gerrit-PatchSet: 4
Gerrit-Owner: John Russell <[email protected]>
Gerrit-Reviewer: Dan Hecht <[email protected]>
Gerrit-Reviewer: Mostafa Mokhtar <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Comment-Date: Wed, 04 Oct 2017 21:23:45 +0000
Gerrit-HasComments: Yes

Reply via email to