Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10511 )

Change subject: IMPALA-5740: [DOCS] Correct the max length of STRING
......................................................................


Patch Set 2:

(7 comments)

Thanks for writing this up! This comes up a lot.

http://gerrit.cloudera.org:8080/#/c/10511/2/docs/topics/impala_string.xml
File docs/topics/impala_string.xml:

http://gerrit.cloudera.org:8080/#/c/10511/2/docs/topics/impala_string.xml@68
PS2, Line 68:         The hard limit is 2 GB on <codeph>STRING</codeph>.
Maybe "The hard limit on the size of a STRING and the total size of a row is 
2GB". Since if the strings add up to more than 2GB for a row, they hit this.

Maybe mention the consequences. E.g. "If a query tries to process or create a 
string larger than this, it will always return an error to the user."


http://gerrit.cloudera.org:8080/#/c/10511/2/docs/topics/impala_string.xml@83
PS2, Line 83: will
Maybe "may". "will" is a little too scary maybe.


http://gerrit.cloudera.org:8080/#/c/10511/2/docs/topics/impala_string.xml@88
PS2, Line 88:         The row size, i.e. total size of all string and other 
columns, is
            :         limited by various factors, such as:
I think this could be clearer (my fault for vague wording in the original 
JIRA). Maybe something like:

"The row size, i.e. total size of all string and other columns, is subject to 
lower limits at various points in query execution that support spill-to-disk, 
such as:"


http://gerrit.cloudera.org:8080/#/c/10511/2/docs/topics/impala_string.xml@97
PS2, Line 97: spilling hash join
Not sure if we use "spilling" elsewere in docs. Maybe

  "of a hash join that spills to disk".


http://gerrit.cloudera.org:8080/#/c/10511/2/docs/topics/impala_string.xml@101
PS2, Line 101:             Rows being sorted
maybe "being sorted by the "SORT" operator without a limit."


http://gerrit.cloudera.org:8080/#/c/10511/2/docs/topics/impala_string.xml@112
PS2, Line 112:  the row
             :       size is 8 MB.
"of row size in the above places is 8MB", just to be clearer that it's tied to 
the above list.


http://gerrit.cloudera.org:8080/#/c/10511/2/docs/topics/impala_string.xml@118
PS2, Line 118:  2 MB
The default value of max_row_size is actually 512 KB. This gets a bit confusing 
because MAX_ROW_SIZE is a lower bound on the upper bound. I.e. setting 
MAX_ROW_SIZE guarantees we can always process rows >= MAX_ROW_SIZE in the 
spill-to-disk operators, but it doesn't always prevent processing rows >= 
MAX_ROW_SIZE.

Maybe we should refer to the MAX_ROW_SIZE docs and keep it to a brief summary 
here. Maybe you have some ideas on how to express this. Here's my attempt:

"In <keyword keyref="impala210"/> and higher, rows up to MAX_ROW_SIZE (which 
defaults to 512 KB) can always be processed in the above cases. Rows larger 
than MAX_ROW_SIZE are processed on a best-effort basis. See the MAX_ROW_SIZE 
documentation (link) for more details."



--
To view, visit http://gerrit.cloudera.org:8080/10511
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I43c5a2819c8a3db33a8ce3a6bbde6a1d823ec9b2
Gerrit-Change-Number: 10511
Gerrit-PatchSet: 2
Gerrit-Owner: Alex Rodoni <[email protected]>
Gerrit-Reviewer: Alex Rodoni <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Comment-Date: Fri, 01 Jun 2018 17:25:39 +0000
Gerrit-HasComments: Yes

Reply via email to