Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/10511 )
Change subject: IMPALA-5740: [DOCS] Correct the max length of STRING ...................................................................... Patch Set 2: (7 comments) Thanks for writing this up! This comes up a lot. http://gerrit.cloudera.org:8080/#/c/10511/2/docs/topics/impala_string.xml File docs/topics/impala_string.xml: http://gerrit.cloudera.org:8080/#/c/10511/2/docs/topics/impala_string.xml@68 PS2, Line 68: The hard limit is 2 GB on <codeph>STRING</codeph>. Maybe "The hard limit on the size of a STRING and the total size of a row is 2GB". Since if the strings add up to more than 2GB for a row, they hit this. Maybe mention the consequences. E.g. "If a query tries to process or create a string larger than this, it will always return an error to the user." http://gerrit.cloudera.org:8080/#/c/10511/2/docs/topics/impala_string.xml@83 PS2, Line 83: will Maybe "may". "will" is a little too scary maybe. http://gerrit.cloudera.org:8080/#/c/10511/2/docs/topics/impala_string.xml@88 PS2, Line 88: The row size, i.e. total size of all string and other columns, is : limited by various factors, such as: I think this could be clearer (my fault for vague wording in the original JIRA). Maybe something like: "The row size, i.e. total size of all string and other columns, is subject to lower limits at various points in query execution that support spill-to-disk, such as:" http://gerrit.cloudera.org:8080/#/c/10511/2/docs/topics/impala_string.xml@97 PS2, Line 97: spilling hash join Not sure if we use "spilling" elsewere in docs. Maybe "of a hash join that spills to disk". http://gerrit.cloudera.org:8080/#/c/10511/2/docs/topics/impala_string.xml@101 PS2, Line 101: Rows being sorted maybe "being sorted by the "SORT" operator without a limit." http://gerrit.cloudera.org:8080/#/c/10511/2/docs/topics/impala_string.xml@112 PS2, Line 112: the row : size is 8 MB. "of row size in the above places is 8MB", just to be clearer that it's tied to the above list. http://gerrit.cloudera.org:8080/#/c/10511/2/docs/topics/impala_string.xml@118 PS2, Line 118: 2 MB The default value of max_row_size is actually 512 KB. This gets a bit confusing because MAX_ROW_SIZE is a lower bound on the upper bound. I.e. setting MAX_ROW_SIZE guarantees we can always process rows >= MAX_ROW_SIZE in the spill-to-disk operators, but it doesn't always prevent processing rows >= MAX_ROW_SIZE. Maybe we should refer to the MAX_ROW_SIZE docs and keep it to a brief summary here. Maybe you have some ideas on how to express this. Here's my attempt: "In <keyword keyref="impala210"/> and higher, rows up to MAX_ROW_SIZE (which defaults to 512 KB) can always be processed in the above cases. Rows larger than MAX_ROW_SIZE are processed on a best-effort basis. See the MAX_ROW_SIZE documentation (link) for more details." -- To view, visit http://gerrit.cloudera.org:8080/10511 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I43c5a2819c8a3db33a8ce3a6bbde6a1d823ec9b2 Gerrit-Change-Number: 10511 Gerrit-PatchSet: 2 Gerrit-Owner: Alex Rodoni <[email protected]> Gerrit-Reviewer: Alex Rodoni <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Comment-Date: Fri, 01 Jun 2018 17:25:39 +0000 Gerrit-HasComments: Yes
