Matthew Jacobs has posted comments on this change. Change subject: Updates to DML statements for Impala + Kudu ......................................................................
Patch Set 4: (17 comments) http://gerrit.cloudera.org:8080/#/c/5646/3/docs/topics/impala_delete.xml File docs/topics/impala_delete.xml: PS3, Line 89: There the http://gerrit.cloudera.org:8080/#/c/5646/4/docs/topics/impala_delete.xml File docs/topics/impala_delete.xml: PS4, Line 79: <p> : The conditions in the <codeph>WHERE</codeph> clause can refer to : any combination of primary key columns or other columns. : </p> maybe worth mentioning that predicates on the PK will be faster- this still has a scan in it, so we push some predicates to the scan (that's described somewhere else) PS4, Line 89: There The PS4, Line 88: : There <codeph>WHERE</codeph> clause can refer to any combination of columns, : regardless of whether the columns are part of the primary key. seems to duplicate the stmt 2 above PS4, Line 93: <p> : If some rows cannot be deleted because their : some primary key columns are not found, due to their being deleted : by a concurrent <codeph>DELETE</codeph> operation, : the statement succeeds but returns a warning. : </p> : : <p> : After the statement finishes, there might be more or fewer rows than expected in the table, : due to other <codeph>INSERT</codeph>, <codeph>DELETE</codeph>, <codeph>UPDATE</codeph>, : or <codeph>UPSERT</codeph> statements running concurrently on the same table. : </p> these could be combined I think and made more clear, e.g. Because DML statements may conflict with one another (ref consistency?), a DELETE statement may attempt to delete rows that have already been deleted in which case the statement succeeds but a warning is returned. A DELETE statement may also conflict with an INSERT statement resulting in more rows than expected in the target table. PS4, Line 108: No message or return value indicates how many rows were deleted by the statement. This is not true, we show it in the shell and in the profile (not *DBC/HS2). Query: select * from t Query submitted at: 2017-01-25 11:09:23 (Coordinator: http://mj-desktop.ca.cloudera.com:25000) Query progress can be monitored at: http://mj-desktop.ca.cloudera.com:25000/query_plan?query_id=c410195daa4fa5e:aa39998900000000 +----+---------+ | id | int_col | +----+---------+ | 1 | 1 | | 5 | 1 | | 6 | 0 | | 7 | 1 | | 0 | 0 | | 2 | 0 | | 4 | 0 | | 3 | 1 | +----+---------+ Fetched 8 row(s) in 5.56s [localhost:21000] > delete t where id < 3; Query: delete t where id < 3 Query submitted at: 2017-01-25 11:11:04 (Coordinator: http://mj-desktop.ca.cloudera.com:25000) Query progress can be monitored at: http://mj-desktop.ca.cloudera.com:25000/query_plan?query_id=5b47f095b366a4ac:c972171800000000 Modified 3 row(s), 0 row error(s) in 0.13s PS4, Line 140: DELETE FROM time_series WHERE : year = 2016 AND month IN (11,12) AND day > 15; maybe worth mentioning this one would be fastest assuming year, month, day are PK. in the above examples we cannot push anything with "OR" to the scan. http://gerrit.cloudera.org:8080/#/c/5646/4/docs/topics/impala_update.xml File docs/topics/impala_update.xml: PS4, Line 62: The conditions in the <codeph>WHERE</codeph> clause are the same ones allowed : for the <codeph>SELECT</codeph> statement. same comment as in delete case about predicates on PKs will be faster. PS4, Line 77: their : some the PS4, Line 77: If some rows cannot be updated because their : some primary key columns are not found, due to their being deleted : by a concurrent <codeph>DELETE</codeph> operation, : the statement succeeds but returns a warning. : </p> : : <p> : The result set of this statement is always the empty set (zero rows). : No message or return value indicates how many rows were deleted by the statement. same comment about combining these as in DELETE PS4, Line 84: The result set of this statement is always the empty set (zero rows). : No message or return value indicates how many rows were deleted by the statement. same as delete this should return a message in the shell but not *DBC/HS2. it is in the profile too. it holds for all DML. PS4, Line 85: deleted this should be updated PS4, Line 98: <p conref="../shared/impala_common.xml#common/sync_ddl_blurb"/> as we discussed in the mtg this probably doesn't apply to DML (please update the other DML stmts as well) PS4, Line 144: but more efficient. note this is still not pushed down http://gerrit.cloudera.org:8080/#/c/5646/4/docs/topics/impala_upsert.xml File docs/topics/impala_upsert.xml: PS4, Line 41: <indexterm audience="hidden">UPSERT statement</indexterm> : Acts as a combination of the <codeph>INSERT</codeph> : and <codeph>UPDATE</codeph> statements. not sure if we should state this in docs PS4, Line 78: (Note: the square brackets are part of the syntax.) this ends up formatted oddly in the pdf, maybe next line or out of the code block PS4, Line 104: <p conref="../shared/impala_common.xml#common/sync_ddl_blurb"/> same as other stmts -- To view, visit http://gerrit.cloudera.org:8080/5646 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I60512b7957fb53d86d3123a4f1d46fbb355f4665 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: John Russell <[email protected]> Gerrit-Reviewer: Ambreen Kazi <[email protected]> Gerrit-Reviewer: Dimitris Tsirogiannis <[email protected]> Gerrit-Reviewer: Jean-Daniel Cryans <[email protected]> Gerrit-Reviewer: John Russell <[email protected]> Gerrit-Reviewer: Matthew Jacobs <[email protected]> Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-HasComments: Yes
