Matthew Jacobs has posted comments on this change. Change subject: IMPALA-5799: Kudu DML can crash if schema has changed ......................................................................
Patch Set 1: (9 comments) http://gerrit.cloudera.org:8080/#/c/7688/1//COMMIT_MSG Commit Message: PS1, Line 14: its it's PS1, Line 18: DML DML is really a SQL concept, technically this should be 'WriteOp' http://gerrit.cloudera.org:8080/#/c/7688/1/be/src/exec/kudu-table-sink.cc File be/src/exec/kudu-table-sink.cc: PS1, Line 144: output_expr_evals_.size() We can't assume this is true. E.g. see the comment in Send(). We actually don't necessarily have the full Table descriptor in the BE (sorry I didn't think of that when we first started talking about this JIRA), so we could either (a) ship that from the FE to the BE or (b) just check the cols that are referenced. I think the latter makes sense, and that you can do something like this: // output_expr_evals_ only contains the columns that the op // applies to, i.e. columns explicitly mentioned in the query, and // referenced_columns is then used to map to actual column positions. int col = kudu_table_sink_.referenced_columns.empty() ? j : kudu_table_sink_.referenced_columns[j]; http://gerrit.cloudera.org:8080/#/c/7688/1/be/src/exec/kudu-util.cc File be/src/exec/kudu-util.cc: PS1, Line 37: DataType = why can't you just have using kudu::client::...::DataType; as we do above? http://gerrit.cloudera.org:8080/#/c/7688/1/be/src/exec/kudu-util.h File be/src/exec/kudu-util.h: Line 81: ColumnType KuduDataTypeToColumnType(kudu::client::KuduColumnSchema::DataType type); comment http://gerrit.cloudera.org:8080/#/c/7688/1/tests/query_test/test_kudu.py File tests/query_test/test_kudu.py: Line 399: def test_concurrent_schema_change(self, cursor, unique_database): is this test reproducing a crash before your change? Line 410: client.execute("insert into %s values (0, 0), (1, 1)" % table_name) I was thinking we could do these operations in a loop to get more opportunities to hit issues. PS1, Line 416: try: : client = self.create_impala_client() : client.execute("alter table %s drop column col1" % table_name) : client.execute("alter table %s add columns (col1 string)" % table_name) : except Exception as e: : threading.current_thread().error = e this you can do on the main thread while the insert thread is inserting PS1, Line 432: 'Kudu error(s) reported' this is the error in the case we sent a write op to Kudu that was w/ the old schema? Is there any more of the error message we can match? This looks very generic. -- To view, visit http://gerrit.cloudera.org:8080/7688 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9fd6bf164310df0041144f75f5ee722665e9f587 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall <[email protected]> Gerrit-Reviewer: Matthew Jacobs <[email protected]> Gerrit-HasComments: Yes
