[Impala-ASF-CR] IMPALA-4828: Alter Kudu schema outside Impala may crash on read
Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-4828: Alter Kudu schema outside Impala may crash on read .. IMPALA-4828: Alter Kudu schema outside Impala may crash on read Creating a table in Impala, changing the column schema outside of Impala, and then reading again in Impala may result in a crash. Impala may attempt to dereference pointers that aren't there. This happens if a string column is dropped and then a new, non string column is added with the old string column's name. The Kudu scan token contains the projection schema, and that is validated when opening the Kudu scanner (with the exception of KUDU-1881), but the issue is that during planning, Impala assumes the types/nullability of columns haven't changed when creating the scan tokens. This is fixed by adding a check when creating the scan token, and failing the query if the column types changed. Impala then relies on the Kudu client to properly validate that the underlying schema is still represented by the scan token, and that deserialization will fail if it no longer matches. Test cases were added for this particular crash scenario, which now fails during planning as expected. This does not attempt to validate the Kudu client validation at deserialization time, though that would be valuable coverage to add in the future. Columns being removed don't produce a crash; the query fails gracefully. A test was added for this case. Columns being added should not affect this scenario, but a test was added anyway. Change-Id: I6d43f5bb9811e728ad592933066d006c8fb4553a Reviewed-on: http://gerrit.cloudera.org:8080/5840 Reviewed-by: Matthew JacobsTested-by: Impala Public Jenkins --- M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M tests/common/kudu_test_suite.py M tests/query_test/test_kudu.py 3 files changed, 227 insertions(+), 6 deletions(-) Approvals: Impala Public Jenkins: Verified Matthew Jacobs: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/5840 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I6d43f5bb9811e728ad592933066d006c8fb4553a Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-4828: Alter Kudu schema outside Impala may crash on read
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4828: Alter Kudu schema outside Impala may crash on read .. Patch Set 9: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/5840 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6d43f5bb9811e728ad592933066d006c8fb4553a Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4828: Alter Kudu schema outside Impala may crash on read
Hello Thomas Tauber-Marshall, Sailesh Mukil, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5840 to look at the new patch set (#9). Change subject: IMPALA-4828: Alter Kudu schema outside Impala may crash on read .. IMPALA-4828: Alter Kudu schema outside Impala may crash on read Creating a table in Impala, changing the column schema outside of Impala, and then reading again in Impala may result in a crash. Impala may attempt to dereference pointers that aren't there. This happens if a string column is dropped and then a new, non string column is added with the old string column's name. The Kudu scan token contains the projection schema, and that is validated when opening the Kudu scanner (with the exception of KUDU-1881), but the issue is that during planning, Impala assumes the types/nullability of columns haven't changed when creating the scan tokens. This is fixed by adding a check when creating the scan token, and failing the query if the column types changed. Impala then relies on the Kudu client to properly validate that the underlying schema is still represented by the scan token, and that deserialization will fail if it no longer matches. Test cases were added for this particular crash scenario, which now fails during planning as expected. This does not attempt to validate the Kudu client validation at deserialization time, though that would be valuable coverage to add in the future. Columns being removed don't produce a crash; the query fails gracefully. A test was added for this case. Columns being added should not affect this scenario, but a test was added anyway. Change-Id: I6d43f5bb9811e728ad592933066d006c8fb4553a --- M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M tests/common/kudu_test_suite.py M tests/query_test/test_kudu.py 3 files changed, 227 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/40/5840/9 -- To view, visit http://gerrit.cloudera.org:8080/5840 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6d43f5bb9811e728ad592933066d006c8fb4553a Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-4828: Alter Kudu schema outside Impala may crash on read
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-4828: Alter Kudu schema outside Impala may crash on read .. Patch Set 9: Code-Review+2 fixed spelling and carry +2 -- To view, visit http://gerrit.cloudera.org:8080/5840 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6d43f5bb9811e728ad592933066d006c8fb4553a Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4828: Alter Kudu schema outside Impala may crash on read
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-4828: Alter Kudu schema outside Impala may crash on read .. Patch Set 7: (2 comments) http://gerrit.cloudera.org:8080/#/c/5840/7/tests/query_test/test_kudu.py File tests/query_test/test_kudu.py: Line 101: def test_kudu_col_changed(self, cursor, kudu_client, unique_database): > Tests lgtm, but there seems to be a lot of repeated boilerplate stuff. Mayb I did the refactoring below but there are enough small differences between these functions that it's not working out nicely to refactor. Line 106: kudu_tbl_name = "impala::%s.foo" % unique_database > seems worth factoring out this function and adding a comment that it must b Done -- To view, visit http://gerrit.cloudera.org:8080/5840 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6d43f5bb9811e728ad592933066d006c8fb4553a Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4828: Alter Kudu schema outside Impala may crash on read
Hello Thomas Tauber-Marshall, Sailesh Mukil, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5840 to look at the new patch set (#8). Change subject: IMPALA-4828: Alter Kudu schema outside Impala may crash on read .. IMPALA-4828: Alter Kudu schema outside Impala may crash on read Creating a table in Impala, changing the column schema outside of Impala, and then reading again in Impala may result in a crash. Impala may attempt to dereference pointers that aren't there. This happens if a string column is dropped and then a new, non string column is added with the old string column's name. The Kudu scan token contains the projection schema, and that is validated when opening the Kudu scanner (with the exception of KUDU-1881), but the issue is that during planning, Impala assumes the types/nullability of columns haven't changed when creating the scan tokens. This is fixed by adding a check when creating the scan token, and failing the query if the column types changed. Impala then relies on the Kudu client to properly validate that the underlying schema is still represented by the scan token, and that deserialization will fail if it no longer matches. Test cases were added for this particular crash scenario, which now fails during planning as expected. This does not attempt to validate the Kudu client validation at deserialization time, though that would be valuable coverage to add in the future. Columns being removed don't produce a crash; the query fails gracefully. A test was added for this case. Columns being added should not affect this scenario, but a test was added anyway. Change-Id: I6d43f5bb9811e728ad592933066d006c8fb4553a --- M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M tests/common/kudu_test_suite.py M tests/query_test/test_kudu.py 3 files changed, 227 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/40/5840/8 -- To view, visit http://gerrit.cloudera.org:8080/5840 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6d43f5bb9811e728ad592933066d006c8fb4553a Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-4828: Alter Kudu schema outside Impala may crash on read
Alex Behm has posted comments on this change. Change subject: IMPALA-4828: Alter Kudu schema outside Impala may crash on read .. Patch Set 7: (2 comments) http://gerrit.cloudera.org:8080/#/c/5840/7/tests/query_test/test_kudu.py File tests/query_test/test_kudu.py: Line 101: def test_kudu_col_changed(self, cursor, kudu_client, unique_database): Tests lgtm, but there seems to be a lot of repeated boilerplate stuff. Maybe you can take another pass and see if it makes sense to factor out some parts to condense the code? Line 106: kudu_tbl_name = "impala::%s.foo" % unique_database seems worth factoring out this function and adding a comment that it must be kept in sync with the FE table-name-generation behavior -- To view, visit http://gerrit.cloudera.org:8080/5840 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6d43f5bb9811e728ad592933066d006c8fb4553a Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4828: Alter Kudu schema outside Impala may crash on read
Hello Thomas Tauber-Marshall, Sailesh Mukil, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5840 to look at the new patch set (#7). Change subject: IMPALA-4828: Alter Kudu schema outside Impala may crash on read .. IMPALA-4828: Alter Kudu schema outside Impala may crash on read Creating a table in Impala, changing the column schema outside of Impala, and then reading again in Impala may result in a crash. Impala may attempt to dereference pointers that aren't there. This happens if a string column is dropped and then a new, non string column is added with the old string column's name. The Kudu scan token contains the projection schema, and that is validated when opening the Kudu scanner (with the exception of KUDU-1881), but the issue is that during planning, Impala assumes the types/nullability of columns haven't changed when creating the scan tokens. This is fixed by adding a check when creating the scan token, and failing the query if the column types changed. Impala then relies on the Kudu client to properly validate that the underlying schema is still represented by the scan token, and that deserialization will fail if it no longer matches. Test cases were added for this particular crash scenario, which now fails during planning as expected. This does not attempt to validate the Kudu client validation at deserialization time, though that would be valuable coverage to add in the future. Columns being removed don't produce a crash; the query fails gracefully. A test was added for this case. Columns being added should not affect this scenario, but a test was added anyway. Change-Id: I6d43f5bb9811e728ad592933066d006c8fb4553a --- M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M tests/query_test/test_kudu.py 2 files changed, 216 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/40/5840/7 -- To view, visit http://gerrit.cloudera.org:8080/5840 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6d43f5bb9811e728ad592933066d006c8fb4553a Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-4828: Alter Kudu schema outside Impala may crash on read
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-4828: Alter Kudu schema outside Impala may crash on read .. Patch Set 6: (2 comments) http://gerrit.cloudera.org:8080/#/c/5840/2/be/src/exec/kudu-scanner.cc File be/src/exec/kudu-scanner.cc: Line 149: > I think Thomas is right and the query may crash if there is a difference in I couldn't reproduce any failures, but looking at TupleDescriptor I agree it seems like it could be possible to crash. Good call Thomas & Alex. I'll check that here too. Will the Kudu-side validation check that the projection is 100% identical to what the scan token expects? Nullability etc. included? In theory it should, thought it looks like it's missing on the Kudu side as well, so I filed a JIRA on them https://issues.apache.org/jira/browse/KUDU-1881 . Thanks for pointing this out. http://gerrit.cloudera.org:8080/#/c/5840/6/fe/src/main/java/org/apache/impala/planner/KuduScanNode.java File fe/src/main/java/org/apache/impala/planner/KuduScanNode.java: Line 161: if (!colType.matchesType(kuduColType)) { > Shouldn't this use equals() instead of matchesType()? Done -- To view, visit http://gerrit.cloudera.org:8080/5840 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6d43f5bb9811e728ad592933066d006c8fb4553a Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4828: Alter Kudu schema outside Impala may crash on read
Alex Behm has posted comments on this change. Change subject: IMPALA-4828: Alter Kudu schema outside Impala may crash on read .. Patch Set 6: (2 comments) http://gerrit.cloudera.org:8080/#/c/5840/2/be/src/exec/kudu-scanner.cc File be/src/exec/kudu-scanner.cc: Line 149: > Good point, though I'll just leave this as a TODO for now because we don't I think Thomas is right and the query may crash if there is a difference in column nullability. We do rely on the nullability. The SlotDescriptors are set to nullable or non-nullable which affects the memory layout of the tuple. If the tuples returned by Kudu are not 100% identical to what Impala expects, you may get non-deterministic wrong results (and maybe a crash). See the class comment in TupleDescriptor.java regarding the Kudu layout. In response to one of your comments: "After discussing with Dan on the Kudu team, we can make this simpler by checking at plan time because the kudu scan token encodes the col metadata and deserializing it will fail if the projection schema is no longer valid. The issue for us was that we did not check the Kudu col type matched our col type at plan time." Will the Kudu-side validation check that the projection is 100% identical to what the scan token expects? Nullability etc. included? http://gerrit.cloudera.org:8080/#/c/5840/6/fe/src/main/java/org/apache/impala/planner/KuduScanNode.java File fe/src/main/java/org/apache/impala/planner/KuduScanNode.java: Line 161: if (!colType.matchesType(kuduColType)) { Shouldn't this use equals() instead of matchesType()? -- To view, visit http://gerrit.cloudera.org:8080/5840 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6d43f5bb9811e728ad592933066d006c8fb4553a Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4828: Alter Kudu schema outside Impala may crash on read
Thomas Tauber-Marshall has posted comments on this change. Change subject: IMPALA-4828: Alter Kudu schema outside Impala may crash on read .. Patch Set 6: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/5840 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6d43f5bb9811e728ad592933066d006c8fb4553a Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4828: Alter Kudu schema outside Impala may crash on read
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-4828: Alter Kudu schema outside Impala may crash on read .. Patch Set 6: Code-Review+1 carrying Sailesh's +1 -- To view, visit http://gerrit.cloudera.org:8080/5840 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6d43f5bb9811e728ad592933066d006c8fb4553a Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4828: Alter Kudu schema outside Impala may crash on read
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-4828: Alter Kudu schema outside Impala may crash on read .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/5840/5//COMMIT_MSG Commit Message: PS5, Line 22: Kudu scanner > nit: I got confused thinking it was our KuduScanner. It's happening on the Done -- To view, visit http://gerrit.cloudera.org:8080/5840 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6d43f5bb9811e728ad592933066d006c8fb4553a Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4828: Alter Kudu schema outside Impala may crash on read
Hello Sailesh Mukil, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5840 to look at the new patch set (#6). Change subject: IMPALA-4828: Alter Kudu schema outside Impala may crash on read .. IMPALA-4828: Alter Kudu schema outside Impala may crash on read Creating a table in Impala, changing the column schema outside of Impala, and then reading again in Impala may result in a crash. Impala may attempt to dereference pointers that aren't there. This happens if a string column is dropped and then a new, non string column is added with the old string column's name. The Kudu scan token contains the projection schema, and that is validated when opening the Kudu scanner, but the issue is that during planning, Impala assumes the types of columns haven't changed when creating the scan tokens. This is fixed by adding a check when creating the scan token, and failing the query if the column types changed. Impala then relies on the Kudu client to properly validate that the underlying schema is still represented by the scan token, and that deserialization will fail if it no longer matches. A test case was added for this particular crash scenario, which now fails during planning as expected. This does not attempt to validate the Kudu client validation at deserialization time, though that would be valuable coverage to add in the future. Columns being removed don't produce a crash; the query fails gracefully. A test was added for this case. Columns being added should not affect this scenario, but a test was added anyway. Change-Id: I6d43f5bb9811e728ad592933066d006c8fb4553a --- M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M tests/query_test/test_kudu.py 2 files changed, 116 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/40/5840/6 -- To view, visit http://gerrit.cloudera.org:8080/5840 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6d43f5bb9811e728ad592933066d006c8fb4553a Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-4828: Alter Kudu schema outside Impala may crash on read
Sailesh Mukil has posted comments on this change. Change subject: IMPALA-4828: Alter Kudu schema outside Impala may crash on read .. Patch Set 5: Code-Review+1 (1 comment) http://gerrit.cloudera.org:8080/#/c/5840/5//COMMIT_MSG Commit Message: PS5, Line 22: Kudu scanner nit: I got confused thinking it was our KuduScanner. It's happening on the Kudu side right? If so, maybe change this to "Kudu's client scanner"? -- To view, visit http://gerrit.cloudera.org:8080/5840 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6d43f5bb9811e728ad592933066d006c8fb4553a Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4828: Alter Kudu schema outside Impala may crash on read
Matthew Jacobs has uploaded a new patch set (#5). Change subject: IMPALA-4828: Alter Kudu schema outside Impala may crash on read .. IMPALA-4828: Alter Kudu schema outside Impala may crash on read Creating a table in Impala, changing the column schema outside of Impala, and then reading again in Impala may result in a crash. Impala may attempt to dereference pointers that aren't there. This happens if a string column is dropped and then a new, non string column is added with the old string column's name. The Kudu scan token contains the projection schema, and that is validated when opening the Kudu scanner, but the issue is that during planning, Impala assumes the types of columns haven't changed when creating the scan tokens. This is fixed by adding a check when creating the scan token, and failing the query if the column types changed. Impala then relies on the Kudu scanner to properly validate that the underlying schema is still represented by the scan token, and that deserialization will fail if it no longer matches. A test case was added for this particular crash scenario, which now fails during planning as expected. This does not attempt to validate the Kudu client validation at deserialization time, though that would be valuable coverage to add in the future. Columns being removed don't produce a crash; the query fails gracefully. A test was added for this case. Columns being added should not affect this scenario, but a test was added anyway. Change-Id: I6d43f5bb9811e728ad592933066d006c8fb4553a --- M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java M tests/query_test/test_kudu.py 2 files changed, 116 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/40/5840/5 -- To view, visit http://gerrit.cloudera.org:8080/5840 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6d43f5bb9811e728ad592933066d006c8fb4553a Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-4828: Alter Kudu schema outside Impala may crash on read
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-4828: Alter Kudu schema outside Impala may crash on read .. Patch Set 2: After discussing with Dan on the Kudu team, we can make this simpler by checking at plan time because the kudu scan token encodes the col metadata and deserializing it will fail if the projection schema is no longer valid. The issue for us was that we did not check the Kudu col type matched our col type at plan time. -- To view, visit http://gerrit.cloudera.org:8080/5840 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6d43f5bb9811e728ad592933066d006c8fb4553a Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4828: Alter Kudu schema outside Impala may crash on read
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-4828: Alter Kudu schema outside Impala may crash on read .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/5840/2/be/src/exec/kudu-scanner.cc File be/src/exec/kudu-scanner.cc: PS2, Line 140: } else if (scanner_schema.num_columns() > tuple_desc->slots().size()) { : state_->LogError(ErrorMsg::Init(TErrorCode::KUDU_COL_MISMATCH, : scan_node_->table_->name(), scanner_schema.num_columns(), : tuple_desc->slots().size())); After thinking about this more, I don't think it should be possible to find more than we expected in planning. The reason is that in planning we get a scan token which knows about the expected columns. That gets deserialized here, so there shouldn't be any new cols that could be added to it. Even if new cols were added to the table. I made this a DCHECK but I'll double check my thinking with the Kudu team. -- To view, visit http://gerrit.cloudera.org:8080/5840 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6d43f5bb9811e728ad592933066d006c8fb4553a Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4828: Alter Kudu schema outside Impala may crash on read
Thomas Tauber-Marshall has posted comments on this change. Change subject: IMPALA-4828: Alter Kudu schema outside Impala may crash on read .. Patch Set 2: (3 comments) http://gerrit.cloudera.org:8080/#/c/5840/2/be/src/exec/kudu-scanner.cc File be/src/exec/kudu-scanner.cc: Line 149: if (slot->type().type != KuduColTypeToPrimitiveType(col.type())) { This doesn't address the case where you removed a column and then added a new one with the same type but with other attributes changed, eg. name and nullability. Do we keep our column names consistent with the column names in Kudu? If so, maybe we should log an error in that case, though probably not fail the query. http://gerrit.cloudera.org:8080/#/c/5840/2/common/thrift/generate_error_codes.py File common/thrift/generate_error_codes.py: PS2, Line 317: KUDU_COL_MISMATCH > nit: maybe call this 'KUDU_NUM_COLS_MISMATCH', else it sounds like it could Maybe also explicitly mention refreshing here, as above. http://gerrit.cloudera.org:8080/#/c/5840/2/tests/query_test/test_kudu.py File tests/query_test/test_kudu.py: Line 148: nit: extra line -- To view, visit http://gerrit.cloudera.org:8080/5840 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6d43f5bb9811e728ad592933066d006c8fb4553a Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4828: Alter Kudu schema outside Impala may crash on read
Matthew Jacobs has uploaded a new patch set (#2). Change subject: IMPALA-4828: Alter Kudu schema outside Impala may crash on read .. IMPALA-4828: Alter Kudu schema outside Impala may crash on read Creating a table in Impala, changing the column schema outside of Impala, and then reading again in Impala may result in a crash. Neither Impala nor the Kudu client validates the schema immediately before reading, so Impala may attempt to dereference pointers that aren't there. This happens if a string column is dropped and then a new, non string column is added with the old string column's name. This change adds validation after opening a scan token to ensure the projection schema matches the expected schema. The scan is guaranteed to be valid while the KuduScanner is open, even if the schema changes concurrently. This is the earliest point and coursest granularity with which Impala can detect and gracefully handle a schema mismatch. Also handles the cases where columns were added or removed. Testing: Adds test cases for these scenarios. Change-Id: I6d43f5bb9811e728ad592933066d006c8fb4553a --- M be/src/exec/kudu-scanner.cc M be/src/exec/kudu-scanner.h M be/src/exec/kudu-util.cc M be/src/exec/kudu-util.h M be/src/runtime/descriptors.h M common/thrift/generate_error_codes.py M tests/query_test/test_kudu.py 7 files changed, 179 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/40/5840/2 -- To view, visit http://gerrit.cloudera.org:8080/5840 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6d43f5bb9811e728ad592933066d006c8fb4553a Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-4828: Alter Kudu schema outside Impala may crash on read
Sailesh Mukil has posted comments on this change. Change subject: IMPALA-4828: Alter Kudu schema outside Impala may crash on read .. Patch Set 1: (4 comments) http://gerrit.cloudera.org:8080/#/c/5840/1//COMMIT_MSG Commit Message: Line 20: open, even if the schema changes concurrently. It would probably be worth adding that this is the earliest point and the coarsest granularity at which we can successfully detect a schema change if there is any, without leaving room for error, just to assuage any concerns of whether we're doing this check too often. http://gerrit.cloudera.org:8080/#/c/5840/1/be/src/exec/kudu-scanner.cc File be/src/exec/kudu-scanner.cc: PS1, Line 133: for (int i = 0; i < tuple_desc->slots().size(); ++i) { As we spoke, if an ALTER TABLE is done and a column is removed, a crash might occur in this loop. Another point I wanted to call out: What's the expected behavior when a column is added? 1) Currently, this might pass and we will probably return the results for N-1 columns successfully. Is that what we want? 2) Or should we fail the query? 3) Or should we return the results and also give a warning asking the user to REFRESH? My vote is for the 3rd option, but I'm open to others take on this. http://gerrit.cloudera.org:8080/#/c/5840/1/be/src/exec/kudu-util.cc File be/src/exec/kudu-util.cc: PS1, Line 77: PrimitiveType KuduColTypeToPrimitiveType( : const KuduColumnSchema::DataType& type) { This may be an over optimization, but what if we had an array mapping KuduColumnSchemas to PrimitiveTypes? Since they both are just enums. Just worried about the case where there are a large number of columns and we bounce on the switch statement for every one of them for every scan token. If you don't think it's too pressing, we can forego that. http://gerrit.cloudera.org:8080/#/c/5840/1/common/thrift/generate_error_codes.py File common/thrift/generate_error_codes.py: PS1, Line 314: is type nit: is of type -- To view, visit http://gerrit.cloudera.org:8080/5840 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6d43f5bb9811e728ad592933066d006c8fb4553a Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4828: Alter Kudu schema outside Impala may crash on read
Dan Hecht has posted comments on this change. Change subject: IMPALA-4828: Alter Kudu schema outside Impala may crash on read .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/5840/1/be/src/exec/kudu-scanner.cc File be/src/exec/kudu-scanner.cc: Line 150: "Unable to deserialize scan token"); a quick comment explaining why this is needed would be helpful. -- To view, visit http://gerrit.cloudera.org:8080/5840 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6d43f5bb9811e728ad592933066d006c8fb4553a Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew JacobsGerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4828: Alter Kudu schema outside Impala may crash on read
Matthew Jacobs has uploaded a new change for review. http://gerrit.cloudera.org:8080/5840 Change subject: IMPALA-4828: Alter Kudu schema outside Impala may crash on read .. IMPALA-4828: Alter Kudu schema outside Impala may crash on read Creating a table in Impala, changing the column schema outside of Impala, and then reading again in Impala may result in a crash. Neither Impala nor the Kudu client validates the schema immediately before reading, so Impala may attempt to dereference pointers that aren't there. This happens if a string column is dropped and then a new, non string column is added with the old string column's name. This change adds validation after opening a scan token to ensure the projection schema matches the expected schema. The scan is guaranteed to be valid while the KuduScanner is open, even if the schema changes concurrently. Testing: Adds a test case that previously would crash Impala. Change-Id: I6d43f5bb9811e728ad592933066d006c8fb4553a --- M be/src/exec/kudu-scanner.cc M be/src/exec/kudu-scanner.h M be/src/exec/kudu-util.cc M be/src/exec/kudu-util.h M be/src/runtime/descriptors.h M common/thrift/generate_error_codes.py M tests/query_test/test_kudu.py 7 files changed, 99 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/40/5840/1 -- To view, visit http://gerrit.cloudera.org:8080/5840 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I6d43f5bb9811e728ad592933066d006c8fb4553a Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Matthew Jacobs