[Impala-ASF-CR] IMPALA-4828: Alter Kudu schema outside Impala may crash on read

2017-02-17 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-4828: Alter Kudu schema outside Impala may crash on read
..


IMPALA-4828: Alter Kudu schema outside Impala may crash on read

Creating a table in Impala, changing the column schema
outside of Impala, and then reading again in Impala may
result in a crash. Impala may attempt to dereference
pointers that aren't there. This happens if a string column
is dropped and then a new, non string column is added with
the old string column's name.

The Kudu scan token contains the projection schema, and that
is validated when opening the Kudu scanner (with the
exception of KUDU-1881), but the issue is that during
planning, Impala assumes the types/nullability of columns
haven't changed when creating the scan tokens. This is fixed
by adding a check when creating the scan token, and failing
the query if the column types changed. Impala then relies on
the Kudu client to properly validate that the underlying
schema is still represented by the scan token, and that
deserialization will fail if it no longer matches. Test
cases were added for this particular crash scenario, which now
fails during planning as expected. This does not attempt to
validate the Kudu client validation at deserialization time,
though that would be valuable coverage to add in the future.

Columns being removed don't produce a crash; the query fails
gracefully. A test was added for this case.

Columns being added should not affect this scenario, but a
test was added anyway.

Change-Id: I6d43f5bb9811e728ad592933066d006c8fb4553a
Reviewed-on: http://gerrit.cloudera.org:8080/5840
Reviewed-by: Matthew Jacobs 
Tested-by: Impala Public Jenkins
---
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M tests/common/kudu_test_suite.py
M tests/query_test/test_kudu.py
3 files changed, 227 insertions(+), 6 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Matthew Jacobs: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/5840
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I6d43f5bb9811e728ad592933066d006c8fb4553a
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-4828: Alter Kudu schema outside Impala may crash on read

2017-02-17 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4828: Alter Kudu schema outside Impala may crash on read
..


Patch Set 9: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/5840
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6d43f5bb9811e728ad592933066d006c8fb4553a
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4828: Alter Kudu schema outside Impala may crash on read

2017-02-17 Thread Matthew Jacobs (Code Review)
Hello Thomas Tauber-Marshall, Sailesh Mukil, Alex Behm,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5840

to look at the new patch set (#9).

Change subject: IMPALA-4828: Alter Kudu schema outside Impala may crash on read
..

IMPALA-4828: Alter Kudu schema outside Impala may crash on read

Creating a table in Impala, changing the column schema
outside of Impala, and then reading again in Impala may
result in a crash. Impala may attempt to dereference
pointers that aren't there. This happens if a string column
is dropped and then a new, non string column is added with
the old string column's name.

The Kudu scan token contains the projection schema, and that
is validated when opening the Kudu scanner (with the
exception of KUDU-1881), but the issue is that during
planning, Impala assumes the types/nullability of columns
haven't changed when creating the scan tokens. This is fixed
by adding a check when creating the scan token, and failing
the query if the column types changed. Impala then relies on
the Kudu client to properly validate that the underlying
schema is still represented by the scan token, and that
deserialization will fail if it no longer matches. Test
cases were added for this particular crash scenario, which now
fails during planning as expected. This does not attempt to
validate the Kudu client validation at deserialization time,
though that would be valuable coverage to add in the future.

Columns being removed don't produce a crash; the query fails
gracefully. A test was added for this case.

Columns being added should not affect this scenario, but a
test was added anyway.

Change-Id: I6d43f5bb9811e728ad592933066d006c8fb4553a
---
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M tests/common/kudu_test_suite.py
M tests/query_test/test_kudu.py
3 files changed, 227 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/40/5840/9
-- 
To view, visit http://gerrit.cloudera.org:8080/5840
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6d43f5bb9811e728ad592933066d006c8fb4553a
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-4828: Alter Kudu schema outside Impala may crash on read

2017-02-17 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-4828: Alter Kudu schema outside Impala may crash on read
..


Patch Set 9: Code-Review+2

fixed spelling and carry +2

-- 
To view, visit http://gerrit.cloudera.org:8080/5840
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6d43f5bb9811e728ad592933066d006c8fb4553a
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4828: Alter Kudu schema outside Impala may crash on read

2017-02-16 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-4828: Alter Kudu schema outside Impala may crash on read
..


Patch Set 7:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/5840/7/tests/query_test/test_kudu.py
File tests/query_test/test_kudu.py:

Line 101:   def test_kudu_col_changed(self, cursor, kudu_client, 
unique_database):
> Tests lgtm, but there seems to be a lot of repeated boilerplate stuff. Mayb
I did the refactoring below but there are enough small differences between 
these functions that it's not working out nicely to refactor.


Line 106: kudu_tbl_name = "impala::%s.foo" % unique_database
> seems worth factoring out this function and adding a comment that it must b
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/5840
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6d43f5bb9811e728ad592933066d006c8fb4553a
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4828: Alter Kudu schema outside Impala may crash on read

2017-02-16 Thread Matthew Jacobs (Code Review)
Hello Thomas Tauber-Marshall, Sailesh Mukil,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5840

to look at the new patch set (#8).

Change subject: IMPALA-4828: Alter Kudu schema outside Impala may crash on read
..

IMPALA-4828: Alter Kudu schema outside Impala may crash on read

Creating a table in Impala, changing the column schema
outside of Impala, and then reading again in Impala may
result in a crash. Impala may attempt to dereference
pointers that aren't there. This happens if a string column
is dropped and then a new, non string column is added with
the old string column's name.

The Kudu scan token contains the projection schema, and that
is validated when opening the Kudu scanner (with the
exception of KUDU-1881), but the issue is that during
planning, Impala assumes the types/nullability of columns
haven't changed when creating the scan tokens. This is fixed
by adding a check when creating the scan token, and failing
the query if the column types changed. Impala then relies on
the Kudu client to properly validate that the underlying
schema is still represented by the scan token, and that
deserialization will fail if it no longer matches. Test
cases were added for this particular crash scenario, which now
fails during planning as expected. This does not attempt to
validate the Kudu client validation at deserialization time,
though that would be valuable coverage to add in the future.

Columns being removed don't produce a crash; the query fails
gracefully. A test was added for this case.

Columns being added should not affect this scenario, but a
test was added anyway.

Change-Id: I6d43f5bb9811e728ad592933066d006c8fb4553a
---
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M tests/common/kudu_test_suite.py
M tests/query_test/test_kudu.py
3 files changed, 227 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/40/5840/8
-- 
To view, visit http://gerrit.cloudera.org:8080/5840
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6d43f5bb9811e728ad592933066d006c8fb4553a
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-4828: Alter Kudu schema outside Impala may crash on read

2017-02-15 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-4828: Alter Kudu schema outside Impala may crash on read
..


Patch Set 7:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/5840/7/tests/query_test/test_kudu.py
File tests/query_test/test_kudu.py:

Line 101:   def test_kudu_col_changed(self, cursor, kudu_client, 
unique_database):
Tests lgtm, but there seems to be a lot of repeated boilerplate stuff. Maybe 
you can take another pass and see if it makes sense to factor out some parts to 
condense the code?


Line 106: kudu_tbl_name = "impala::%s.foo" % unique_database
seems worth factoring out this function and adding a comment that it must be 
kept in sync with the FE table-name-generation behavior


-- 
To view, visit http://gerrit.cloudera.org:8080/5840
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6d43f5bb9811e728ad592933066d006c8fb4553a
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4828: Alter Kudu schema outside Impala may crash on read

2017-02-15 Thread Matthew Jacobs (Code Review)
Hello Thomas Tauber-Marshall, Sailesh Mukil,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5840

to look at the new patch set (#7).

Change subject: IMPALA-4828: Alter Kudu schema outside Impala may crash on read
..

IMPALA-4828: Alter Kudu schema outside Impala may crash on read

Creating a table in Impala, changing the column schema
outside of Impala, and then reading again in Impala may
result in a crash. Impala may attempt to dereference
pointers that aren't there. This happens if a string column
is dropped and then a new, non string column is added with
the old string column's name.

The Kudu scan token contains the projection schema, and that
is validated when opening the Kudu scanner (with the
exception of KUDU-1881), but the issue is that during
planning, Impala assumes the types/nullability of columns
haven't changed when creating the scan tokens. This is fixed
by adding a check when creating the scan token, and failing
the query if the column types changed. Impala then relies on
the Kudu client to properly validate that the underlying
schema is still represented by the scan token, and that
deserialization will fail if it no longer matches. Test
cases were added for this particular crash scenario, which now
fails during planning as expected. This does not attempt to
validate the Kudu client validation at deserialization time,
though that would be valuable coverage to add in the future.

Columns being removed don't produce a crash; the query fails
gracefully. A test was added for this case.

Columns being added should not affect this scenario, but a
test was added anyway.

Change-Id: I6d43f5bb9811e728ad592933066d006c8fb4553a
---
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M tests/query_test/test_kudu.py
2 files changed, 216 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/40/5840/7
-- 
To view, visit http://gerrit.cloudera.org:8080/5840
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6d43f5bb9811e728ad592933066d006c8fb4553a
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-4828: Alter Kudu schema outside Impala may crash on read

2017-02-15 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-4828: Alter Kudu schema outside Impala may crash on read
..


Patch Set 6:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/5840/2/be/src/exec/kudu-scanner.cc
File be/src/exec/kudu-scanner.cc:

Line 149: 
> I think Thomas is right and the query may crash if there is a difference in
I couldn't reproduce any failures, but looking at TupleDescriptor I agree it 
seems like it could be possible to crash. Good call Thomas & Alex. I'll check 
that here too.

Will the Kudu-side validation check that the projection is 100% identical to 
what the scan token expects? Nullability etc. included?
In theory it should, thought it looks like it's missing on the Kudu side as 
well, so I filed a JIRA on them https://issues.apache.org/jira/browse/KUDU-1881 
. Thanks for pointing this out.


http://gerrit.cloudera.org:8080/#/c/5840/6/fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
File fe/src/main/java/org/apache/impala/planner/KuduScanNode.java:

Line 161:   if (!colType.matchesType(kuduColType)) {
> Shouldn't this use equals() instead of matchesType()?
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/5840
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6d43f5bb9811e728ad592933066d006c8fb4553a
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4828: Alter Kudu schema outside Impala may crash on read

2017-02-14 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-4828: Alter Kudu schema outside Impala may crash on read
..


Patch Set 6:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/5840/2/be/src/exec/kudu-scanner.cc
File be/src/exec/kudu-scanner.cc:

Line 149: 
> Good point, though I'll just leave this as a TODO for now because we don't 
I think Thomas is right and the query may crash if there is a difference in 
column nullability.

We do rely on the nullability. The SlotDescriptors are set to nullable or 
non-nullable which affects the memory layout of the tuple. If the tuples 
returned by Kudu are not 100% identical to what Impala expects, you may get 
non-deterministic wrong results (and maybe a crash). See the class comment in 
TupleDescriptor.java regarding the Kudu layout.

In response to one of your comments:
"After discussing with Dan on the Kudu team, we can make this simpler by 
checking at plan time because the kudu scan token encodes the col metadata and 
deserializing it will fail if the projection schema is no longer valid. The 
issue for us was that we did not check the Kudu col type matched our col type 
at plan time."

Will the Kudu-side validation check that the projection is 100% identical to 
what the scan token expects? Nullability etc. included?


http://gerrit.cloudera.org:8080/#/c/5840/6/fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
File fe/src/main/java/org/apache/impala/planner/KuduScanNode.java:

Line 161:   if (!colType.matchesType(kuduColType)) {
Shouldn't this use equals() instead of matchesType()?


-- 
To view, visit http://gerrit.cloudera.org:8080/5840
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6d43f5bb9811e728ad592933066d006c8fb4553a
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4828: Alter Kudu schema outside Impala may crash on read

2017-02-10 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change.

Change subject: IMPALA-4828: Alter Kudu schema outside Impala may crash on read
..


Patch Set 6: Code-Review+1

-- 
To view, visit http://gerrit.cloudera.org:8080/5840
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6d43f5bb9811e728ad592933066d006c8fb4553a
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4828: Alter Kudu schema outside Impala may crash on read

2017-02-10 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-4828: Alter Kudu schema outside Impala may crash on read
..


Patch Set 6: Code-Review+1

carrying Sailesh's +1

-- 
To view, visit http://gerrit.cloudera.org:8080/5840
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6d43f5bb9811e728ad592933066d006c8fb4553a
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4828: Alter Kudu schema outside Impala may crash on read

2017-02-10 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-4828: Alter Kudu schema outside Impala may crash on read
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5840/5//COMMIT_MSG
Commit Message:

PS5, Line 22: Kudu scanner
> nit: I got confused thinking it was our KuduScanner. It's happening on the 
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/5840
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6d43f5bb9811e728ad592933066d006c8fb4553a
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4828: Alter Kudu schema outside Impala may crash on read

2017-02-10 Thread Matthew Jacobs (Code Review)
Hello Sailesh Mukil,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/5840

to look at the new patch set (#6).

Change subject: IMPALA-4828: Alter Kudu schema outside Impala may crash on read
..

IMPALA-4828: Alter Kudu schema outside Impala may crash on read

Creating a table in Impala, changing the column schema
outside of Impala, and then reading again in Impala may
result in a crash. Impala may attempt to dereference
pointers that aren't there. This happens if a string column
is dropped and then a new, non string column is added with
the old string column's name.

The Kudu scan token contains the projection schema, and that
is validated when opening the Kudu scanner, but the issue is
that during planning, Impala assumes the types of columns
haven't changed when creating the scan tokens. This is fixed
by adding a check when creating the scan token, and failing
the query if the column types changed. Impala then relies on
the Kudu client to properly validate that the underlying
schema is still represented by the scan token, and that
deserialization will fail if it no longer matches. A test
case was added for this particular crash scenario, which now
fails during planning as expected. This does not attempt to
validate the Kudu client validation at deserialization time,
though that would be valuable coverage to add in the future.

Columns being removed don't produce a crash; the query fails
gracefully. A test was added for this case.

Columns being added should not affect this scenario, but a
test was added anyway.

Change-Id: I6d43f5bb9811e728ad592933066d006c8fb4553a
---
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M tests/query_test/test_kudu.py
2 files changed, 116 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/40/5840/6
-- 
To view, visit http://gerrit.cloudera.org:8080/5840
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6d43f5bb9811e728ad592933066d006c8fb4553a
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-4828: Alter Kudu schema outside Impala may crash on read

2017-02-10 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change.

Change subject: IMPALA-4828: Alter Kudu schema outside Impala may crash on read
..


Patch Set 5: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5840/5//COMMIT_MSG
Commit Message:

PS5, Line 22: Kudu scanner
nit: I got confused thinking it was our KuduScanner. It's happening on the Kudu 
side right? If so, maybe change this to "Kudu's client scanner"?


-- 
To view, visit http://gerrit.cloudera.org:8080/5840
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6d43f5bb9811e728ad592933066d006c8fb4553a
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4828: Alter Kudu schema outside Impala may crash on read

2017-02-08 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has uploaded a new patch set (#5).

Change subject: IMPALA-4828: Alter Kudu schema outside Impala may crash on read
..

IMPALA-4828: Alter Kudu schema outside Impala may crash on read

Creating a table in Impala, changing the column schema
outside of Impala, and then reading again in Impala may
result in a crash. Impala may attempt to dereference
pointers that aren't there. This happens if a string column
is dropped and then a new, non string column is added with
the old string column's name.

The Kudu scan token contains the projection schema, and that
is validated when opening the Kudu scanner, but the issue is
that during planning, Impala assumes the types of columns
haven't changed when creating the scan tokens. This is fixed
by adding a check when creating the scan token, and failing
the query if the column types changed. Impala then relies on
the Kudu scanner to properly validate that the underlying
schema is still represented by the scan token, and that
deserialization will fail if it no longer matches. A test
case was added for this particular crash scenario, which now
fails during planning as expected. This does not attempt to
validate the Kudu client validation at deserialization time,
though that would be valuable coverage to add in the future.

Columns being removed don't produce a crash; the query fails
gracefully. A test was added for this case.

Columns being added should not affect this scenario, but a
test was added anyway.

Change-Id: I6d43f5bb9811e728ad592933066d006c8fb4553a
---
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M tests/query_test/test_kudu.py
2 files changed, 116 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/40/5840/5
-- 
To view, visit http://gerrit.cloudera.org:8080/5840
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6d43f5bb9811e728ad592933066d006c8fb4553a
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-4828: Alter Kudu schema outside Impala may crash on read

2017-02-08 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-4828: Alter Kudu schema outside Impala may crash on read
..


Patch Set 2:

After discussing with Dan on the Kudu team, we can make this simpler by 
checking at plan time because the kudu scan token encodes the col metadata and 
deserializing it will fail if the projection schema is no longer valid. The 
issue for us was that we did not check the Kudu col type matched our col type 
at plan time.

-- 
To view, visit http://gerrit.cloudera.org:8080/5840
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6d43f5bb9811e728ad592933066d006c8fb4553a
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4828: Alter Kudu schema outside Impala may crash on read

2017-02-07 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-4828: Alter Kudu schema outside Impala may crash on read
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5840/2/be/src/exec/kudu-scanner.cc
File be/src/exec/kudu-scanner.cc:

PS2, Line 140:   } else if (scanner_schema.num_columns() > 
tuple_desc->slots().size()) {
 : 
state_->LogError(ErrorMsg::Init(TErrorCode::KUDU_COL_MISMATCH,
 : scan_node_->table_->name(), scanner_schema.num_columns(),
 : tuple_desc->slots().size()));
After thinking about this more, I don't think it should be possible to find 
more than we expected in planning. The reason is that in planning we get a scan 
token which knows about the expected columns. That gets deserialized here, so 
there shouldn't be any new cols that could be added to it. Even if new cols 
were added to the table. I made this a DCHECK but I'll double check my thinking 
with the Kudu team.


-- 
To view, visit http://gerrit.cloudera.org:8080/5840
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6d43f5bb9811e728ad592933066d006c8fb4553a
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4828: Alter Kudu schema outside Impala may crash on read

2017-02-07 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change.

Change subject: IMPALA-4828: Alter Kudu schema outside Impala may crash on read
..


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/5840/2/be/src/exec/kudu-scanner.cc
File be/src/exec/kudu-scanner.cc:

Line 149: if (slot->type().type != KuduColTypeToPrimitiveType(col.type())) {
This doesn't address the case where you removed a column and then added a new 
one with the same type but with other attributes changed, eg. name and 
nullability.

Do we keep our column names consistent with the column names in Kudu? If so, 
maybe we should log an error in that case, though probably not fail the query.


http://gerrit.cloudera.org:8080/#/c/5840/2/common/thrift/generate_error_codes.py
File common/thrift/generate_error_codes.py:

PS2, Line 317: KUDU_COL_MISMATCH
> nit: maybe call this 'KUDU_NUM_COLS_MISMATCH', else it sounds like it could
Maybe also explicitly mention refreshing here, as above.


http://gerrit.cloudera.org:8080/#/c/5840/2/tests/query_test/test_kudu.py
File tests/query_test/test_kudu.py:

Line 148: 
nit: extra line


-- 
To view, visit http://gerrit.cloudera.org:8080/5840
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6d43f5bb9811e728ad592933066d006c8fb4553a
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4828: Alter Kudu schema outside Impala may crash on read

2017-02-03 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has uploaded a new patch set (#2).

Change subject: IMPALA-4828: Alter Kudu schema outside Impala may crash on read
..

IMPALA-4828: Alter Kudu schema outside Impala may crash on read

Creating a table in Impala, changing the column schema
outside of Impala, and then reading again in Impala may
result in a crash. Neither Impala nor the Kudu client
validates the schema immediately before reading, so Impala
may attempt to dereference pointers that aren't there. This
happens if a string column is dropped and then a new, non
string column is added with the old string column's name.

This change adds validation after opening a scan token to
ensure the projection schema matches the expected schema.
The scan is guaranteed to be valid while the KuduScanner is
open, even if the schema changes concurrently. This is the
earliest point and coursest granularity with which Impala
can detect and gracefully handle a schema mismatch.

Also handles the cases where columns were added or removed.

Testing: Adds test cases for these scenarios.

Change-Id: I6d43f5bb9811e728ad592933066d006c8fb4553a
---
M be/src/exec/kudu-scanner.cc
M be/src/exec/kudu-scanner.h
M be/src/exec/kudu-util.cc
M be/src/exec/kudu-util.h
M be/src/runtime/descriptors.h
M common/thrift/generate_error_codes.py
M tests/query_test/test_kudu.py
7 files changed, 179 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/40/5840/2
-- 
To view, visit http://gerrit.cloudera.org:8080/5840
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6d43f5bb9811e728ad592933066d006c8fb4553a
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-4828: Alter Kudu schema outside Impala may crash on read

2017-02-03 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change.

Change subject: IMPALA-4828: Alter Kudu schema outside Impala may crash on read
..


Patch Set 1:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/5840/1//COMMIT_MSG
Commit Message:

Line 20: open, even if the schema changes concurrently.
It would probably be worth adding that this is the earliest point and the 
coarsest granularity at which we can successfully detect a schema change if 
there is any, without leaving room for error, just to assuage any concerns of 
whether we're doing this check too often.


http://gerrit.cloudera.org:8080/#/c/5840/1/be/src/exec/kudu-scanner.cc
File be/src/exec/kudu-scanner.cc:

PS1, Line 133: for (int i = 0; i < tuple_desc->slots().size(); ++i) {
As we spoke, if an ALTER TABLE is done and a column is removed, a crash might 
occur in this loop.

Another point I wanted to call out: What's the expected behavior when a column 
is added?
 1) Currently, this might pass and we will probably return the results for N-1 
columns successfully. Is that what we want?

 2) Or should we fail the query?

 3) Or should we return the results and also give a warning asking the user to 
REFRESH?


My vote is for the 3rd option, but I'm open to others take on this.


http://gerrit.cloudera.org:8080/#/c/5840/1/be/src/exec/kudu-util.cc
File be/src/exec/kudu-util.cc:

PS1, Line 77: PrimitiveType KuduColTypeToPrimitiveType(
: const KuduColumnSchema::DataType& type) {
This may be an over optimization, but what if we had an array mapping 
KuduColumnSchemas to PrimitiveTypes? Since they both are just enums.

Just worried about the case where there are a large number of columns and we 
bounce on the switch statement for every one of them for every scan token. If 
you don't think it's too pressing, we can forego that.


http://gerrit.cloudera.org:8080/#/c/5840/1/common/thrift/generate_error_codes.py
File common/thrift/generate_error_codes.py:

PS1, Line 314: is type
nit: is of type


-- 
To view, visit http://gerrit.cloudera.org:8080/5840
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6d43f5bb9811e728ad592933066d006c8fb4553a
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4828: Alter Kudu schema outside Impala may crash on read

2017-02-03 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-4828: Alter Kudu schema outside Impala may crash on read
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/5840/1/be/src/exec/kudu-scanner.cc
File be/src/exec/kudu-scanner.cc:

Line 150:   "Unable to deserialize scan token");
a quick comment explaining why this is needed would be helpful.


-- 
To view, visit http://gerrit.cloudera.org:8080/5840
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6d43f5bb9811e728ad592933066d006c8fb4553a
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4828: Alter Kudu schema outside Impala may crash on read

2017-01-31 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/5840

Change subject: IMPALA-4828: Alter Kudu schema outside Impala may crash on read
..

IMPALA-4828: Alter Kudu schema outside Impala may crash on read

Creating a table in Impala, changing the column schema
outside of Impala, and then reading again in Impala may
result in a crash. Neither Impala nor the Kudu client
validates the schema immediately before reading, so Impala
may attempt to dereference pointers that aren't there. This
happens if a string column is dropped and then a new, non
string column is added with the old string column's name.

This change adds validation after opening a scan token to
ensure the projection schema matches the expected schema.
The scan is guaranteed to be valid while the KuduScanner is
open, even if the schema changes concurrently.

Testing: Adds a test case that previously would crash Impala.

Change-Id: I6d43f5bb9811e728ad592933066d006c8fb4553a
---
M be/src/exec/kudu-scanner.cc
M be/src/exec/kudu-scanner.h
M be/src/exec/kudu-util.cc
M be/src/exec/kudu-util.h
M be/src/runtime/descriptors.h
M common/thrift/generate_error_codes.py
M tests/query_test/test_kudu.py
7 files changed, 99 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/40/5840/1
-- 
To view, visit http://gerrit.cloudera.org:8080/5840
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I6d43f5bb9811e728ad592933066d006c8fb4553a
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Matthew Jacobs