[Impala-ASF-CR] IMPALA-5092 Add support for VARCHAR in Kudu tables

2020-04-01 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/14197 )

Change subject: IMPALA-5092 Add support for VARCHAR in Kudu tables
..

IMPALA-5092 Add support for VARCHAR in Kudu tables

KUDU-1938 added VARCHAR column type support to Kudu.
This commit adds support for Kudu's VARCHAR type to Impala.

The length of a Kudu varchar is applied as a character length as opposed
to a byte length like Impala currently uses.

When writing data to Kudu, the VARCHAR length is not an issue because
Impala only officially supports ASCII characters and those characters are
the same size in bytes and characters. Additionally, extra bytes would be
truncated by the Kudu client if somehow a value was too long.

When reading data from Kudu, it is possible that the value written by
some other application is wider in bytes than Impala expects and can
handle. This can happen due to multi-byte UTF-8 characters. In that
case, we adjust the length in Impala to truncate the extra bytes of the
value. This isn’t a great solution, but one other integrations have taken
as well given Impala doesn’t support UTF-8 values.

IMPALA-5675 tracks adding UTF-8 Character length support to VARCHAR
columns and marked the truncation code with a TODO that references
that Jira.

Testing:
* Performed manual testing of standard DDL and DML interaction
* Manually reproduced a check failure due to multi-byte characters
  and tested that length truncation resolve that issue.
* Added/adjusted the following automated tests:
** AnalyzeDDLTest: CTAS into Kudu with varchar type
** AnalyzeKuduDDLTest: CREATE TABLE in Kudu with VARCHAR type
** kudu_create.test: Create table with VARCHAR column, key, hash
   partition, and range partition
** kudu_describe.test: Describe table with VARCHAR column and key
** kudu_insert.test: Insert with VARCHAR columns including null and
   non-null defaults
** kudu_update.test: Updates with VARCHAR column
** kudu_upsert.test: Upserts with VARCHAR column
** kudu_delete.test Deletes with VARCHAR columns
** kudu-scan-node.test Tests basic predicates with VARCHAR columns

Follow on work:
- IMPALA-9580: Add min-max runtime filter support/tests
- IMPALA-9581: Pushdown string predicates
- IMPALA-9583: Automated multibyte truncation tests

Change-Id: I0d4959410fdd882bfa980cb55e8a7837c7823da8
Reviewed-on: http://gerrit.cloudera.org:8080/14197
Tested-by: Impala Public Jenkins 
Reviewed-by: Thomas Tauber-Marshall 
---
M be/src/exec/kudu-scanner.cc
M be/src/exec/kudu-scanner.h
M be/src/exec/kudu-util.cc
M fe/src/main/java/org/apache/impala/catalog/Type.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/KuduUtil.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeKuduDDLTest.java
M testdata/workloads/functional-query/queries/QueryTest/kudu-scan-node.test
M testdata/workloads/functional-query/queries/QueryTest/kudu_create.test
M testdata/workloads/functional-query/queries/QueryTest/kudu_delete.test
M testdata/workloads/functional-query/queries/QueryTest/kudu_describe.test
M testdata/workloads/functional-query/queries/QueryTest/kudu_insert.test
M testdata/workloads/functional-query/queries/QueryTest/kudu_update.test
M testdata/workloads/functional-query/queries/QueryTest/kudu_upsert.test
M tests/query_test/test_kudu.py
17 files changed, 806 insertions(+), 639 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Thomas Tauber-Marshall: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/14197
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I0d4959410fdd882bfa980cb55e8a7837c7823da8
Gerrit-Change-Number: 14197
Gerrit-PatchSet: 18
Gerrit-Owner: Attila Bukor 
Gerrit-Reviewer: Attila Bukor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5092 Add support for VARCHAR in Kudu tables

2020-04-01 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14197 )

Change subject: IMPALA-5092 Add support for VARCHAR in Kudu tables
..


Patch Set 17: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/14197
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0d4959410fdd882bfa980cb55e8a7837c7823da8
Gerrit-Change-Number: 14197
Gerrit-PatchSet: 17
Gerrit-Owner: Attila Bukor 
Gerrit-Reviewer: Attila Bukor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 01 Apr 2020 15:48:29 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5092 Add support for VARCHAR in Kudu tables

2020-04-01 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14197 )

Change subject: IMPALA-5092 Add support for VARCHAR in Kudu tables
..


Patch Set 17: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/14197
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0d4959410fdd882bfa980cb55e8a7837c7823da8
Gerrit-Change-Number: 14197
Gerrit-PatchSet: 17
Gerrit-Owner: Attila Bukor 
Gerrit-Reviewer: Attila Bukor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 01 Apr 2020 06:55:52 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5092 Add support for VARCHAR in Kudu tables

2020-03-31 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14197 )

Change subject: IMPALA-5092 Add support for VARCHAR in Kudu tables
..


Patch Set 17:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5596/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/14197
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0d4959410fdd882bfa980cb55e8a7837c7823da8
Gerrit-Change-Number: 14197
Gerrit-PatchSet: 17
Gerrit-Owner: Attila Bukor 
Gerrit-Reviewer: Attila Bukor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 01 Apr 2020 02:26:27 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5092 Add support for VARCHAR in Kudu tables

2020-03-31 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14197 )

Change subject: IMPALA-5092 Add support for VARCHAR in Kudu tables
..


Patch Set 17:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/5676/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/14197
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0d4959410fdd882bfa980cb55e8a7837c7823da8
Gerrit-Change-Number: 14197
Gerrit-PatchSet: 17
Gerrit-Owner: Attila Bukor 
Gerrit-Reviewer: Attila Bukor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 01 Apr 2020 02:20:20 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5092 Add support for VARCHAR in Kudu tables

2020-03-31 Thread Grant Henke (Code Review)
Grant Henke has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14197 )

Change subject: IMPALA-5092 Add support for VARCHAR in Kudu tables
..


Patch Set 16:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/14197/16//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14197/16//COMMIT_MSG@50
PS16, Line 50: IMPALA-9580
> IMPALA-9581
Done


http://gerrit.cloudera.org:8080/#/c/14197/16/fe/src/test/java/org/apache/impala/analysis/AnalyzeKuduDDLTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeKuduDDLTest.java:

http://gerrit.cloudera.org:8080/#/c/14197/16/fe/src/test/java/org/apache/impala/analysis/AnalyzeKuduDDLTest.java@389
PS16, Line 389: null
> Should this be a non-null value? The null case is tested above
Done



--
To view, visit http://gerrit.cloudera.org:8080/14197
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0d4959410fdd882bfa980cb55e8a7837c7823da8
Gerrit-Change-Number: 14197
Gerrit-PatchSet: 16
Gerrit-Owner: Attila Bukor 
Gerrit-Reviewer: Attila Bukor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 01 Apr 2020 01:37:40 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5092 Add support for VARCHAR in Kudu tables

2020-03-31 Thread Grant Henke (Code Review)
Grant Henke has uploaded a new patch set (#17) to the change originally created 
by Attila Bukor. ( http://gerrit.cloudera.org:8080/14197 )

Change subject: IMPALA-5092 Add support for VARCHAR in Kudu tables
..

IMPALA-5092 Add support for VARCHAR in Kudu tables

KUDU-1938 added VARCHAR column type support to Kudu.
This commit adds support for Kudu's VARCHAR type to Impala.

The length of a Kudu varchar is applied as a character length as opposed
to a byte length like Impala currently uses.

When writing data to Kudu, the VARCHAR length is not an issue because
Impala only officially supports ASCII characters and those characters are
the same size in bytes and characters. Additionally, extra bytes would be
truncated by the Kudu client if somehow a value was too long.

When reading data from Kudu, it is possible that the value written by
some other application is wider in bytes than Impala expects and can
handle. This can happen due to multi-byte UTF-8 characters. In that
case, we adjust the length in Impala to truncate the extra bytes of the
value. This isn’t a great solution, but one other integrations have taken
as well given Impala doesn’t support UTF-8 values.

IMPALA-5675 tracks adding UTF-8 Character length support to VARCHAR
columns and marked the truncation code with a TODO that references
that Jira.

Testing:
* Performed manual testing of standard DDL and DML interaction
* Manually reproduced a check failure due to multi-byte characters
  and tested that length truncation resolve that issue.
* Added/adjusted the following automated tests:
** AnalyzeDDLTest: CTAS into Kudu with varchar type
** AnalyzeKuduDDLTest: CREATE TABLE in Kudu with VARCHAR type
** kudu_create.test: Create table with VARCHAR column, key, hash
   partition, and range partition
** kudu_describe.test: Describe table with VARCHAR column and key
** kudu_insert.test: Insert with VARCHAR columns including null and
   non-null defaults
** kudu_update.test: Updates with VARCHAR column
** kudu_upsert.test: Upserts with VARCHAR column
** kudu_delete.test Deletes with VARCHAR columns
** kudu-scan-node.test Tests basic predicates with VARCHAR columns

Follow on work:
- IMPALA-9580: Add min-max runtime filter support/tests
- IMPALA-9581: Pushdown string predicates
- IMPALA-9583: Automated multibyte truncation tests

Change-Id: I0d4959410fdd882bfa980cb55e8a7837c7823da8
---
M be/src/exec/kudu-scanner.cc
M be/src/exec/kudu-scanner.h
M be/src/exec/kudu-util.cc
M fe/src/main/java/org/apache/impala/catalog/Type.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/KuduUtil.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeKuduDDLTest.java
M testdata/workloads/functional-query/queries/QueryTest/kudu-scan-node.test
M testdata/workloads/functional-query/queries/QueryTest/kudu_create.test
M testdata/workloads/functional-query/queries/QueryTest/kudu_delete.test
M testdata/workloads/functional-query/queries/QueryTest/kudu_describe.test
M testdata/workloads/functional-query/queries/QueryTest/kudu_insert.test
M testdata/workloads/functional-query/queries/QueryTest/kudu_update.test
M testdata/workloads/functional-query/queries/QueryTest/kudu_upsert.test
M tests/query_test/test_kudu.py
17 files changed, 806 insertions(+), 639 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/97/14197/17
--
To view, visit http://gerrit.cloudera.org:8080/14197
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0d4959410fdd882bfa980cb55e8a7837c7823da8
Gerrit-Change-Number: 14197
Gerrit-PatchSet: 17
Gerrit-Owner: Attila Bukor 
Gerrit-Reviewer: Attila Bukor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5092 Add support for VARCHAR in Kudu tables

2020-03-31 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14197 )

Change subject: IMPALA-5092 Add support for VARCHAR in Kudu tables
..


Patch Set 16:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/14197/16//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14197/16//COMMIT_MSG@50
PS16, Line 50: IMPALA-9580
IMPALA-9581


http://gerrit.cloudera.org:8080/#/c/14197/16/fe/src/test/java/org/apache/impala/analysis/AnalyzeKuduDDLTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeKuduDDLTest.java:

http://gerrit.cloudera.org:8080/#/c/14197/16/fe/src/test/java/org/apache/impala/analysis/AnalyzeKuduDDLTest.java@389
PS16, Line 389: null
Should this be a non-null value? The null case is tested above



--
To view, visit http://gerrit.cloudera.org:8080/14197
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0d4959410fdd882bfa980cb55e8a7837c7823da8
Gerrit-Change-Number: 14197
Gerrit-PatchSet: 16
Gerrit-Owner: Attila Bukor 
Gerrit-Reviewer: Attila Bukor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 31 Mar 2020 23:44:38 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5092 Add support for VARCHAR in Kudu tables

2020-03-31 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14197 )

Change subject: IMPALA-5092 Add support for VARCHAR in Kudu tables
..


Patch Set 15: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/14197
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0d4959410fdd882bfa980cb55e8a7837c7823da8
Gerrit-Change-Number: 14197
Gerrit-PatchSet: 15
Gerrit-Owner: Attila Bukor 
Gerrit-Reviewer: Attila Bukor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 31 Mar 2020 22:09:58 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5092 Add support for VARCHAR in Kudu tables

2020-03-31 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14197 )

Change subject: IMPALA-5092 Add support for VARCHAR in Kudu tables
..


Patch Set 16: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/14197
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0d4959410fdd882bfa980cb55e8a7837c7823da8
Gerrit-Change-Number: 14197
Gerrit-PatchSet: 16
Gerrit-Owner: Attila Bukor 
Gerrit-Reviewer: Attila Bukor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 31 Mar 2020 21:58:56 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5092 Add support for VARCHAR in Kudu tables

2020-03-31 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14197 )

Change subject: IMPALA-5092 Add support for VARCHAR in Kudu tables
..


Patch Set 16:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/5668/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/14197
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0d4959410fdd882bfa980cb55e8a7837c7823da8
Gerrit-Change-Number: 14197
Gerrit-PatchSet: 16
Gerrit-Owner: Attila Bukor 
Gerrit-Reviewer: Attila Bukor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 31 Mar 2020 20:51:11 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5092 Add support for VARCHAR in Kudu tables

2020-03-31 Thread Grant Henke (Code Review)
Grant Henke has uploaded a new patch set (#16) to the change originally created 
by Attila Bukor. ( http://gerrit.cloudera.org:8080/14197 )

Change subject: IMPALA-5092 Add support for VARCHAR in Kudu tables
..

IMPALA-5092 Add support for VARCHAR in Kudu tables

KUDU-1938 added VARCHAR column type support to Kudu.
This commit adds support for Kudu's VARCHAR type to Impala.

The length of a Kudu varchar is applied as a character length as opposed
to a byte length like Impala currently uses.

When writing data to Kudu, the VARCHAR length is not an issue because
Impala only officially supports ASCII characters and those characters are
the same size in bytes and characters. Additionally, extra bytes would be
truncated by the Kudu client if somehow a value was too long.

When reading data from Kudu, it is possible that the value written by
some other application is wider in bytes than Impala expects and can
handle. This can happen due to multi-byte UTF-8 characters. In that
case, we adjust the length in Impala to truncate the extra bytes of the
value. This isn’t a great solution, but one other integrations have taken
as well given Impala doesn’t support UTF-8 values.

IMPALA-5675 tracks adding UTF-8 Character length support to VARCHAR
columns and marked the truncation code with a TODO that references
that Jira.

Testing:
* Performed manual testing of standard DDL and DML interaction
* Manually reproduced a check failure due to multi-byte characters
  and tested that length truncation resolve that issue.
* Added/adjusted the following automated tests:
** AnalyzeDDLTest: CTAS into Kudu with varchar type
** AnalyzeKuduDDLTest: CREATE TABLE in Kudu with VARCHAR type
** kudu_create.test: Create table with VARCHAR column, key, hash
   partition, and range partition
** kudu_describe.test: Describe table with VARCHAR column and key
** kudu_insert.test: Insert with VARCHAR columns including null and
   non-null defaults
** kudu_update.test: Updates with VARCHAR column
** kudu_upsert.test: Upserts with VARCHAR column
** kudu_delete.test Deletes with VARCHAR columns
** kudu-scan-node.test Tests basic predicates with VARCHAR columns

Follow on work:
- IMPALA-9580: Add min-max runtime filter support/tests
- IMPALA-9580: Pushdown string predicates
- IMPALA-9583: Automated multibyte truncation tests

Change-Id: I0d4959410fdd882bfa980cb55e8a7837c7823da8
---
M be/src/exec/kudu-scanner.cc
M be/src/exec/kudu-scanner.h
M be/src/exec/kudu-util.cc
M fe/src/main/java/org/apache/impala/catalog/Type.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/KuduUtil.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeKuduDDLTest.java
M testdata/workloads/functional-query/queries/QueryTest/kudu-scan-node.test
M testdata/workloads/functional-query/queries/QueryTest/kudu_create.test
M testdata/workloads/functional-query/queries/QueryTest/kudu_delete.test
M testdata/workloads/functional-query/queries/QueryTest/kudu_describe.test
M testdata/workloads/functional-query/queries/QueryTest/kudu_insert.test
M testdata/workloads/functional-query/queries/QueryTest/kudu_update.test
M testdata/workloads/functional-query/queries/QueryTest/kudu_upsert.test
M tests/query_test/test_kudu.py
17 files changed, 806 insertions(+), 640 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/97/14197/16
--
To view, visit http://gerrit.cloudera.org:8080/14197
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0d4959410fdd882bfa980cb55e8a7837c7823da8
Gerrit-Change-Number: 14197
Gerrit-PatchSet: 16
Gerrit-Owner: Attila Bukor 
Gerrit-Reviewer: Attila Bukor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5092 Add support for VARCHAR in Kudu tables

2020-03-31 Thread Grant Henke (Code Review)
Grant Henke has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14197 )

Change subject: IMPALA-5092 Add support for VARCHAR in Kudu tables
..


Patch Set 15:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/14197/15/testdata/workloads/functional-query/queries/QueryTest/kudu-scan-node.test
File testdata/workloads/functional-query/queries/QueryTest/kudu-scan-node.test:

http://gerrit.cloudera.org:8080/#/c/14197/15/testdata/workloads/functional-query/queries/QueryTest/kudu-scan-node.test@147
PS15, Line 147: INSERT INTO kudu_varchar_pred VALUES
> nit: trailing whitespace
Done


http://gerrit.cloudera.org:8080/#/c/14197/15/testdata/workloads/functional-query/queries/QueryTest/kudu_create.test
File testdata/workloads/functional-query/queries/QueryTest/kudu_create.test:

http://gerrit.cloudera.org:8080/#/c/14197/15/testdata/workloads/functional-query/queries/QueryTest/kudu_create.test@427
PS15, Line 427: # hash partition and range partition.
> nit: trailing whitespace
Done



--
To view, visit http://gerrit.cloudera.org:8080/14197
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0d4959410fdd882bfa980cb55e8a7837c7823da8
Gerrit-Change-Number: 14197
Gerrit-PatchSet: 15
Gerrit-Owner: Attila Bukor 
Gerrit-Reviewer: Attila Bukor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 31 Mar 2020 20:10:14 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5092 Add support for VARCHAR in Kudu tables

2020-03-31 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14197 )

Change subject: IMPALA-5092 Add support for VARCHAR in Kudu tables
..


Patch Set 15: Code-Review+1

(2 comments)

http://gerrit.cloudera.org:8080/#/c/14197/15/testdata/workloads/functional-query/queries/QueryTest/kudu-scan-node.test
File testdata/workloads/functional-query/queries/QueryTest/kudu-scan-node.test:

http://gerrit.cloudera.org:8080/#/c/14197/15/testdata/workloads/functional-query/queries/QueryTest/kudu-scan-node.test@147
PS15, Line 147: INSERT INTO kudu_varchar_pred VALUES
nit: trailing whitespace


http://gerrit.cloudera.org:8080/#/c/14197/15/testdata/workloads/functional-query/queries/QueryTest/kudu_create.test
File testdata/workloads/functional-query/queries/QueryTest/kudu_create.test:

http://gerrit.cloudera.org:8080/#/c/14197/15/testdata/workloads/functional-query/queries/QueryTest/kudu_create.test@427
PS15, Line 427: # hash partition and range partition.
nit: trailing whitespace



--
To view, visit http://gerrit.cloudera.org:8080/14197
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0d4959410fdd882bfa980cb55e8a7837c7823da8
Gerrit-Change-Number: 14197
Gerrit-PatchSet: 15
Gerrit-Owner: Attila Bukor 
Gerrit-Reviewer: Attila Bukor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 31 Mar 2020 18:57:06 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5092 Add support for VARCHAR in Kudu tables

2020-03-31 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14197 )

Change subject: IMPALA-5092 Add support for VARCHAR in Kudu tables
..


Patch Set 15:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5588/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/14197
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0d4959410fdd882bfa980cb55e8a7837c7823da8
Gerrit-Change-Number: 14197
Gerrit-PatchSet: 15
Gerrit-Owner: Attila Bukor 
Gerrit-Reviewer: Attila Bukor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 31 Mar 2020 17:38:58 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5092 Add support for VARCHAR in Kudu tables

2020-03-31 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14197 )

Change subject: IMPALA-5092 Add support for VARCHAR in Kudu tables
..


Patch Set 15:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/5660/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/14197
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0d4959410fdd882bfa980cb55e8a7837c7823da8
Gerrit-Change-Number: 14197
Gerrit-PatchSet: 15
Gerrit-Owner: Attila Bukor 
Gerrit-Reviewer: Attila Bukor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 31 Mar 2020 17:16:54 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5092 Add support for VARCHAR in Kudu tables

2020-03-31 Thread Grant Henke (Code Review)
Grant Henke has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14197 )

Change subject: IMPALA-5092 Add support for VARCHAR in Kudu tables
..


Patch Set 14:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/14197/14//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14197/14//COMMIT_MSG@27
PS14, Line 27: IMPALA-5675 tracks adding UTF-8 Character length support to 
VARCHAR
 : columns and marked the truncation code with a TODO that 
references
 : that Jira.
> UTF-8 has the nice property that sorting by (unsigned) byte values is equiv
Done


http://gerrit.cloudera.org:8080/#/c/14197/14//COMMIT_MSG@33
PS14, Line 33: * Manually reproduced a check failure due to multi-byte 
characters
 :   and tested that length truncation resolve that issue.
> This would be a good sanity check, if only to prevent future breakage. But 
I started writing a test in test_kudu.py, but it looks like I would need to 
update the testing framework to handle UTF-8 as well:

   common/test_result_verifier.py:493: in parse_result_rows
  col = cols[i].encode('unicode_escape')
   E   UnicodeDecodeError: 'ascii' codec can't decode byte 0xe6 in position 0: 
ordinal not in range(128)

I will work on this automated test in a follow on patch. I opened IMPALA-9583 
to track this.


http://gerrit.cloudera.org:8080/#/c/14197/14//COMMIT_MSG@47
PS14, Line 47: support
> I don't think there are major changes needed, but definitely some changes,
I opened jiras to track this follow on work and will update the commit message 
to reference them.



--
To view, visit http://gerrit.cloudera.org:8080/14197
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0d4959410fdd882bfa980cb55e8a7837c7823da8
Gerrit-Change-Number: 14197
Gerrit-PatchSet: 14
Gerrit-Owner: Attila Bukor 
Gerrit-Reviewer: Attila Bukor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 31 Mar 2020 16:56:58 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5092 Add support for VARCHAR in Kudu tables

2020-03-31 Thread Grant Henke (Code Review)
Grant Henke has uploaded a new patch set (#15) to the change originally created 
by Attila Bukor. ( http://gerrit.cloudera.org:8080/14197 )

Change subject: IMPALA-5092 Add support for VARCHAR in Kudu tables
..

IMPALA-5092 Add support for VARCHAR in Kudu tables

KUDU-1938 added VARCHAR column type support to Kudu.
This commit adds support for Kudu's VARCHAR type to Impala.

The length of a Kudu varchar is applied as a character length as opposed
to a byte length like Impala currently uses.

When writing data to Kudu, the VARCHAR length is not an issue because
Impala only officially supports ASCII characters and those characters are
the same size in bytes and characters. Additionally, extra bytes would be
truncated by the Kudu client if somehow a value was too long.

When reading data from Kudu, it is possible that the value written by
some other application is wider in bytes than Impala expects and can
handle. This can happen due to multi-byte UTF-8 characters. In that
case, we adjust the length in Impala to truncate the extra bytes of the
value. This isn’t a great solution, but one other integrations have taken
as well given Impala doesn’t support UTF-8 values.

IMPALA-5675 tracks adding UTF-8 Character length support to VARCHAR
columns and marked the truncation code with a TODO that references
that Jira.

Testing:
* Performed manual testing of standard DDL and DML interaction
* Manually reproduced a check failure due to multi-byte characters
  and tested that length truncation resolve that issue.
* Added/adjusted the following automated tests:
** AnalyzeDDLTest: CTAS into Kudu with varchar type
** AnalyzeKuduDDLTest: CREATE TABLE in Kudu with VARCHAR type
** kudu_create.test: Create table with VARCHAR column, key, hash
   partition, and range partition
** kudu_describe.test: Describe table with VARCHAR column and key
** kudu_insert.test: Insert with VARCHAR columns including null and
   non-null defaults
** kudu_update.test: Updates with VARCHAR column
** kudu_upsert.test: Upserts with VARCHAR column
** kudu_delete.test Deletes with VARCHAR columns
** kudu-scan-node.test Tests basic predicates with VARCHAR columns

Follow on work:
- IMPALA-9580: Add min-max runtime filter support/tests
- IMPALA-9580: Pushdown string predicates
- IMPALA-9583: Automated multibyte truncation tests

Change-Id: I0d4959410fdd882bfa980cb55e8a7837c7823da8
---
M be/src/exec/kudu-scanner.cc
M be/src/exec/kudu-scanner.h
M be/src/exec/kudu-util.cc
M fe/src/main/java/org/apache/impala/catalog/Type.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/KuduUtil.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeKuduDDLTest.java
M testdata/workloads/functional-query/queries/QueryTest/kudu-scan-node.test
M testdata/workloads/functional-query/queries/QueryTest/kudu_create.test
M testdata/workloads/functional-query/queries/QueryTest/kudu_delete.test
M testdata/workloads/functional-query/queries/QueryTest/kudu_describe.test
M testdata/workloads/functional-query/queries/QueryTest/kudu_insert.test
M testdata/workloads/functional-query/queries/QueryTest/kudu_update.test
M testdata/workloads/functional-query/queries/QueryTest/kudu_upsert.test
M tests/query_test/test_kudu.py
17 files changed, 806 insertions(+), 640 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/97/14197/15
--
To view, visit http://gerrit.cloudera.org:8080/14197
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0d4959410fdd882bfa980cb55e8a7837c7823da8
Gerrit-Change-Number: 14197
Gerrit-PatchSet: 15
Gerrit-Owner: Attila Bukor 
Gerrit-Reviewer: Attila Bukor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5092 Add support for VARCHAR in Kudu tables

2020-03-30 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14197 )

Change subject: IMPALA-5092 Add support for VARCHAR in Kudu tables
..


Patch Set 14:

(3 comments)

The code changes and testing seem generally fine, the min-max filtering needs 
fixing though.

http://gerrit.cloudera.org:8080/#/c/14197/14//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14197/14//COMMIT_MSG@27
PS14, Line 27: IMPALA-5675 tracks adding UTF-8 Character length support to 
VARCHAR
 : columns and marked the truncation code with a TODO that 
references
 : that Jira.
> I don't expect any additional sorting or predicate issues outside of any th
UTF-8 has the nice property that sorting by (unsigned) byte values is 
equivalent to sorting in codepoint order so this should just work as far as I 
can see it.

I.e. I think there aren't any additional ripple effects from the truncation, 
just everything behaves as you would expect from the values being truncated.

There are other fancier collation algorithms, but as far as I'm aware nothing 
in our stack uses them by default.


http://gerrit.cloudera.org:8080/#/c/14197/14//COMMIT_MSG@33
PS14, Line 33: * Manually reproduced a check failure due to multi-byte 
characters
 :   and tested that length truncation resolve that issue.
> I will look at implementing it. The main challenge is inserting data direct
This would be a good sanity check, if only to prevent future breakage. But if 
it's not feasible right now, seems like we could live with the manual testing.


http://gerrit.cloudera.org:8080/#/c/14197/14//COMMIT_MSG@47
PS14, Line 47: support
> I am under the impression they should work (given it's just strings), but n
I don't think there are major changes needed, but definitely some changes, 
since it crashes now. If we want to push this out we can disable the filters 
for now, e.g. DECIMAL filters were previously disabled like this: 
https://gerrit.cloudera.org/#/c/12113/19/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java

  Query submitted at: 2020-03-30 15:11:00 (Coordinator: 
http://tarmstrong-box2:25000)
  Query progress can be monitored at: 
http://tarmstrong-box2:25000/query_plan?query_id=194eb5100ef2d6ed:b373b842
  +--+
  | count(*) |
  +--+
  | 0|
  +--+
  Fetched 1 row(s) in 0.37s
  [localhost.EXAMPLE.COM:21000] default> select straight_join count(*) from 
ctas_varchar join functional.chars_medium t on vc = t.varchar_col;
  Query: select straight_join count(*) from ctas_varchar join 
functional.chars_medium t on vc = t.varchar_col
  Query submitted at: 2020-03-30 15:11:12 (Coordinator: 
http://tarmstrong-box2:25000)
  Query progress can be monitored at: 
http://tarmstrong-box2:25000/query_plan?query_id=cc476b56d10a741e:99f338be
  [ 
   ] 0%
  ERROR: Failed due to unreachable impalad(s): tarmstrong-box2:22001


from impalad_node1.ERROR:

  F0330 15:11:17.178375  8395 min-max-filter.cc:65] 
cc476b56d10a741e:99f338be0002] Check failed: llvm_class != 
MIN_MAX_FILTER_LLVM_CLASS_NAMES.end() Not a valid type: 16
  *** Check failure stack trace: ***
  @  0x4f92d8c  google::LogMessage::Fail()
  @  0x4f94631  google::LogMessage::SendToLog()
  @  0x4f92766  google::LogMessage::Flush()
  @  0x4f95d2d  google::LogMessageFatal::~LogMessageFatal()
  @  0x2618bd0  impala::MinMaxFilter::GetLlvmClassName()
  @  0x28aa046  impala::FilterContext::CodegenInsert()
  @  0x2810a2e  
impala::PhjBuilderConfig::CodegenInsertRuntimeFilters()
  @  0x280e26b  impala::PhjBuilderConfig::Codegen()
  @  0x280dfd5  impala::PhjBuilder::Codegen()
  @  0x281ba6a  impala::PartitionedHashJoinNode::Codegen()
  @  0x272288f  impala::ExecNode::Codegen()
  @  0x287b2c6  impala::AggregationNodeBase::Codegen()
  @  0x22d26f7  impala::FragmentInstanceState::Open()
  @  0x22cf787  impala::FragmentInstanceState::Exec()
  @  0x22e391e  impala::QueryState::ExecFInstance()
  @  0x22e1c3d  
_ZZN6impala10QueryState15StartFInstancesEvENKUlvE_clEv
  @  0x22e5506  
_ZN5boost6detail8function26void_function_obj_invoker0IZN6impala10QueryState15StartFInstancesEvEUlvE_vE6invokeERNS1_15function_bufferE
  @  0x20c079b  boost::function0<>::operator()()
  @  0x26828ea  impala::Thread::SuperviseThread()
  @  0x268ab6e  boost::_bi::list5<>::operator()<>()
  @  0x268aa92  boost::_bi::bind_t<>::operator()()
  @  0x268aa55  boost::detail::thread_data<>::run()
  @  0x3eb9379  thread_proxy
  @ 0x7f2a077f06da  start_thread
  @ 0x7f2a0423e88e  clone



[Impala-ASF-CR] IMPALA-5092 Add support for VARCHAR in Kudu tables

2020-03-30 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14197 )

Change subject: IMPALA-5092 Add support for VARCHAR in Kudu tables
..


Patch Set 14:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/14197/14//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14197/14//COMMIT_MSG@33
PS14, Line 33: * Manually reproduced a check failure due to multi-byte 
characters
 :   and tested that length truncation resolve that issue.
> I will look at implementing it. The main challenge is inserting data direct
We have a bunch of tests in test_kudu.py that already directly use the Kudu 
python client to simulate situations where an external client has done 
something we don't expect, so hopefully you should be able to do the same thing.


http://gerrit.cloudera.org:8080/#/c/14197/14//COMMIT_MSG@47
PS14, Line 47: support
> I am under the impression they should work (given it's just strings), but n
My expectation is that yes, they should work with this patch as-is, but of 
course its definitely worth testing.



--
To view, visit http://gerrit.cloudera.org:8080/14197
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0d4959410fdd882bfa980cb55e8a7837c7823da8
Gerrit-Change-Number: 14197
Gerrit-PatchSet: 14
Gerrit-Owner: Attila Bukor 
Gerrit-Reviewer: Attila Bukor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 30 Mar 2020 22:10:08 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5092 Add support for VARCHAR in Kudu tables

2020-03-30 Thread Grant Henke (Code Review)
Grant Henke has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14197 )

Change subject: IMPALA-5092 Add support for VARCHAR in Kudu tables
..


Patch Set 14:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/14197/14//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14197/14//COMMIT_MSG@27
PS14, Line 27: IMPALA-5675 tracks adding UTF-8 Character length support to 
VARCHAR
 : columns and marked the truncation code with a TODO that 
references
 : that Jira.
I don't expect any additional sorting or predicate issues outside of any that 
may already exist. VARCHAR is effectively a STRING (which Kudu has had for some 
time) with a length limit.

Impala warns users the UTF-8 functionality is effectively undefined: 
https://impala.apache.org/docs/build/html/topics/impala_string.html

> For full support in all Impala subsystems, restrict string values to the 
> ASCII character set. Although some UTF-8 character data can be stored in 
> Impala and retrieved through queries, UTF-8 strings containing non-ASCII 
> characters are not guaranteed to work properly in combination with many SQL 
> aspects, including but not limited to:

- String manipulation functions.
- Comparison operators.
- The ORDER BY clause.
- Values in partition key columns.

If these edge cases and tests look important. We should prioritize UTF-8 
functionality as a whole in Impala.

Note: Hive, Parquet, and ORC all support UTF-8


http://gerrit.cloudera.org:8080/#/c/14197/14//COMMIT_MSG@33
PS14, Line 33: * Manually reproduced a check failure due to multi-byte 
characters
 :   and tested that length truncation resolve that issue.
> If this test is very hard to integrate into the Impala environment, then I
I will look at implementing it. The main challenge is inserting data directly 
via a Kudu client give Impala doesn't support UTF-8 strings.


http://gerrit.cloudera.org:8080/#/c/14197/14//COMMIT_MSG@47
PS14, Line 47: support
> What is the current state of min/max runtime filters for varchars? Are they
I am under the impression they should work (given it's just strings), but need 
tests. Thomas would likely know.



--
To view, visit http://gerrit.cloudera.org:8080/14197
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0d4959410fdd882bfa980cb55e8a7837c7823da8
Gerrit-Change-Number: 14197
Gerrit-PatchSet: 14
Gerrit-Owner: Attila Bukor 
Gerrit-Reviewer: Attila Bukor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 30 Mar 2020 21:39:45 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5092 Add support for VARCHAR in Kudu tables

2020-03-30 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14197 )

Change subject: IMPALA-5092 Add support for VARCHAR in Kudu tables
..


Patch Set 14:

(3 comments)

The code looks good to me, but I would prefer to add some testing for all 
features added in this test, e.g. for partitioning on varchar(N)

http://gerrit.cloudera.org:8080/#/c/14197/14//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14197/14//COMMIT_MSG@27
PS14, Line 27: IMPALA-5675 tracks adding UTF-8 Character length support to 
VARCHAR
 : columns and marked the truncation code with a TODO that 
references
 : that Jira.
One more thing came up related to UTF-8: does it affect the ordering of 
strings? e.g. is it possible that s1 < s2 according to Impala, but as they 
contain multybyte characters, their ordering is different in Kudu? This could 
cause proplems in predicate push down.

I don't think that we can do much if this issue exists, but at least we should 
note somewhere that it is a known issue.


http://gerrit.cloudera.org:8080/#/c/14197/14//COMMIT_MSG@33
PS14, Line 33: * Manually reproduced a check failure due to multi-byte 
characters
 :   and tested that length truncation resolve that issue.
If this test is very hard to integrate into the Impala environment, then I am 
ok with moving that to a follow up jira, but I think that we should test this 
in Impala in the long term.

If the code to test it is not too complex, it would be great to add it here or 
into the follow up jira.


http://gerrit.cloudera.org:8080/#/c/14197/14//COMMIT_MSG@47
PS14, Line 47: support
What is the current state of min/max runtime filters for varchars? Are they 
turned off?



--
To view, visit http://gerrit.cloudera.org:8080/14197
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0d4959410fdd882bfa980cb55e8a7837c7823da8
Gerrit-Change-Number: 14197
Gerrit-PatchSet: 14
Gerrit-Owner: Attila Bukor 
Gerrit-Reviewer: Attila Bukor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 30 Mar 2020 19:41:32 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5092 Add support for VARCHAR in Kudu tables

2020-03-28 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14197 )

Change subject: IMPALA-5092 Add support for VARCHAR in Kudu tables
..


Patch Set 14: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/14197
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0d4959410fdd882bfa980cb55e8a7837c7823da8
Gerrit-Change-Number: 14197
Gerrit-PatchSet: 14
Gerrit-Owner: Attila Bukor 
Gerrit-Reviewer: Attila Bukor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sat, 28 Mar 2020 19:13:57 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5092 Add support for VARCHAR in Kudu tables

2020-03-28 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14197 )

Change subject: IMPALA-5092 Add support for VARCHAR in Kudu tables
..


Patch Set 14:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/5635/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/14197
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0d4959410fdd882bfa980cb55e8a7837c7823da8
Gerrit-Change-Number: 14197
Gerrit-PatchSet: 14
Gerrit-Owner: Attila Bukor 
Gerrit-Reviewer: Attila Bukor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sat, 28 Mar 2020 15:03:25 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5092 Add support for VARCHAR in Kudu tables

2020-03-28 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14197 )

Change subject: IMPALA-5092 Add support for VARCHAR in Kudu tables
..


Patch Set 14:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5563/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/14197
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0d4959410fdd882bfa980cb55e8a7837c7823da8
Gerrit-Change-Number: 14197
Gerrit-PatchSet: 14
Gerrit-Owner: Attila Bukor 
Gerrit-Reviewer: Attila Bukor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sat, 28 Mar 2020 14:22:47 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5092 Add support for VARCHAR in Kudu tables

2020-03-28 Thread Grant Henke (Code Review)
Grant Henke has uploaded a new patch set (#14) to the change originally created 
by Attila Bukor. ( http://gerrit.cloudera.org:8080/14197 )

Change subject: IMPALA-5092 Add support for VARCHAR in Kudu tables
..

IMPALA-5092 Add support for VARCHAR in Kudu tables

KUDU-1938 added VARCHAR column type support to Kudu.
This commit adds support for Kudu's VARCHAR type to Impala.

The length of a Kudu varchar is applied as a character length as opposed
to a byte length like Impala currently uses.

When writing data to Kudu, the VARCHAR length is not an issue because
Impala only officially supports ASCII characters and those characters are
the same size in bytes and characters. Additionally, extra bytes would be
truncated by the Kudu client if somehow a value was too long.

When reading data from Kudu, it is possible that the value written by
some other application is wider in bytes than Impala expects and can
handle. This can happen due to multi-byte UTF-8 characters. In that
case, we adjust the length in Impala to truncate the extra bytes of the
value. This isn’t a great solution, but one other integrations have taken
as well given Impala doesn’t support UTF-8 values.

IMPALA-5675 tracks adding UTF-8 Character length support to VARCHAR
columns and marked the truncation code with a TODO that references
that Jira.

Testing:
* Performed manual testing of standard DDL and DML interaction
* Manually reproduced a check failure due to multi-byte characters
  and tested that length truncation resolve that issue.
* Added/adjusted the following automated tests:
** AnalyzeDDLTest: CTAS into Kudu with varchar type
** AnalyzeKuduDDLTest: CREATE TABLE in Kudu with VARCHAR type
** kudu_create.test: Create table with VARCHAR column and key
** kudu_describe.test: Describe table with VARCHAR column and key
** kudu_insert.test: Insert with VARCHAR columns including null and
   non-null defaults
** kudu_update.test: Updates with VARCHAR column
** kudu_upsert.test: Upserts with VARCHAR column
** kudu_delete.test Deletes with VARCHAR columns

Follow on work:
- Add min-max runtime filter tests/support
- Add predicate pushdown and partition tests

Change-Id: I0d4959410fdd882bfa980cb55e8a7837c7823da8
---
M be/src/exec/kudu-scanner.cc
M be/src/exec/kudu-scanner.h
M be/src/exec/kudu-util.cc
M fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/KuduUtil.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeKuduDDLTest.java
M testdata/workloads/functional-query/queries/QueryTest/kudu_create.test
M testdata/workloads/functional-query/queries/QueryTest/kudu_delete.test
M testdata/workloads/functional-query/queries/QueryTest/kudu_describe.test
M testdata/workloads/functional-query/queries/QueryTest/kudu_insert.test
M testdata/workloads/functional-query/queries/QueryTest/kudu_update.test
M testdata/workloads/functional-query/queries/QueryTest/kudu_upsert.test
M tests/query_test/test_kudu.py
14 files changed, 760 insertions(+), 640 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/97/14197/14
--
To view, visit http://gerrit.cloudera.org:8080/14197
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0d4959410fdd882bfa980cb55e8a7837c7823da8
Gerrit-Change-Number: 14197
Gerrit-PatchSet: 14
Gerrit-Owner: Attila Bukor 
Gerrit-Reviewer: Attila Bukor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5092 Add support for VARCHAR in Kudu tables

2020-03-28 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14197 )

Change subject: IMPALA-5092 Add support for VARCHAR in Kudu tables
..


Patch Set 13: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/14197
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0d4959410fdd882bfa980cb55e8a7837c7823da8
Gerrit-Change-Number: 14197
Gerrit-PatchSet: 13
Gerrit-Owner: Attila Bukor 
Gerrit-Reviewer: Attila Bukor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sat, 28 Mar 2020 09:21:52 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5092 Add support for VARCHAR in Kudu tables

2020-03-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14197 )

Change subject: IMPALA-5092 Add support for VARCHAR in Kudu tables
..


Patch Set 13:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5562/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/14197
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0d4959410fdd882bfa980cb55e8a7837c7823da8
Gerrit-Change-Number: 14197
Gerrit-PatchSet: 13
Gerrit-Owner: Attila Bukor 
Gerrit-Reviewer: Attila Bukor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sat, 28 Mar 2020 04:26:40 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5092 Add support for VARCHAR in Kudu tables

2020-03-27 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14197 )

Change subject: IMPALA-5092 Add support for VARCHAR in Kudu tables
..


Patch Set 13:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/5631/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/14197
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0d4959410fdd882bfa980cb55e8a7837c7823da8
Gerrit-Change-Number: 14197
Gerrit-PatchSet: 13
Gerrit-Owner: Attila Bukor 
Gerrit-Reviewer: Attila Bukor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sat, 28 Mar 2020 04:25:24 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5092 Add support for VARCHAR in Kudu tables

2020-03-27 Thread Grant Henke (Code Review)
Grant Henke has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14197 )

Change subject: IMPALA-5092 Add support for VARCHAR in Kudu tables
..


Patch Set 13:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/14197/13/testdata/cluster/node_templates/common/etc/kudu/master.conf.tmpl
File testdata/cluster/node_templates/common/etc/kudu/master.conf.tmpl:

http://gerrit.cloudera.org:8080/#/c/14197/13/testdata/cluster/node_templates/common/etc/kudu/master.conf.tmpl@5
PS13, Line 5: # The flags below require unsafe flags to be unlocked.
I realize I accidentally pushed this file. I will remove it in my next 
iteration based on reviews.


http://gerrit.cloudera.org:8080/#/c/14197/13/testdata/cluster/node_templates/common/etc/kudu/tserver.conf.tmpl
File testdata/cluster/node_templates/common/etc/kudu/tserver.conf.tmpl:

http://gerrit.cloudera.org:8080/#/c/14197/13/testdata/cluster/node_templates/common/etc/kudu/tserver.conf.tmpl@7
PS13, Line 7: # The flags below require unsafe flags to be unlocked.
I realize I accidentally pushed this file. I will remove it in my next 
iteration based on reviews.



--
To view, visit http://gerrit.cloudera.org:8080/14197
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0d4959410fdd882bfa980cb55e8a7837c7823da8
Gerrit-Change-Number: 14197
Gerrit-PatchSet: 13
Gerrit-Owner: Attila Bukor 
Gerrit-Reviewer: Attila Bukor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sat, 28 Mar 2020 03:54:05 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5092 Add support for VARCHAR in Kudu tables

2020-03-27 Thread Grant Henke (Code Review)
Grant Henke has uploaded a new patch set (#13) to the change originally created 
by Attila Bukor. ( http://gerrit.cloudera.org:8080/14197 )

Change subject: IMPALA-5092 Add support for VARCHAR in Kudu tables
..

IMPALA-5092 Add support for VARCHAR in Kudu tables

KUDU-1938 added VARCHAR column type support to Kudu.
This commit adds support for Kudu's VARCHAR type to Impala.

The length of a Kudu varchar is applied as a character length as opposed
to a byte length like Impala currently uses.

When writing data to Kudu, the VARCHAR length is not an issue because
Impala only officially supports ASCII characters and those characters are
the same size in bytes and characters. Additionally, extra bytes would be
truncated by the Kudu client if somehow a value was too long.

When reading data from Kudu, it is possible that the value written by
some other application is wider in bytes than Impala expects and can
handle. This can happen due to multi-byte UTF-8 characters. In that
case, we adjust the length in Impala to truncate the extra bytes of the
value. This isn’t a great solution, but one other integrations have taken
as well given Impala doesn’t support UTF-8 values.

IMPALA-5675 tracks adding UTF-8 Character length support to VARCHAR
columns and marked the truncation code with a TODO that references
that Jira.

Testing:
* Performed manual testing of standard DDL and DML interaction
* Manually reproduced a check failure due to multi-byte characters
  and tested that length truncation resolve that issue.
* Added/adjusted the following automated tests:
** AnalyzeDDLTest: CTAS into Kudu with varchar type
** AnalyzeKuduDDLTest: CREATE TABLE in Kudu with VARCHAR type
** kudu_create.test: Create table with VARCHAR column and key
** kudu_describe.test: Describe table with VARCHAR column and key
** kudu_insert.test: Insert with VARCHAR columns including null and
   non-null defaults
** kudu_update.test: Updates with VARCHAR column
** kudu_upsert.test: Upserts with VARCHAR column
** kudu_delete.test Deletes with VARCHAR columns

Follow on work:
- Add min-max runtime filter tests/support
- Add predicate pushdown and partition tests

Change-Id: I0d4959410fdd882bfa980cb55e8a7837c7823da8
---
M be/src/exec/kudu-scanner.cc
M be/src/exec/kudu-scanner.h
M be/src/exec/kudu-util.cc
M fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/KuduUtil.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeKuduDDLTest.java
M testdata/cluster/node_templates/common/etc/kudu/master.conf.tmpl
M testdata/cluster/node_templates/common/etc/kudu/tserver.conf.tmpl
M testdata/workloads/functional-query/queries/QueryTest/kudu_create.test
M testdata/workloads/functional-query/queries/QueryTest/kudu_delete.test
M testdata/workloads/functional-query/queries/QueryTest/kudu_describe.test
M testdata/workloads/functional-query/queries/QueryTest/kudu_insert.test
M testdata/workloads/functional-query/queries/QueryTest/kudu_update.test
M testdata/workloads/functional-query/queries/QueryTest/kudu_upsert.test
M tests/query_test/test_kudu.py
16 files changed, 776 insertions(+), 642 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/97/14197/13
--
To view, visit http://gerrit.cloudera.org:8080/14197
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0d4959410fdd882bfa980cb55e8a7837c7823da8
Gerrit-Change-Number: 14197
Gerrit-PatchSet: 13
Gerrit-Owner: Attila Bukor 
Gerrit-Reviewer: Attila Bukor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5092 Add support for VARCHAR in Kudu tables

2020-03-10 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14197 )

Change subject: IMPALA-5092 Add support for VARCHAR in Kudu tables
..


Patch Set 12: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/5462/


--
To view, visit http://gerrit.cloudera.org:8080/14197
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0d4959410fdd882bfa980cb55e8a7837c7823da8
Gerrit-Change-Number: 14197
Gerrit-PatchSet: 12
Gerrit-Owner: Attila Bukor 
Gerrit-Reviewer: Attila Bukor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 10 Mar 2020 20:47:32 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5092 Add support for VARCHAR in Kudu tables

2020-03-10 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14197 )

Change subject: IMPALA-5092 Add support for VARCHAR in Kudu tables
..


Patch Set 12:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14197/12//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14197/12//COMMIT_MSG@12
PS12, Line 12: VARCHAR length is enforced in the Kudu client, Impala doesn't 
have to
 : enforce it further. The length of the VARCHAR is given in 
characters
 : (UTF8) so a VARCHAR(n) is stored in "n" to "4n" bytes.
> Relatedly, are there any test cases that cover strings with multi byte char
Yeah we need to test the hell out of this, cause it's violating an existing 
invariant and can affect a bunch of other code. This patch is assuming a lot of 
stuff is just gonna work and I don't have much confidence in that.

E.g. here's a DCHECK assuming that VARCHARs are
https://github.com/apache/impala/blob/b5805de/be/src/exprs/anyval-util.h#L302

I also wouldn't be surprised at all if Impala inserts casts to varchar in 
various places, e.g. when inserting into tables.

I also have no idea whether we might be using the VARCHAR length to truncate.

In general the idea of having a different definition of length in the storage 
engine vs the rest of query execution seems extremely messy, but I understand 
where it coming from so maybe we can live with it if we're clear about the 
behaviour in a bunch of different circumstances and have a lot of testing.



--
To view, visit http://gerrit.cloudera.org:8080/14197
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0d4959410fdd882bfa980cb55e8a7837c7823da8
Gerrit-Change-Number: 14197
Gerrit-PatchSet: 12
Gerrit-Owner: Attila Bukor 
Gerrit-Reviewer: Attila Bukor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 10 Mar 2020 18:30:48 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5092 Add support for VARCHAR in Kudu tables

2020-03-10 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14197 )

Change subject: IMPALA-5092 Add support for VARCHAR in Kudu tables
..


Patch Set 12:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/14197/12//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14197/12//COMMIT_MSG@12
PS12, Line 12: VARCHAR length is enforced in the Kudu client, Impala doesn't 
have to
 : enforce it further. The length of the VARCHAR is given in 
characters
 : (UTF8) so a VARCHAR(n) is stored in "n" to "4n" bytes.
> This actually means that Impala has to enforce it during reading, as the le
Relatedly, are there any test cases that cover strings with multi byte 
characters?


http://gerrit.cloudera.org:8080/#/c/14197/12//COMMIT_MSG@19
PS12, Line 19: TODO: add tests for the scenario where the primary key is 
varchar +
 : predicates and partitioning
> ok, we can add these tests in a later commit from my side
Another test case I'd like to see (can also be left for followup work): min-max 
runtime filters, i.e. in 
testdata/workloads/functional-query/queries/QueryTest/min_max_filters.test



--
To view, visit http://gerrit.cloudera.org:8080/14197
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0d4959410fdd882bfa980cb55e8a7837c7823da8
Gerrit-Change-Number: 14197
Gerrit-PatchSet: 12
Gerrit-Owner: Attila Bukor 
Gerrit-Reviewer: Attila Bukor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-Comment-Date: Tue, 10 Mar 2020 18:16:39 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5092 Add support for VARCHAR in Kudu tables

2020-03-10 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14197 )

Change subject: IMPALA-5092 Add support for VARCHAR in Kudu tables
..


Patch Set 12: Code-Review+1

(3 comments)

http://gerrit.cloudera.org:8080/#/c/14197/12//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14197/12//COMMIT_MSG@12
PS12, Line 12: VARCHAR length is enforced in the Kudu client, Impala doesn't 
have to
 : enforce it further. The length of the VARCHAR is given in 
characters
 : (UTF8) so a VARCHAR(n) is stored in "n" to "4n" bytes.
This actually means that Impala has to enforce it during reading, as the length 
is the number of bytes in Impala, right? So we have to truncate it after 
receiving the string from Kudu if it has multy byte characters.


http://gerrit.cloudera.org:8080/#/c/14197/12//COMMIT_MSG@17
PS12, Line 17: in client-side.
You mean on Impala side?


http://gerrit.cloudera.org:8080/#/c/14197/12//COMMIT_MSG@19
PS12, Line 19: TODO: add tests for the scenario where the primary key is 
varchar +
 : predicates and partitioning
ok, we can add these tests in a later commit from my side



--
To view, visit http://gerrit.cloudera.org:8080/14197
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0d4959410fdd882bfa980cb55e8a7837c7823da8
Gerrit-Change-Number: 14197
Gerrit-PatchSet: 12
Gerrit-Owner: Attila Bukor 
Gerrit-Reviewer: Attila Bukor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Comment-Date: Tue, 10 Mar 2020 16:22:43 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5092 Add support for VARCHAR in Kudu tables

2020-03-10 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14197 )

Change subject: IMPALA-5092 Add support for VARCHAR in Kudu tables
..


Patch Set 12:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5462/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/14197
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0d4959410fdd882bfa980cb55e8a7837c7823da8
Gerrit-Change-Number: 14197
Gerrit-PatchSet: 12
Gerrit-Owner: Attila Bukor 
Gerrit-Reviewer: Attila Bukor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Comment-Date: Tue, 10 Mar 2020 15:51:08 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5092 Add support for VARCHAR in Kudu tables

2020-03-10 Thread Attila Bukor (Code Review)
Hello Tamas Mate, Grant Henke, Csaba Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/14197

to look at the new patch set (#12).

Change subject: IMPALA-5092 Add support for VARCHAR in Kudu tables
..

IMPALA-5092 Add support for VARCHAR in Kudu tables

KUDU-1938 added VARCHAR column type support to Kudu. This commit adds
support for Kudu's VARCHAR type to Impala as well.

VARCHAR length is enforced in the Kudu client, Impala doesn't have to
enforce it further. The length of the VARCHAR is given in characters
(UTF8) so a VARCHAR(n) is stored in "n" to "4n" bytes.

The values are truncated in predicate pushdown also as the truncation is
handled in client-side.

TODO: add tests for the scenario where the primary key is varchar +
predicates and partitioning

Testing:

* performed manual testing
* added automated tests that passed

Change-Id: I0d4959410fdd882bfa980cb55e8a7837c7823da8
---
M be/src/exec/kudu-util.cc
M fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/KuduUtil.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeKuduDDLTest.java
M testdata/workloads/functional-query/queries/QueryTest/kudu_create.test
M testdata/workloads/functional-query/queries/QueryTest/kudu_delete.test
M testdata/workloads/functional-query/queries/QueryTest/kudu_describe.test
M testdata/workloads/functional-query/queries/QueryTest/kudu_insert.test
M testdata/workloads/functional-query/queries/QueryTest/kudu_update.test
M testdata/workloads/functional-query/queries/QueryTest/kudu_upsert.test
M tests/query_test/test_kudu.py
12 files changed, 736 insertions(+), 637 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/97/14197/12
--
To view, visit http://gerrit.cloudera.org:8080/14197
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0d4959410fdd882bfa980cb55e8a7837c7823da8
Gerrit-Change-Number: 14197
Gerrit-PatchSet: 12
Gerrit-Owner: Attila Bukor 
Gerrit-Reviewer: Attila Bukor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 


[Impala-ASF-CR] IMPALA-5092 Add support for VARCHAR in Kudu tables

2020-03-10 Thread Attila Bukor (Code Review)
Attila Bukor has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14197 )

Change subject: IMPALA-5092 Add support for VARCHAR in Kudu tables
..


Patch Set 11:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/14197/10/be/src/exec/kudu-util.cc
File be/src/exec/kudu-util.cc:

http://gerrit.cloudera.org:8080/#/c/14197/10/be/src/exec/kudu-util.cc@139
PS10, Line 139: "Could not set Ku
> nit: Indentation should be 4 in these cases.
Done


http://gerrit.cloudera.org:8080/#/c/14197/10/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java:

http://gerrit.cloudera.org:8080/#/c/14197/10/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java@a2217
PS10, Line 2217:
> Should this be move to AnalyzesOk just below the decimal one above?
Done


http://gerrit.cloudera.org:8080/#/c/14197/10/fe/src/test/java/org/apache/impala/analysis/AnalyzeKuduDDLTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeKuduDDLTest.java:

http://gerrit.cloudera.org:8080/#/c/14197/10/fe/src/test/java/org/apache/impala/analysis/AnalyzeKuduDDLTest.java@344
PS10, Line 344: }
> nit: Maybe call it `valvc` to follow the pattern of the other columns. It m
Done


http://gerrit.cloudera.org:8080/#/c/14197/10/testdata/workloads/functional-query/queries/QueryTest/kudu_insert.test
File testdata/workloads/functional-query/queries/QueryTest/kudu_insert.test:

http://gerrit.cloudera.org:8080/#/c/14197/10/testdata/workloads/functional-query/queries/QueryTest/kudu_insert.test@7
PS10, Line 7:decimal16 decimal(38, 0) null, valvc varchar(10) null)
> Do any of these tests validate truncation behavior?
added one


http://gerrit.cloudera.org:8080/#/c/14197/10/testdata/workloads/functional-query/queries/QueryTest/kudu_upsert.test
File testdata/workloads/functional-query/queries/QueryTest/kudu_upsert.test:

http://gerrit.cloudera.org:8080/#/c/14197/10/testdata/workloads/functional-query/queries/QueryTest/kudu_upsert.test@413
PS10, Line 413:
> nit: trailing whitespace
Done



--
To view, visit http://gerrit.cloudera.org:8080/14197
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0d4959410fdd882bfa980cb55e8a7837c7823da8
Gerrit-Change-Number: 14197
Gerrit-PatchSet: 11
Gerrit-Owner: Attila Bukor 
Gerrit-Reviewer: Attila Bukor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Comment-Date: Tue, 10 Mar 2020 14:41:46 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5092 Add support for VARCHAR in Kudu tables

2020-03-10 Thread Attila Bukor (Code Review)
Hello Tamas Mate, Grant Henke, Csaba Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/14197

to look at the new patch set (#11).

Change subject: IMPALA-5092 Add support for VARCHAR in Kudu tables
..

IMPALA-5092 Add support for VARCHAR in Kudu tables

KUDU-1938 added VARCHAR column type support to Kudu. This commit adds
support for Kudu's VARCHAR type to Impala as well.

VARCHAR length is enforced in the Kudu client, Impala doesn't have to
enforce it further. The length of the VARCHAR is given in characters
(UTF8) so a VARCHAR(n) is stored in "n" to "4n" bytes.

The values are truncated in predicate pushdown also as the truncation is
handled in client-side.

Testing:

* performed manual testing
* added automated tests that passed

Change-Id: I0d4959410fdd882bfa980cb55e8a7837c7823da8
---
M be/src/exec/kudu-util.cc
M fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/KuduUtil.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeKuduDDLTest.java
M testdata/workloads/functional-query/queries/QueryTest/kudu_create.test
M testdata/workloads/functional-query/queries/QueryTest/kudu_delete.test
M testdata/workloads/functional-query/queries/QueryTest/kudu_describe.test
M testdata/workloads/functional-query/queries/QueryTest/kudu_insert.test
M testdata/workloads/functional-query/queries/QueryTest/kudu_update.test
M testdata/workloads/functional-query/queries/QueryTest/kudu_upsert.test
M tests/query_test/test_kudu.py
12 files changed, 736 insertions(+), 637 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/97/14197/11
--
To view, visit http://gerrit.cloudera.org:8080/14197
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0d4959410fdd882bfa980cb55e8a7837c7823da8
Gerrit-Change-Number: 14197
Gerrit-PatchSet: 11
Gerrit-Owner: Attila Bukor 
Gerrit-Reviewer: Attila Bukor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 


[Impala-ASF-CR] IMPALA-5092 Add support for VARCHAR in Kudu tables

2020-02-18 Thread Grant Henke (Code Review)
Grant Henke has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14197 )

Change subject: IMPALA-5092 Add support for VARCHAR in Kudu tables
..


Patch Set 10:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/14197/10/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java:

http://gerrit.cloudera.org:8080/#/c/14197/10/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java@a2217
PS10, Line 2217:
Should this be move to AnalyzesOk just below the decimal one above?


http://gerrit.cloudera.org:8080/#/c/14197/10/fe/src/test/java/org/apache/impala/analysis/AnalyzeKuduDDLTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeKuduDDLTest.java:

http://gerrit.cloudera.org:8080/#/c/14197/10/fe/src/test/java/org/apache/impala/analysis/AnalyzeKuduDDLTest.java@344
PS10, Line 344: "valvarchar varchar(10) default null) " +
nit: Maybe call it `valvc` to follow the pattern of the other columns. It might 
also be a bit easier to read.


http://gerrit.cloudera.org:8080/#/c/14197/10/testdata/workloads/functional-query/queries/QueryTest/kudu_delete.test
File testdata/workloads/functional-query/queries/QueryTest/kudu_delete.test:

http://gerrit.cloudera.org:8080/#/c/14197/10/testdata/workloads/functional-query/queries/QueryTest/kudu_delete.test@7
PS10, Line 7:valdec16 decimal(38, 0) null, valvarchar varchar(500))
Any reason you use 500 here and 10 in the insert test?


http://gerrit.cloudera.org:8080/#/c/14197/10/testdata/workloads/functional-query/queries/QueryTest/kudu_insert.test
File testdata/workloads/functional-query/queries/QueryTest/kudu_insert.test:

http://gerrit.cloudera.org:8080/#/c/14197/10/testdata/workloads/functional-query/queries/QueryTest/kudu_insert.test@7
PS10, Line 7:decimal16 decimal(38, 0) null, valvarchar varchar(10) null)
Do any of these tests validate truncation behavior?



--
To view, visit http://gerrit.cloudera.org:8080/14197
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0d4959410fdd882bfa980cb55e8a7837c7823da8
Gerrit-Change-Number: 14197
Gerrit-PatchSet: 10
Gerrit-Owner: Attila Bukor 
Gerrit-Reviewer: Attila Bukor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Comment-Date: Tue, 18 Feb 2020 16:33:25 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5092 Add support for VARCHAR in Kudu tables

2020-02-18 Thread Tamas Mate (Code Review)
Tamas Mate has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14197 )

Change subject: IMPALA-5092 Add support for VARCHAR in Kudu tables
..


Patch Set 10:

(2 comments)

Thank you for the change. I did a first pass and found two small nits.

Will do a quick test locally as well later today/tomorrow.

http://gerrit.cloudera.org:8080/#/c/14197/10//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14197/10//COMMIT_MSG@11
PS10, Line 11:
Usually, Impala commit messages also have a section "Testing" to mention the 
tests added by this commit. This will help reviewers understand the testing 
done for the patch. See this for example. https://gerrit.cloudera.org/#/c/15157/


http://gerrit.cloudera.org:8080/#/c/14197/10/be/src/exec/kudu-util.cc
File be/src/exec/kudu-util.cc:

http://gerrit.cloudera.org:8080/#/c/14197/10/be/src/exec/kudu-util.cc@139
PS10, Line 139:
nit: Indentation should be 4 in these cases.



--
To view, visit http://gerrit.cloudera.org:8080/14197
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0d4959410fdd882bfa980cb55e8a7837c7823da8
Gerrit-Change-Number: 14197
Gerrit-PatchSet: 10
Gerrit-Owner: Attila Bukor 
Gerrit-Reviewer: Attila Bukor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Comment-Date: Tue, 18 Feb 2020 09:38:13 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5092 Add support for VARCHAR in Kudu tables

2020-02-17 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14197 )

Change subject: IMPALA-5092 Add support for VARCHAR in Kudu tables
..


Patch Set 10:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/14197/10//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14197/10//COMMIT_MSG@9
PS10, Line 9: KUDU-1938 added VARCHAR column type support to Kudu
Can you add some information about how Kudu supports VARCHAR? It is also ok to 
add a link. Some topics I am interested in:
- Does Kudu treat VARCHAR differently than STRING? Does it enforce enforce the 
length constraint (so Impala doesn't have to)? (I guess yes, but it would be 
useful to highlight it here)
- Does Kudu care about the encoding of the string? Is the length of the string 
always the number of bytes? (I remember that we discussed in the path, but I 
don't remember how it ended up).
- How do we push down min/max predicates? It is valid in Impala to compare a 
VARCHAR to a STRING, and it matters whether the string is truncated before push 
down. Do we push it down to Kudu in this case? I think that the correct way is 
pushing it down without truncation.


http://gerrit.cloudera.org:8080/#/c/14197/10/testdata/workloads/functional-query/queries/QueryTest/kudu_create.test
File testdata/workloads/functional-query/queries/QueryTest/kudu_create.test:

http://gerrit.cloudera.org:8080/#/c/14197/10/testdata/workloads/functional-query/queries/QueryTest/kudu_create.test@361
PS10, Line 361: select * from ctas_varchar;
If I didn't miss something then this was the only test where the primary key is 
varchar. We should also test the case when  there are predicates against it 
(both comparing it to STRINGs and VARCHARs) + the partitioned case also seems 
interesting.


http://gerrit.cloudera.org:8080/#/c/14197/10/testdata/workloads/functional-query/queries/QueryTest/kudu_upsert.test
File testdata/workloads/functional-query/queries/QueryTest/kudu_upsert.test:

http://gerrit.cloudera.org:8080/#/c/14197/10/testdata/workloads/functional-query/queries/QueryTest/kudu_upsert.test@413
PS10, Line 413:
nit: trailing whitespace



--
To view, visit http://gerrit.cloudera.org:8080/14197
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0d4959410fdd882bfa980cb55e8a7837c7823da8
Gerrit-Change-Number: 14197
Gerrit-PatchSet: 10
Gerrit-Owner: Attila Bukor 
Gerrit-Reviewer: Attila Bukor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Comment-Date: Mon, 17 Feb 2020 18:41:18 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5092 Add support for VARCHAR in Kudu tables

2020-02-17 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14197 )

Change subject: IMPALA-5092 Add support for VARCHAR in Kudu tables
..


Patch Set 10: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/14197
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0d4959410fdd882bfa980cb55e8a7837c7823da8
Gerrit-Change-Number: 14197
Gerrit-PatchSet: 10
Gerrit-Owner: Attila Bukor 
Gerrit-Reviewer: Attila Bukor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Comment-Date: Mon, 17 Feb 2020 12:27:50 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5092 Add support for VARCHAR in Kudu tables

2020-02-17 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14197 )

Change subject: IMPALA-5092 Add support for VARCHAR in Kudu tables
..


Patch Set 10:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/5235/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/14197
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0d4959410fdd882bfa980cb55e8a7837c7823da8
Gerrit-Change-Number: 14197
Gerrit-PatchSet: 10
Gerrit-Owner: Attila Bukor 
Gerrit-Reviewer: Attila Bukor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Comment-Date: Mon, 17 Feb 2020 08:34:32 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5092 Add support for VARCHAR in Kudu tables

2020-02-17 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14197 )

Change subject: IMPALA-5092 Add support for VARCHAR in Kudu tables
..


Patch Set 9:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/5234/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/14197
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0d4959410fdd882bfa980cb55e8a7837c7823da8
Gerrit-Change-Number: 14197
Gerrit-PatchSet: 9
Gerrit-Owner: Attila Bukor 
Gerrit-Reviewer: Attila Bukor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Comment-Date: Mon, 17 Feb 2020 08:33:49 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5092 Add support for VARCHAR in Kudu tables

2020-02-17 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14197 )

Change subject: IMPALA-5092 Add support for VARCHAR in Kudu tables
..


Patch Set 10:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/5345/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/14197
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0d4959410fdd882bfa980cb55e8a7837c7823da8
Gerrit-Change-Number: 14197
Gerrit-PatchSet: 10
Gerrit-Owner: Attila Bukor 
Gerrit-Reviewer: Attila Bukor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tamas Mate 
Gerrit-Comment-Date: Mon, 17 Feb 2020 08:03:21 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5092 Add support for VARCHAR in Kudu tables

2020-02-16 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14197 )

Change subject: IMPALA-5092 Add support for VARCHAR in Kudu tables
..


Patch Set 9:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14197/9/be/src/exec/kudu-util.cc
File be/src/exec/kudu-util.cc:

http://gerrit.cloudera.org:8080/#/c/14197/9/be/src/exec/kudu-util.cc@239
PS9, Line 239: case DataType::VARCHAR: return 
ColumnType::CreateVarcharType(type_attributes.length());
line too long (91 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/14197
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0d4959410fdd882bfa980cb55e8a7837c7823da8
Gerrit-Change-Number: 14197
Gerrit-PatchSet: 9
Gerrit-Owner: Attila Bukor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Mon, 17 Feb 2020 07:51:14 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5092 Add support for VARCHAR in Kudu tables

2020-02-16 Thread Attila Bukor (Code Review)
Hello Csaba Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/14197

to look at the new patch set (#10).

Change subject: IMPALA-5092 Add support for VARCHAR in Kudu tables
..

IMPALA-5092 Add support for VARCHAR in Kudu tables

KUDU-1938 added VARCHAR column type support to Kudu. This commit adds
support for Kudu's VARCHAR type to Impala as well.

Change-Id: I0d4959410fdd882bfa980cb55e8a7837c7823da8
---
M be/src/exec/kudu-util.cc
M fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/KuduUtil.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeKuduDDLTest.java
M testdata/workloads/functional-query/queries/QueryTest/kudu_create.test
M testdata/workloads/functional-query/queries/QueryTest/kudu_delete.test
M testdata/workloads/functional-query/queries/QueryTest/kudu_describe.test
M testdata/workloads/functional-query/queries/QueryTest/kudu_insert.test
M testdata/workloads/functional-query/queries/QueryTest/kudu_update.test
M testdata/workloads/functional-query/queries/QueryTest/kudu_upsert.test
M tests/query_test/test_kudu.py
12 files changed, 713 insertions(+), 637 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/97/14197/10
--
To view, visit http://gerrit.cloudera.org:8080/14197
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0d4959410fdd882bfa980cb55e8a7837c7823da8
Gerrit-Change-Number: 14197
Gerrit-PatchSet: 10
Gerrit-Owner: Attila Bukor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-5092 Add support for VARCHAR in Kudu tables

2020-02-16 Thread Attila Bukor (Code Review)
Hello Csaba Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/14197

to look at the new patch set (#9).

Change subject: IMPALA-5092 Add support for VARCHAR in Kudu tables
..

IMPALA-5092 Add support for VARCHAR in Kudu tables

KUDU-1938 added VARCHAR column type support to Kudu. This commit adds
support for Kudu's VARCHAR type to Impala as well.

Change-Id: I0d4959410fdd882bfa980cb55e8a7837c7823da8
---
M be/src/exec/kudu-util.cc
M fe/src/main/java/org/apache/impala/service/KuduCatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/KuduUtil.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeKuduDDLTest.java
M testdata/workloads/functional-query/queries/QueryTest/kudu_create.test
M testdata/workloads/functional-query/queries/QueryTest/kudu_delete.test
M testdata/workloads/functional-query/queries/QueryTest/kudu_describe.test
M testdata/workloads/functional-query/queries/QueryTest/kudu_insert.test
M testdata/workloads/functional-query/queries/QueryTest/kudu_update.test
M testdata/workloads/functional-query/queries/QueryTest/kudu_upsert.test
M tests/query_test/test_kudu.py
12 files changed, 712 insertions(+), 637 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/97/14197/9
--
To view, visit http://gerrit.cloudera.org:8080/14197
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0d4959410fdd882bfa980cb55e8a7837c7823da8
Gerrit-Change-Number: 14197
Gerrit-PatchSet: 9
Gerrit-Owner: Attila Bukor 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins