[Impala-ASF-CR] IMPALA-5052: Read and write signed integer logical types in Parquet
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/8548 ) Change subject: IMPALA-5052: Read and write signed integer logical types in Parquet .. IMPALA-5052: Read and write signed integer logical types in Parquet This patch maps a signed integer logical type in parquet to a supported Impala column type. This change introduces the following mapping - INT_8 -> TINYINT INT_16 -> SMALLINT INT_32 -> INT INT_64 -> BIGINT Also, added a parquet file with the following schema for testing - schema { optional int32 id; optional int32 tinyint_col (INT_8); optional int32 smallint_col (INT_16); optional int32 int_col; optional int64 bigint_col; } Change-Id: I47a8371858c9597c6a440808cf6f933532468927 Reviewed-on: http://gerrit.cloudera.org:8080/8548 Reviewed-by: Tim ArmstrongReviewed-by: Tianyi Wang Tested-by: Impala Public Jenkins --- M be/src/exec/hdfs-parquet-table-writer.cc M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java M testdata/data/README A testdata/data/signed_integer_logical_types.parquet M tests/query_test/test_insert_parquet.py 5 files changed, 99 insertions(+), 1 deletion(-) Approvals: Tim Armstrong: Looks good to me, but someone else must approve Tianyi Wang: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/8548 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I47a8371858c9597c6a440808cf6f933532468927 Gerrit-Change-Number: 8548 Gerrit-PatchSet: 4 Gerrit-Owner: anujphadke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke
[Impala-ASF-CR] IMPALA-5052: Read and write signed integer logical types in Parquet
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8548 ) Change subject: IMPALA-5052: Read and write signed integer logical types in Parquet .. Patch Set 3: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/8548 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I47a8371858c9597c6a440808cf6f933532468927 Gerrit-Change-Number: 8548 Gerrit-PatchSet: 3 Gerrit-Owner: anujphadkeGerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-Comment-Date: Tue, 09 Jan 2018 04:55:58 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5052: Read and write signed integer logical types in Parquet
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8548 ) Change subject: IMPALA-5052: Read and write signed integer logical types in Parquet .. Patch Set 3: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1686/ -- To view, visit http://gerrit.cloudera.org:8080/8548 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I47a8371858c9597c6a440808cf6f933532468927 Gerrit-Change-Number: 8548 Gerrit-PatchSet: 3 Gerrit-Owner: anujphadkeGerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-Comment-Date: Tue, 09 Jan 2018 01:16:39 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5052: Read and write signed integer logical types in Parquet
Tianyi Wang has posted comments on this change. ( http://gerrit.cloudera.org:8080/8548 ) Change subject: IMPALA-5052: Read and write signed integer logical types in Parquet .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/8548 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I47a8371858c9597c6a440808cf6f933532468927 Gerrit-Change-Number: 8548 Gerrit-PatchSet: 3 Gerrit-Owner: anujphadkeGerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-Comment-Date: Mon, 08 Jan 2018 23:43:54 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5052: Read and write signed integer logical types in Parquet
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8548 ) Change subject: IMPALA-5052: Read and write signed integer logical types in Parquet .. Patch Set 3: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/8548 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I47a8371858c9597c6a440808cf6f933532468927 Gerrit-Change-Number: 8548 Gerrit-PatchSet: 3 Gerrit-Owner: anujphadkeGerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-Comment-Date: Mon, 08 Jan 2018 23:11:26 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5052: Read and write signed integer logical types in Parquet
anujphadke has posted comments on this change. ( http://gerrit.cloudera.org:8080/8548 ) Change subject: IMPALA-5052: Read and write signed integer logical types in Parquet .. Patch Set 3: (4 comments) http://gerrit.cloudera.org:8080/#/c/8548/2/tests/query_test/test_insert_parquet.py File tests/query_test/test_insert_parquet.py: http://gerrit.cloudera.org:8080/#/c/8548/2/tests/query_test/test_insert_parquet.py@310 PS2, Line 310: if line_split[0] == "id": > Can you make this an elif chain and assert that the column name is one of t Done http://gerrit.cloudera.org:8080/#/c/8548/2/tests/query_test/test_insert_parquet.py@339 PS2, Line 339: if line_split[0] == "id": > Same here Done http://gerrit.cloudera.org:8080/#/c/8548/2/tests/query_test/test_insert_parquet.py@349 PS2, Line 349: > nit: missing space after % Done http://gerrit.cloudera.org:8080/#/c/8548/2/tests/query_test/test_insert_parquet.py@351 PS2, Line 351: c > same here Done -- To view, visit http://gerrit.cloudera.org:8080/8548 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I47a8371858c9597c6a440808cf6f933532468927 Gerrit-Change-Number: 8548 Gerrit-PatchSet: 3 Gerrit-Owner: anujphadkeGerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-Comment-Date: Mon, 08 Jan 2018 22:57:48 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5052: Read and write signed integer logical types in Parquet
Hello Tianyi Wang, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8548 to look at the new patch set (#3). Change subject: IMPALA-5052: Read and write signed integer logical types in Parquet .. IMPALA-5052: Read and write signed integer logical types in Parquet This patch maps a signed integer logical type in parquet to a supported Impala column type. This change introduces the following mapping - INT_8 -> TINYINT INT_16 -> SMALLINT INT_32 -> INT INT_64 -> BIGINT Also, added a parquet file with the following schema for testing - schema { optional int32 id; optional int32 tinyint_col (INT_8); optional int32 smallint_col (INT_16); optional int32 int_col; optional int64 bigint_col; } Change-Id: I47a8371858c9597c6a440808cf6f933532468927 --- M be/src/exec/hdfs-parquet-table-writer.cc M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java M testdata/data/README A testdata/data/signed_integer_logical_types.parquet M tests/query_test/test_insert_parquet.py 5 files changed, 99 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/8548/3 -- To view, visit http://gerrit.cloudera.org:8080/8548 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I47a8371858c9597c6a440808cf6f933532468927 Gerrit-Change-Number: 8548 Gerrit-PatchSet: 3 Gerrit-Owner: anujphadkeGerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke
[Impala-ASF-CR] IMPALA-5052: Read and write signed integer logical types in Parquet
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8548 ) Change subject: IMPALA-5052: Read and write signed integer logical types in Parquet .. Patch Set 2: Code-Review+1 (4 comments) Thanks, this is looking good. Couple of minor comments then I'll let tianyi +2 http://gerrit.cloudera.org:8080/#/c/8548/2/tests/query_test/test_insert_parquet.py File tests/query_test/test_insert_parquet.py: http://gerrit.cloudera.org:8080/#/c/8548/2/tests/query_test/test_insert_parquet.py@310 PS2, Line 310: if line_split[0] == "id": Can you make this an elif chain and assert that the column name is one of the expected column name. Just to make it less likely that an error in the tests results in it silently failing. I.e. something like if line_split[0] == 'id': ... elif line_split[0] == 'int': ... else: assert line_split[0] == 'bigint_col' ... http://gerrit.cloudera.org:8080/#/c/8548/2/tests/query_test/test_insert_parquet.py@339 PS2, Line 339: if line_split[0] == "tinyint_col": Same here http://gerrit.cloudera.org:8080/#/c/8548/2/tests/query_test/test_insert_parquet.py@349 PS2, Line 349: %src_tbl) nit: missing space after % http://gerrit.cloudera.org:8080/#/c/8548/2/tests/query_test/test_insert_parquet.py@351 PS2, Line 351: % same here -- To view, visit http://gerrit.cloudera.org:8080/8548 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I47a8371858c9597c6a440808cf6f933532468927 Gerrit-Change-Number: 8548 Gerrit-PatchSet: 2 Gerrit-Owner: anujphadkeGerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-Comment-Date: Fri, 05 Jan 2018 19:09:19 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5052: Read and write signed integer logical types in Parquet
anujphadke has posted comments on this change. ( http://gerrit.cloudera.org:8080/8548 ) Change subject: IMPALA-5052: Read and write signed integer logical types in Parquet .. Patch Set 2: (6 comments) http://gerrit.cloudera.org:8080/#/c/8548/1/testdata/data/signed_integer_logical_types.parquet File testdata/data/signed_integer_logical_types.parquet: PS1: > Can you add a description of this file to the readme - i.e. what it has in Done http://gerrit.cloudera.org:8080/#/c/8548/1/tests/query_test/test_insert_parquet.py File tests/query_test/test_insert_parquet.py: http://gerrit.cloudera.org:8080/#/c/8548/1/tests/query_test/test_insert_parquet.py@295 PS1, Line 295: column > column Done http://gerrit.cloudera.org:8080/#/c/8548/1/tests/query_test/test_insert_parquet.py@303 PS1, Line 303: stored as parquet""".format(src_tbl, hdfs_path) > Why is there a space after {1}? Removed http://gerrit.cloudera.org:8080/#/c/8548/1/tests/query_test/test_insert_parquet.py@321 PS1, Line 321: result = self.execute_query_expect_success(self.client, insert_stmt) > remove semicolon Done http://gerrit.cloudera.org:8080/#/c/8548/1/tests/query_test/test_insert_parquet.py@325 PS1, Line 325: ame values in > should these (above and below as well) be execute_query_expect_success? Done http://gerrit.cloudera.org:8080/#/c/8548/1/tests/query_test/test_insert_parquet.py@331 PS1, Line 331: dst_tbl = "{0}.{1}".format(unique_database, "read_write_logical_type_dst") > +1. I think the test would be a little easier to understand too if we asser Done -- To view, visit http://gerrit.cloudera.org:8080/8548 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I47a8371858c9597c6a440808cf6f933532468927 Gerrit-Change-Number: 8548 Gerrit-PatchSet: 2 Gerrit-Owner: anujphadkeGerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: anujphadke Gerrit-Comment-Date: Fri, 05 Jan 2018 02:45:33 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5052: Read and write signed integer logical types in Parquet
Hello Tianyi Wang, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8548 to look at the new patch set (#2). Change subject: IMPALA-5052: Read and write signed integer logical types in Parquet .. IMPALA-5052: Read and write signed integer logical types in Parquet This patch maps a signed integer logical type in parquet to a supported Impala column type. This change introduces the following mapping - INT_8 -> TINYINT INT_16 -> SMALLINT INT_32 -> INT INT_64 -> BIGINT Also, added a parquet file with the following schema for testing - schema { optional int32 id; optional int32 tinyint_col (INT_8); optional int32 smallint_col (INT_16); optional int32 int_col; optional int64 bigint_col; } Change-Id: I47a8371858c9597c6a440808cf6f933532468927 --- M be/src/exec/hdfs-parquet-table-writer.cc M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java M testdata/data/README A testdata/data/signed_integer_logical_types.parquet M tests/query_test/test_insert_parquet.py 5 files changed, 94 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/8548/2 -- To view, visit http://gerrit.cloudera.org:8080/8548 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I47a8371858c9597c6a440808cf6f933532468927 Gerrit-Change-Number: 8548 Gerrit-PatchSet: 2 Gerrit-Owner: anujphadkeGerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5052: Read and write signed integer logical types in Parquet
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8548 ) Change subject: IMPALA-5052: Read and write signed integer logical types in Parquet .. Patch Set 1: Ping? -- To view, visit http://gerrit.cloudera.org:8080/8548 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I47a8371858c9597c6a440808cf6f933532468927 Gerrit-Change-Number: 8548 Gerrit-PatchSet: 1 Gerrit-Owner: anujphadkeGerrit-Reviewer: Tianyi Wang Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 03 Jan 2018 19:15:39 + Gerrit-HasComments: No