[Impala-ASF-CR] IMPALA-5052: Read and write signed integer logical types in Parquet

2018-01-08 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/8548 )

Change subject: IMPALA-5052: Read and write signed integer logical types in 
Parquet
..

IMPALA-5052: Read and write signed integer logical types in Parquet

This patch maps a signed integer logical type in parquet to a supported
Impala column type. This change introduces the following mapping -

  INT_8  -> TINYINT
  INT_16 -> SMALLINT
  INT_32 -> INT
  INT_64 -> BIGINT

Also, added a parquet file with the following schema for testing -

  schema {
optional int32 id;
optional int32 tinyint_col (INT_8);
optional int32 smallint_col (INT_16);
optional int32 int_col;
optional int64 bigint_col;
  }

Change-Id: I47a8371858c9597c6a440808cf6f933532468927
Reviewed-on: http://gerrit.cloudera.org:8080/8548
Reviewed-by: Tim Armstrong 
Reviewed-by: Tianyi Wang 
Tested-by: Impala Public Jenkins
---
M be/src/exec/hdfs-parquet-table-writer.cc
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java
M testdata/data/README
A testdata/data/signed_integer_logical_types.parquet
M tests/query_test/test_insert_parquet.py
5 files changed, 99 insertions(+), 1 deletion(-)

Approvals:
  Tim Armstrong: Looks good to me, but someone else must approve
  Tianyi Wang: Looks good to me, approved
  Impala Public Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/8548
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I47a8371858c9597c6a440808cf6f933532468927
Gerrit-Change-Number: 8548
Gerrit-PatchSet: 4
Gerrit-Owner: anujphadke 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 


[Impala-ASF-CR] IMPALA-5052: Read and write signed integer logical types in Parquet

2018-01-08 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8548 )

Change subject: IMPALA-5052: Read and write signed integer logical types in 
Parquet
..


Patch Set 3: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/8548
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I47a8371858c9597c6a440808cf6f933532468927
Gerrit-Change-Number: 8548
Gerrit-PatchSet: 3
Gerrit-Owner: anujphadke 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-Comment-Date: Tue, 09 Jan 2018 04:55:58 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5052: Read and write signed integer logical types in Parquet

2018-01-08 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8548 )

Change subject: IMPALA-5052: Read and write signed integer logical types in 
Parquet
..


Patch Set 3:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1686/


--
To view, visit http://gerrit.cloudera.org:8080/8548
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I47a8371858c9597c6a440808cf6f933532468927
Gerrit-Change-Number: 8548
Gerrit-PatchSet: 3
Gerrit-Owner: anujphadke 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-Comment-Date: Tue, 09 Jan 2018 01:16:39 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5052: Read and write signed integer logical types in Parquet

2018-01-08 Thread Tianyi Wang (Code Review)
Tianyi Wang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8548 )

Change subject: IMPALA-5052: Read and write signed integer logical types in 
Parquet
..


Patch Set 3: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/8548
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I47a8371858c9597c6a440808cf6f933532468927
Gerrit-Change-Number: 8548
Gerrit-PatchSet: 3
Gerrit-Owner: anujphadke 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-Comment-Date: Mon, 08 Jan 2018 23:43:54 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5052: Read and write signed integer logical types in Parquet

2018-01-08 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8548 )

Change subject: IMPALA-5052: Read and write signed integer logical types in 
Parquet
..


Patch Set 3: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/8548
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I47a8371858c9597c6a440808cf6f933532468927
Gerrit-Change-Number: 8548
Gerrit-PatchSet: 3
Gerrit-Owner: anujphadke 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-Comment-Date: Mon, 08 Jan 2018 23:11:26 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5052: Read and write signed integer logical types in Parquet

2018-01-08 Thread anujphadke (Code Review)
anujphadke has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8548 )

Change subject: IMPALA-5052: Read and write signed integer logical types in 
Parquet
..


Patch Set 3:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/8548/2/tests/query_test/test_insert_parquet.py
File tests/query_test/test_insert_parquet.py:

http://gerrit.cloudera.org:8080/#/c/8548/2/tests/query_test/test_insert_parquet.py@310
PS2, Line 310:   if line_split[0] == "id":
> Can you make this an elif chain and assert that the column name is one of t
Done


http://gerrit.cloudera.org:8080/#/c/8548/2/tests/query_test/test_insert_parquet.py@339
PS2, Line 339:   if line_split[0] == "id":
> Same here
Done


http://gerrit.cloudera.org:8080/#/c/8548/2/tests/query_test/test_insert_parquet.py@349
PS2, Line 349:
> nit: missing space after %
Done


http://gerrit.cloudera.org:8080/#/c/8548/2/tests/query_test/test_insert_parquet.py@351
PS2, Line 351: c
> same here
Done



--
To view, visit http://gerrit.cloudera.org:8080/8548
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I47a8371858c9597c6a440808cf6f933532468927
Gerrit-Change-Number: 8548
Gerrit-PatchSet: 3
Gerrit-Owner: anujphadke 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-Comment-Date: Mon, 08 Jan 2018 22:57:48 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5052: Read and write signed integer logical types in Parquet

2018-01-08 Thread anujphadke (Code Review)
Hello Tianyi Wang, Tim Armstrong,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8548

to look at the new patch set (#3).

Change subject: IMPALA-5052: Read and write signed integer logical types in 
Parquet
..

IMPALA-5052: Read and write signed integer logical types in Parquet

This patch maps a signed integer logical type in parquet to a supported
Impala column type. This change introduces the following mapping -

  INT_8  -> TINYINT
  INT_16 -> SMALLINT
  INT_32 -> INT
  INT_64 -> BIGINT

Also, added a parquet file with the following schema for testing -

  schema {
optional int32 id;
optional int32 tinyint_col (INT_8);
optional int32 smallint_col (INT_16);
optional int32 int_col;
optional int64 bigint_col;
  }

Change-Id: I47a8371858c9597c6a440808cf6f933532468927
---
M be/src/exec/hdfs-parquet-table-writer.cc
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java
M testdata/data/README
A testdata/data/signed_integer_logical_types.parquet
M tests/query_test/test_insert_parquet.py
5 files changed, 99 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/8548/3
--
To view, visit http://gerrit.cloudera.org:8080/8548
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I47a8371858c9597c6a440808cf6f933532468927
Gerrit-Change-Number: 8548
Gerrit-PatchSet: 3
Gerrit-Owner: anujphadke 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 


[Impala-ASF-CR] IMPALA-5052: Read and write signed integer logical types in Parquet

2018-01-05 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8548 )

Change subject: IMPALA-5052: Read and write signed integer logical types in 
Parquet
..


Patch Set 2: Code-Review+1

(4 comments)

Thanks, this is looking good. Couple of minor comments then I'll let tianyi +2

http://gerrit.cloudera.org:8080/#/c/8548/2/tests/query_test/test_insert_parquet.py
File tests/query_test/test_insert_parquet.py:

http://gerrit.cloudera.org:8080/#/c/8548/2/tests/query_test/test_insert_parquet.py@310
PS2, Line 310:   if line_split[0] == "id":
Can you make this an elif chain and assert that the column name is one of the 
expected column name. Just to make it less likely that an error in the tests 
results in it silently failing. I.e. something like

  if line_split[0] == 'id':
...
  elif line_split[0] == 'int':
  ...
  else:
assert line_split[0] == 'bigint_col'
...


http://gerrit.cloudera.org:8080/#/c/8548/2/tests/query_test/test_insert_parquet.py@339
PS2, Line 339:   if line_split[0] == "tinyint_col":
Same here


http://gerrit.cloudera.org:8080/#/c/8548/2/tests/query_test/test_insert_parquet.py@349
PS2, Line 349: %src_tbl)
nit: missing space after %


http://gerrit.cloudera.org:8080/#/c/8548/2/tests/query_test/test_insert_parquet.py@351
PS2, Line 351: %
same here



--
To view, visit http://gerrit.cloudera.org:8080/8548
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I47a8371858c9597c6a440808cf6f933532468927
Gerrit-Change-Number: 8548
Gerrit-PatchSet: 2
Gerrit-Owner: anujphadke 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-Comment-Date: Fri, 05 Jan 2018 19:09:19 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5052: Read and write signed integer logical types in Parquet

2018-01-04 Thread anujphadke (Code Review)
anujphadke has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8548 )

Change subject: IMPALA-5052: Read and write signed integer logical types in 
Parquet
..


Patch Set 2:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/8548/1/testdata/data/signed_integer_logical_types.parquet
File testdata/data/signed_integer_logical_types.parquet:

PS1:
> Can you add a description of this file to the readme - i.e. what it has in
Done


http://gerrit.cloudera.org:8080/#/c/8548/1/tests/query_test/test_insert_parquet.py
File tests/query_test/test_insert_parquet.py:

http://gerrit.cloudera.org:8080/#/c/8548/1/tests/query_test/test_insert_parquet.py@295
PS1, Line 295: column
> column
Done


http://gerrit.cloudera.org:8080/#/c/8548/1/tests/query_test/test_insert_parquet.py@303
PS1, Line 303: stored as parquet""".format(src_tbl, hdfs_path)
> Why is there a space after {1}?
Removed


http://gerrit.cloudera.org:8080/#/c/8548/1/tests/query_test/test_insert_parquet.py@321
PS1, Line 321: result = self.execute_query_expect_success(self.client, 
insert_stmt)
> remove semicolon
Done


http://gerrit.cloudera.org:8080/#/c/8548/1/tests/query_test/test_insert_parquet.py@325
PS1, Line 325: ame values in
> should these (above and below as well) be execute_query_expect_success?
Done


http://gerrit.cloudera.org:8080/#/c/8548/1/tests/query_test/test_insert_parquet.py@331
PS1, Line 331: dst_tbl = "{0}.{1}".format(unique_database, 
"read_write_logical_type_dst")
> +1. I think the test would be a little easier to understand too if we asser
Done



--
To view, visit http://gerrit.cloudera.org:8080/8548
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I47a8371858c9597c6a440808cf6f933532468927
Gerrit-Change-Number: 8548
Gerrit-PatchSet: 2
Gerrit-Owner: anujphadke 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: anujphadke 
Gerrit-Comment-Date: Fri, 05 Jan 2018 02:45:33 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5052: Read and write signed integer logical types in Parquet

2018-01-04 Thread anujphadke (Code Review)
Hello Tianyi Wang, Tim Armstrong,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8548

to look at the new patch set (#2).

Change subject: IMPALA-5052: Read and write signed integer logical types in 
Parquet
..

IMPALA-5052: Read and write signed integer logical types in Parquet

This patch maps a signed integer logical type in parquet to a supported
Impala column type. This change introduces the following mapping -

  INT_8  -> TINYINT
  INT_16 -> SMALLINT
  INT_32 -> INT
  INT_64 -> BIGINT

Also, added a parquet file with the following schema for testing -

  schema {
optional int32 id;
optional int32 tinyint_col (INT_8);
optional int32 smallint_col (INT_16);
optional int32 int_col;
optional int64 bigint_col;
  }

Change-Id: I47a8371858c9597c6a440808cf6f933532468927
---
M be/src/exec/hdfs-parquet-table-writer.cc
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java
M testdata/data/README
A testdata/data/signed_integer_logical_types.parquet
M tests/query_test/test_insert_parquet.py
5 files changed, 94 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/8548/2
--
To view, visit http://gerrit.cloudera.org:8080/8548
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I47a8371858c9597c6a440808cf6f933532468927
Gerrit-Change-Number: 8548
Gerrit-PatchSet: 2
Gerrit-Owner: anujphadke 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5052: Read and write signed integer logical types in Parquet

2018-01-03 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8548 )

Change subject: IMPALA-5052: Read and write signed integer logical types in 
Parquet
..


Patch Set 1:

Ping?


--
To view, visit http://gerrit.cloudera.org:8080/8548
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I47a8371858c9597c6a440808cf6f933532468927
Gerrit-Change-Number: 8548
Gerrit-PatchSet: 1
Gerrit-Owner: anujphadke 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 03 Jan 2018 19:15:39 +
Gerrit-HasComments: No