Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11268 )

Change subject: IMPALA-6373: Allow primitive type widening on parquet tables
......................................................................


Patch Set 1:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/11268/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11268/1//COMMIT_MSG@12
PS1, Line 12: tinyint
I think that it would be useful to mention that tinyint/smallint/int are all 
encoded as int32 in Parquet. For example the encoded size could be mentioned in 
parentheses.


http://gerrit.cloudera.org:8080/#/c/11268/1/be/src/exec/parquet-common.h
File be/src/exec/parquet-common.h:

http://gerrit.cloudera.org:8080/#/c/11268/1/be/src/exec/parquet-common.h@260
PS1, Line 260:   return ByteSize(*v);
Decode() should return the number of bytes read from the original buffer, so in 
this case not the size of int64_t but the size of int32_t. I checked some 
places where it is called (dict-encoding.h, parquet-column-readers.cc) and it 
is used to advance a pointer to the original data. The comment in this file at 
line 186 also supports this.

I guess that this didn't cause an issue because the test table has only one 
row. Note that two rows with distinct values are probably needed to cause an 
issue, because the default encoding is dictionary.


http://gerrit.cloudera.org:8080/#/c/11268/1/be/src/exec/parquet-common.h@271
PS1, Line 271: ByteSize
Same as line 260.


http://gerrit.cloudera.org:8080/#/c/11268/1/be/src/exec/parquet-common.h@282
PS1, Line 282:   return ByteSize(*v);
Same as line 260.


http://gerrit.cloudera.org:8080/#/c/11268/1/be/src/exec/parquet-plain-test.cc
File be/src/exec/parquet-plain-test.cc:

http://gerrit.cloudera.org:8080/#/c/11268/1/be/src/exec/parquet-plain-test.cc@113
PS1, Line 113:     int decoded_size = 
ParquetPlainEncoder::Decode<WidenInternalType, PARQUET_TYPE>(
This function should return the encoded size.


http://gerrit.cloudera.org:8080/#/c/11268/1/be/src/exec/parquet-plain-test.cc@145
PS1, Line 145:   int decoded_size = 
ParquetPlainEncoder::Decode<WidenInternalType, PARQUET_TYPE>(
Same as line 113.


http://gerrit.cloudera.org:8080/#/c/11268/1/testdata/data/README
File testdata/data/README:

http://gerrit.cloudera.org:8080/#/c/11268/1/testdata/data/README@192
PS1, Line 192: single row
Please add a second row with different values.



--
To view, visit http://gerrit.cloudera.org:8080/11268
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If93394b035c64cf6fc5f37b54d29c034cc1f86e4
Gerrit-Change-Number: 11268
Gerrit-PatchSet: 1
Gerrit-Owner: Fredy Wijaya <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Fredy Wijaya <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Tue, 21 Aug 2018 10:24:04 +0000
Gerrit-HasComments: Yes

Reply via email to