Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12163 )

Change subject: IMPALA-7087: Read Parquet decimal columns with lower 
precision/scale
......................................................................


Patch Set 1:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/12163/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/12163/1//COMMIT_MSG@25
PS1, Line 25: If the underlying Parquet files contain rows that cannot be 
converted to
            : the higher scale without overflow, an error is thrown and the 
query
            : fails. This is different from other engines such as Hive which 
set the
            : row to NULL and allow the query to run to completion without 
error.
I do not know the use case, so I may be wrong, but I dislike the idea of 
failing the query if the Parquet file is not corrupt and only the data is "too 
large". If a file with such data is added to a table, the table can become 
unusable and Impala offers no tool to get rid of the problematic file.

Would it make sense to be stricter during scale + precision checking to exclude 
the possibility of overflow? If I understand correctly, overflow can only occur 
if the scale in the file is lower then the expected scale: e.g value 2 in 
decimal (1,0) would overflow when converted to decimal(1, 1). If precision_file 
- scale_file <= precision_expected - scale_expected, then the conversion should 
happen without overflow, shouldn't it?


http://gerrit.cloudera.org:8080/#/c/12163/1/be/src/exec/parquet/parquet-metadata-utils.cc
File be/src/exec/parquet/parquet-metadata-utils.cc:

http://gerrit.cloudera.org:8080/#/c/12163/1/be/src/exec/parquet/parquet-metadata-utils.cc@243
PS1, Line 243: // TODO: we could allow a mismatch and do a conversion at this 
step.
This TODO could be probably updated.


http://gerrit.cloudera.org:8080/#/c/12163/1/be/src/exec/parquet/parquet-metadata-utils.cc@255
PS1, Line 255: // TODO: we could allow a mismatch and do a conversion at this 
step.
This TODO could be probably updated.


http://gerrit.cloudera.org:8080/#/c/12163/1/testdata/workloads/functional-query/queries/QueryTest/parquet-decimal-precision-and-scale-widening.test
File 
testdata/workloads/functional-query/queries/QueryTest/parquet-decimal-precision-and-scale-widening.test:

http://gerrit.cloudera.org:8080/#/c/12163/1/testdata/workloads/functional-query/queries/QueryTest/parquet-decimal-precision-and-scale-widening.test@85
PS1, Line 85: ====
Please add some tests with 'where col=const' clauses to exercise the dictionary 
and stat filtering code paths.

Dictionary filtering is disabled if 'needs_conversion_' is true, but it may be 
enabled in the future if conversion will be moved to dictionary construction.

I think that stat filtering won't work correctly with the current 
implementation - the stat value will be interpreted as if it had the same 
precision/scale as the SQL column, which can lead to incorrectly dropping row 
groups.

I had to deal with somewhat similar problems in 
https://gerrit.cloudera.org/#/c/11057/


http://gerrit.cloudera.org:8080/#/c/12163/1/tests/query_test/test_scanners.py
File tests/query_test/test_scanners.py:

http://gerrit.cloudera.org:8080/#/c/12163/1/tests/query_test/test_scanners.py@842
PS1, Line 842: decimal_stored_as_int32.parquet
I do not see this file in the patch and it is not mentioned in the README.


http://gerrit.cloudera.org:8080/#/c/12163/1/tests/query_test/test_scanners.py@848
PS1, Line 848: decimal_stored_as_int64.parquet
I do not see this file in the patch and it is not mentioned in the README.


http://gerrit.cloudera.org:8080/#/c/12163/1/tests/query_test/test_scanners.py@856
PS1, Line 856: binary_decimal_no_dictionary.parquet
I do not see this file in the patch and it is not mentioned in the README.



--
To view, visit http://gerrit.cloudera.org:8080/12163
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iafc8efd12379a39756e3e70f022a81a636dadb61
Gerrit-Change-Number: 12163
Gerrit-PatchSet: 1
Gerrit-Owner: Sahil Takiar <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Lars Volker <[email protected]>
Gerrit-Comment-Date: Mon, 07 Jan 2019 15:14:25 +0000
Gerrit-HasComments: Yes

Reply via email to