Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/22922 )

Change subject: IMPALA-13625: Allow reading Parquet int32/int64 as decimal 
without logical types
......................................................................


Patch Set 10:

(4 comments)

Thanks for working on this!

http://gerrit.cloudera.org:8080/#/c/22922/10/be/src/exec/parquet/parquet-data-converter.h
File be/src/exec/parquet/parquet-data-converter.h:

http://gerrit.cloudera.org:8080/#/c/22922/10/be/src/exec/parquet/parquet-data-converter.h@94
PS10, Line 94: GetPrecision() <= 0
This change might not needed if we change GetPrecision(), see comment in the 
.cc file.


http://gerrit.cloudera.org:8080/#/c/22922/10/be/src/exec/parquet/parquet-metadata-utils.cc
File be/src/exec/parquet/parquet-metadata-utils.cc:

http://gerrit.cloudera.org:8080/#/c/22922/10/be/src/exec/parquet/parquet-metadata-utils.cc@377
PS10, Line 377: GetPrecision(schema_element);
The code could be greatly simplified if we changed GetPrecision() to return 
ColumnType::MAX_DECIMAL4_PRECISION+1 and ColumnType::MAX_DECIMAL8_PRECISION+1 
when the schema_element.type is INT32 or INT64.

The downside is that we would lose the check for precision - scale < File 
precision - File scale. I think it is a bug that we don't check it right now, 
as for possible overflows we should raise an error and not return NULLs, see 
"Overflow behavior" at 
https://impala.apache.org/docs/build/html/topics/impala_decimal.html

Though adding this check might work current workloads.. Maybe we could just add 
the check when the precision is not explicitly set, so we don't break existing 
workloads.


http://gerrit.cloudera.org:8080/#/c/22922/10/be/src/exec/parquet/parquet-metadata-utils.cc@423
PS10, Line 423:     if (!is_converted_type_decimal) {
              :       // TODO: is this validation useful? It is not required at 
all to read the data and
              :       // might only serve to reject otherwise perfectly 
readable files.
              :       ErrorMsg msg(TErrorCode::PARQUET_BAD_CONVERTED_TYPE, 
filename,
              :           schema_element.name);
              :       RETURN_IF_ERROR(state->LogOrReturnError(msg));
I think we shouldn't report an error from now on if the file type is 
INT32/INT64.


http://gerrit.cloudera.org:8080/#/c/22922/10/testdata/workloads/functional-query/queries/QueryTest/parquet-type-widening.test
File 
testdata/workloads/functional-query/queries/QueryTest/parquet-type-widening.test:

http://gerrit.cloudera.org:8080/#/c/22922/10/testdata/workloads/functional-query/queries/QueryTest/parquet-type-widening.test@11
PS10, Line 11: # IMPALA-13625: Allow reading Parquet int32/int64 as decimal 
without logical types
Please add tests for INT_MIN, INT_MAX, and INT64_MIN, INT64_MAX.



--
To view, visit http://gerrit.cloudera.org:8080/22922
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I56006eb3cca28c81ec8467d77b35005fbf669680
Gerrit-Change-Number: 22922
Gerrit-PatchSet: 10
Gerrit-Owner: Daniel Vanko <[email protected]>
Gerrit-Reviewer: Daniel Vanko <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Thu, 19 Jun 2025 14:08:01 +0000
Gerrit-HasComments: Yes

Reply via email to