Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/15087 )
Change subject: IMPALA-8110: Fix the Parquet stats filtering issue to correctly handle narrowed int types ...................................................................... Patch Set 3: (5 comments) http://gerrit.cloudera.org:8080/#/c/15087/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/15087/3//COMMIT_MSG@10 PS3, Line 10: add nit: adds http://gerrit.cloudera.org:8080/#/c/15087/3//COMMIT_MSG@14 PS3, Line 14: Manual test Please create an automatic test for this. Checking the example in the jira with <, =, > predicates + the sames with smallint instead of int seems enough to me. A possible place for the test: https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test http://gerrit.cloudera.org:8080/#/c/15087/3//COMMIT_MSG@16 PS3, Line 16: smallint (int16), make sure the query return correct number of rows when nit: please wrap the commit message at 72 http://gerrit.cloudera.org:8080/#/c/15087/3/be/src/exec/parquet/parquet-column-stats.cc File be/src/exec/parquet/parquet-column-stats.cc: http://gerrit.cloudera.org:8080/#/c/15087/3/be/src/exec/parquet/parquet-column-stats.cc@54 PS3, Line 54: stats.__isset.min_value This check is needed with max_value before using stats.max_value for paired stats http://gerrit.cloudera.org:8080/#/c/15087/3/be/src/exec/parquet/parquet-column-stats.cc@106 PS3, Line 106: check_paired_stats I think that we should return false even if the paired stats do not exist - e.g. assume the case in the jira with [1,201] where 201 was not written to the stats for some reason (I do not know of such a writer) and the predicate is "i < 1". Even if 1 was the smallest value as int32, large values that overflow during the conversion can become negative, so dropping the whole row group is not correct. So I think that min/max values can be only valid if both the stat and its paired stat exist and within the valid range. -- To view, visit http://gerrit.cloudera.org:8080/15087 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id8bdaf4c4b2d0c6ea26d6e9bf013afca647e53a1 Gerrit-Change-Number: 15087 Gerrit-PatchSet: 3 Gerrit-Owner: Wenzhe Zhou <[email protected]> Gerrit-Reviewer: Bikramjeet Vig <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Comment-Date: Mon, 27 Jan 2020 16:58:30 +0000 Gerrit-HasComments: Yes
