Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15087 )

Change subject: IMPALA-8110: Fix the Parquet stats filtering issue to correctly 
handle narrowed int types
......................................................................


Patch Set 3:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/15087/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/15087/3//COMMIT_MSG@10
PS3, Line 10: add
nit: adds


http://gerrit.cloudera.org:8080/#/c/15087/3//COMMIT_MSG@14
PS3, Line 14: Manual test
Please create an automatic test for this. Checking the example in the jira with 
<, =, > predicates + the sames with smallint instead of int seems enough to me.

A possible place for the test: 
https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test


http://gerrit.cloudera.org:8080/#/c/15087/3//COMMIT_MSG@16
PS3, Line 16:    smallint (int16), make sure the query return correct number of 
rows when
nit: please wrap the commit message at 72


http://gerrit.cloudera.org:8080/#/c/15087/3/be/src/exec/parquet/parquet-column-stats.cc
File be/src/exec/parquet/parquet-column-stats.cc:

http://gerrit.cloudera.org:8080/#/c/15087/3/be/src/exec/parquet/parquet-column-stats.cc@54
PS3, Line 54: stats.__isset.min_value
This check is needed with max_value before using stats.max_value for paired 
stats


http://gerrit.cloudera.org:8080/#/c/15087/3/be/src/exec/parquet/parquet-column-stats.cc@106
PS3, Line 106: check_paired_stats
I think that we should return false even if the paired stats do not exist - 
e.g. assume the case in the jira with [1,201] where 201 was not written to the 
stats for some reason (I do not know of such a writer) and the predicate is "i 
< 1". Even if 1 was the smallest value as int32, large values that overflow 
during the conversion can become negative, so dropping the whole row group is 
not correct.

So I think that min/max values can be only valid if both the stat and its 
paired stat exist and within the valid range.



--
To view, visit http://gerrit.cloudera.org:8080/15087
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id8bdaf4c4b2d0c6ea26d6e9bf013afca647e53a1
Gerrit-Change-Number: 15087
Gerrit-PatchSet: 3
Gerrit-Owner: Wenzhe Zhou <[email protected]>
Gerrit-Reviewer: Bikramjeet Vig <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Comment-Date: Mon, 27 Jan 2020 16:58:30 +0000
Gerrit-HasComments: Yes

Reply via email to