[Impala-ASF-CR] IMPALA-6538: Fix read path when Parquet min/max statistics contain NaN
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/9358 ) Change subject: IMPALA-6538: Fix read path when Parquet min/max statistics contain NaN .. Patch Set 9: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/9358 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If3897fc1426541239223670812f59e2bed32f455 Gerrit-Change-Number: 9358 Gerrit-PatchSet: 9 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: Zoltan Ivanfi Gerrit-Comment-Date: Thu, 22 Feb 2018 00:57:44 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6538: Fix read path when Parquet min/max statistics contain NaN
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/9358 ) Change subject: IMPALA-6538: Fix read path when Parquet min/max statistics contain NaN .. IMPALA-6538: Fix read path when Parquet min/max statistics contain NaN If the first number in a row group written by Impala is NaN, then Impala writes incorrect statistics in the metadata. This will result in incorrect results when filtering the data. This commit fixes the read path when encountering NaNs in Parquet min/max statistics. If min and max are both NaN, we can't use the statistics at all. If only one of them is NaN, the other still can be used. I added some tests to QueryTest/parqet-stats.test Change-Id: If3897fc1426541239223670812f59e2bed32f455 Reviewed-on: http://gerrit.cloudera.org:8080/9358 Reviewed-by: Tim Armstrong Tested-by: Impala Public Jenkins --- M be/src/exec/parquet-column-stats.cc M testdata/data/README A testdata/data/min_max_is_nan.parquet A testdata/workloads/functional-query/queries/QueryTest/parquet-invalid-minmax-stats.test M testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test M tests/query_test/test_parquet_stats.py 6 files changed, 137 insertions(+), 2 deletions(-) Approvals: Tim Armstrong: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/9358 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: If3897fc1426541239223670812f59e2bed32f455 Gerrit-Change-Number: 9358 Gerrit-PatchSet: 10 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: Zoltan Ivanfi
[Impala-ASF-CR] IMPALA-6538: Fix read path when Parquet min/max statistics contain NaN
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/9358 ) Change subject: IMPALA-6538: Fix read path when Parquet min/max statistics contain NaN .. Patch Set 9: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1976/ -- To view, visit http://gerrit.cloudera.org:8080/9358 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If3897fc1426541239223670812f59e2bed32f455 Gerrit-Change-Number: 9358 Gerrit-PatchSet: 9 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: Zoltan Ivanfi Gerrit-Comment-Date: Wed, 21 Feb 2018 21:17:13 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6538: Fix read path when Parquet min/max statistics contain NaN
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/9358 ) Change subject: IMPALA-6538: Fix read path when Parquet min/max statistics contain NaN .. Patch Set 9: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/9358 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If3897fc1426541239223670812f59e2bed32f455 Gerrit-Change-Number: 9358 Gerrit-PatchSet: 9 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: Zoltan Ivanfi Gerrit-Comment-Date: Wed, 21 Feb 2018 21:16:58 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6538: Fix read path when Parquet min/max statistics contain NaN
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/9358 ) Change subject: IMPALA-6538: Fix read path when Parquet min/max statistics contain NaN .. Patch Set 8: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/9358 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If3897fc1426541239223670812f59e2bed32f455 Gerrit-Change-Number: 9358 Gerrit-PatchSet: 8 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: Zoltan Ivanfi Gerrit-Comment-Date: Wed, 21 Feb 2018 21:16:52 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6538: Fix read path when Parquet min/max statistics contain NaN
Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/9358 ) Change subject: IMPALA-6538: Fix read path when Parquet min/max statistics contain NaN .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/9358/6/testdata/data/README File testdata/data/README: http://gerrit.cloudera.org:8080/#/c/9358/6/testdata/data/README@153 PS6, Line 153: hacking wasn't this the default behavior before this fix? -- To view, visit http://gerrit.cloudera.org:8080/9358 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If3897fc1426541239223670812f59e2bed32f455 Gerrit-Change-Number: 9358 Gerrit-PatchSet: 6 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: Zoltan Ivanfi Gerrit-Comment-Date: Wed, 21 Feb 2018 19:33:21 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6538: Fix read path when Parquet min/max statistics contain NaN
Hello Lars Volker, Gabor Kaszab, Zoltan Ivanfi, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9358 to look at the new patch set (#7). Change subject: IMPALA-6538: Fix read path when Parquet min/max statistics contain NaN .. IMPALA-6538: Fix read path when Parquet min/max statistics contain NaN If the first number in a row group written by Impala is NaN, then Impala writes incorrect statistics in the metadata. This will result in incorrect results when filtering the data. This commit fixes the read path when encountering NaNs in Parquet min/max statistics. If min and max are both NaN, we can't use the statistics at all. If only one of them is NaN, the other still can be used. I added some tests to QueryTest/parqet-stats.test Change-Id: If3897fc1426541239223670812f59e2bed32f455 --- M be/src/exec/parquet-column-stats.cc M testdata/data/README A testdata/data/min_max_is_nan.parquet A testdata/workloads/functional-query/queries/QueryTest/parquet-invalid-minmax-stats.test M testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test M tests/query_test/test_parquet_stats.py 6 files changed, 137 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/9358/7 -- To view, visit http://gerrit.cloudera.org:8080/9358 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If3897fc1426541239223670812f59e2bed32f455 Gerrit-Change-Number: 9358 Gerrit-PatchSet: 7 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: Zoltan Ivanfi
[Impala-ASF-CR] IMPALA-6538: Fix read path when Parquet min/max statistics contain NaN
Hello Lars Volker, Gabor Kaszab, Zoltan Ivanfi, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9358 to look at the new patch set (#8). Change subject: IMPALA-6538: Fix read path when Parquet min/max statistics contain NaN .. IMPALA-6538: Fix read path when Parquet min/max statistics contain NaN If the first number in a row group written by Impala is NaN, then Impala writes incorrect statistics in the metadata. This will result in incorrect results when filtering the data. This commit fixes the read path when encountering NaNs in Parquet min/max statistics. If min and max are both NaN, we can't use the statistics at all. If only one of them is NaN, the other still can be used. I added some tests to QueryTest/parqet-stats.test Change-Id: If3897fc1426541239223670812f59e2bed32f455 --- M be/src/exec/parquet-column-stats.cc M testdata/data/README A testdata/data/min_max_is_nan.parquet A testdata/workloads/functional-query/queries/QueryTest/parquet-invalid-minmax-stats.test M testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test M tests/query_test/test_parquet_stats.py 6 files changed, 137 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/9358/8 -- To view, visit http://gerrit.cloudera.org:8080/9358 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If3897fc1426541239223670812f59e2bed32f455 Gerrit-Change-Number: 9358 Gerrit-PatchSet: 8 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: Zoltan Ivanfi
[Impala-ASF-CR] IMPALA-6538: Fix read path when Parquet min/max statistics contain NaN
Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/9358 ) Change subject: IMPALA-6538: Fix read path when Parquet min/max statistics contain NaN .. Patch Set 7: Code-Review+1 (1 comment) http://gerrit.cloudera.org:8080/#/c/9358/7/testdata/data/README File testdata/data/README: http://gerrit.cloudera.org:8080/#/c/9358/7/testdata/data/README@153 PS7, Line 153: Generated by Impala's Parquet writer before the fix for IMPALA-6527. Nit: Maybe also add the first 8 characters of the git hash (current HEAD) to make it easy to repro. -- To view, visit http://gerrit.cloudera.org:8080/9358 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If3897fc1426541239223670812f59e2bed32f455 Gerrit-Change-Number: 9358 Gerrit-PatchSet: 7 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: Zoltan Ivanfi Gerrit-Comment-Date: Wed, 21 Feb 2018 20:23:25 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6538: Fix read path when Parquet min/max statistics contain NaN
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/9358 ) Change subject: IMPALA-6538: Fix read path when Parquet min/max statistics contain NaN .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/9358/6/testdata/data/README File testdata/data/README: http://gerrit.cloudera.org:8080/#/c/9358/6/testdata/data/README@153 PS6, Line 153: hacking > wasn't this the default behavior before this fix? Yes, I just saw that almost every Parquet file has this description that are generated by Impala. Improved the comment. -- To view, visit http://gerrit.cloudera.org:8080/9358 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If3897fc1426541239223670812f59e2bed32f455 Gerrit-Change-Number: 9358 Gerrit-PatchSet: 6 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: Zoltan Ivanfi Gerrit-Comment-Date: Wed, 21 Feb 2018 19:51:42 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6538: Fix read path when Parquet min/max statistics contain NaN
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/9358 ) Change subject: IMPALA-6538: Fix read path when Parquet min/max statistics contain NaN .. Patch Set 8: (1 comment) http://gerrit.cloudera.org:8080/#/c/9358/7/testdata/data/README File testdata/data/README: http://gerrit.cloudera.org:8080/#/c/9358/7/testdata/data/README@153 PS7, Line 153: Generated by Impala's Parquet writer before the fix for IMPALA-6527. Git hash: 3a049a53 > Nit: Maybe also add the first 8 characters of the git hash (current HEAD) t Done -- To view, visit http://gerrit.cloudera.org:8080/9358 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If3897fc1426541239223670812f59e2bed32f455 Gerrit-Change-Number: 9358 Gerrit-PatchSet: 8 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: Zoltan Ivanfi Gerrit-Comment-Date: Wed, 21 Feb 2018 20:37:06 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6538: Fix read path when Parquet min/max statistics contain NaN
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/9358 ) Change subject: IMPALA-6538: Fix read path when Parquet min/max statistics contain NaN .. Patch Set 7: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/9358 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If3897fc1426541239223670812f59e2bed32f455 Gerrit-Change-Number: 9358 Gerrit-PatchSet: 7 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: Zoltan Ivanfi Gerrit-Comment-Date: Wed, 21 Feb 2018 20:35:52 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6538: Fix read path when Parquet min/max statistics contain NaN
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/9358 ) Change subject: IMPALA-6538: Fix read path when Parquet min/max statistics contain NaN .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/9358/5/testdata/data/min_max_is_nan.parquet File testdata/data/min_max_is_nan.parquet: PS5: > Please also update the README file in the same folder. Updated the README file. -- To view, visit http://gerrit.cloudera.org:8080/9358 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If3897fc1426541239223670812f59e2bed32f455 Gerrit-Change-Number: 9358 Gerrit-PatchSet: 6 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: Zoltan Ivanfi Gerrit-Comment-Date: Wed, 21 Feb 2018 19:32:12 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6538: Fix read path when Parquet min/max statistics contain NaN
Hello Lars Volker, Gabor Kaszab, Zoltan Ivanfi, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9358 to look at the new patch set (#6). Change subject: IMPALA-6538: Fix read path when Parquet min/max statistics contain NaN .. IMPALA-6538: Fix read path when Parquet min/max statistics contain NaN If the first number in a row group written by Impala is NaN, then Impala writes incorrect statistics in the metadata. This will result in incorrect results when filtering the data. This commit fixes the read path when encountering NaNs in Parquet min/max statistics. If min and max are both NaN, we can't use the statistics at all. If only one of them is NaN, the other still can be used. I added some tests to QueryTest/parqet-stats.test Change-Id: If3897fc1426541239223670812f59e2bed32f455 --- M be/src/exec/parquet-column-stats.cc M testdata/data/README A testdata/data/min_max_is_nan.parquet A testdata/workloads/functional-query/queries/QueryTest/parquet-invalid-minmax-stats.test M testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test M tests/query_test/test_parquet_stats.py 6 files changed, 137 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/9358/6 -- To view, visit http://gerrit.cloudera.org:8080/9358 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If3897fc1426541239223670812f59e2bed32f455 Gerrit-Change-Number: 9358 Gerrit-PatchSet: 6 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: Zoltan Ivanfi
[Impala-ASF-CR] IMPALA-6538: Fix read path when Parquet min/max statistics contain NaN
Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/9358 ) Change subject: IMPALA-6538: Fix read path when Parquet min/max statistics contain NaN .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/9358/5/testdata/data/min_max_is_nan.parquet File testdata/data/min_max_is_nan.parquet: PS5: Please also update the README file in the same folder. -- To view, visit http://gerrit.cloudera.org:8080/9358 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If3897fc1426541239223670812f59e2bed32f455 Gerrit-Change-Number: 9358 Gerrit-PatchSet: 5 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: Zoltan Ivanfi Gerrit-Comment-Date: Wed, 21 Feb 2018 18:26:41 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6538: Fix read path when Parquet min/max statistics contain NaN
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/9358 ) Change subject: IMPALA-6538: Fix read path when Parquet min/max statistics contain NaN .. Patch Set 5: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/9358 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If3897fc1426541239223670812f59e2bed32f455 Gerrit-Change-Number: 9358 Gerrit-PatchSet: 5 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: Zoltan Ivanfi Gerrit-Comment-Date: Wed, 21 Feb 2018 16:45:07 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6538: Fix read path when Parquet min/max statistics contain NaN
Zoltan Ivanfi has posted comments on this change. ( http://gerrit.cloudera.org:8080/9358 ) Change subject: IMPALA-6538: Fix read path when Parquet min/max statistics contain NaN .. Patch Set 5: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/9358 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If3897fc1426541239223670812f59e2bed32f455 Gerrit-Change-Number: 9358 Gerrit-PatchSet: 5 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: Zoltan Ivanfi Gerrit-Comment-Date: Wed, 21 Feb 2018 15:12:12 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6538: Fix read path when Parquet min/max statistics contain NaN
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/9358 ) Change subject: IMPALA-6538: Fix read path when Parquet min/max statistics contain NaN .. Patch Set 4: (2 comments) Thanks for the comments! http://gerrit.cloudera.org:8080/#/c/9358/4/be/src/exec/parquet-column-stats.cc File be/src/exec/parquet-column-stats.cc: http://gerrit.cloudera.org:8080/#/c/9358/4/be/src/exec/parquet-column-stats.cc@99 PS4, Line 99: col_chunk.meta_data.type)) return false; > nit: Not sure if it makes it more readable, but you could write this as I like it, re-structured the code. http://gerrit.cloudera.org:8080/#/c/9358/4/testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test File testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test: http://gerrit.cloudera.org:8080/#/c/9358/4/testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test@494 PS4, Line 494: QUERY > This is a good point. We should consider checking in a file written with th Thanks for the suggestions. I added min_max_is_nan.parquet, and created a new test in test_parquet_stats.py. -- To view, visit http://gerrit.cloudera.org:8080/9358 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If3897fc1426541239223670812f59e2bed32f455 Gerrit-Change-Number: 9358 Gerrit-PatchSet: 4 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: Zoltan Ivanfi Gerrit-Comment-Date: Wed, 21 Feb 2018 14:27:42 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6538: Fix read path when Parquet min/max statistics contain NaN
Hello Lars Volker, Gabor Kaszab, Zoltan Ivanfi, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9358 to look at the new patch set (#5). Change subject: IMPALA-6538: Fix read path when Parquet min/max statistics contain NaN .. IMPALA-6538: Fix read path when Parquet min/max statistics contain NaN If the first number in a row group written by Impala is NaN, then Impala writes incorrect statistics in the metadata. This will result in incorrect results when filtering the data. This commit fixes the read path when encountering NaNs in Parquet min/max statistics. If min and max are both NaN, we can't use the statistics at all. If only one of them is NaN, the other still can be used. I added some tests to QueryTest/parqet-stats.test Change-Id: If3897fc1426541239223670812f59e2bed32f455 --- M be/src/exec/parquet-column-stats.cc A testdata/data/min_max_is_nan.parquet A testdata/workloads/functional-query/queries/QueryTest/parquet-invalid-minmax-stats.test M testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test M tests/query_test/test_parquet_stats.py 5 files changed, 130 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/9358/5 -- To view, visit http://gerrit.cloudera.org:8080/9358 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If3897fc1426541239223670812f59e2bed32f455 Gerrit-Change-Number: 9358 Gerrit-PatchSet: 5 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: Zoltan Ivanfi
[Impala-ASF-CR] IMPALA-6538: Fix read path when Parquet min/max statistics contain NaN
Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/9358 ) Change subject: IMPALA-6538: Fix read path when Parquet min/max statistics contain NaN .. Patch Set 4: (2 comments) http://gerrit.cloudera.org:8080/#/c/9358/4/be/src/exec/parquet-column-stats.cc File be/src/exec/parquet-column-stats.cc: http://gerrit.cloudera.org:8080/#/c/9358/4/be/src/exec/parquet-column-stats.cc@99 PS4, Line 99: col_chunk.meta_data.type)) return false; nit: Not sure if it makes it more readable, but you could write this as return (DecodePlainValue(..) && !isnan(..)); http://gerrit.cloudera.org:8080/#/c/9358/4/testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test File testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test: http://gerrit.cloudera.org:8080/#/c/9358/4/testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test@494 PS4, Line 494: QUERY > This is a good test in general, but it does not specifically test for the f This is a good point. We should consider checking in a file written with the current code. You can find examples in testdata/data. -- To view, visit http://gerrit.cloudera.org:8080/9358 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If3897fc1426541239223670812f59e2bed32f455 Gerrit-Change-Number: 9358 Gerrit-PatchSet: 4 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: Zoltan Ivanfi Gerrit-Comment-Date: Tue, 20 Feb 2018 18:00:58 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6538: Fix read path when Parquet min/max statistics contain NaN
Zoltan Ivanfi has posted comments on this change. ( http://gerrit.cloudera.org:8080/9358 ) Change subject: IMPALA-6538: Fix read path when Parquet min/max statistics contain NaN .. Patch Set 4: Code-Review+1 (1 comment) This workaround in the read path seems to be a good quick-fix, but I think the write path should also have a quick fix to make the written stats independent of the data order, i.e. it should not matter whether a NaN is the first value or not. http://gerrit.cloudera.org:8080/#/c/9358/4/testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test File testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test: http://gerrit.cloudera.org:8080/#/c/9358/4/testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test@494 PS4, Line 494: QUERY This is a good test in general, but it does not specifically test for the fix. Both the read path and the write path can be modified to make this test pass and once we will have fixes in both paths, this test won't notice if one of them has a regression. -- To view, visit http://gerrit.cloudera.org:8080/9358 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If3897fc1426541239223670812f59e2bed32f455 Gerrit-Change-Number: 9358 Gerrit-PatchSet: 4 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: Zoltan Ivanfi Gerrit-Comment-Date: Tue, 20 Feb 2018 15:41:42 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6538: Fix read path when Parquet min/max statistics contain NaN
Hello Lars Volker, Gabor Kaszab, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9358 to look at the new patch set (#4). Change subject: IMPALA-6538: Fix read path when Parquet min/max statistics contain NaN .. IMPALA-6538: Fix read path when Parquet min/max statistics contain NaN If the first number in a row group written by Impala is NaN, then Impala writes incorrect statistics in the metadata. This will result in incorrect results when filtering the data. This commit fixes the read path when encountering NaNs in Parquet min/max statistics. If min and max are both NaN, we can't use the statistics at all. If only one of them is NaN, the other still can be used. I added some tests to QueryTest/parqet-stats.test Change-Id: If3897fc1426541239223670812f59e2bed32f455 --- M be/src/exec/parquet-column-stats.cc M testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test 2 files changed, 86 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/9358/4 -- To view, visit http://gerrit.cloudera.org:8080/9358 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If3897fc1426541239223670812f59e2bed32f455 Gerrit-Change-Number: 9358 Gerrit-PatchSet: 4 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-6538: Fix read path when Parquet min/max statistics contain NaN
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/9358 ) Change subject: IMPALA-6538: Fix read path when Parquet min/max statistics contain NaN .. Patch Set 3: (6 comments) Thanks for the comments! http://gerrit.cloudera.org:8080/#/c/9358/2/be/src/exec/parquet-column-stats.cc File be/src/exec/parquet-column-stats.cc: http://gerrit.cloudera.org:8080/#/c/9358/2/be/src/exec/parquet-column-stats.cc@29 PS2, Line 29: const ColumnType& col_type, const parquet::ColumnOrder* col_order, > 1: Can this be part of ColumnStatsBase? Removed this function. http://gerrit.cloudera.org:8080/#/c/9358/2/be/src/exec/parquet-column-stats.cc@109 PS2, Line 109: col_chunk.meta_data.type); > +1 for returning false here. I added comments here. The function is general for all the datatypes and I think the current comment does a good job explaining its behavior. I'm not sure if this edge case that only affects one type should be mentioned there. http://gerrit.cloudera.org:8080/#/c/9358/2/be/src/exec/parquet-column-stats.cc@109 PS2, Line 109: col_chunk.meta_data.type); > I think we can just return false here if a value is NaN. That way the calle Good idea, thanks. http://gerrit.cloudera.org:8080/#/c/9358/2/testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test File testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test: http://gerrit.cloudera.org:8080/#/c/9358/2/testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test@496 PS2, Line 496: # When the first value is NaN in a column chunk, Impala chooses it as min_value and > Could you add a description here why we test this? (to check that when the Added some extra details. http://gerrit.cloudera.org:8080/#/c/9358/2/testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test@497 PS2, Line 497: # max_value for statistics. In this case the min/max filter should be ignored. > Can you also add tests where NaN is inserted in the middle of two values? F Added couple of tests for the case when NaN is in the middle. http://gerrit.cloudera.org:8080/#/c/9358/2/testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test@520 PS2, Line 520: # test predicate that is true for NaN > Maybe also test !=? Added tests for != -- To view, visit http://gerrit.cloudera.org:8080/9358 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If3897fc1426541239223670812f59e2bed32f455 Gerrit-Change-Number: 9358 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 20 Feb 2018 11:18:26 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6538: Fix read path when Parquet min/max statistics contain NaN
Hello Lars Volker, Gabor Kaszab, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9358 to look at the new patch set (#3). Change subject: IMPALA-6538: Fix read path when Parquet min/max statistics contain NaN .. IMPALA-6538: Fix read path when Parquet min/max statistics contain NaN If the first number in a row group written by Impala is NaN, then Impala writes incorrect statistics in the metadata. This will result in incorrect results when filtering the data. This commit fixes the read path when encountering NaNs in Parquet min/max statistics. If max is NaN, then Impala will interpret it as infinity. If min is NaN, then -infinity will be used. I added some tests to QueryTest/parqet-stats.test Change-Id: If3897fc1426541239223670812f59e2bed32f455 --- M be/src/exec/parquet-column-stats.cc M testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test 2 files changed, 86 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/9358/3 -- To view, visit http://gerrit.cloudera.org:8080/9358 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If3897fc1426541239223670812f59e2bed32f455 Gerrit-Change-Number: 9358 Gerrit-PatchSet: 3 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-6538: Fix read path when Parquet min/max statistics contain NaN
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/9358 ) Change subject: IMPALA-6538: Fix read path when Parquet min/max statistics contain NaN .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/9358/2/be/src/exec/parquet-column-stats.cc File be/src/exec/parquet-column-stats.cc: http://gerrit.cloudera.org:8080/#/c/9358/2/be/src/exec/parquet-column-stats.cc@109 PS2, Line 109: ChangeNaNToInf(slot, stats_field); > I think we can just return false here if a value is NaN. That way the calle +1 for returning false here. Please update the function comment in the header file accordingly. -- To view, visit http://gerrit.cloudera.org:8080/9358 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If3897fc1426541239223670812f59e2bed32f455 Gerrit-Change-Number: 9358 Gerrit-PatchSet: 2 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 20 Feb 2018 08:40:12 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6538: Fix read path when Parquet min/max statistics contain NaN
Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/9358 ) Change subject: IMPALA-6538: Fix read path when Parquet min/max statistics contain NaN .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/9358/2/be/src/exec/parquet-column-stats.cc File be/src/exec/parquet-column-stats.cc: http://gerrit.cloudera.org:8080/#/c/9358/2/be/src/exec/parquet-column-stats.cc@109 PS2, Line 109: ChangeNaNToInf(slot, stats_field); I think we can just return false here if a value is NaN. That way the caller will know that it shouldn't use the stats (see comment of the ReadFromThrift()). http://gerrit.cloudera.org:8080/#/c/9358/2/testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test File testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test: http://gerrit.cloudera.org:8080/#/c/9358/2/testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test@497 PS2, Line 497: insert into test_nan values (cast('NaN' as double)), (42); Can you also add tests where NaN is inserted in the middle of two values? For those, "where not val >= 0" should be covered, too. -- To view, visit http://gerrit.cloudera.org:8080/9358 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If3897fc1426541239223670812f59e2bed32f455 Gerrit-Change-Number: 9358 Gerrit-PatchSet: 2 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 19 Feb 2018 22:17:30 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6538: Fix read path when Parquet min/max statistics contain NaN
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/9358 ) Change subject: IMPALA-6538: Fix read path when Parquet min/max statistics contain NaN .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/9358/2/testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test File testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test: http://gerrit.cloudera.org:8080/#/c/9358/2/testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test@520 PS2, Line 520: select * from test_nan where not val >= 0 Maybe also test !=? -- To view, visit http://gerrit.cloudera.org:8080/9358 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If3897fc1426541239223670812f59e2bed32f455 Gerrit-Change-Number: 9358 Gerrit-PatchSet: 2 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 19 Feb 2018 20:59:05 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6538: Fix read path when Parquet min/max statistics contain NaN
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/9358 ) Change subject: IMPALA-6538: Fix read path when Parquet min/max statistics contain NaN .. Patch Set 2: (2 comments) Seems fine in general. I have some minor comments. http://gerrit.cloudera.org:8080/#/c/9358/2/be/src/exec/parquet-column-stats.cc File be/src/exec/parquet-column-stats.cc: http://gerrit.cloudera.org:8080/#/c/9358/2/be/src/exec/parquet-column-stats.cc@29 PS2, Line 29: void ChangeNaNToInf(void *slot, ColumnStatsBase::StatsField stats_field) { 1: Can this be part of ColumnStatsBase? 2: Could you add some description to the function? e.g. can we expect changes on slot parameter? http://gerrit.cloudera.org:8080/#/c/9358/2/testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test File testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test: http://gerrit.cloudera.org:8080/#/c/9358/2/testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test@496 PS2, Line 496: create table test_nan(val double) stored as parquet; Could you add a description here why we test this? (to check that when the first item in the row group is NaN then it doesn't ruin min/max stats and as a result it doesn't rule out the whole row group) -- To view, visit http://gerrit.cloudera.org:8080/9358 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If3897fc1426541239223670812f59e2bed32f455 Gerrit-Change-Number: 9358 Gerrit-PatchSet: 2 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 19 Feb 2018 15:40:18 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6538: Fix read path when Parquet min/max statistics contain NaN
Hello Gabor Kaszab, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9358 to look at the new patch set (#2). Change subject: IMPALA-6538: Fix read path when Parquet min/max statistics contain NaN .. IMPALA-6538: Fix read path when Parquet min/max statistics contain NaN If the first number in a row group written by Impala is NaN, then Impala writes incorrect statistics in the metadata. This will result in incorrect results when filtering the data. This commit fixes the read path when encountering NaNs in Parquet min/max statistics. If max is NaN, then Impala will interpret it as infinity. If min is NaN, then -infinity will be used. I added some tests to QueryTest/parqet-stats.test Change-Id: If3897fc1426541239223670812f59e2bed32f455 --- M be/src/exec/parquet-column-stats.cc M testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test 2 files changed, 48 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/9358/2 -- To view, visit http://gerrit.cloudera.org:8080/9358 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If3897fc1426541239223670812f59e2bed32f455 Gerrit-Change-Number: 9358 Gerrit-PatchSet: 2 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-6538: Fix read path when Parquet min/max statistics contain NaN
Zoltan Borok-Nagy has uploaded this change for review. ( http://gerrit.cloudera.org:8080/9358 Change subject: IMPALA-6538: Fix read path when Parquet min/max statistics contain NaN .. IMPALA-6538: Fix read path when Parquet min/max statistics contain NaN If the first number in a row group written by Impala is NaN, then Impala writes incorrect statistics in the metadata. This will result in incorrect results when filtering the data. This commit fixes the read path when encountering NaNs in Parquet min/max statistics. If max is NaN, then Impala will interpret it as infinity. If min is NaN, then -infinity will be used. I added some tests to QueryTest/parqet-stats.test Change-Id: If3897fc1426541239223670812f59e2bed32f455 --- M be/src/exec/parquet-column-stats.cc M testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test 2 files changed, 48 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/9358/1 -- To view, visit http://gerrit.cloudera.org:8080/9358 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: If3897fc1426541239223670812f59e2bed32f455 Gerrit-Change-Number: 9358 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy