Amogh Margoor has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17860 )

Change subject: IMPALA-9873: Avoid materilization of columns for filtered out 
rows in Parquet table.
......................................................................


Patch Set 12:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17860/12/be/src/exec/scratch-tuple-batch-test.cc
File be/src/exec/scratch-tuple-batch-test.cc:

http://gerrit.cloudera.org:8080/#/c/17860/12/be/src/exec/scratch-tuple-batch-test.cc@69
PS12, Line 69: 2, 4, 8, 16, 32
> I see. Let us assume the following:
Ah, got it! It may not be sufficient though. For instance,

0 1 2 3 4 5 6 7 8 9 0 1 2 3
T T F T F F F T T T T T F F - > we will verify these 2 batches [1,3] and [10, 
11] with gap of 5 as correct result even if they are not. Probably some extra 
conditions might be needed.


http://gerrit.cloudera.org:8080/#/c/17860/12/testdata/workloads/functional-query/queries/QueryTest/min_max_filters.test
File testdata/workloads/functional-query/queries/QueryTest/min_max_filters.test:

http://gerrit.cloudera.org:8080/#/c/17860/12/testdata/workloads/functional-query/queries/QueryTest/min_max_filters.test@436
PS12, Line 436: row_regex:.* RF00.\[min_max\] -. .\.wr_item_sk.*
> In addition, I wonder if we can grab a few counters on late materialized ro
I had commented on the issue with counters earlier (pasting it below). Let me 
know your thoughts:

--- PASTED ---
Thanks Qifan for the review and the suggestion of counter is good and something 
I pondered about earlier. Issue is that we don't skip decoding rows, instead we 
skip decoding values where one row may constitute hundreds of values out of 
which some will be read and others might be skipped. But we cannot accurately 
keep track number of values being skipped in current scheme of things without 
incurring significant performance penalty. For instance, we sometimes skip 
pages without decompressing it - if skipped page has page index with candidate 
rows we will need to decompress the page to get the accurate values skipped due 
to late materialisation. In that scenario where we directly skip pages, even if 
page is not compressed, figuring out number of values for corresponding 
candidate range can be time consuming. Hence, using timed counters would be 
more appropriate here, which are already present.



--
To view, visit http://gerrit.cloudera.org:8080/17860
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I46406c913297d5bbbec3ccae62a83bb214ed2c60
Gerrit-Change-Number: 17860
Gerrit-PatchSet: 12
Gerrit-Owner: Amogh Margoor <[email protected]>
Gerrit-Reviewer: Amogh Margoor <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Kurt Deschler <[email protected]>
Gerrit-Reviewer: Qifan Chen <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Tue, 26 Oct 2021 18:02:14 +0000
Gerrit-HasComments: Yes

Reply via email to