Amogh Margoor has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17860 )

Change subject: IMPALA-9873: Avoid materilization of columns for filtered out 
rows in Parquet table.
......................................................................


Patch Set 12:

> Wrote the following code for randomly generating selected rows.
 > Have not get a chance to test it out  yet (some runtime issue with
 > by dev box).
 >
 > // This tests checks conversion of 'selected_rows' with randomly
 > generated
 > // 'true' values to 'ScratchMicroBatch';
 > TEST_F(ScratchTupleBatchTest, TestRandomGeneratedMicroBatches) {
 > const int BATCH_SIZE = 1024;
 > scoped_ptr<ScratchTupleBatch> scratch_batch(
 > new ScratchTupleBatch(*desc_, BATCH_SIZE, &tracker_));
 > scratch_batch->num_tuples = BATCH_SIZE;
 > // gaps to try
 > vector<int> gaps = {5, 16, 29, 37, 1025};
 > for (auto n : gaps) {
 > // Set randomly locations as selected.
 > srand (time(NULL));
 > for (int batch_idx = 0; batch_idx < BATCH_SIZE; ++batch_idx) {
 > scratch_batch->selected_rows[batch_idx] = rand() < (RAND_MAX / 2);
 > }
 > ScratchMicroBatch micro_batches[BATCH_SIZE];
 > int batches = scratch_batch->GetMicroBatches(n, micro_batches);
 > EXPECT_TRUE(batches > 1);
 > EXPECT_TRUE(batches <= BATCH_SIZE);
 > // Verify every batch
 > for (int i = 0; i < batches; i++) {
 > const ScratchMicroBatch& batch = micro_batches[i];
 > EXPECT_TRUE(batch.start <= batch.end);
 > EXPECT_TRUE(batch.length == batch.end - batch.start + 1);
 > EXPECT_TRUE(batch.start);
 > EXPECT_TRUE(batch.end);
 > int last_true_idx = batch.start;
 > for (int j = batch.start + 1; j < batch.end; j++) {
 > if (scratch_batch->selected_rows[j]) {
 > EXPECT_TRUE(j - last_true_idx + 1 <= n);
 > last_true_idx = j;
 > }
 > }
 > }
 > // Verify any two consecutive batches i and i+1
 > for (int i = 0; i < batches - 1; i++) {
 > const ScratchMicroBatch& batch = micro_batches[i];
 > const ScratchMicroBatch& nbatch = micro_batches[i + 1];
 > EXPECT_TRUE(batch.end < nbatch.start);
 > EXPECT_TRUE(nbatch.start - batch.end >= n);
 > // Any row in betweeen the two batches should not be selected
 > for (int j=batch.end+1; j<nbatch.start; j++) {
 > EXPECT_FALSE(scratch_batch->selected_rows[j]);
 > }
 > }
 > }
 > }

hey Qifan, Thanks a lot for this snippet. I almost wrote the code - will merge 
your snippet to it. Huge thanks for both  - detailed description of the 
verfication algo earlier and also for this snippet.


--
To view, visit http://gerrit.cloudera.org:8080/17860
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I46406c913297d5bbbec3ccae62a83bb214ed2c60
Gerrit-Change-Number: 17860
Gerrit-PatchSet: 12
Gerrit-Owner: Amogh Margoor <[email protected]>
Gerrit-Reviewer: Amogh Margoor <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Kurt Deschler <[email protected]>
Gerrit-Reviewer: Qifan Chen <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Wed, 27 Oct 2021 14:47:58 +0000
Gerrit-HasComments: No

Reply via email to