Hi We are trying to do a simple where clause query with Predicate .Parquet files are created using python and stored on hdfs. Apache Drill version used is 1.17 .
Below options are set as default required for Predicate Push Down [image: image.png] Drill query is scanning directory with multiple parquet files (total size 1 GB). We are expecting if predicate push down works it will help reduce scan time which is currently 97 %. If Predicate push down works row group scan should only fetch 70,840 records instead of 14162187. [image: image.png] *Minor Fragment* *NUM_ROWGROUPS* *ROWGROUPS_PRUNED* *NUM_DICT_PAGE_LOADS* *NUM_DATA_PAGE_lOADS* *NUM_DATA_PAGES_DECODED* *NUM_DICT_PAGES_DECOMPRESSED* *NUM_DATA_PAGES_DECOMPRESSED* *TOTAL_DICT_PAGE_READ_BYTES* *TOTAL_DATA_PAGE_READ_BYTES* *TOTAL_DICT_DECOMPRESSED_BYTES* *TOTAL_DATA_DECOMPRESSED_BYTES* *TIME_DICT_PAGE_LOADS* *TIME_DATA_PAGE_LOADS* *TIME_DATA_PAGE_DECODE* *TIME_DICT_PAGE_DECODE* *TIME_DICT_PAGES_DECOMPRESSED* *TIME_DATA_PAGES_DECOMPRESSED* *TIME_DISK_SCAN_WAIT* *TIME_DISK_SCAN* *TIME_FIXEDCOLUMN_READ* *TIME_VARCOLUMN_READ* *TIME_PROCESS* 01-00-04 7 0 77 0 77 77 77 0 0 7,147,852 8,884,071 598,070 0 97,822 11,440,739 2,081,514 17,694,740 598,070 0 112,108,259 703,103,096 815,245,307 01-01-04 6 0 66 0 66 66 66 0 0 2,115,860 4,316,153 1,778,468 0 144,320 3,665,957 775,403 8,693,618 1,778,468 0 105,066,657 776,807,232 882,070,408 01-02-04 6 0 66 0 66 66 66 0 0 6,835,560 8,630,174 337,404 0 100,190 10,876,145 1,970,521 11,789,061 337,404 0 102,833,433 655,338,696 758,203,357 01-03-04 6 0 66 0 66 66 66 0 0 2,242,112 4,516,183 1,586,562 0 164,398 3,827,371 877,814 8,604,307 1,586,562 0 112,745,628 758,634,132 871,586,588 01-04-04 6 0 66 2 66 66 64 0 1,420 5,407,178 7,175,446 2,216,935 3,181 74,956 8,754,425 1,650,970 11,241,636 2,216,935 0 97,180,713 668,249,966 765,461,684 01-05-04 6 0 66 1 66 66 65 0 92 1,378,260 3,595,638 3,394,196 1,571 204,833 2,726,005 1,357,297 6,843,717 3,394,196 0 150,560,569 704,154,215 854,928,393 01-06-04 6 0 66 0 66 66 66 0 0 4,748,302 6,547,215 471,679 0 114,270 7,739,335 1,537,805 10,571,215 471,679 0 97,392,926 667,056,499 764,478,811 01-07-04 6 0 68 0 66 64 66 180 0 769,746 3,128,730 292,603 0 130,814 1,574,574 425,133 6,563,457 286,300 0 168,501,325 716,135,483 884,850,308 01-08-04 6 0 66 0 66 66 66 0 0 8,356,637 9,264,223 582,946 0 101,103 13,332,669 2,422,705 13,340,100 582,946 0 109,932,913 691,400,457 801,374,949 01-09-04 6 0 66 2 66 66 64 0 133 1,453,953 2,953,546 19,563,820 1,920 149,257 2,553,666 632,461 5,886,238 19,563,820 0 81,854,819 557,612,832 639,664,370 01-10-04 6 0 66 0 66 66 66 0 0 6,634,676 8,081,684 Please advise if there is any specific options required to enable predicate push down. Also we expect Filter should filter out records but its done later by SELECTION_VECTOR_REMOVER operator. There is not enough details on documentation site ,when this operation is triggered. Thanks, Navin