Hi

We are trying to do a simple where clause query with Predicate .Parquet
files are created using python and stored on hdfs.
Apache Drill version used is 1.17 .


Below options are set as default required for Predicate Push Down
[image: image.png]

Drill query is scanning directory with multiple parquet files (total size 1
GB).
We are expecting if predicate push down works it will help reduce scan time
which is currently 97 %.
If Predicate push down works row group scan should only fetch 70,840
records instead of 14162187.
[image: image.png]

*Minor Fragment*

*NUM_ROWGROUPS*

*ROWGROUPS_PRUNED*

*NUM_DICT_PAGE_LOADS*

*NUM_DATA_PAGE_lOADS*

*NUM_DATA_PAGES_DECODED*

*NUM_DICT_PAGES_DECOMPRESSED*

*NUM_DATA_PAGES_DECOMPRESSED*

*TOTAL_DICT_PAGE_READ_BYTES*

*TOTAL_DATA_PAGE_READ_BYTES*

*TOTAL_DICT_DECOMPRESSED_BYTES*

*TOTAL_DATA_DECOMPRESSED_BYTES*

*TIME_DICT_PAGE_LOADS*

*TIME_DATA_PAGE_LOADS*

*TIME_DATA_PAGE_DECODE*

*TIME_DICT_PAGE_DECODE*

*TIME_DICT_PAGES_DECOMPRESSED*

*TIME_DATA_PAGES_DECOMPRESSED*

*TIME_DISK_SCAN_WAIT*

*TIME_DISK_SCAN*

*TIME_FIXEDCOLUMN_READ*

*TIME_VARCOLUMN_READ*

*TIME_PROCESS*

01-00-04

7

0

77

0

77

77

77

0

0

7,147,852

8,884,071

598,070

0

97,822

11,440,739

2,081,514

17,694,740

598,070

0

112,108,259

703,103,096

815,245,307

01-01-04

6

0

66

0

66

66

66

0

0

2,115,860

4,316,153

1,778,468

0

144,320

3,665,957

775,403

8,693,618

1,778,468

0

105,066,657

776,807,232

882,070,408

01-02-04

6

0

66

0

66

66

66

0

0

6,835,560

8,630,174

337,404

0

100,190

10,876,145

1,970,521

11,789,061

337,404

0

102,833,433

655,338,696

758,203,357

01-03-04

6

0

66

0

66

66

66

0

0

2,242,112

4,516,183

1,586,562

0

164,398

3,827,371

877,814

8,604,307

1,586,562

0

112,745,628

758,634,132

871,586,588

01-04-04

6

0

66

2

66

66

64

0

1,420

5,407,178

7,175,446

2,216,935

3,181

74,956

8,754,425

1,650,970

11,241,636

2,216,935

0

97,180,713

668,249,966

765,461,684

01-05-04

6

0

66

1

66

66

65

0

92

1,378,260

3,595,638

3,394,196

1,571

204,833

2,726,005

1,357,297

6,843,717

3,394,196

0

150,560,569

704,154,215

854,928,393

01-06-04

6

0

66

0

66

66

66

0

0

4,748,302

6,547,215

471,679

0

114,270

7,739,335

1,537,805

10,571,215

471,679

0

97,392,926

667,056,499

764,478,811

01-07-04

6

0

68

0

66

64

66

180

0

769,746

3,128,730

292,603

0

130,814

1,574,574

425,133

6,563,457

286,300

0

168,501,325

716,135,483

884,850,308

01-08-04

6

0

66

0

66

66

66

0

0

8,356,637

9,264,223

582,946

0

101,103

13,332,669

2,422,705

13,340,100

582,946

0

109,932,913

691,400,457

801,374,949

01-09-04

6

0

66

2

66

66

64

0

133

1,453,953

2,953,546

19,563,820

1,920

149,257

2,553,666

632,461

5,886,238

19,563,820

0

81,854,819

557,612,832

639,664,370

01-10-04

6

0

66

0

66

66

66

0

0

6,634,676

8,081,684

Please advise if there is any specific options required to enable predicate
push down.

Also we expect Filter should filter out records but its done later by
SELECTION_VECTOR_REMOVER operator.
There is not enough details on documentation site ,when this operation is
triggered.

Thanks,
Navin

Reply via email to