Disable parquet metadata for count

Gary Li Tue, 17 Nov 2020 22:53:02 -0800

Hi all,

I am implementing a custom data sourceV1 and would like to enforce a pushdown 
filter for every query. But when I run a simple count query df.count(), Spark 
will ignore the filter and use the metadata in the parquet footer to accumulate 
the count of each block directly, which will return an unexpected result for my 
use cases. Is there any way I can skip using the parquet metadata and enforce 
the filter for every query?


Thank you,

Best,
Gary

Disable parquet metadata for count

Reply via email to