Hey guys, Are there any benefits of generic partitioning for non-restrictive count(*) queries with Drill and Parquet files partitioned on some base criteria (by state, month, etc.)
Let's say I am running: select count(*) from dfs.tmp.`claims_parquet`; where I have plain and partitioned claims_parquet For example, is there maybe a scatter-gather parallelisation? (we are about to benchmark this, but I would like to know a theory behind it too) Thank you, Edmon
