Dear All,
I have found it difficult to understand the behavior of a part of Drill's
execution planner.
My query looks like this:
SELECT COUNT(DISTINCT ...) FROM dfs.`...` WHERE dir0 in ('...', '...',
'...');
The result is a number of course. If I query the plan for the above using
"EXPLAIN PLAN FOR ...", then the physical plan seems fine.
My problem is the following: if I submit the physical plan above using the
web interface the result is a different (!) number. This number is equal to
the results of the same query but without the partition filtering.
Additional details:
- Drill 1.4.0
- The data source is gzipped Parquet
Can any of you give an insight on what is happening?
If it's a bug, does it have an open/closed issue already? Should I open one
if not?
I'm going to check it in 1.6.0...
Thank you,
Sándor
Sidenote: and it's painfully slow if the partition listing happens to
contain every partition... but I've already found DRILL-2287 as it is
already submitted.