Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/22313
Thank you for review, @kiszk .
First, I don't want to hold the memory up after query completion. If we do,
it will be a regression. So, I wanted `time` first.
Second, It's difficult to estimate the enough limit for the number of
filters.
- As we know codegen JVM limit issue. There are several attempts to
generate a single complex query for wide tables (thousands of columns).
- Spark's optimizer like `InferFiltersFromConstraints` adds more
constraints like 'NotNull(col1)`. Usually, the number of filters becomes double
here.
- Also, it's not a good design if we need to increase this limitation
whenever we add a new optimizer like `InferFiltersFromConstraints`.
- If the limit is too high, we waste the memory. If the limit is small,
the eviction will bite us again.
In short, `time` was enough and the simplest for this purpose.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]