[GitHub] spark issue #22313: [SPARK-25306][SQL] Use cache to speed up `createFilter` ...

dongjoon-hyun Sat, 01 Sep 2018 22:42:01 -0700

Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/22313
  
    Thank you for review, @kiszk .
    
    First, I don't want to hold the memory up after query completion. If we do, 
it will be a regression. So, I wanted `time` first.
    
    Second, It's difficult to estimate the enough limit for the number of 
filters.
      - As we know codegen JVM limit issue. There are several attempts to 
generate a single complex query for wide tables (thousands of columns).
      - Spark's optimizer like `InferFiltersFromConstraints` adds more 
constraints like 'NotNull(col1)`. Usually, the number of filters becomes double 
here.
        - Also, it's not a good design if we need to increase this limitation 
whenever we add a new optimizer like `InferFiltersFromConstraints`.
      - If the limit is too high, we waste the memory. If the limit is small, 
the eviction will bite us again.
    
    In short, `time` was enough and the simplest for this purpose.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22313: [SPARK-25306][SQL] Use cache to speed up `createFilter` ...

Reply via email to