[GitHub] spark issue #22313: [SPARK-25306][SQL] Use cache to speed up `createFilter` ...

dongjoon-hyun Mon, 03 Sep 2018 13:11:28 -0700

Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/22313
  
    Thank you for review and advice, @cloud-fan . It turns out that my initial 
assessment is not enough.
    
    First of all, from the beginning, 
[SPARK-2883](https://github.com/apache/spark/commit/aa31e431fc09f0477f1c2351c6275769a31aca90#diff-6cac9bc2656e3782b0312dceb8c55d47R75)
 is designed as a recursive function like the following. Please see `tryLeft` 
and `tryRight`. It's a purely computation to check if it succeeds. There is no 
reuse here. So, I tried to cache the first two `tryLeft` and `tryRight` 
operation since they can be re-used.
    ```scala
    val tryLeft = buildSearchArgument(left, newBuilder)
    val tryRight = buildSearchArgument(right, newBuilder)
    val conjunction = for {
      _ <- tryLeft
      _ <- tryRight
      lhs <- buildSearchArgument(left, builder.startAnd())
      rhs <- buildSearchArgument(right, lhs)
    } yield rhs.end()
    ```
    
    However, before that, `createFilter` generates the target tree with 
[reduceOption(And)](https://github.com/apache/spark/commit/aa31e431fc09f0477f1c2351c6275769a31aca90#diff-6cac9bc2656e3782b0312dceb8c55d47R35)
 as a deeply skewed tree. That was the root cause. I'll update this PR.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #22313: [SPARK-25306][SQL] Use cache to speed up `createFilter` ...

Reply via email to