subject:"\[GitHub\] \[spark\] viirya commented on pull request #28328\: \[SPARK\-31553\]\[SQL\] Fix isInCollection for collection sizes above the optimisation threshold"

[GitHub] [spark] viirya commented on pull request #28328: [SPARK-31553][SQL] Fix isInCollection for collection sizes above the optimisation threshold

2020-04-27 Thread GitBox

viirya commented on pull request #28328: URL: https://github.com/apache/spark/pull/28328#issuecomment-620101304 > A `In` with many values is slow to analyze, as the type coercion rules or `In.resolved` are very slow. That's a pain point. But when we merge `In` and `InSet`, we can hav

[GitHub] [spark] viirya commented on pull request #28328: [SPARK-31553][SQL] Fix isInCollection for collection sizes above the optimisation threshold

2020-04-27 Thread GitBox

viirya commented on pull request #28328: URL: https://github.com/apache/spark/pull/28328#issuecomment-619793314 > Actually this PR shows we still need `InSet`, to make the analyzer fast... What that means? We optimize `In` with `InSet` in optimizer, right? --