subject:"\[GitHub\] \[spark\] cloud\-fan commented on pull request #28328\: \[SPARK\-31553\]\[SQL\] Fix isInCollection for collection sizes above the optimisation threshold"

[GitHub] [spark] cloud-fan commented on pull request #28328: [SPARK-31553][SQL] Fix isInCollection for collection sizes above the optimisation threshold

2020-04-27 Thread GitBox

cloud-fan commented on pull request #28328: URL: https://github.com/apache/spark/pull/28328#issuecomment-619795854 A `In` with many values is slow to analyze, as the type coercion rules or `In.resolved` are very slow. This

[GitHub] [spark] cloud-fan commented on pull request #28328: [SPARK-31553][SQL] Fix isInCollection for collection sizes above the optimisation threshold

2020-04-27 Thread GitBox

cloud-fan commented on pull request #28328: URL: https://github.com/apache/spark/pull/28328#issuecomment-619781105 Actually this PR shows we still need `InSet`, to make the analyzer fast... This is an automated message from

[GitHub] [spark] cloud-fan commented on pull request #28328: [SPARK-31553][SQL] Fix isInCollection for collection sizes above the optimisation threshold

2020-04-27 Thread GitBox

cloud-fan commented on pull request #28328: URL: https://github.com/apache/spark/pull/28328#issuecomment-619753268 cc @viirya , this is another instance that merging `InSet` and `Set` can fix the issue. This is an automated