Hi All,
For the use case where the expensive UDF has constant inputs (literals) we
have proposed the following JIRA and PR which calculates the UDF only once
in the driver:
https://issues.apache.org/jira/browse/SPARK-27692
https://github.com/apache/spark/pull/24593
If considering revisiting the
Hi all,
I found this jira for an issue I ran into recently:
https://issues.apache.org/jira/browse/SPARK-28771
My initial idea for a fix is to change SortMergeJoinExec's (and
ShuffledHashJoinExec) requiredChildDistribution.
At least if all below conditions are met, we could only require a subset