Re: [DISCUSS] Expensive deterministic UDFs

2019-12-31 Thread Guy Khazma
Hi All, For the use case where the expensive UDF has constant inputs (literals) we have proposed the following JIRA and PR which calculates the UDF only once in the driver: https://issues.apache.org/jira/browse/SPARK-27692 https://github.com/apache/spark/pull/24593 If considering revisiting the

SortMergeJoinExec: Utilizing child partitioning when joining

2019-12-31 Thread Brett Marcott
Hi all, I found this jira for an issue I ran into recently: https://issues.apache.org/jira/browse/SPARK-28771 My initial idea for a fix is to change SortMergeJoinExec's (and ShuffledHashJoinExec) requiredChildDistribution. At least if all below conditions are met, we could only require a subset