Github user viirya commented on the issue: https://github.com/apache/spark/pull/18652 It is a good question. Based on previous discussion, I think Join operator has no unique result in the non-deterministic case. The migration issue from Hive is because this kind of queries can't run by current SparkSQL. Whether we exactly follow Hive behavior is not real pain, if I didn't miss something. Take the non-deterministic join in the query seen in the discussion thread at http://apache-spark-developers-list.1001551.n3.nabble.com/SQL-Syntax-quot-case-when-quot-doesn-t-be-supported-in-JOIN-tc21953.html#a21973 as an example: SELECT a.col1 FROM tbl1 a LEFT OUTER JOIN tbl2 b ON CASE WHEN a.col2 IS NULL THEN cast(rand(9)*1000 - 9999999999 as string) ELSE a.col2 END = b.col3; Where the different join result (the randomized value replacing NULL values of a.col2) actually doesn't matter, because it only retain a.col1 finally. The purpose of this query is to mitigate skew (huge NULLs) data when joining. That said the columns from non-deterministic join conditions are not really useful at this cases. I even think it may be also the reason Hive doesn't put special consideration on non-deterministic expression when joining. As there's no unique result, users go to take care what they do with it. I am not sure if I convey the idea clearly. As an analogy, for a query like `SELECT a.col1 FROM table a WHERE rand() > 0.5`. Even our rand generates different numbers than Hive so the final result is different. It still helps migration issue from Hive workloads using rand.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org