[GitHub] spark pull request #18652: [WIP] Pull non-deterministic joining keys from Jo...

gatorsmile Tue, 18 Jul 2017 01:57:08 -0700

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18652#discussion_r127918499
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
    @@ -1912,6 +1913,26 @@ class Analyzer(
               nondeterToAttr.get(e).map(_.toAttribute).getOrElse(e)
             }.copy(child = newChild)
     
    +      case j: Join if j.condition.isDefined && 
!j.condition.get.deterministic =>
    +        j match {
    +          // We can push down non-deterministic joining keys.
    --- End diff --
    
    Most the RDBMS systems allow non-deterministic join conditions. To support 
it correclty in Spark, we need to check how the other systems behave. After we 
deciding the rule, we can't break it. Thus, it has to be very careful to design 
the initial version.
    
    In the current stage, I do not think we have a bandwidth to make it 
perfect. If you want to continue the PR, could you just check how Hive works? 
Adding an extra flag for Hive users. It can simplify their migration task. By 
default, turn it off.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #18652: [WIP] Pull non-deterministic joining keys from Jo...

Reply via email to