[GitHub] spark pull request #22259: [SPARK-25044][SQL] (take 2) Address translation o...

maryannxue Wed, 10 Oct 2018 19:11:53 -0700

Github user maryannxue commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22259#discussion_r224295469
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala
 ---
    @@ -47,7 +48,8 @@ case class ScalaUDF(
         inputTypes: Seq[DataType] = Nil,
         udfName: Option[String] = None,
         nullable: Boolean = true,
    -    udfDeterministic: Boolean = true)
    +    udfDeterministic: Boolean = true,
    +    nullableTypes: Seq[Boolean] = Nil)
    --- End diff --
    
    Yes, the test should not pass after removing `isInstanceOf[KnownNotNull]` 
condition from `needsNullCheck` test 
(https://github.com/apache/spark/pull/22259/files#diff-57b3d87be744b7d79a9beacf8e5e5eb2L2160).
 The idea was to add a `KnownNotNull` node on top of the original node to mark 
it as null-checked, so the rule won't add redundant null checks even if it is 
accidentally applied again. I'm not sure about the exact reason why you removed 
`isInstanceOf[KnownNotNull]` condition in this PR, but I think it should be 
left there alongside the new nullable type check.
    After adding the `nullableTypes` parameter in the test, the issue can be 
reproduced:
    ```
      test("SPARK-24891 Fix HandleNullInputsForUDF rule") {
        val a = testRelation.output(0)
        val func = (x: Int, y: Int) => x + y
        val udf1 = ScalaUDF(func, IntegerType, a :: a :: Nil, nullableTypes = 
false :: false :: Nil)
        val udf2 = ScalaUDF(func, IntegerType, a :: udf1 :: Nil, nullableTypes 
= false :: false :: Nil)
        val plan = Project(Alias(udf2, "")() :: Nil, testRelation)
        comparePlans(plan.analyze, plan.analyze.analyze)
      }
    ```
    BTW, I'm just curious: It looks like `nullableTypes` indicates something 
opposite to "nullable" used in schema. I would assume when `nullableTypes` is 
`Seq(false)`, it means this type is not nullable and we need not add the null 
check, vice versa. Did I miss something here?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #22259: [SPARK-25044][SQL] (take 2) Address translation o...

Reply via email to