wangshuo128 edited a comment on issue #25902: [SPARK-29213][SQL] Make it 
consistent when get notnull output and generate null checks in FilterExec
URL: https://github.com/apache/spark/pull/25902#issuecomment-534381006
 
 
   @cloud-fan @viirya 
   Sorry, my earlier description and fix are not good enough.
   
   The real problem is here:
   `nullcheck` will not be generated for `(length(cast(x#7L as string)) > 0)` 
because 
   `(length(cast(x#7L as string)) > 0)` and `cast(x#7L as string)`(child of 
IsNotNull predicate `isnotnull(cast(x#7L as string))`) are not semantic equal 
when  generating code for `Filter ((length(cast(x#7L as string)) > 0) AND 
isnotnull(cast(x#7L as string)))`. 
   ```
       val generated = otherPreds.map { c =>
         val nullChecks = c.references.map { r =>
           val idx = notNullPreds.indexWhere { n => 
n.asInstanceOf[IsNotNull].child.semanticEquals(r)}
           if (idx != -1 && !generatedIsNotNullChecks(idx)) {
             generatedIsNotNullChecks(idx) = true
             // Use the child's output. The nullability is what the child 
produced.
             genPredicate(notNullPreds(idx), input, child.output)
           } else {
             ""
           }
         }.mkString("\n").trim
   
         // Here we use *this* operator's output with this output's nullability 
since we already
         // enforced them with the IsNotNull checks above.
         s"""
            |$nullChecks
            |${genPredicate(c, input, output)}
          """.stripMargin.trim
       }.mkString("\n")
   ```
   At the same time, filter output  attribute `x#7L` is marked as not nullable. 
Nullability will not be checked when generating code for `x#7L`.
   So NPE is thrown  when reading null data in generated code of  
`(length(cast(x#7L as string)) > 0) `.
   
   To fix this, I think we can filter attributes both in `notNullPreds` and  
`otherPreds` when get `notNullAttributes`:
   ```
     private val notNullAttributes = 
notNullPreds.flatMap(_.references).distinct.map(_.exprId)
       .diff(otherPreds.flatMap(_.references).distinct.map(_.exprId))
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to