wangshuo128 edited a comment on issue #25902: [SPARK-29213][SQL] Make it consistent when get notnull output and generate null checks in FilterExec URL: https://github.com/apache/spark/pull/25902#issuecomment-534381006 @cloud-fan @viirya Sorry, my earlier description and fix are not good enough. The real problem is here: `nullcheck` will not be generated for `(length(cast(x#7L as string)) > 0)` because `(length(cast(x#7L as string)) > 0)` and `cast(x#7L as string)`(child of IsNotNull predicate `isnotnull(cast(x#7L as string))`) are not semantic equal when generating code for `Filter ((length(cast(x#7L as string)) > 0) AND isnotnull(cast(x#7L as string)))`. ``` val generated = otherPreds.map { c => val nullChecks = c.references.map { r => val idx = notNullPreds.indexWhere { n => n.asInstanceOf[IsNotNull].child.semanticEquals(r)} if (idx != -1 && !generatedIsNotNullChecks(idx)) { generatedIsNotNullChecks(idx) = true // Use the child's output. The nullability is what the child produced. genPredicate(notNullPreds(idx), input, child.output) } else { "" } }.mkString("\n").trim // Here we use *this* operator's output with this output's nullability since we already // enforced them with the IsNotNull checks above. s""" |$nullChecks |${genPredicate(c, input, output)} """.stripMargin.trim }.mkString("\n") ``` At the same time, filter output attribute `x#7L` is marked as not nullable. Nullability will not be checked when generating code for `x#7L`. So NPE is thrown when reading null data in generated code of `(length(cast(x#7L as string)) > 0) `. To fix this, I think we can filter attributes both in `notNullPreds` and `otherPreds` when get `notNullAttributes`: ``` private val notNullAttributes = notNullPreds.flatMap(_.references).distinct.map(_.exprId) .diff(otherPreds.flatMap(_.references).distinct.map(_.exprId)) ```
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
