wangshuo128 edited a comment on issue #25902: [SPARK-29213][SQL] Make it consistent when get notnull output and generate null checks in FilterExec URL: https://github.com/apache/spark/pull/25902#issuecomment-534381006 @cloud-fan @viirya Sorry, my earlier description and fix are not good enough. The real problem is here: `nullcheck` will not be generated for `(length(cast(x#7L as string)) > 0)` because `(length(cast(x#7L as string)) > 0)` and `isnotnull(cast(x#7L as string))` are not semantic equal when generating code for `Filter ((length(cast(x#7L as string)) > 0) AND isnotnull(cast(x#7L as string)))`. At the same time, filter output attribute `x#7L` is marked as not nullable. Nullability will not be checked when generating code for `x#7L`. So NPE is thrown when reading null data in generated code of `(length(cast(x#7L as string)) > 0) `. To fix this, I think we can filter attributes both in `notNullPreds` and `otherPreds` when get `notNullAttributes`: ``` private val notNullAttributes = notNullPreds.flatMap(_.references).distinct.map(_.exprId) .diff(otherPreds.flatMap(_.references).distinct.map(_.exprId)) ```
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org