wangshuo128 edited a comment on issue #25902: [SPARK-29213][SQL] Make it 
consistent when get notnull output and generate null checks in FilterExec
URL: https://github.com/apache/spark/pull/25902#issuecomment-534381006
 
 
   @cloud-fan @viirya 
   Sorry, my earlier description and fix are not good enough.
   
   The real problem is here:
   `nullcheck` will not be generated for `(length(cast(x#7L as string)) > 0)` 
because 
   `(length(cast(x#7L as string)) > 0)` and `isnotnull(cast(x#7L as string))` 
are not semantic equal when  generating code for `Filter ((length(cast(x#7L as 
string)) > 0) AND isnotnull(cast(x#7L as string)))`. 
   At the same time, filter output  attribute `x#7L` is marked as not nullable. 
Nullability will not be checked when generating code for `x#7L`.
   So NPE is thrown  when reading null data in generated code of  
`(length(cast(x#7L as string)) > 0) `.
   
   To fix this, I think we can filter attributes both in `notNullPreds` and  
`otherPreds` when get `notNullAttributes`:
   ```
     private val notNullAttributes = 
notNullPreds.flatMap(_.references).distinct.map(_.exprId)
       .diff(otherPreds.flatMap(_.references).distinct.map(_.exprId))
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to