[ 
https://issues.apache.org/jira/browse/SPARK-42384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruce Robbins updated SPARK-42384:
----------------------------------
    Affects Version/s: 3.4.0

> Mask function's generated code does not handle null input
> ---------------------------------------------------------
>
>                 Key: SPARK-42384
>                 URL: https://issues.apache.org/jira/browse/SPARK-42384
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.4.0, 3.5.0
>            Reporter: Bruce Robbins
>            Priority: Major
>
> Example:
> {noformat}
> create or replace temp view v1 as
> select * from values
> (null),
> ('AbCD123-@$#')
> as data(col1);
> cache table v1;
> select mask(col1) from v1;
> {noformat}
> This query results in a {{NullPointerException}}:
> {noformat}
> 23/02/07 16:36:06 ERROR Executor: Exception in task 0.0 in stage 3.0 (TID 3)
> java.lang.NullPointerException
>       at 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeWriter.write(UnsafeWriter.java:110)
>       at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
>  Source)
>       at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>       at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:760)
> {noformat}
> The generated code calls {{UnsafeWriter.write(0, value_0)}} regardless of 
> whether {{Mask.transformInput}} returns null or not. The 
> {{UnsafeWriter.write}} method for {{UTF8String}} does not expect a null 
> pointer.
> {noformat}
> /* 031 */     boolean isNull_1 = i.isNullAt(0);
> /* 032 */     UTF8String value_1 = isNull_1 ?
> /* 033 */     null : (i.getUTF8String(0));
> /* 034 */
> /* 035 */
> /* 036 */
> /* 037 */
> /* 038 */     UTF8String value_0 = null;
> /* 039 */     value_0 = 
> org.apache.spark.sql.catalyst.expressions.Mask.transformInput(value_1, 
> ((UTF8String) references[0] /* literal */), ((UTF8String) references[1] /* 
> literal */), ((UTF8String) references[2] /* literal */), ((UTF8String) 
> references[3] /* literal */));;
> /* 040 */     if (false) {
> /* 041 */       mutableStateArray_0[0].setNullAt(0);
> /* 042 */     } else {
> /* 043 */       mutableStateArray_0[0].write(0, value_0);
> /* 044 */     }
> /* 045 */     return (mutableStateArray_0[0].getRow());
> /* 046 */   }
> {noformat}
> The bug is not exercised by a literal null input value, since there appears 
> to be some optimization that simply replaces the entire function call with a 
> null literal:
> {noformat}
> spark-sql> explain SELECT mask(NULL);
> == Physical Plan ==
> *(1) Project [null AS mask(NULL, X, x, n, NULL)#47]
> +- *(1) Scan OneRowRelation[]
> Time taken: 0.026 seconds, Fetched 1 row(s)
> spark-sql> SELECT mask(NULL);
> NULL
> Time taken: 0.042 seconds, Fetched 1 row(s)
> spark-sql> 
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to