GitHub user pierre-borckmans opened a pull request:
https://github.com/apache/spark/pull/10429
[SPARK 12477][SQL] Tungsten projection fails for null values in array fields
Accessing null elements in an array field fails when tungsten is enabled.
It works in Spark 1.3.1, and in Spark > 1.5 with Tungsten disabled.
This PR solves this by checking if the accessed element in the array field
is null, in the generated code.
Example:
```
// Array of String
case class AS( as: Seq[String] )
val dfAS = sc.parallelize( Seq( AS ( Seq("a",null,"b") ) ) ).toDF
dfAS.registerTempTable("T_AS")
for (i <- 0 to 2) { println(i + " = " + sqlContext.sql(s"select as[$i] from
T_AS").collect.mkString(","))}
```
With Tungsten disabled:
```
0 = [a]
1 = [null]
2 = [b]
```
With Tungsten enabled:
```
15/12/22 09:32:50 ERROR Executor: Exception in task 7.0 in stage 1.0 (TID
15)
java.lang.NullPointerException
at
org.apache.spark.sql.catalyst.expressions.UnsafeRowWriters$UTF8StringWriter.getSize(UnsafeRowWriters.java:90)
at
org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
Source)
at
org.apache.spark.sql.execution.TungstenProject$$anonfun$3$$anonfun$apply$3.apply(basicOperators.scala:90)
at
org.apache.spark.sql.execution.TungstenProject$$anonfun$3$$anonfun$apply$3.apply(basicOperators.scala:88)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
```
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/pierre-borckmans/spark
SPARK-12477_Tungsten-Projection-Null-Element-In-Array
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/10429.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #10429
----
commit b6a79e7fe73b5a1cabbc39a50fa4e47dd4f2a079
Author: pierre-borckmans <[email protected]>
Date: 2015-12-22T08:43:55Z
CHECK if element in array field is null
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]