Bruce Robbins created SPARK-45896:
-------------------------------------

             Summary: Expression encoding fails for Seq/Map of Option[Seq]
                 Key: SPARK-45896
                 URL: https://issues.apache.org/jira/browse/SPARK-45896
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 3.5.0, 3.4.1
            Reporter: Bruce Robbins


The following action fails on 3.4.1, 3.5.0, and master:
{noformat}
scala> val df = Seq(Seq(Some(Seq(0)))).toDF("a")
val df = Seq(Seq(Some(Seq(0)))).toDF("a")
org.apache.spark.SparkRuntimeException: [EXPRESSION_ENCODING_FAILED] Failed to 
encode a value of the expressions: mapobjects(lambdavariable(MapObject, 
ObjectType(class java.lang.Object), true, -1), 
mapobjects(lambdavariable(MapObject, ObjectType(class java.lang.Object), true, 
-2), assertnotnull(validateexternaltype(lambdavariable(MapObject, 
ObjectType(class java.lang.Object), true, -2), IntegerType, IntegerType)), 
unwrapoption(ObjectType(interface scala.collection.immutable.Seq), 
validateexternaltype(lambdavariable(MapObject, ObjectType(class 
java.lang.Object), true, -1), ArrayType(IntegerType,false), ObjectType(class 
scala.Option))), None), input[0, scala.collection.immutable.Seq, true], None) 
AS value#0 to a row. SQLSTATE: 42846
...
Caused by: java.lang.RuntimeException: scala.Some is not a valid external type 
for schema of array<int>
  at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.MapObjects_0$(Unknown
 Source)
...
{noformat}
However, it succeeds on 3.3.3:
{noformat}
scala> val df = Seq(Seq(Some(Seq(0)))).toDF("a")
df: org.apache.spark.sql.DataFrame = [a: array<array<int>>]

scala> df.collect
res0: Array[org.apache.spark.sql.Row] = Array([WrappedArray(WrappedArray(0))])
{noformat}
Map of option of sequence also fails on 3.4.1, 3.5.0, and master:
{noformat}
scala> val df = Seq(Map(0 -> Some(Seq(0)))).toDF("a")
val df = Seq(Map(0 -> Some(Seq(0)))).toDF("a")
org.apache.spark.SparkRuntimeException: [EXPRESSION_ENCODING_FAILED] Failed to 
encode a value of the expressions: 
externalmaptocatalyst(lambdavariable(ExternalMapToCatalyst_key, 
ObjectType(class java.lang.Object), false, -1), 
assertnotnull(validateexternaltype(lambdavariable(ExternalMapToCatalyst_key, 
ObjectType(class java.lang.Object), false, -1), IntegerType, IntegerType)), 
lambdavariable(ExternalMapToCatalyst_value, ObjectType(class java.lang.Object), 
true, -2), mapobjects(lambdavariable(MapObject, ObjectType(class 
java.lang.Object), true, -3), 
assertnotnull(validateexternaltype(lambdavariable(MapObject, ObjectType(class 
java.lang.Object), true, -3), IntegerType, IntegerType)), 
unwrapoption(ObjectType(interface scala.collection.immutable.Seq), 
validateexternaltype(lambdavariable(ExternalMapToCatalyst_value, 
ObjectType(class java.lang.Object), true, -2), ArrayType(IntegerType,false), 
ObjectType(class scala.Option))), None), input[0, 
scala.collection.immutable.Map, true]) AS value#0 to a row. SQLSTATE: 42846
...
Caused by: java.lang.RuntimeException: scala.Some is not a valid external type 
for schema of array<int>
  at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.MapObjects_0$(Unknown
 Source)
...
{noformat}
As with the first example, this succeeds on 3.3.3:
{noformat}
scala> val df = Seq(Map(0 -> Some(Seq(0)))).toDF("a")
df: org.apache.spark.sql.DataFrame = [a: map<int,array<int>>]

scala> df.collect
res0: Array[org.apache.spark.sql.Row] = Array([Map(0 -> WrappedArray(0))])
{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to