Li Jin created SPARK-25213: ------------------------------ Summary: DataSourceV2 doesn't seem to produce unsafe rows Key: SPARK-25213 URL: https://issues.apache.org/jira/browse/SPARK-25213 Project: Spark Issue Type: Task Components: SQL Affects Versions: 2.4.0 Reporter: Li Jin
Reproduce (Need to compile test-classes): bin/pyspark --driver-class-path sql/core/target/scala-2.11/test-classes {code:java} datasource_v2_df = spark.read \ .format("org.apache.spark.sql.sources.v2.SimpleDataSourceV2") \ .load() result = datasource_v2_df.withColumn('x', udf(lambda x: x, 'int')(datasource_v2_df['i'])) result.show() {code} The above code fails with: {code:java} Caused by: java.lang.ClassCastException: org.apache.spark.sql.catalyst.expressions.GenericInternalRow cannot be cast to org.apache.spark.sql.catalyst.expressions.UnsafeRow at org.apache.spark.sql.execution.python.EvalPythonExec$$anonfun$doExecute$1$$anonfun$5.apply(EvalPythonExec.scala:127) at org.apache.spark.sql.execution.python.EvalPythonExec$$anonfun$doExecute$1$$anonfun$5.apply(EvalPythonExec.scala:126) at scala.collection.Iterator$$anon$11.next(Iterator.scala:410) at scala.collection.Iterator$$anon$11.next(Iterator.scala:410) {code} Seems like Data Source V2 doesn't produce unsafeRows here. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org