Github user BryanCutler commented on a diff in the pull request:
https://github.com/apache/spark/pull/19325#discussion_r140852903
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/python/ArrowEvalPythonExec.scala
---
@@ -51,10 +51,12 @@ case class ArrowEvalPythonExec(udfs: Seq[PythonUDF],
output: Seq[Attribute], chi
outputIterator.map(new ArrowPayload(_)), context)
// Verify that the output schema is correct
- val schemaOut =
StructType.fromAttributes(output.drop(child.output.length).zipWithIndex
- .map { case (attr, i) => attr.withName(s"_$i") })
- assert(schemaOut.equals(outputRowIterator.schema),
- s"Invalid schema from pandas_udf: expected $schemaOut, got
${outputRowIterator.schema}")
+ if (outputIterator.nonEmpty) {
+ val schemaOut =
StructType.fromAttributes(output.drop(child.output.length).zipWithIndex
+ .map { case (attr, i) => attr.withName(s"_$i") })
+ assert(schemaOut.equals(outputRowIterator.schema),
+ s"Invalid schema from pandas_udf: expected $schemaOut, got
${outputRowIterator.schema}")
--- End diff --
Yeah, I tried to make one but since we are now casting the return Series in
`ArrowPandasSerializer.dumps` with `astype` I have not found a case that
triggers it. I think it would still be good to keep this, just in case there is
some way it could happen and if we upgrade to Arrow 0.7 then we won't need the
`astype` logic and this will be used instead.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]