peter-toth commented on a change in pull request #31955:
URL: https://github.com/apache/spark/pull/31955#discussion_r601496353
##########
File path:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala
##########
@@ -124,10 +124,10 @@ case class ScalaUDF(
val toRow = enc.createSerializer().asInstanceOf[Any => Any]
if (enc.isSerializedAsStructForTopLevel) {
value: Any =>
- if (value == null) null else toRow(value).asInstanceOf[InternalRow]
+ if (value == null) null else
toRow(value).asInstanceOf[InternalRow].copy()
} else {
value: Any =>
- if (value == null) null else
toRow(value).asInstanceOf[InternalRow].get(0, dataType)
+ if (value == null) null else
toRow(value).asInstanceOf[InternalRow].copy().get(0, dataType)
Review comment:
> I think the current fix can cause performance regression in case of
ScalaUDF + non higher order functions.
Well, yes, in case of *typed* `ScalaUDF`s my change can cause regression.
But your suggested change can cause regression in case of other `UDF`s +
`higher order functions` which is admittedly rarer than typed `ScalaUDF` + `non
higher order functions`.
So I'm not against fixing this in all affected higher order functions, but I
wouldn't expect `ScalaUDF.eval()` to return with something that needs to be
copied first to make sure another invocation doesn't change it.
##########
File path:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala
##########
@@ -124,10 +124,10 @@ case class ScalaUDF(
val toRow = enc.createSerializer().asInstanceOf[Any => Any]
if (enc.isSerializedAsStructForTopLevel) {
value: Any =>
- if (value == null) null else toRow(value).asInstanceOf[InternalRow]
+ if (value == null) null else
toRow(value).asInstanceOf[InternalRow].copy()
} else {
value: Any =>
- if (value == null) null else
toRow(value).asInstanceOf[InternalRow].get(0, dataType)
+ if (value == null) null else
toRow(value).asInstanceOf[InternalRow].copy().get(0, dataType)
Review comment:
> I think the current fix can cause performance regression in case of
ScalaUDF + non higher order functions.
Well, yes, in case of *typed* `ScalaUDF`s my change can cause regression.
But your suggested change can cause regression in case of other `UDF`s +
`higher order functions` which is admittedly rarer than typed `ScalaUDF` + `non
higher order functions`.
So I'm not against fixing this in all affected higher order functions, but I
wouldn't expect `ScalaUDF.eval()` to return with something that needs to be
copied first to make sure another invocation doesn't change it.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]