xinrong-meng commented on PR #39585:
URL: https://github.com/apache/spark/pull/39585#issuecomment-1385357961
Thanks @grundprinzip for the insightful comments. I will adjust them.
As for the message `PythonFunction`, it was a placeholder for all the
information required to construct a PySpark SimplePythonFunction, as shown
below.
```
private[spark] case class SimplePythonFunction(
command: Seq[Byte],
envVars: JMap[String, String],
pythonIncludes: JList[String],
pythonExec: String,
pythonVer: String,
broadcastVars: JList[Broadcast[PythonBroadcast]],
accumulator: PythonAccumulatorV2)
```
Another reason we may want to have a PythonFunction, separated from
PythonUDF, is that: the information in the PythonFunction cannot be changed
after the creation of a user-defined function, whereas the information in the
PythonUDF can be changed by users at runtime - an example is as shown below:
```py
>>> @udf(returnType='int')
... def f(x):
... return x + 1
...
>>> f.returnType
IntegerType()
>>> f.returnType = LongType()
>>> f.returnType
LongType()
```
Please correct me if I'm wrong.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]