xinrong-meng commented on PR #39585:
URL: https://github.com/apache/spark/pull/39585#issuecomment-1385357961

   Thanks @grundprinzip for the insightful comments. I will adjust them.
   
   As for the message `PythonFunction`, it was a placeholder for all the 
information required to construct a PySpark SimplePythonFunction, as shown 
below.
   ```
   private[spark] case class SimplePythonFunction(
       command: Seq[Byte],
       envVars: JMap[String, String],
       pythonIncludes: JList[String],
       pythonExec: String,
       pythonVer: String,
       broadcastVars: JList[Broadcast[PythonBroadcast]],
       accumulator: PythonAccumulatorV2)
   ```
   Another reason we may want to have a PythonFunction, separated from 
PythonUDF, is that: the information in the PythonFunction cannot be changed 
after the creation of a user-defined function, whereas the information in the 
PythonUDF can be changed by users at runtime - an example is as shown below:
   ```py
   >>> @udf(returnType='int')
   ... def f(x):
   ...   return x + 1
   ... 
   >>> f.returnType
   IntegerType()
   >>> f.returnType = LongType()
   >>> f.returnType
   LongType()
   ```
   
   Please correct me if I'm wrong.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to