[GitHub] [spark] xinrong-meng commented on pull request #39585: [WIP] Unregistered Python UDF in Spark Connect

GitBox Tue, 17 Jan 2023 04:33:44 -0800


xinrong-meng commented on PR #39585:
URL: https://github.com/apache/spark/pull/39585#issuecomment-1385357961


   Thanks @grundprinzip for the insightful comments. I will adjust them.
   
   As for the message `PythonFunction`, it was a placeholder for all the 
information required to construct a PySpark SimplePythonFunction, as shown 
below.
   ```
   private[spark] case class SimplePythonFunction(
       command: Seq[Byte],
       envVars: JMap[String, String],
       pythonIncludes: JList[String],
       pythonExec: String,
       pythonVer: String,
       broadcastVars: JList[Broadcast[PythonBroadcast]],
       accumulator: PythonAccumulatorV2)
   ```
   Another reason we may want to have a PythonFunction, separated from 
PythonUDF, is that: the information in the PythonFunction cannot be changed 
after the creation of a user-defined function, whereas the information in the 
PythonUDF can be changed by users at runtime - an example is as shown below:
   ```py
   >>> @udf(returnType='int')
   ... def f(x):
   ...   return x + 1
   ... 
   >>> f.returnType
   IntegerType()
   >>> f.returnType = LongType()
   >>> f.returnType
   LongType()
   ```
   
   Please correct me if I'm wrong.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] xinrong-meng commented on pull request #39585: [WIP] Unregistered Python UDF in Spark Connect

Reply via email to