[GitHub] [flink] dianfu opened a new pull request #11014: [FLINK-15897][python] Defer the deserialization of the Python UDF execution results

GitBox Tue, 04 Feb 2020 00:27:56 -0800

dianfu opened a new pull request #11014: [FLINK-15897][python] Defer the 
deserialization of the Python UDF execution results
URL: https://github.com/apache/flink/pull/11014
 
 
   ## What is the purpose of the change
   
   *Currently, the Python UDF execution results are deserialized and then 
buffered in a collection when received from the Python worker. The 
deserialization could be deferred when sending the execution results to the 
downstream operator. That's to say, it buffers the serialized bytes instead of 
the deserialized Java objects in the buffer. This could reduce the memory 
footprint of the Java operator.*
   
   ## Brief change log
   
     - *Update AbstractPythonFunctionRunner to not deserialize the UDF 
execution results when received from the Python worker*
     - *The UDF execution results are deserialized in 
PythonScalarFunctionOperator when sending them out to the downstream operator*
   
   
   ## Verifying this change
   
   This change is an improvement and is already covered by existing tests, such 
as *test_udf.py*.
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): (no)
     - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (no)
     - The serializers: (no)
     - The runtime per-record code paths (performance sensitive): (no)
     - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: (no)
     - The S3 file system connector: (no)
   
   ## Documentation
   
     - Does this pull request introduce a new feature? (no)
     - If yes, how is the feature documented? (not applicable)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] dianfu opened a new pull request #11014: [FLINK-15897][python] Defer the deserialization of the Python UDF execution results

Reply via email to