[GitHub] [spark] zhengruifeng commented on pull request #40520: [SPARK-42896][SQL][PYSPARK] Make `mapInPandas` / `mapInArrow` support barrier mode execution

via GitHub Wed, 22 Mar 2023 19:22:47 -0700


zhengruifeng commented on PR #40520:
URL: https://github.com/apache/spark/pull/40520#issuecomment-1480490580


   > Barrier mode is only used in specific ML case, i.e. in model training 
routine, we will only use it in one pattern:
   > 
   > dataset.mapInPandas(..., is_barrier=True).collect()
   
   > To simply the implementation, we can implement a 
barrierMapInPandasAndCollect instead, and define a execution plan stage like 
BarrierMapInPandasAndCollectExec
   
   If it is the only use case, i think it will be safe to add dedicated logical 
plan and physical plan for it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] zhengruifeng commented on pull request #40520: [SPARK-42896][SQL][PYSPARK] Make `mapInPandas` / `mapInArrow` support barrier mode execution

Reply via email to