hui730 commented on PR #43621: URL: https://github.com/apache/spark/pull/43621#issuecomment-1790565946
> > My plan is to create a new binary from executor binary,use System.arraycopy(). > > This is what is effectively happening currently, right ? The underlying serialzed task array array is immutable - and is repeatedly read to deserialize into the task closure. > > I want to make sure I understand the proposal, and how it is different from what Spark is currently doing. Assuming that there are currently n tasks (same stage) running simultaneously in the executor. the current situation is to read the remote broadcast, deserializing it, and then start running these n tasks. This step is serial. In my modification, the step of reading the remote broadcast and deserializing it into array [byte], which is asynchronous. It is parallel to the launchTask. This can save time. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
