tgravescs commented on issue #26085: [SPARK-29434][Core] Improve the 
MapStatuses Serialization Performance
URL: https://github.com/apache/spark/pull/26085#issuecomment-544977349
 
 
   thanks for running the tests @dbtsai. Its actually faster then I expected.
   
   yes it needs to distribute it and this is obviously using memory on the 
driver side. Normally if the map status is of any size it will end up being 
broadcast to the hosts with the message going over the wire just indicating its 
a broadcast.  This to me isn't much different then any other broadcast thing 
which normally has the spark.io.compression.codec config applied for what 
compression to use.  You may actually want it faster if you have ample network. 
 I assume originally before we were broadcasting it, the size was definitely an 
issue because if it went over the max message size it would just fail. It also 
took a long time for large status' and slower networks.  That was also before 
we had highlycompressed status's and such as well.
   
   This is definitely an improvement over what we had.  Perhaps we just wait 
and see if its an issue or if someone wants to use something other then zstd 
and at that point we an make it configurable if needed.  I hate to add more 
configs if not really needed.
    

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to