[GitHub] [spark] cloud-fan commented on pull request #40629: [SPARK-42980][CORE] Implement a lightweight SmallBroadcast

via GitHub Mon, 03 Apr 2023 17:49:44 -0700


cloud-fan commented on PR #40629:
URL: https://github.com/apache/spark/pull/40629#issuecomment-1495187524


   @mridulm the use case we found so far is the broadcasting of hadoop conf: 
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala#L161-L162
   
   Simply inline the variable does not improve the perf because:
   1. hadoop conf is not that small
   2. if the executor is powerful like 16 cores, the hadoop conf has 16 copies 
in the executor JVM, which is a waste.
   
   We might find more cases in the future, as SQL operators need to broadcast 
small data (but not as small as a single integer) sometimes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] cloud-fan commented on pull request #40629: [SPARK-42980][CORE] Implement a lightweight SmallBroadcast

Reply via email to