cloud-fan commented on PR #40629: URL: https://github.com/apache/spark/pull/40629#issuecomment-1495187524
@mridulm the use case we found so far is the broadcasting of hadoop conf: https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala#L161-L162 Simply inline the variable does not improve the perf because: 1. hadoop conf is not that small 2. if the executor is powerful like 16 cores, the hadoop conf has 16 copies in the executor JVM, which is a waste. We might find more cases in the future, as SQL operators need to broadcast small data (but not as small as a single integer) sometimes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
