mridulm commented on code in PR #37610: URL: https://github.com/apache/spark/pull/37610#discussion_r954071160
########## docs/configuration.md: ########## @@ -1007,6 +1007,28 @@ Apart from these, the following properties are also available, and may be useful </td> <td>3.3.0</td> </tr> +<tr> + <td><code>spark.shuffle.service.db.enabled</code></td> + <td>true</td> + <td> + Store External Shuffle service state on local disk so that when the external shuffle service is restarted, it will + automatically reload info on current executors. This only affects standalone mode (yarn always has this behavior + enabled). You should also enable <code>spark.worker.cleanup.enabled</code>, to ensure that the state + eventually gets cleaned up. This config may be removed in the future. + </td> + <td>3.0.0</td> +</tr> Review Comment: This should be in the section at the bottom - in `External Shuffle service(server) side configuration options` section. Having said that, I was wrong about `spark.shuffle.service.db.enabled` - it is always enabled in yarn mode - so we cannot control it with `.enabled` flag. The newly introduced `spark.shuffle.service.db.backend` is relevant though; but can we add a blurb that it is relevant for yarn and standalone - with db enabled by default for yarn, and to look at standalone.md for more details on how to configure it for standalone ? Thx -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
