mridulm commented on code in PR #37610:
URL: https://github.com/apache/spark/pull/37610#discussion_r954071160


##########
docs/configuration.md:
##########
@@ -1007,6 +1007,28 @@ Apart from these, the following properties are also 
available, and may be useful
   </td>
   <td>3.3.0</td>
 </tr>
+<tr>
+  <td><code>spark.shuffle.service.db.enabled</code></td>
+  <td>true</td>
+  <td>
+    Store External Shuffle service state on local disk so that when the 
external shuffle service is restarted, it will
+    automatically reload info on current executors.  This only affects 
standalone mode (yarn always has this behavior
+    enabled).  You should also enable 
<code>spark.worker.cleanup.enabled</code>, to ensure that the state
+    eventually gets cleaned up.  This config may be removed in the future.
+  </td>
+  <td>3.0.0</td>
+</tr>

Review Comment:
   This should be in the section at the bottom - in `External Shuffle 
service(server) side configuration options` section.
   
   Having said that, I was wrong about `spark.shuffle.service.db.enabled` - it 
is always enabled in yarn mode - so we cannot control it with `.enabled` flag.
   The newly introduced `spark.shuffle.service.db.backend` is relevant though; 
but can we add a blurb that it is relevant for yarn and standalone - with db 
enabled by default for yarn, and to look at standalone.md for more on details 
there ?
   
   Currently, we have the yarn shuffle service config in `configuration.md`; so 
can we 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to