Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/21825#discussion_r206546611
--- Diff: docs/configuration.md ---
@@ -1215,6 +1215,14 @@ Apart from these, the following properties are also
available, and may be useful
if it is too small, <code>BlockManager</code> might take a performance
hit.
</td>
</tr>
+<tr>
+ <td><code>spark.broadcast.checksum</code></td>
+ <td>true</td>
+ <td>
+ Whether to enable checksum for broadcast.If it is enabled (default),
the broadcast will be more reliable.
--- End diff --
Nits like: space after a period, and the default is already documented
above. I think this could still be more useful. What about: "If enabled,
broadcasts will include a checksum, which can help detect corrupted blocks, at
the cost of computing and sending a little more data. It's possible to disable
it if the network has other mechanisms to guarantee data won't be corrupted
during broadcast."
CC @davies . I guess even I'm not sure when I would disable this ... what
would a network have to guarantee to avoid whatever corruption is possible
here? Here it isn't clear yet when it's safe, when it won't lead to correctness
issues.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]