GitHub user squito opened a pull request:
https://github.com/apache/spark/pull/22881
[SPARK-25855][CORE] Don't use erasure coding for event logs by default
## What changes were proposed in this pull request?
This turns off hdfs erasure coding by default for event logs, regardless of
filesystem defaults. Because this requires apis only available in hadoop 3,
this uses reflection. EC isn't a very good choice for event logs, as hflush()
is a no-op, and so updates to the file are not visible for a long time. This
can still be configured by setting "spark.eventLog.allowErasureCoding=true",
which will use filesystem defaults.
## How was this patch tested?
deployed a cluster with the changes with HDFS EC on. By default, event
logs didn't use EC, but configuration still would allow EC.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/squito/spark SPARK-25855
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/22881.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #22881
----
commit 005ee5494acd3d9f0721ad24ba3700d8905e2e26
Author: Imran Rashid <irashid@...>
Date: 2018-10-26T19:03:43Z
[SPARK-25855][CORE][STREAMING] Don't use HDFS EC for event logs and WAL
hdfs erasure coding doesn't support hflush(), hsync(), or append(),
which doesn't work well for event logs and the WAL, so be sure we never
use it for those files, regardless of the configuration of hdfs.
commit 04b968a0223e195f1c7e6d6684274bd7f8484069
Author: Imran Rashid <irashid@...>
Date: 2018-10-26T20:22:11Z
fix
commit 8a9392c875b9b2aec048940a8ae7d03529bfc641
Author: Imran Rashid <irashid@...>
Date: 2018-10-29T15:56:20Z
make it configurable
commit cd28e61fe9232927ea66b3beb4af5c5d699bb6d3
Author: Imran Rashid <irashid@...>
Date: 2018-10-29T20:09:28Z
remove changes for WAL
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]