Github user tgravescs commented on the issue:
https://github.com/apache/spark/pull/18805
> Why does this need to be in Spark?
@srowen you already asked that question and it has been answered on the
jira as well as the old pr. A user cannot add zstd compression to the internal
spark parts: spark.io.compression.codec. In this particular case he is saying
its the shuffle output where its making a big difference.
zstd is already included in other open source projects like Hadoop, but
again we don't get that for Spark internal compression code, zstd itself is BSD
license. It looks like this pr is using the https://github.com/luben/zstd-jni
which also appears to be BSD licensed. We need to decide if using that is ok
for us to use directly. Hadoop wrote its own version but I would say if that
version is working we use it. Worse case if something happens where that user
won't fix something we could fork it and aren't any worse then having our own
copy to start with.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]