Github user vanzin commented on the issue:
https://github.com/apache/spark/pull/17295
> Isn't it not simpler to transmit block contents in encrypted format
without decryption?
First, keep in mind that there's no metadata that tells the receiver
whether a block is encrypted or not. This means that methods like
`BlockManager.get`, which can read block data from either local or remote
sources, need to return data that is either always encrypted or always not
encrypted for the same block ID.
This leaves two choices:
- encrypt the data in all stores (memory & disk); this is what the current
code does, and it requires all code that uses the BlockManager to have to deal
with encryption. This is what caused SPARK-19520, and I filed SPARK-19556 to
cover yet another case of a code path that did not do the right thing when
encryption is enabled.
- make all non-shuffle block data read from the BlockManager not encrypted.
This means non-shuffle code calling the BlockManager does not have to care
about encryption, since it will always read unencrypted data, and can always
put unencrypted data in the BlockManager and it will be encrypted when needed
(a.k.a. when writing to disk).
> Remote fetch of RDD blocks is not uncommon
That's fine. This change makes the data read from the BlockManager instance
not encrypted. But when transmitting the data over to another executor, there's
RPC-level encryption (`spark.authenticate.enableSaslEncryption` or
`spark.network.crypto.enabled`), which means the data is still encrypted on the
wire.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]