Github user winningsix commented on the pull request:
https://github.com/apache/spark/pull/8880#issuecomment-163075127
Hi @vanzin
Thanks for your comments. I mean to ping you and add reply some of your
previous comments but the build failed since the Chimera jar is not updated. I
will trigger another build to check it again.
>the block manager is still compressing encrypted streams, instead of
encrypting compressed streams, which would be more efficient.
Yes, I agree that we should encrypt the compressed stream when writing.
However, `BlockStoreShuffleReader` should be opposite to the write path which
decrypts the stream and then decompresses the stream in read path.
>the tests currently under yarn/ should be moved to core/ and adjusted as
needed, since that's where the code is being modified.
Agree. But I didn't find a good way to do it. Regards the test for
`DiskBlockObjectWriter` or `BlockManager`, we have to do it in the level of
Yarn module since we need to obtain the security key from the credentials
stored in user group information. It is allowed only in Yarn mode while return
null in Hadoop mode (`SparkHadoopUtil.getCurrentUserCredentials()`). I am
thinking about adding a common test Utils in the test code of core module and
use it directly in Yarn maven module. Do you have any better idea for that?
>I'd suggest finding a way to test the block manager changes without
needing to run a complete Spark app.
Yes, I am figuring out a way to do it. Can we do it as a follow up task and
we can merge this patch firstly.
In fact, we have one round code review internally for the Spark File
Shuffle Encryption feature. I have addressed all of your comments except for
what mentioned above and do the following updates.
1. Remove all unnecessary configurations which is used by Chimera internally
2. Add more comments or description for both code and configuration
3. Add more unit tests to cover my changes in the class
ShuffleEncryptionSuite
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]