Github user JoshRosen commented on the pull request:
https://github.com/apache/spark/pull/6397#issuecomment-105329288
Alright, I think that this is ready for review so I'm going to remove the
`WIP` tag. I'd like feedback on testing: I think that the tests here, combined
with the implicit testing when this path is used during other shuffle write
paths, should be sufficient to convince yourself that I haven't broken anything
seriously when porting the bypass code into its own file. It's worth calling
out what I _don't_ have good unit tests for, though: we don't have any tests
which explicitly try to read per-partition output from the combined output
file. AFAIK the `ExternalSorterSuite` didn't explicitly test this, either. I
might end up adding a separate test utility for emulating how
IndexShuffleBlockResolver will read chunks of the output file and index file in
order to serve shuffles. This is part of an effort to avoid writing new
end-to-end tests for testing basic shuffle functionality, since those tend to
be slow and aren't great for testing things like metrics.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]