Github user squito commented on the issue:
https://github.com/apache/spark/pull/16639
cc @kayousterhout @markhamstra @mateiz
This isn't just protecting against crazy user code -- I've seen users hit
this with spark sql (because of
https://github.com/apache/spark/blob/278fa1eb305220a85c816c948932d6af8fa619aa/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala#L214),
so it seems important to fix.
I attempted to write a larger integration test, which reproduced the issue
in a "local-cluster" setup, but got stuck. ShuffleBlockFetcherIterator does
_some_ fetches on construction, before its used as an iterator wrapped in user
code. So if the failures happen during that initialization, everything was
fine before. The failure has to happen inside the call to
`shuffleBlockFetcherIterator.next()` when its called by the user's iterator for
the error to happen. I eventually was able to reproduce it with this
https://github.com/squito/spark/commit/c2d27d10f32edf70e78d849967f7b7bf51495c4e
but it involved hacking internals and didn't seem easy to get into a test. I
settled for a simpler unit test just on `Executor`, but open to more
suggestions.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]