Github user vanzin commented on the issue:
https://github.com/apache/spark/pull/21402
> StreamRequest will not block the server netty handler thread.
Hmm, I'm not so sure that's accurate. I think the main difference is that I
don't think there is currently any code path that sends a `StreamRequest` to
the shuffle service. But for example if many executors request files from the
driver simultaneously, you could potentially end up in the same situation. It's
a less serious issue since I think it's a lot less common for large files to be
transferred that way, at least after startup.
I took a look at the code and it seems the actual change that avoids the
disk thrashing is the synchronization done in the chunk fetch handler; it only
allows a certain number of threads to actually do disk reads simultaneously.
That's an improvement already, but a couple of questions popped in my head when
I read your comment:
- how does that relate to maxChunksBeingTransferred()? Aren't both settings
effectively a limit on the number of requests being serviced, making the
existing one a little redundant?
- would there be benefits by trying to add some sort of disk affinity to
these threads? e.g. send fetch requests hitting different disks to different
queues.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]