Github user andrewor14 commented on the pull request:
https://github.com/apache/spark/pull/5385#issuecomment-113348477
@harishreedharan Yes, receiver is a long-running task, and so we don't want
to disrupt it. According to TD the main reason why the batch queue builds up is
because the execution can't catch up with the receiving, not because we have
too few receivers. I think it's simplest to limit dynamic allocation in
streaming to adding / removing executors without receivers.
> Not sure what you mean here - can you explain a bit? In the receiver case
partition count depends on amount of data received while in the diret
stream case it is fixed, so that we'd have to rethink since the task count
is constant.
Having a long batch queue means execution can't keep up with receiving, so
we need more executors to speed up the execution. Note that in streaming, the
scheduler task queue may fluctuate quite a bit because the batch interval may
not be long enough for the timeouts to elapse continuously, i.e. the task queue
probably won't sustain across batches. Maybe @tdas can explain this better.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]