Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/5385#issuecomment-113348477
  
    @harishreedharan Yes, receiver is a long-running task, and so we don't want 
to disrupt it. According to TD the main reason why the batch queue builds up is 
because the execution can't catch up with the receiving, not because we have 
too few receivers. I think it's simplest to limit dynamic allocation in 
streaming to adding / removing executors without receivers.
    
    > Not sure what you mean here - can you explain a bit? In the receiver case
    partition count depends on amount of data received while in the diret
    stream case it is fixed, so that we'd have to rethink since the task count
    is constant.
    
    Having a long batch queue means execution can't keep up with receiving, so 
we need more executors to speed up the execution. Note that in streaming, the 
scheduler task queue may fluctuate quite a bit because the batch interval may 
not be long enough for the timeouts to elapse continuously, i.e. the task queue 
probably won't sustain across batches. Maybe @tdas can explain this better.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to