Github user bOOm-X commented on the issue:
https://github.com/apache/spark/pull/18004
@cloud-fan The block strategy is not the default one ! It is safe to block
while any listener does not add messages to the queue (which is the case
currently). I am not fan of blocking the queue too, but in my mind this
strategy is a safety net while all the performance issues of the listener bus
have been solved. It is a temporary and clearly not ideal solution (which is
solving all the performance issue) but it is better than dropping messages
without failing the job.
@vanzin I think that approach like #16291 is a very bad idea. It make the
design more complex, force to duplicate queues, change the synchronization
paradigm between listener, ... The current design is the good one I think. It
is a kind of "reactor pattern", which works pretty well (see nodejs, vertx.io,
...) But the golden rule is to not block the dequeing process ! and this is the
responsibility of each listener unfortunately. We can try to enforce this by
providing an abstract asynchronous listener for example of by mesuring in real
time the listener execution time and logging error or warning for the most time
consuming listeners. Some listeners which make trivial work (in terms of
performances) do not need to be asynchronous (overkill, and less efficient). It
is the case for every listener which react only to very rare messages (like
only to job start, executor add or remove, ...) So Listerner should be migrate
to an asynchronous version only after case by case analysis.
I agree that other listeners should become asynchronous too, and it is
obvious for all the GUI-related one (yeah the GUI should not slow down the
deuqueing process !!) But the more consuming one is the EventLoggingListener
one, and it was very simple to transform it to an asynchronous version. It is
why I did only this one in this PR. I plan to do the other GUI related ones in
another PR.
I updated the PR description to mention the EventLoggingListener change.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]