Github user bOOm-X commented on the issue:
https://github.com/apache/spark/pull/18004
I am against the other approach for multiple reasons. But the key one is
that it will change the synchronization paradigm, and clearly it will change
the behavior of the current listeners and maybe causing bugs. For example, the
StorageStatusListener and the StorageListener are dependent. The second one use
the "result" of the first one. If you put them in different thread, it will for
sure change the current behavior. Will it cause fatal bug, I do not know.
The asynchronous mechanism will be implemented in a very different way for
all listeners. No global approach can be used because of the very different
types of the messages and their frequency. What you will leverage at the
listener level is the type of messages that the listener is interested in (for
the logging listener, the blockUpdated messages - the far most frequent one -
are ignored), the message processing type (for the logging listener the
processing is the same for all message type), and the dependencies of the
listener (For the logging listener, there is no dependencies).
For the other significant (in term of performance) listener - the
StorageListener - all of the key aspects are very different:
It processes the blockUpdated messages.
All the different message types have a different processing
The storageListener depends on the storageStatusListener (they have to
process messages synchronously)
The key thing in the event logging listener is the ability to not queue the
blockUpdated messages and so be able to "not consider" them.
For the couple storageStatusListener / storageListener , I think that the
key thing is that you can batch consecutive blockUpdated messages (the other
messages like SparkListenerStageSubmitted, ... act as a barrier) to decrease
the processing time. This optimization will be much more complex than the
logging listener one, and much less significant in terms of performance
improvement
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]