Github user bOOm-X commented on the issue:
https://github.com/apache/spark/pull/18004
There is no global issue with the current queue / listeners implementation.
Conversely, it works pretty well and it is pretty simple !
There is just 2 precises performance issues:
- the eventLogging listener which takes 50% of the "dequeing time"
- the StorageListener which takes 40% of the "dequeing time"
But for all the other listeners it works very well. It is why I think that
it does not worth for changing the global mechanism, but more just fixing the 2
precise issues.
Fixing only one of the 2 precise performance issues will increase a lot the
dequeing rate, allowing to use Spark 2.x with a pretty decent volume of data
(which is clearly not the case right now).
For the EventLogListener I think that I have in this changelist a pretty
simple fix, very localized, with no impact on the rest of the code, ... , which
completely fix the performance issue in a quite standard way (writing
asynchronously to a file is very standard).
For The storageStatusListener / storageListener couple, I indeed noticed
that the classes are flagged as deprecated. I do not plan to modify them yet (I
do not really need that). I will wait for your new listener. I will be able to
test its performance if you want on my test case (10 000 partitions, 1 000
executors)
For the issue that could appears with the external listener.
I think that this should not be address in this PR.
I think that no complex mechanism should be implemented to avoid that. the
PR #18083 will introduce metrics on the dequeing process (including a per
listener timings) and a simple warning / error message in the log allowing to
clearly spot the faulty listener is enough.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]