GitHub user tdas opened a pull request:
https://github.com/apache/spark/pull/16186
[SPARK-18758][SS] StreamingQueryListener events from a StreamingQuery
should be sent only to the listeners in the same session as the query
## What changes were proposed in this pull request?
Listeners added with `sparkSession.streams.addListener(l)` are added to a
SparkSession. So events only from queries in the same session as a listener
should be posted to the listener. Currently, all the events gets rerouted
through the Spark's main listener bus, that is,
- StreamingQuery posts event to StreamingQueryListenerBus. Only the queries
associated with the same session as the bus posts events to it.
- StreamingQueryListenerBus posts event to Spark's main LiveListenerBus as
a SparkEvent.
- StreamingQueryListenerBus also subscribes to LiveListenerBus events thus
getting back the posted event in a different thread.
- The received is posted to the registered listeners.
The problem is that *all StreamingQueryListenerBuses in all sessions* gets
the events and posts them to their listeners. This is wrong.
In this PR, I solve it by making StreamingQueryListenerBus track active
queries (by their runIds) when a query posts the QueryStarted event to the bus.
This allows the rerouted events to be filtered using the tracked queries.
Note that this list needs to be maintained separately
from the `StreamingQueryManager.activeQueries` because a terminated query
is cleared from
`StreamingQueryManager.activeQueries` as soon as it is stopped, but the
this ListenerBus must
clear a query only after the termination event of that query has been
posted lazily, much after the query has been terminated.
Credit goes to @zsxwing for coming up with the initial idea.
## How was this patch tested?
(Please explain how this patch was tested. E.g. unit tests, integration
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise,
remove this)
Please review http://spark.apache.org/contributing.html before opening a
pull request.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/tdas/spark SPARK-18758
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/16186.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #16186
----
commit 2c73f1e9965ea4ad28c9f7f0574540d614fa5de7
Author: Tathagata Das <[email protected]>
Date: 2016-12-07T03:19:01Z
Fixed bug
commit d3057d5cee64a0b0d308452511730770f4866bd7
Author: Tathagata Das <[email protected]>
Date: 2016-12-07T03:42:11Z
Simpler fix
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]