GitHub user jmwdpk opened a pull request:
https://github.com/apache/spark/pull/20823
[SPARK-23674] Add Spark ML Listener for Tracking ML Pipeline Status
## What changes were proposed in this pull request?
In order to keep track of the status of Spark ML pipeline, trait
MLListenEvent to monitor the
[jira](https://issues.apache.org/jira/browse/SPARK-23674) proposed events was
added; trait MLListener used onEvent method to overide doPostEvent in
ListenerBus and post the events to specific listener.
In Pipeline.scala, PipelineStage now extends with ListenerBus, so that the
related events can be posted to specific listener by doPostEvent. All pipeline
related events were posted to all registered listeners by postToAll
In ReadWrite.scala, MLWriter now extends with ListenerBus, so that the
save-related events can be posted to specific listener
## How was this patch tested?
When testing the features, a recorder was created as a mutable buffer to
catch/listen to the actual pipeline events, all the events were added to the
recorder by the overridden onEvent method in the newly created MLListener,
which was add to the
object(pipeline/newPipelineModel/pipelineWritter/pipelineModelWritter)
corresponding to the operation(fit/transform/save/save) associated with each
type of event, finally the actual captured events were compared with the
expected events specified in the tests.
Please review http://spark.apache.org/contributing.html before opening a
pull request.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/jmwdpk/spark master
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/20823.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #20823
----
commit 7097400b0453a0cd2ab3fa7128cae247797030de
Author: Ming Jiang <mjiang@...>
Date: 2018-03-14T16:56:21Z
added MLListener.scala with trait MLListenEvent to monitor the jira
proposed events, added postToAll in Pipeline.scala so that pipeline related
events were posted to all registered listeners, added pipelineJJobTracker test
case
commit f664b527cdf05a53766bb0bd3009a0cb15833f41
Author: Ming Jiang <mjiang@...>
Date: 2018-03-14T17:37:48Z
added test cases: pipeline model transform tracker, Pipeline read/write
tracker, PipelineModel read/write tracker
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]