GitHub user jmwdpk opened a pull request:

    https://github.com/apache/spark/pull/20823

    [SPARK-23674] Add Spark ML Listener for Tracking ML Pipeline Status

    ## What changes were proposed in this pull request?
    
    In order to keep track of the status of Spark ML pipeline, trait 
MLListenEvent to monitor the 
[jira](https://issues.apache.org/jira/browse/SPARK-23674) proposed events was 
added; trait MLListener  used onEvent method to overide doPostEvent in 
ListenerBus and post the events to specific listener. 
    
    In Pipeline.scala, PipelineStage now extends with ListenerBus, so that the 
related events can be posted to specific listener by doPostEvent. All pipeline 
related events were posted to all registered listeners by postToAll
    
    In ReadWrite.scala, MLWriter now extends with ListenerBus, so that the 
save-related events can be posted to specific listener
    
    ## How was this patch tested?
    
    When testing the features, a recorder was created as a mutable buffer to 
catch/listen to the actual pipeline events, all the events were added to the 
recorder by the overridden onEvent method in the newly created MLListener, 
which was add to the 
object(pipeline/newPipelineModel/pipelineWritter/pipelineModelWritter) 
corresponding to the operation(fit/transform/save/save) associated with each 
type of event, finally the actual captured events were compared with the 
expected events specified in the tests.
    
    Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jmwdpk/spark master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20823.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20823
    
----
commit 7097400b0453a0cd2ab3fa7128cae247797030de
Author: Ming Jiang <mjiang@...>
Date:   2018-03-14T16:56:21Z

    added MLListener.scala with trait MLListenEvent to monitor the jira 
proposed events, added postToAll in  Pipeline.scala so that pipeline related 
events were posted to all registered listeners, added pipelineJJobTracker test 
case

commit f664b527cdf05a53766bb0bd3009a0cb15833f41
Author: Ming Jiang <mjiang@...>
Date:   2018-03-14T17:37:48Z

    added test cases: pipeline model transform tracker, Pipeline read/write 
tracker, PipelineModel read/write tracker

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to