Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/23263
  
    My first impression is that it's a big change, which is reason for caution 
here.
    
    Visualizing a workflow is nice, but Spark's Pipelines are typically pretty 
straightforward and linear. I could imagine producing a nicer visualization 
than what you get from reading the Spark UI, although of course we already have 
some degree of history and data there.
    
    These are just the hooks, right? someone would have to implement something 
to use these events. I see the value in the API to some degree, but with no 
concrete implementation, does it add anything for Spark users out of the box?
    
    It seems like the history this generates would belong in the history 
server, although that already has a pretty particular purpose, storing granular 
history of events in Spark. Is that what someone would likely do? or would 
someone likely have to run Atlas to use this? If that's a good example of the 
use case, and Atlas is really about lineage and governance, is that the thrust 
of this change, to help with something to do with model lineage and 
reproducibility?
    
    It's good that the API changes little, though it does change a bit.
    
    I think I mostly have questions right now.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to