Github user srowen commented on the issue: https://github.com/apache/spark/pull/23263 My first impression is that it's a big change, which is reason for caution here. Visualizing a workflow is nice, but Spark's Pipelines are typically pretty straightforward and linear. I could imagine producing a nicer visualization than what you get from reading the Spark UI, although of course we already have some degree of history and data there. These are just the hooks, right? someone would have to implement something to use these events. I see the value in the API to some degree, but with no concrete implementation, does it add anything for Spark users out of the box? It seems like the history this generates would belong in the history server, although that already has a pretty particular purpose, storing granular history of events in Spark. Is that what someone would likely do? or would someone likely have to run Atlas to use this? If that's a good example of the use case, and Atlas is really about lineage and governance, is that the thrust of this change, to help with something to do with model lineage and reproducibility? It's good that the API changes little, though it does change a bit. I think I mostly have questions right now.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org