Github user LantaoJin commented on the issue:
https://github.com/apache/spark/pull/22353
> Although event log is in JSON format, it's mostly for internal usage, to
be load by history server and used to build the Spark UI.
AFAIK, there are more and more projects replay event log to analysis jobs
offline, especially in a platform/infra team in a big company. Dr-elephant
doesn't read event log, instead, query SHS to get information causing many
problems like compatibility or data accuracy. In eBay we are building a system
similar with Dr-elephant but much powerful. One of use cases in this system is
building a data lineage and monitor the input/output path and data size for
each application. Difference with Apache Altas who need attach a spark listener
into the spark runtime, we choose to replay event log to build all context we
need. Before 2.3, we can get above information from the `metadata` field in
SQLExecutionStart event. Now it was removed. So I hope this PR could add it
back. What's more is make more probability on event log instead of only using
in SHS.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]