[GitHub] spark issue #22353: [SPARK-25357][SQL] Add metadata to SparkPlanInfo to dump...

LantaoJin Tue, 11 Sep 2018 03:10:42 -0700

Github user LantaoJin commented on the issue:

    https://github.com/apache/spark/pull/22353
  
    > Although event log is in JSON format, it's mostly for internal usage, to 
be load by history server and used to build the Spark UI.
    AFAIK, there are more and more projects replay event log to analysis jobs 
offline, especially in a platform/infra team in a big company. Dr-elephant 
doesn't read event log, instead, query SHS to get information causing many 
problems like compatibility or data accuracy. In eBay we are building a system 
similar with Dr-elephant but much powerful. One of use cases in this system is 
building a data lineage and monitor the input/output path and data size for 
each application. Difference with Apache Altas who need attach a spark listener 
into the spark runtime, we choose to replay event log to build all context we 
need. Before 2.3, we can get above information from the `metadata` field in 
SQLExecutionStart event. Now it was removed. So I hope this PR could add it 
back. What's more is  make more probability on event log instead of only using 
in SHS.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #22353: [SPARK-25357][SQL] Add metadata to SparkPlanInfo to dump...

Reply via email to