pan3793 opened a new pull request, #38357:
URL: https://github.com/apache/spark/pull/38357

   ### What changes were proposed in this pull request?
   
   Provide a flexible way on K8s for Driver and Executor by using env vars to 
configure external log service links(pattern) and attributes, on both live 
Spark UI and SHS.
   
   The full design doc is 
https://docs.google.com/document/d/1MfB39LD4B4Rp7MDRxZbMKMbdNSe6V6mBmMQ-gkCnM-0/edit?usp=sharing
   
   1. Expose general attributes on K8s, for both Driver and Executor, which can 
be referred in log URLs pattern and will be persisted into event log. My 
proposed generic attributes are
     - APP_ID
     - KUBENETES_POD_NAME
     - KUBENETES_NAMESPACE
   
   2. Allow using env vars to add custom log URLs and attributes, for both 
Driver and Executor.
     - Driver log URL: env vars w/ prefix SPARK_DRIVER_LOG_URL_
     - Driver attribute: env vars w/ prefix SPARK_DRIVER_ATTRIBUTE_
     - Executor log URL: env vars w/ prefix SPARK_LOG_URL_
     - Executor attribute: env vars w/ prefix SPARK_EXECUTOR_ATTRIBUTE_
   
   3. Always do log URLs replacement for Driver before sending 
SparkListenerApplicationStart into the LiveListenerBus, so that the Driver 
could have the log URL replacement ability on live UI, as Executor does.
   
   4. Always do log URLs replacement for Executor,
     - if spark.history.custom.executor.log.url is provided, as-is;
     - otherwise, use the value of log URL as pattern in case that 
user-provided log URL refers to the attributes.
   
   ### Why are the changes needed?
   
   Currently, there is no out-of-box log solution for Spark on K8s.
   
   For Spark on Yarn case, Spark provides stdout/stderr log links on Spark UI 
for the Driver and each Executor which redirects to the Yarn log pages, but for 
the resource manager which does not provide the out-of-box log services, like 
K8s, Spark has no log links on Spark UI.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes, users could add custom log links in the Spark UI by configurations in 
Spark on K8s.
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some 
test cases that check the changes thoroughly including negative and positive 
cases if possible.
   If it was tested in a way different from regular unit tests, please clarify 
how you tested step by step, ideally copy and paste-able, so that other 
reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why 
it was difficult to add.
   If benchmark tests were added, please run the benchmarks in GitHub Actions 
for the consistent environment, and the instructions could accord to: 
https://spark.apache.org/developer-tools.html#github-workflow-benchmarks.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to