HeartSaVioR commented on issue #23260: [SPARK-26311][YARN] New feature: custom log URL for stdout/stderr URL: https://github.com/apache/spark/pull/23260#issuecomment-446738674 @squito > Or maybe I'm still not quite following, and there is some 3rd party piece here, outside of spark & yarn, which collects the logs and can serve them later on, whether or not the NM can serve the logs? Exactly, and collecting logs can also happen while app is running. For now I would rather say there's 3rd party here, but Hadoop side is trying to leverage `clusterId` which I guess Hadoop is also going to have multi-clusters awareness, so it's not impossible for Hadoop/YARN to include multi-clusters aware centralized services in future. @vanzin > because the user has to do that (well, the admin could put the values in the default Spark properties file) In practice, Admin will put the value in Spark properties. I agree it doesn't sound good if end users can override it, but not sure Spark can prevent it. Please let me know if there's a way for Spark to only read from Spark property file and not allowing end users to override it while submitting. I'm not aware of it and I'll use once it exists. > I'm not sure about what's the behavior while the app is running. If you go to the live UI, and click on the log link, where does that take you? Centralized log services (whatever they exist) will provide the log in unique URLs and Spark will always point to these URLs. Suppose the log service knows the status of NM and application, then the service can do anything which we are serving now. If NM is live, the service could redirect/forward to NM's log URL, or just serve stored log file which is continuously pulled from NM. (For latter it may represent a bit outdated log, but it just depends on when to pull which is just a detail on the log service, not Spark should worry about) If not, it will serve stored log file pulled before NM goes offline. In any way, I wish we don't end up with dealing with static URL, and provide some flexibility on 3rd party and end users.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org