gaborgsomogyi commented on pull request #28336: URL: https://github.com/apache/spark/pull/28336#issuecomment-619792253
I've tested it and works fine. The change is technically correct however the main question is whether an active `SparkContext` is a requirement to run a Spark application or not. Namely the main problem here is that the user code touches an external system (for example HDFS) which requires delegation token without an active `SparkContext`. Since `SparkContext` creation obtains delegation tokens synchronously it would potentially solve this problem. I tend to agree that this change is the best option in this case just want to hear other voices. My consideration is the following: This issue only comes when AM dies, YARN restarts it but the delegation tokens are not valid. It's easy to see that mainly streaming applications are effected. If we don't apply this change we have at least the following options: * Document clearly that an active `SparkContext` is required to interact with external systems where DT required => If one miss to read the doc then only weeks later realises that the streaming job stopped to work, so I don't think it's a real solution * Detect somehow the user code is trying to access external systems which require DT => well, it's maybe not possible but surely and overkill in my view ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
