gaborgsomogyi commented on pull request #28336:
URL: https://github.com/apache/spark/pull/28336#issuecomment-619792253


   I've tested it and works fine. The change is technically correct however the 
main question is whether an active `SparkContext` is a requirement to run a 
Spark application or not.
   Namely the main problem here is that the user code touches an external 
system (for example HDFS) which requires delegation token without an active 
`SparkContext`. Since `SparkContext` creation obtains delegation tokens 
synchronously it would potentially solve this problem.
   
   I tend to agree that this change is the best option in this case just want 
to hear other voices. My consideration is the following: This issue only comes 
when AM dies, YARN restarts it but the delegation tokens are not valid. It's 
easy to see that mainly streaming applications are effected. If we don't apply 
this change we have at least the following options:
   * Document clearly that an active `SparkContext` is required to interact 
with external systems where DT required => If one miss to read the doc then 
only weeks later realises that the streaming job stopped to work, so I don't 
think it's a real solution
   * Detect somehow the user code is trying to access external systems which 
require DT => well, it's maybe not possible but surely and overkill in my view
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to