shanyu opened a new pull request #25705: SPARK-29003: Spark history server 
startup hang due to deadlock
URL: https://github.com/apache/spark/pull/25705
 
 
   ### What changes were proposed in this pull request?
   This fixes JIRA:
   https://issues.apache.org/jira/browse/SPARK-29003
   
   The problem is that during Spark History Server startup, there are two 
things happening simultaneously that call into 
java.nio.file.FileSystems.getDefault():
   1) start jetty server
   2) start ApplicationHistoryProvider (which reads files from HDFS)
   
   We should do these two things sequentially instead of in parallel.
   We introduce a start() method in ApplicationHistoryProvider (and its 
subclass FsHistoryProvider), and we do initialize inside the start() method 
instead of the constructor.
   In HistoryServer, we explicitly call provider.start() after we call bind() 
which starts the Jetty server.
   
   ### Why are the changes needed?
   It is a bug that occasionally starting Spark History Server results in 
process hang due to deadlock among threads.
   
   ### Does this PR introduce any user-facing change?
   No.
   
   ### How was this patch tested?
   I stress tested this PR with a bash script to stop and start Spark History 
Server more than 1000 times, it worked fine. Previously I can only do the 
stop/start loop less than 10 times before I hit the deadlock issue.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to