[GitHub] spark issue #22186: [SPARK-25183][SQL][WIP] Spark HiveServer2 to use Spark S...

steveloughran Wed, 29 Aug 2018 04:17:03 -0700

Github user steveloughran commented on the issue:

    https://github.com/apache/spark/pull/22186
  
    This will eliminate a race condition between FS shutdown (in the hadoop 
shutdown manager) and the hive callback. Theres a risk today that the 
filesystems will be closed before that event log close()/rename() is called, so 
things don't get saved âand this can happen with any FS.
    
    registering the shutdown hook via the spark APIs, with a priority > than 
the FS shutdown, guarantees that it will be called before the FS shutdown. But 
it doesn't guarantee that the operation will complete within the 10s time limit 
hard coded into Hadoop 2.8.x+ for any single shutdown hook to complete. It is 
going to work in HDFS except in the special case of HDFS NN lock or GC pause.
    
    The Hadoop configurable delay of 
[HADOOP-15679](https://issues.apache.org/jira/browse/HADOOP-15679) needs to go 
in. I've increased the default timeout to 30s there for more forgiveness with 
HDFS, and for object stores with O(data) renames people should configure it 
with a timeout of minutes, or, if they want to turn it off altogether, hours. 
    
    I'm backporting HADOOP-15679 to all branches 2.8.x+, so all hadoop versions 
with that timeout will have the timeout configurable & the default time 
extended.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #22186: [SPARK-25183][SQL][WIP] Spark HiveServer2 to use Spark S...

Reply via email to