gaborgsomogyi opened a new pull request #23764: [SPARK-26825][SS] Fix temp 
checkpoint creation in cluster mode when default filesystem is not local.
URL: https://github.com/apache/spark/pull/23764
 
 
   ## What changes were proposed in this pull request?
   
   There are situations where temporary checkpoint directory created by Spark. 
One example when one uses console sink. Such cases in the actual implementation 
`StreamingQueryManager` creates directory with `Utils.createTempDir` which will 
be passed to the appropriate `StreamExecution`. `StreamExecution` then does the 
following:
   * Creates the directory again
   * Resolves the provided directory
   
   The problem comes when resolving happens. The `StreamingQueryManager` 
provided path doesn't contain `file://` scheme and because of this from local 
filesystem it can switch to HDFS for example (such case HDFS is the default 
filesystem).
   
   In this PR I've added the following changes:
   * Creating the directory only in `StreamExecution`
   * `file://` scheme added to the directory
   * As it was not clear that the checkpoint directory was not created because 
of permission issues I've added an exception when checkpoint directory doesn't 
exist and creation is not successful
   
   ## How was this patch tested?
   
   Existing unit tests + started a query in client/cluster mode.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to