gaborgsomogyi opened a new pull request #23764: [SPARK-26825][SS] Fix temp checkpoint creation in cluster mode when default filesystem is not local. URL: https://github.com/apache/spark/pull/23764 ## What changes were proposed in this pull request? There are situations where temporary checkpoint directory created by Spark. One example when one uses console sink. Such cases in the actual implementation `StreamingQueryManager` creates directory with `Utils.createTempDir` which will be passed to the appropriate `StreamExecution`. `StreamExecution` then does the following: * Creates the directory again * Resolves the provided directory The problem comes when resolving happens. The `StreamingQueryManager` provided path doesn't contain `file://` scheme and because of this from local filesystem it can switch to HDFS for example (such case HDFS is the default filesystem). In this PR I've added the following changes: * Creating the directory only in `StreamExecution` * `file://` scheme added to the directory * As it was not clear that the checkpoint directory was not created because of permission issues I've added an exception when checkpoint directory doesn't exist and creation is not successful ## How was this patch tested? Existing unit tests + started a query in client/cluster mode.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
