zsxwing commented on a change in pull request #23733: [SPARK-26824][SS]Fix the 
checkpoint location and _spark_metadata when it contains special chars
URL: https://github.com/apache/spark/pull/23733#discussion_r256112441
 
 

 ##########
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSink.scala
 ##########
 @@ -37,23 +39,47 @@ object FileStreamSink extends Logging {
    * Returns true if there is a single path that has a metadata log indicating 
which files should
    * be read.
    */
-  def hasMetadata(path: Seq[String], hadoopConf: Configuration): Boolean = {
+  def hasMetadata(path: Seq[String], hadoopConf: Configuration, sqlConf: 
SQLConf): Boolean = {
     path match {
       case Seq(singlePath) =>
+        val hdfsPath = new Path(singlePath)
+        val fs = hdfsPath.getFileSystem(hadoopConf)
+        if (fs.isDirectory(hdfsPath)) {
+          val metadataPath = new Path(hdfsPath, metadataDir)
+          checkEscapedMetadataPath(fs, metadataPath, sqlConf)
+          fs.exists(metadataPath)
+        } else {
+          false
+        }
+      case _ => false
+    }
+  }
+
+  def checkEscapedMetadataPath(fs: FileSystem, metadataPath: Path, sqlConf: 
SQLConf): Unit = {
+    if 
(sqlConf.getConf(SQLConf.STREAMING_CHECKPOINT_ESCAPED_PATH_CHECK_ENABLED)
+        && StreamExecution.containsSpecialCharsInPath(metadataPath)) {
+      val legacyMetadataPath = new Path(metadataPath.toUri.toString)
+      val legacyMetadataPathExists =
         try {
-          val hdfsPath = new Path(singlePath)
-          val fs = hdfsPath.getFileSystem(hadoopConf)
-          if (fs.isDirectory(hdfsPath)) {
-            fs.exists(new Path(hdfsPath, metadataDir))
-          } else {
-            false
-          }
+          fs.exists(legacyMetadataPath)
         } catch {
           case NonFatal(e) =>
-            logWarning(s"Error while looking for metadata directory.")
+            // We may not have access to this directory. Don't fail the query 
if that happens.
+            logWarning(e.getMessage, e)
 
 Review comment:
   That's correct.
   
   The reason I chose to ignore the error here is we may not have the 
permission to check the directory. For example, if I have some special chars in 
my user name such as `foo bar`. Then I try to write into my home directory 
`/user/foo bar/a/b/c` in **Spark 3.0.0**, it's likely I don't have access to 
`/user/foo%20bar/a/b/c` since that's in a different user home directory. Since 
checking the directory should be best effort and should not impact any users 
that don't hit this path issue, I prefer to ignore the error for safety.
   
   This is different than `try-catch` in `hasMetadata` which is checking a sub 
directory in the current directory. In this case, the directory is usually 
accessible and the error is probably a real issue.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to