[GitHub] [spark] HeartSaVioR commented on a change in pull request #32702: [SPARK-35565][SS] Add config for ignoring metadata directory of FileStreamSink

GitBox Fri, 18 Jun 2021 14:15:17 -0700


HeartSaVioR commented on a change in pull request #32702:
URL: https://github.com/apache/spark/pull/32702#discussion_r654400586




##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##########
@@ -1568,6 +1568,16 @@ object SQLConf {
       .booleanConf
       .createWithDefault(true)
 
+  val FILESTREAM_SINK_METADATA_IGNORED =
+    buildConf("spark.sql.streaming.fileStreamSink.metadata.ignored")

Review comment:
       Personally `spark.sql.streaming.fileStreamSink.ignoreMetadata` sounds 
better. I couldn't get what `formatCheck` means intuitively. 

##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
##########
@@ -360,12 +360,15 @@ case class DataSource(
         baseRelation
 
       // We are reading from the results of a streaming query. Load files from 
the metadata log
-      // instead of listing them using HDFS APIs.
+      // instead of listing them using HDFS APIs. Note that the config
+      // `spark.sql.streaming.fileStreamSink.metadata.ignored` can be enabled 
to ignore the
+      // metadata log.
       case (format: FileFormat, _)
-          if FileStreamSink.hasMetadata(
-            caseInsensitiveOptions.get("path").toSeq ++ paths,
-            newHadoopConfiguration(),
-            sparkSession.sessionState.conf) =>
+          if !sparkSession.sessionState.conf.fileStreamSinkMetadataIgnored &&

Review comment:
       Either is fine for me.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] HeartSaVioR commented on a change in pull request #32702: [SPARK-35565][SS] Add config for ignoring metadata directory of FileStreamSink

Reply via email to