xuanyuanking commented on a change in pull request #32702:
URL: https://github.com/apache/spark/pull/32702#discussion_r654280374



##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##########
@@ -1568,6 +1568,16 @@ object SQLConf {
       .booleanConf
       .createWithDefault(true)
 
+  val FILESTREAM_SINK_METADATA_IGNORED =
+    buildConf("spark.sql.streaming.fileStreamSink.metadata.ignored")

Review comment:
       Following the [guideline for naming 
configurations](https://github.com/apache/spark/blob/c6109ba9181520359222fb032d989f266d3221d8/core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala#L20-L47),
 maybe the config can be named like 
`spark.sql.streaming.fileStreamSink.ignoreMetadata` or 
`spark.sql.streaming.fileStreamSink.formatCheck.enabled`, or any other good 
names :)

##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
##########
@@ -360,12 +360,15 @@ case class DataSource(
         baseRelation
 
       // We are reading from the results of a streaming query. Load files from 
the metadata log
-      // instead of listing them using HDFS APIs.
+      // instead of listing them using HDFS APIs. Note that the config
+      // `spark.sql.streaming.fileStreamSink.metadata.ignored` can be enabled 
to ignore the
+      // metadata log.
       case (format: FileFormat, _)
-          if FileStreamSink.hasMetadata(
-            caseInsensitiveOptions.get("path").toSeq ++ paths,
-            newHadoopConfiguration(),
-            sparkSession.sessionState.conf) =>
+          if !sparkSession.sessionState.conf.fileStreamSinkMetadataIgnored &&

Review comment:
       Instead of checking the config on the caller side in three places, maybe 
we can directly check the config in `FileStreamSink.hasMetadata`? These 2 
approaches should be equivalent while the latter one only changes single code 
segment.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to