robreeves commented on a change in pull request #35969:
URL: https://github.com/apache/spark/pull/35969#discussion_r837919464



##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##########
@@ -2714,6 +2714,14 @@ object SQLConf {
     .stringConf
     .createWithDefault("avro,csv,json,kafka,orc,parquet,text")
 
+  val ALLOW_EMPTY_SCHEMAS_FOR_WRITES =
+    buildConf("spark.sql.sources.file.allowEmptySchemaWrite")

Review comment:
       Other configurations use `spark.sql.sources.files` instead of `file` as 
a singular. Can you update it to be consistent?
   
   On a related note, why does this include `sources`? At a glance 
`spark.sql.files.allowEmptySchemaWrite` seems to fully define the 
configuration. If this is okay, this should live with the rest of the 
`spark.sql.files.` configs. Let me know if there is a defined convention that 
I'm not aware of.

##########
File path: 
sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala
##########
@@ -154,6 +156,19 @@ class FileBasedDataSourceSuite extends QueryTest
     }
   }
 
+  val emptySchemaSupportedDataSources = Seq("orc", "csv", "json")
+  emptySchemaSupportedDataSources.foreach { format =>
+    val emptySchemaValidationConf = SQLConf.ALLOW_EMPTY_SCHEMAS_FOR_WRITES.key
+    test(s"SPARK-38651 allow writing empty schema files " +
+      s"using $format when ${emptySchemaValidationConf} is enabled") {
+      withSQLConf(emptySchemaValidationConf -> "true") {
+        withTempPath { outputPath =>
+          spark.emptyDataFrame.write.format(format).save(outputPath.toString)

Review comment:
       Is it possible to validate the file content for this test case?

##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
##########
@@ -553,8 +553,12 @@ case class DataSource(
         disallowWritingIntervals(data.schema.map(_.dataType), 
forbidAnsiIntervals = true)
         SaveIntoDataSourceCommand(data, dataSource, caseInsensitiveOptions, 
mode)
       case format: FileFormat =>
+        val shouldAllowEmptySchema =
+          
sparkSession.sessionState.conf.getConf(SQLConf.ALLOW_EMPTY_SCHEMAS_FOR_WRITES)
         disallowWritingIntervals(data.schema.map(_.dataType), 
forbidAnsiIntervals = false)
-        DataSource.validateSchema(data.schema)
+        if (!shouldAllowEmptySchema) {

Review comment:
       Should we only honor this config for formats that support it? I'm 
leaning towards keeping it simple and not checking that here because it is more 
complicated to maintain as format support changes, but wanted to call it out 
for discussion.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to