[GitHub] spark pull request #20579: [SPARK-23372][SQL] Writing empty struct in parque...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20579 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20579: [SPARK-23372][SQL] Writing empty struct in parque...
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/20579#discussion_r176232893 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -719,4 +720,27 @@ object DataSource extends Logging { } globPath } + + /** + * Called before writing into a FileFormat based data source to make sure the + * supplied schema is not empty. + * @param schema + */ + private def hasEmptySchema(schema: StructType): Unit = { +def hasEmptySchemaInternal(schema: StructType): Boolean = { --- End diff -- @cloud-fan I have gone ahead and changed the top level function name to validateSchema. I have kept the internal function name to be hasEmptySchema. Hopefully it makes sense now. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20579: [SPARK-23372][SQL] Writing empty struct in parque...
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/20579#discussion_r176187422 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -719,4 +720,27 @@ object DataSource extends Logging { } globPath } + + /** + * Called before writing into a FileFormat based data source to make sure the + * supplied schema is not empty. + * @param schema + */ + private def hasEmptySchema(schema: StructType): Unit = { +def hasEmptySchemaInternal(schema: StructType): Boolean = { --- End diff -- You are right @cloud-fan. Given we are raising the error from the function itself, should i rename it to validateSchema ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20579: [SPARK-23372][SQL] Writing empty struct in parque...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20579#discussion_r176163585 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -719,4 +720,27 @@ object DataSource extends Logging { } globPath } + + /** + * Called before writing into a FileFormat based data source to make sure the + * supplied schema is not empty. + * @param schema + */ + private def hasEmptySchema(schema: StructType): Unit = { +def hasEmptySchemaInternal(schema: StructType): Boolean = { --- End diff -- they should be `verifySchema` and `hasEmptySchema`, depending on their return type. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20579: [SPARK-23372][SQL] Writing empty struct in parque...
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/20579#discussion_r175952404 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala --- @@ -77,7 +77,6 @@ class ParquetFileFormat job: Job, options: Map[String, String], dataSchema: StructType): OutputWriterFactory = { - --- End diff -- @cloud-fan will remove. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20579: [SPARK-23372][SQL] Writing empty struct in parque...
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/20579#discussion_r175952408 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -719,4 +720,27 @@ object DataSource extends Logging { } globPath } + + /** + * Called before writing into a FileFormat based data source to make sure the + * supplied schema is not empty. + * @param schema + */ + private def verifySchema(schema: StructType): Unit = { +def verifyInternal(schema: StructType): Boolean = { --- End diff -- @cloud-fan will make the change. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20579: [SPARK-23372][SQL] Writing empty struct in parque...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20579#discussion_r175950101 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala --- @@ -77,7 +77,6 @@ class ParquetFileFormat job: Job, options: Map[String, String], dataSchema: StructType): OutputWriterFactory = { - --- End diff -- unnecessary change --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20579: [SPARK-23372][SQL] Writing empty struct in parque...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20579#discussion_r175950061 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -719,4 +720,27 @@ object DataSource extends Logging { } globPath } + + /** + * Called before writing into a FileFormat based data source to make sure the + * supplied schema is not empty. + * @param schema + */ + private def verifySchema(schema: StructType): Unit = { +def verifyInternal(schema: StructType): Boolean = { --- End diff -- better to call it `hasEmptySchema` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20579: [SPARK-23372][SQL] Writing empty struct in parque...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20579#discussion_r175858198 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -546,6 +546,10 @@ case class DataSource( case dataSource: CreatableRelationProvider => SaveIntoDataSourceCommand(data, dataSource, caseInsensitiveOptions, mode) case format: FileFormat => +if (DataSource.isBuiltInFileBasedDataSource(format) && data.schema.size == 0) { --- End diff -- actually we just need [this check](https://github.com/apache/spark/pull/20579/files#diff-ee26d4c4be21e92e92a02e9f16dbc285R71) here. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20579: [SPARK-23372][SQL] Writing empty struct in parque...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20579#discussion_r175854733 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -546,6 +546,10 @@ case class DataSource( case dataSource: CreatableRelationProvider => SaveIntoDataSourceCommand(data, dataSource, caseInsensitiveOptions, mode) case format: FileFormat => +if (DataSource.isBuiltInFileBasedDataSource(format) && data.schema.size == 0) { --- End diff -- We don't need this check. `FileFormat` is internal, we don't need to distinguish between built-in file format or external ones. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20579: [SPARK-23372][SQL] Writing empty struct in parque...
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/20579#discussion_r175154988 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -542,6 +542,11 @@ case class DataSource( throw new AnalysisException("Cannot save interval data type into external storage.") } +if (data.schema.size == 0) { --- End diff -- @gatorsmile @cloud-fan OK.. sounds reasonable to me. I will rollback the latest change in this PR and we can discuss if we want to introduce the behaviour change in a future jira/pr. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20579: [SPARK-23372][SQL] Writing empty struct in parque...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/20579#discussion_r175118143 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -542,6 +542,11 @@ case class DataSource( throw new AnalysisException("Cannot save interval data type into external storage.") } +if (data.schema.size == 0) { --- End diff -- Currently, we are not blocking this. I do not think we should introduce this behavior change. This is risky to block all the cases. Previously, I tried to block CREATE TABLE with an empty schema. Later, I hit a regression because some data sources are using options/table properties to specify the schema... A general guide here is to avoid behavior changes if possible. When we have to introduce a behavior change, we should make it configurable. At least, users can convert it back by using a flag. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20579: [SPARK-23372][SQL] Writing empty struct in parque...
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/20579#discussion_r175019613 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -542,6 +542,11 @@ case class DataSource( throw new AnalysisException("Cannot save interval data type into external storage.") } +if (data.schema.size == 0) { --- End diff -- @gatorsmile May i request you to please quickly go through Wenchen's and Ryan's comments above ? My understanding is that , we want to consistently rejecting writing empty schema for all the data sources ? Please let me know. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20579: [SPARK-23372][SQL] Writing empty struct in parque...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/20579#discussion_r175013773 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -542,6 +542,11 @@ case class DataSource( throw new AnalysisException("Cannot save interval data type into external storage.") } +if (data.schema.size == 0) { --- End diff -- Is it required? This is a behavior change. Can we exclude it from this PR? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20579: [SPARK-23372][SQL] Writing empty struct in parque...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20579#discussion_r173625828 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala --- @@ -72,6 +72,29 @@ class FileBasedDataSourceSuite extends QueryTest with SharedSQLContext { } } + // Text and Parquet format does not allow wrting data frame with empty schema. + Seq("parquet", "text").foreach { format => +test(s"SPARK-23372 writing empty dataframe should produce AnalysisException - $format") { + withTempPath { outputPath => +intercept[AnalysisException] { + spark.emptyDataFrame.write.format(format).save(outputPath.toString) +} + } +} + } + + // Formats excluding text and parquet allow writing empty data frames to files. + allFileBasedDataSources.filterNot(p => p == "text" || p == "parquet").foreach { format => +test(s"SPARK-23372 writing empty dataframe and reading from it - $format") { + withTempPath { outputPath => + spark.emptyDataFrame.write.format(format).save(outputPath.toString) + intercept[AnalysisException] { +val df = spark.read.format(format).load(outputPath.toString) --- End diff -- Sorry if I misunderstood. The link is https://github.com/apache/spark/pull/20579#issuecomment-364994881. Is that the right link? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20579: [SPARK-23372][SQL] Writing empty struct in parque...
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/20579#discussion_r167664375 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala --- @@ -72,6 +72,29 @@ class FileBasedDataSourceSuite extends QueryTest with SharedSQLContext { } } + // Text and Parquet format does not allow wrting data frame with empty schema. + Seq("parquet", "text").foreach { format => +test(s"SPARK-23372 writing empty dataframe should produce AnalysisException - $format") { + withTempPath { outputPath => +intercept[AnalysisException] { + spark.emptyDataFrame.write.format(format).save(outputPath.toString) +} --- End diff -- Sure @dongjoon-hyun --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20579: [SPARK-23372][SQL] Writing empty struct in parque...
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/20579#discussion_r167660095 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala --- @@ -68,6 +68,16 @@ class ParquetFileFormat override def toString: String = "Parquet" + private def verifySchema(schema: StructType): Unit = { +if (schema.size == 0) { + throw new AnalysisException( +s""" + |Parquet data source does not support writing empty groups. --- End diff -- @hvanhovell Thank you. I will change it to use "schema". I will check nested schema as well. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20579: [SPARK-23372][SQL] Writing empty struct in parque...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/20579#discussion_r167661193 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala --- @@ -72,6 +72,29 @@ class FileBasedDataSourceSuite extends QueryTest with SharedSQLContext { } } + // Text and Parquet format does not allow wrting data frame with empty schema. + Seq("parquet", "text").foreach { format => +test(s"SPARK-23372 writing empty dataframe should produce AnalysisException - $format") { + withTempPath { outputPath => +intercept[AnalysisException] { + spark.emptyDataFrame.write.format(format).save(outputPath.toString) +} --- End diff -- Can we check the error message to make it sure? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20579: [SPARK-23372][SQL] Writing empty struct in parque...
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/20579#discussion_r167659522 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala --- @@ -72,6 +72,29 @@ class FileBasedDataSourceSuite extends QueryTest with SharedSQLContext { } } + // Text and Parquet format does not allow wrting data frame with empty schema. + Seq("parquet", "text").foreach { format => +test(s"SPARK-23372 writing empty dataframe should produce AnalysisException - $format") { + withTempPath { outputPath => +intercept[AnalysisException] { + spark.emptyDataFrame.write.format(format).save(outputPath.toString) +} + } +} + } + + // Formats excluding text and parquet allow writing empty data frames to files. + allFileBasedDataSources.filterNot(p => p == "text" || p == "parquet").foreach { format => +test(s"SPARK-23372 writing empty dataframe and reading from it - $format") { + withTempPath { outputPath => + spark.emptyDataFrame.write.format(format).save(outputPath.toString) + intercept[AnalysisException] { +val df = spark.read.format(format).load(outputPath.toString) --- End diff -- @hvanhovell Actually thats my question as well. Please see my comment [comment/question](https://github.com/apache/spark/pull/20579#issuecomment-364994881) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20579: [SPARK-23372][SQL] Writing empty struct in parque...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/20579#discussion_r167625448 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala --- @@ -72,6 +72,29 @@ class FileBasedDataSourceSuite extends QueryTest with SharedSQLContext { } } + // Text and Parquet format does not allow wrting data frame with empty schema. + Seq("parquet", "text").foreach { format => +test(s"SPARK-23372 writing empty dataframe should produce AnalysisException - $format") { + withTempPath { outputPath => +intercept[AnalysisException] { + spark.emptyDataFrame.write.format(format).save(outputPath.toString) +} + } +} + } + + // Formats excluding text and parquet allow writing empty data frames to files. + allFileBasedDataSources.filterNot(p => p == "text" || p == "parquet").foreach { format => +test(s"SPARK-23372 writing empty dataframe and reading from it - $format") { + withTempPath { outputPath => + spark.emptyDataFrame.write.format(format).save(outputPath.toString) + intercept[AnalysisException] { +val df = spark.read.format(format).load(outputPath.toString) --- End diff -- This should pass right? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20579: [SPARK-23372][SQL] Writing empty struct in parque...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/20579#discussion_r167623896 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala --- @@ -68,6 +68,16 @@ class ParquetFileFormat override def toString: String = "Parquet" + private def verifySchema(schema: StructType): Unit = { +if (schema.size == 0) { + throw new AnalysisException( +s""" + |Parquet data source does not support writing empty groups. --- End diff -- `group` is a parquet term. Let's use `schema` instead? We should also check for nested empty schema's. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20579: [SPARK-23372][SQL] Writing empty struct in parque...
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/20579#discussion_r167465371 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala --- @@ -72,6 +72,29 @@ class FileBasedDataSourceSuite extends QueryTest with SharedSQLContext { } } + // Text and Parquet format does not allow wrting data frame with empty schema. + Seq("parquet", "text").foreach { format => +test(s"SPARK-23372 writing empty dataframe should produce AnalysisException - $format") { + withTempPath { outputPath => +intercept[AnalysisException] { + spark.emptyDataFrame.write.format(format).save(outputPath.toString) +} + } +} + } + + // Formats excluding text and parquet allow writing empty data frames to files. --- End diff -- I have added this test to show the current behaviour of reading the empty data frame. For formats like orc, json, csv , we succeed to write an empty data frame but get the error while reading while inferring the schema ? Is this the right behaviour ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20579: [SPARK-23372][SQL] Writing empty struct in parque...
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/20579#discussion_r167463979 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala --- @@ -68,6 +68,16 @@ class ParquetFileFormat override def toString: String = "Parquet" + private def verifySchema(schema: StructType): Unit = { +if (schema.size < 1) { --- End diff -- no particular reason :-). I can change that to "== 0". --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20579: [SPARK-23372][SQL] Writing empty struct in parque...
Github user mannharleen commented on a diff in the pull request: https://github.com/apache/spark/pull/20579#discussion_r167463659 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala --- @@ -68,6 +68,16 @@ class ParquetFileFormat override def toString: String = "Parquet" + private def verifySchema(schema: StructType): Unit = { +if (schema.size < 1) { --- End diff -- any particular reason for using "< 1" and not "== 0". just curious --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20579: [SPARK-23372][SQL] Writing empty struct in parque...
GitHub user dilipbiswal opened a pull request: https://github.com/apache/spark/pull/20579 [SPARK-23372][SQL] Writing empty struct in parquet fails during execution. It should fail earlier in the processing. ## What changes were proposed in this pull request? Running spark.emptyDataFrame.write.format("parquet").mode("overwrite").save(path) Results in ``` SQL org.apache.parquet.schema.InvalidSchemaException: Cannot write a schema with an empty group: message spark_schema { } at org.apache.parquet.schema.TypeUtil$1.visit(TypeUtil.java:27) at org.apache.parquet.schema.TypeUtil$1.visit(TypeUtil.java:37) at org.apache.parquet.schema.MessageType.accept(MessageType.java:58) at org.apache.parquet.schema.TypeUtil.checkValidWriteSchema(TypeUtil.java:23) at org.apache.parquet.hadoop.ParquetFileWriter.(ParquetFileWriter.java:225) at org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:342) at org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:302) at org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.(ParquetOutputWriter.scala:37) at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anon$1.newInstance(ParquetFileFormat.scala:151) at org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.newOutputWriter(FileFormatWriter.scala:376) at org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.execute(FileFormatWriter.scala:387) at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:278) at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:276) at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1411) at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:281) at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:206) at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:205) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:109) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread. ``` This PR addresses a couple of things. 1) The above case now fails earlier during processing during the prep write phase. 2) Writing an empty data frame in ORC succeeds but fails during read while inferring the schema. This issue is also addressed in this PR. ## How was this patch tested? Unit tests added in FileBasedDatasourceSuite. You can merge this pull request into a Git repository by running: $ git pull https://github.com/dilipbiswal/spark spark-23372 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20579.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20579 commit 9f7a1705960250cf6a828787f0f12a9f28b608c5 Author: Dilip BiswalDate: 2018-02-11T17:09:07Z [SPARK-23372] Writing empty struct in parquet fails during execution. It should fail earlier in the processing --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org