Github user viirya commented on a diff in the pull request:
https://github.com/apache/spark/pull/19199#discussion_r138271989
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVFileFormat.scala
---
@@ -109,6 +109,20 @@ class CSVFileFormat extends TextBasedFileFormat with
DataSourceRegister {
}
}
+ if (requiredSchema.length == 1 &&
+ requiredSchema.head.name == parsedOptions.columnNameOfCorruptRecord)
{
+ throw new AnalysisException(
+ "Since Spark 2.3, the queries from raw JSON/CSV files are
disallowed when the\n" +
+ "referenced columns only include the internal corrupt record
column\n" +
+ s"(named ${parsedOptions.columnNameOfCorruptRecord} by default).
For example:\n" +
--- End diff --
Here `named ${parsedOptions.columnNameOfCorruptRecord}` looks weird. As it
is interpolated to configured column name in runtime. We should replace it as:
1. "named `_corrupt_record` by default", as same as in migration guide.
2. "named by the config `spark.sql.columnNameOfCorruptRecord`".
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]