Github user jmchung commented on a diff in the pull request:
https://github.com/apache/spark/pull/19199#discussion_r138287636
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVFileFormat.scala
---
@@ -109,6 +109,20 @@ class CSVFileFormat extends TextBasedFileFormat with
DataSourceRegister {
}
}
+ if (requiredSchema.length == 1 &&
+ requiredSchema.head.name == parsedOptions.columnNameOfCorruptRecord)
{
+ throw new AnalysisException(
+ "Since Spark 2.3, the queries from raw JSON/CSV files are
disallowed when the\n" +
+ "referenced columns only include the internal corrupt record
column\n" +
+ s"(named ${parsedOptions.columnNameOfCorruptRecord} by default).
For example:\n" +
--- End diff --
Thanks @viirya. Should we also need to replace the weird part in
`JsonFileFormat` with `_corrupt_record`?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]