viirya commented on a change in pull request #29328:
URL: https://github.com/apache/spark/pull/29328#discussion_r469713195
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVDataSource.scala
##########
@@ -155,7 +155,7 @@ object TextInputCSVDataSource extends CSVDataSource {
sparkSession,
paths = paths,
className = classOf[TextFileFormat].getName,
- options = options.parameters
+ options = options.parameters - "path"
Review comment:
Can you add some comments here at least? I feel it will confuse code
readers later on why removing it...
##########
File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala
##########
@@ -245,15 +245,27 @@ class DataFrameReader private[sql](sparkSession:
SparkSession) extends Logging {
"read files of Hive data source directly.")
}
+ if ((extraOptions.contains("path") || extraOptions.contains("paths")) &&
paths.nonEmpty) {
+ throw new AnalysisException("There is a 'path' or 'paths' option set and
load() is called " +
+ "with path parameters. Either remove the option or put it into the
load() parameters.")
Review comment:
`Either remove the option or not put it into the load() parameters`?
##########
File path: docs/sql-migration-guide.md
##########
@@ -36,6 +36,8 @@ license: |
- In Spark 3.1, NULL elements of structures, arrays and maps are converted
to "null" in casting them to strings. In Spark 3.0 or earlier, NULL elements
are converted to empty strings. To restore the behavior before Spark 3.1, you
can set `spark.sql.legacy.castComplexTypesToString.enabled` to `true`.
+ - In Spark 3.1, when loading a dataframe, `path` option cannot coexist with
`load()`'s path parameters. For example,
`spark.read.format("csv").option("path", "/tmp").load("/tmp2")` or
`spark.read.option("path", "/tmp").csv("/tmp2")` will throw
`org.apache.spark.sql.AnalysisException`. In Spark version 3.0 and below,
`path` option is overwritten if one path parameter is passed to `load()`, or
`path` option is added to the overall paths if multiple path parameters are
passed to `load()`.
Review comment:
Should `paths` be mentioned too?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]