[GitHub] [spark] dongjoon-hyun opened a new pull request #29160: [SPARK-32364][SQL] `path` argument of DataFrame.load/save should override the existing options

GitBox Sun, 19 Jul 2020 23:05:03 -0700


dongjoon-hyun opened a new pull request #29160:
URL: https://github.com/apache/spark/pull/29160



   ### What changes were proposed in this pull request?
   
   This PR aims to make `load/save` and its variant 
`parquet/orc/csv/json/text/avro` to respect its `path` argument always.
   
   ### Why are the changes needed?
   
   Like the following, DataFrame's `option/options` have been non-deterministic 
in terms of case-insensitivity because it stores the options into 
`extraOptions` which is `HashMap`. 
   
   ```
   spark.read
     .option("paTh", "1")
     .option("PATH", "2")
     .option("Path", "3")
     .option("patH", "4")
     .parquet("5")
   ...
   org.apache.spark.sql.AnalysisException: Path does not exist: 
file:/Users/dongjoon/APACHE/spark-release/spark-3.0.0-bin-hadoop3.2/1;
   ```
   
   Although we have been preserve the behavior, we had better make the direct 
argument `path` override the existing options.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes. However, this is a bug fix for the really misleading case.
   
   ### How was this patch tested?
   
   Pass the Jenkins or GitHub Action with newly added test cases.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] dongjoon-hyun opened a new pull request #29160: [SPARK-32364][SQL] `path` argument of DataFrame.load/save should override the existing options

Reply via email to