Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/20894 @HyukjinKwon > I think we are fine to just document this like saying them to better use select or renaming it after the load The problem occurs during loading. Could you, please, explain how select or renaming of the columns could solve the issue which I described above. Spark just loads data silently. And some partitions have wrong data. Please, do this experiment: _Create two files 1.csv and 2.csv in the same folder_ ``` $ cat 1.csv temperature, depth 10.0, 5.0 ``` ``` $ cat 2.csv depth, temperature 1234, 4.1 ``` _Read the files by Spark:_ ``` val data = spark.read.option("header", "true").csv("folder/*.csv") data.select("temperature").show ``` I as an user would expect either: ``` +-----------+ |temperature| +-----------+ | 10.0| | 4.1| +-----------+ ``` or an error but not this output: ``` +-----------+ |temperature| +-----------+ | 10.0| | 1234| +-----------+ ```
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org