Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/20894
@HyukjinKwon
> I think we are fine to just document this like saying them to better use
select or renaming it after the load
The problem occurs during loading. Could you, please, explain how select or
renaming of the columns could solve the issue which I described above. Spark
just loads data silently. And some partitions have wrong data.
Please, do this experiment:
_Create two files 1.csv and 2.csv in the same folder_
```
$ cat 1.csv
temperature, depth
10.0, 5.0
```
```
$ cat 2.csv
depth, temperature
1234, 4.1
```
_Read the files by Spark:_
```
val data = spark.read.option("header", "true").csv("folder/*.csv")
data.select("temperature").show
```
I as an user would expect either:
```
+-----------+
|temperature|
+-----------+
| 10.0|
| 4.1|
+-----------+
```
or an error but not this output:
```
+-----------+
|temperature|
+-----------+
| 10.0|
| 1234|
+-----------+
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]