Github user MaxGekk commented on the issue:
https://github.com/apache/spark/pull/20894
> Does this also fix actual use cases too?
Yes, it fixes the real problem.
- There are many small csv files in one folder. All files have the same
schema and should have the same headers.
- Unfortunately columns in some csv files are mixed - names of columns are
the same but ordering is different (the csv files were produced by an external
system. And the customer cannot impact on writing phase).
- The schema of input dataset is static and known. That's why it is
specified in advance. The situation when a few files have different order is
rare. And it could be processed separately. What is expected from Spark is it
must not produce incorrect result.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]