[GitHub] spark issue #20894: [SPARK-23786][SQL] Checking column names of csv headers

MaxGekk Sun, 25 Mar 2018 01:24:07 -0700

Github user MaxGekk commented on the issue:

    https://github.com/apache/spark/pull/20894
  
    > Does this also fix actual use cases too?
    
    Yes, it fixes the real problem.
    
    - There are many small csv files in one folder. All files have the same 
schema and should have the same headers.
    
    - Unfortunately columns in some csv files are mixed - names of columns are 
the same but ordering is different (the csv files were produced by an external 
system. And the customer cannot impact on writing phase).
    
    - The schema of input dataset is static and known. That's why it is 
specified in advance. The situation when a few files have different order is 
rare. And it could be processed separately. What is expected from Spark is it 
must not produce incorrect result.




---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #20894: [SPARK-23786][SQL] Checking column names of csv headers

Reply via email to