[GitHub] spark pull request #20894: [SPARK-23786][SQL] Checking column names of csv h...

HyukjinKwon Wed, 16 May 2018 01:41:19 -0700

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20894#discussion_r188542015
  
    --- Diff: python/pyspark/sql/readwriter.py ---
    @@ -373,6 +373,12 @@ def csv(self, path, schema=None, sep=None, 
encoding=None, quote=None, escape=Non
                            default value, ``false``.
             :param inferSchema: infers the input schema automatically from 
data. It requires one extra
                            pass over the data. If None is set, it uses the 
default value, ``false``.
    +        :param enforceSchema: If it is set to ``true``, the specified or 
inferred schema will be
    +                              forcibly applied to datasource files and 
headers in CSV files will be
    +                              ignored. If the option is set to ``false``, 
the schema will be
    +                              validated against headers in CSV files if 
the ``header`` option is set
    +                              to ``true``. The validation is performed in 
column ordering aware and
    +                              case sensitive manner. If None is set, 
``true`` is used by default.
    --- End diff --
    
    https://github.com/apache/spark/pull/20894#discussion_r176949718 Do we 
ignore case sensitivity? 
    Can you check `CSVDataSourece.makeSafeHeader`?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #20894: [SPARK-23786][SQL] Checking column names of csv h...

Reply via email to