Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/20894#discussion_r188542015
--- Diff: python/pyspark/sql/readwriter.py ---
@@ -373,6 +373,12 @@ def csv(self, path, schema=None, sep=None,
encoding=None, quote=None, escape=Non
default value, ``false``.
:param inferSchema: infers the input schema automatically from
data. It requires one extra
pass over the data. If None is set, it uses the
default value, ``false``.
+ :param enforceSchema: If it is set to ``true``, the specified or
inferred schema will be
+ forcibly applied to datasource files and
headers in CSV files will be
+ ignored. If the option is set to ``false``,
the schema will be
+ validated against headers in CSV files if
the ``header`` option is set
+ to ``true``. The validation is performed in
column ordering aware and
+ case sensitive manner. If None is set,
``true`` is used by default.
--- End diff --
https://github.com/apache/spark/pull/20894#discussion_r176949718 Do we
ignore case sensitivity?
Can you check `CSVDataSourece.makeSafeHeader`?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]