Github user viirya commented on a diff in the pull request:
https://github.com/apache/spark/pull/20666#discussion_r170845040
--- Diff: python/pyspark/sql/readwriter.py ---
@@ -393,13 +395,16 @@ def csv(self, path, schema=None, sep=None,
encoding=None, quote=None, escape=Non
:param mode: allows a mode for dealing with corrupt records during
parsing. If None is
set, it uses the default value, ``PERMISSIVE``.
- * ``PERMISSIVE`` : sets other fields to ``null`` when it
meets a corrupted \
- record, and puts the malformed string into a field
configured by \
- ``columnNameOfCorruptRecord``. To keep corrupt records,
an user can set \
- a string type field named ``columnNameOfCorruptRecord``
in an \
- user-defined schema. If a schema does not have the
field, it drops corrupt \
- records during parsing. When a length of parsed CSV
tokens is shorter than \
- an expected length of a schema, it sets `null` for extra
fields.
+ * ``PERMISSIVE`` : when it meets a corrupted record, puts
the malformed string \
+ into a field configured by
``columnNameOfCorruptRecord``, and sets other \
+ fields to ``null``. To keep corrupt records, an user can
set a string type \
--- End diff --
Ok. Added.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]