[GitHub] spark pull request #20666: [SPARK-23448][SQL] Clarify JSON and CSV parser be...

viirya Sat, 24 Feb 2018 02:53:23 -0800

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20666#discussion_r170418454
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ---
    @@ -550,12 +552,14 @@ class DataFrameReader private[sql](sparkSession: 
SparkSession) extends Logging {
        * <li>`mode` (default `PERMISSIVE`): allows a mode for dealing with 
corrupt records
        *    during parsing. It supports the following case-insensitive modes.
        *   <ul>
    -   *     <li>`PERMISSIVE` : sets other fields to `null` when it meets a 
corrupted record, and puts
    -   *     the malformed string into a field configured by 
`columnNameOfCorruptRecord`. To keep
    +   *     <li>`PERMISSIVE` : when it meets a corrupted record, puts the 
malformed string into a
    +   *     field configured by `columnNameOfCorruptRecord`, and sets other 
fields to `null`. To keep
        *     corrupt records, an user can set a string type field named 
`columnNameOfCorruptRecord`
        *     in an user-defined schema. If a schema does not have the field, 
it drops corrupt records
    -   *     during parsing. When a length of parsed CSV tokens is shorter 
than an expected length
    -   *     of a schema, it sets `null` for extra fields.</li>
    +   *     during parsing. It supports partial result for the records just 
with less or more tokens
    --- End diff --
    
    Yes. Will update accordingly.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20666: [SPARK-23448][SQL] Clarify JSON and CSV parser be...

Reply via email to