[GitHub] spark pull request #22237: [SPARK-25243][SQL] Use FailureSafeParser in from_...

HyukjinKwon Fri, 31 Aug 2018 02:53:26 -0700

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22237#discussion_r214302343
  
    --- Diff: docs/sql-programming-guide.md ---
    @@ -1897,6 +1897,7 @@ working with timestamps in `pandas_udf`s to get the 
best performance, see
       - In version 2.3 and earlier, CSV rows are considered as malformed if at 
least one column value in the row is malformed. CSV parser dropped such rows in 
the DROPMALFORMED mode or outputs an error in the FAILFAST mode. Since Spark 
2.4, CSV row is considered as malformed only when it contains malformed column 
values requested from CSV datasource, other values can be ignored. As an 
example, CSV file contains the "id,name" header and one row "1234". In Spark 
2.4, selection of the id column consists of a row with one column value 1234 
but in Spark 2.3 and earlier it is empty in the DROPMALFORMED mode. To restore 
the previous behavior, set `spark.sql.csv.parser.columnPruning.enabled` to 
`false`.
       - Since Spark 2.4, File listing for compute statistics is done in 
parallel by default. This can be disabled by setting 
`spark.sql.parallelFileListingInStatsComputation.enabled` to `False`.
       - Since Spark 2.4, Metadata files (e.g. Parquet summary files) and 
temporary files are not counted as data files when calculating table size 
during Statistics computation.
    +  - Since Spark 2.4, the from_json functions supports two modes - 
PERMISSIVE and FAILFAST. The modes can be set via the `mode` option. The 
default mode became PERMISSIVE. In previous versions, behavior of from_json did 
not conform to either PERMISSIVE nor FAILFAST, especially in processing of 
malformed JSON records.
    --- End diff --
    
    nit: from_json -> `` `from_json` ``.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22237: [SPARK-25243][SQL] Use FailureSafeParser in from_...

Reply via email to