[GitHub] spark pull request #18865: [SPARK-21610][SQL] Corrupt records are not handle...

viirya Fri, 08 Sep 2017 01:56:08 -0700

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18865#discussion_r137741357
  
    --- Diff: docs/sql-programming-guide.md ---
    @@ -1542,6 +1542,10 @@ options.
     
     # Migration Guide
     
    +## Upgrading From Spark SQL 2.2 to 2.3
    +
    +  - The queries which select only `spark.sql.columnNameOfCorruptRecord` 
column are disallowed now. Notice that the queries which have only the column 
after column pruning (e.g. filtering on the column followed by a counting 
operation) are also disallowed. If you want to select only the corrupt records, 
you should cache or save the Dataset and DataFrame before running such queries.
    +
    --- End diff --
    
    nit: cache or save the underlying Dataset and DataFrame ...



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #18865: [SPARK-21610][SQL] Corrupt records are not handle...

Reply via email to