[GitHub] spark pull request #19321: [SPARK-22100] [SQL] Make percentile_approx suppor...

gatorsmile Sun, 24 Sep 2017 20:41:59 -0700

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19321#discussion_r140684015
  
    --- Diff: docs/sql-programming-guide.md ---
    @@ -1553,6 +1553,7 @@ options.
     ## Upgrading From Spark SQL 2.2 to 2.3
     
       - Since Spark 2.3, the queries from raw JSON/CSV files are disallowed 
when the referenced columns only include the internal corrupt record column 
(named `_corrupt_record` by default). For example, 
`spark.read.schema(schema).json(file).filter($"_corrupt_record".isNotNull).count()`
 and `spark.read.schema(schema).json(file).select("_corrupt_record").show()`. 
Instead, you can cache or save the parsed results and then send the same query. 
For example, `val df = spark.read.schema(schema).json(file).cache()` and then 
`df.filter($"_corrupt_record".isNotNull).count()`.
    +  - The percentile_approx function previously accepted only double type 
input and output double type results. Now it supports date type, timestamp type 
and all numeric types as input types. The result type is also changed to be the 
same as the input type, which is more reasonable for percentiles.
    --- End diff --
    
    This is not right? Before this PR, we already support numeric types. We 
automatically cast it to Double, right?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #19321: [SPARK-22100] [SQL] Make percentile_approx suppor...

Reply via email to