Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19321#discussion_r140684015 --- Diff: docs/sql-programming-guide.md --- @@ -1553,6 +1553,7 @@ options. ## Upgrading From Spark SQL 2.2 to 2.3 - Since Spark 2.3, the queries from raw JSON/CSV files are disallowed when the referenced columns only include the internal corrupt record column (named `_corrupt_record` by default). For example, `spark.read.schema(schema).json(file).filter($"_corrupt_record".isNotNull).count()` and `spark.read.schema(schema).json(file).select("_corrupt_record").show()`. Instead, you can cache or save the parsed results and then send the same query. For example, `val df = spark.read.schema(schema).json(file).cache()` and then `df.filter($"_corrupt_record".isNotNull).count()`. + - The percentile_approx function previously accepted only double type input and output double type results. Now it supports date type, timestamp type and all numeric types as input types. The result type is also changed to be the same as the input type, which is more reasonable for percentiles. --- End diff -- This is not right? Before this PR, we already support numeric types. We automatically cast it to Double, right?
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org