[GitHub] [spark] HyukjinKwon commented on a change in pull request #29835: [SPARK-32306][SQL][DOCS] Clarify the result of `percentile_approx()`

GitBox Tue, 22 Sep 2020 21:08:52 -0700


HyukjinKwon commented on a change in pull request #29835:
URL: https://github.com/apache/spark/pull/29835#discussion_r492613049




##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala
##########
@@ -49,11 +49,13 @@ import org.apache.spark.sql.types._
  */
 @ExpressionDescription(
   usage = """
-    _FUNC_(col, percentage [, accuracy]) - Returns the approximate percentile 
value of numeric
-      column `col` at the given percentage. The value of percentage must be 
between 0.0
-      and 1.0. The `accuracy` parameter (default: 10000) is a positive numeric 
literal which
-      controls approximation accuracy at the cost of memory. Higher value of 
`accuracy` yields
-      better accuracy, `1.0/accuracy` is the relative error of the 
approximation.
+    _FUNC_(col, percentage [, accuracy]) - Returns the approximate 
`percentile` of the numeric
+      column `col` which is the smallest value in the ordered `col` values 
(sorted from least to
+      greatest) such that no more than `percentage` of `col` values is less 
than the value
+      or equal to that value. The value of percentage must be between 0.0 and 
1.0. The `accuracy`
+      parameter (default: 10000) is a positive numeric literal which controls 
approximation accuracy
+      at the cost of memory. Higher value of `accuracy` yields better 
accuracy, `1.0/accuracy` is
+      the relative error of the approximation.

Review comment:
       Shall we update Scala, Python and R functions too?

##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala
##########
@@ -49,11 +49,13 @@ import org.apache.spark.sql.types._
  */
 @ExpressionDescription(
   usage = """
-    _FUNC_(col, percentage [, accuracy]) - Returns the approximate percentile 
value of numeric
-      column `col` at the given percentage. The value of percentage must be 
between 0.0
-      and 1.0. The `accuracy` parameter (default: 10000) is a positive numeric 
literal which
-      controls approximation accuracy at the cost of memory. Higher value of 
`accuracy` yields
-      better accuracy, `1.0/accuracy` is the relative error of the 
approximation.
+    _FUNC_(col, percentage [, accuracy]) - Returns the approximate 
`percentile` of the numeric
+      column `col` which is the smallest value in the ordered `col` values 
(sorted from least to
+      greatest) such that no more than `percentage` of `col` values is less 
than the value
+      or equal to that value. The value of percentage must be between 0.0 and 
1.0. The `accuracy`
+      parameter (default: 10000) is a positive numeric literal which controls 
approximation accuracy
+      at the cost of memory. Higher value of `accuracy` yields better 
accuracy, `1.0/accuracy` is
+      the relative error of the approximation.

Review comment:
       Yeah looks good




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] HyukjinKwon commented on a change in pull request #29835: [SPARK-32306][SQL][DOCS] Clarify the result of `percentile_approx()`

Reply via email to