HyukjinKwon commented on a change in pull request #29835:
URL: https://github.com/apache/spark/pull/29835#discussion_r492613049
##########
File path:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala
##########
@@ -49,11 +49,13 @@ import org.apache.spark.sql.types._
*/
@ExpressionDescription(
usage = """
- _FUNC_(col, percentage [, accuracy]) - Returns the approximate percentile
value of numeric
- column `col` at the given percentage. The value of percentage must be
between 0.0
- and 1.0. The `accuracy` parameter (default: 10000) is a positive numeric
literal which
- controls approximation accuracy at the cost of memory. Higher value of
`accuracy` yields
- better accuracy, `1.0/accuracy` is the relative error of the
approximation.
+ _FUNC_(col, percentage [, accuracy]) - Returns the approximate
`percentile` of the numeric
+ column `col` which is the smallest value in the ordered `col` values
(sorted from least to
+ greatest) such that no more than `percentage` of `col` values is less
than the value
+ or equal to that value. The value of percentage must be between 0.0 and
1.0. The `accuracy`
+ parameter (default: 10000) is a positive numeric literal which controls
approximation accuracy
+ at the cost of memory. Higher value of `accuracy` yields better
accuracy, `1.0/accuracy` is
+ the relative error of the approximation.
Review comment:
Shall we update Scala, Python and R functions too?
##########
File path:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala
##########
@@ -49,11 +49,13 @@ import org.apache.spark.sql.types._
*/
@ExpressionDescription(
usage = """
- _FUNC_(col, percentage [, accuracy]) - Returns the approximate percentile
value of numeric
- column `col` at the given percentage. The value of percentage must be
between 0.0
- and 1.0. The `accuracy` parameter (default: 10000) is a positive numeric
literal which
- controls approximation accuracy at the cost of memory. Higher value of
`accuracy` yields
- better accuracy, `1.0/accuracy` is the relative error of the
approximation.
+ _FUNC_(col, percentage [, accuracy]) - Returns the approximate
`percentile` of the numeric
+ column `col` which is the smallest value in the ordered `col` values
(sorted from least to
+ greatest) such that no more than `percentage` of `col` values is less
than the value
+ or equal to that value. The value of percentage must be between 0.0 and
1.0. The `accuracy`
+ parameter (default: 10000) is a positive numeric literal which controls
approximation accuracy
+ at the cost of memory. Higher value of `accuracy` yields better
accuracy, `1.0/accuracy` is
+ the relative error of the approximation.
Review comment:
Yeah looks good
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]