GitHub user HyukjinKwon opened a pull request:
https://github.com/apache/spark/pull/15513
[WIP][SPARK-17963][SQL][Documentation] Add examples (extend) in each
function and improve documentation with arguments
## What changes were proposed in this pull request?
This PR proposes to change the documentation for functions.
The changes include
- Re-indent the documentation
- Add arguments
- Add examples in `extended` where the arguments are multiple or specific
format (e.g. xml/ json).
For examples, the documentation was updated as below:
**Before**
- `approx_count_distinct`
```sql
Usage: approx_count_distinct(expr) - Returns the estimated cardinality by
HyperLogLog++.
approx_count_distinct(expr, relativeSD=0.05) - Returns the estimated
cardinality by HyperLogLog++
with relativeSD, the maximum estimation error allowed.
Extended Usage:
No example for approx_count_distinct.
```
- `percentile_approx`
```sql
Usage:
percentile_approx(col, percentage [, accuracy]) - Returns the
approximate percentile value of numeric
column `col` at the given percentage. The value of percentage must be
between 0.0
and 1.0. The `accuracy` parameter (default: 10000) is a positive
integer literal which
controls approximation accuracy at the cost of memory. Higher value
of `accuracy` yields
better accuracy, `1.0/accuracy` is the relative error of the
approximation.
percentile_approx(col, array(percentage1 [, percentage2]...) [,
accuracy]) - Returns the approximate
percentile array of column `col` at the given percentage array. Each
value of the
percentage array must be between 0.0 and 1.0. The `accuracy`
parameter (default: 10000) is
a positive integer literal which controls approximation accuracy at
the cost of memory.
Higher value of `accuracy` yields better accuracy, `1.0/accuracy` is
the relative error of
the approximation.
Extended Usage:
No example for percentile_approx.
```
**After**
- `approx_count_distinct`
```sql
Usage:
approx_count_distinct(expr) - Returns the estimated cardinality by
HyperLogLog++.
Arguments:
expr - any type expression that represents data to collect the
first.
approx_count_distinct(expr, relativeSD) - Returns the estimated
cardinality by HyperLogLog++
with relativeSD, the maximum estimation error allowed.
Arguments:
expr - any type expression that represents data to collect the
first.
relativeSD - any numeric type or any nonnumeric type literal that
can be implicitly
converted to double type, that represents maximum estimation
error allowed
(default = 0.05).
Extended Usage: No example for approx_count_distinct.
```
- `percentile_approx`
```sql
Usage:
percentile_approx(col, percentage [, accuracy]) - Returns the
approximate percentile value of numeric
column `col` at the given percentage. The value of percentage must
be between 0.0
and 1.0. The `accuracy` parameter (default: 10000) is a positive
integer literal which
controls approximation accuracy at the cost of memory. Higher value
of `accuracy` yields
better accuracy, `1.0/accuracy` is the relative error of the
approximation.
Arguments:
col - any numeric type or any nonnumeric type expression that can
be implicitly
converted to double type.
percentage - any numeric type or any nonnumeric type literal that
can be
implicitly converted to double type.
accuracy - any numeric type or any nonnumeric type literal that
can be implicitly
converted to int type.
percentile_approx(col, array(percentage1 [, percentage2]...) [,
accuracy]) - Returns the approximate
percentile array of column `col` at the given percentage array.
Each value of the
percentage array must be between 0.0 and 1.0. The `accuracy`
parameter (default: 10000) is
a positive integer literal which controls approximation accuracy at
the cost of memory.
Higher value of `accuracy` yields better accuracy, `1.0/accuracy`
is the relative error of
the approximation.
Arguments:
col - any numeric type or any nonnumeric type expression that can
be implicitly
converted to double type.
array(...) - an array that contains any numeric type literal that
can be implicitly
converted to double type.
accuracy - any numeric type or any nonnumeric type literal that
can be implicitly
converted to int type.
Extended Usage:
> SELECT percentile_approx(10.0, 0.5, 100);
10.0
> SELECT percentile_approx(10.0, array(0.5, 0.4, 0.1), 100);
[10.0,10.0,10.0]
```
## How was this patch tested?
N/A
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/HyukjinKwon/spark SPARK-17963
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/15513.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #15513
----
commit 2059374537496c9f81512b643e3ec084e43e2594
Author: hyukjinkwon <[email protected]>
Date: 2016-10-17T12:07:52Z
Add examples (extend) in each function and improve documentation with
arguments
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]