GitHub user HyukjinKwon opened a pull request:
https://github.com/apache/spark/pull/15677
[SPARK-17963][SQL][Documentation] Add examples (extend) in each expression
and improve documentation with arguments
## What changes were proposed in this pull request?
This PR proposes to change the documentation for functions. Please refer
the discussion from https://github.com/apache/spark/pull/15513
The changes include
- Re-indent the documentation
- Add examples/arguments in `extended` where the arguments are multiple or
specific format (e.g. xml/ json).
For examples, the documentation was updated as below:
### Functions with single line usage
**Before**
- `pow`
```sql
Usage: pow(x1, x2) - Raise x1 to the power of x2.
Extended Usage:
> SELECT pow(2, 3);
8.0
```
- `current_timestamp`
```sql
Usage: current_timestamp() - Returns the current timestamp at the start
of query evaluation.
Extended Usage:
No example for current_timestamp.
```
**After**
- `pow`
```sql
Usage: pow(expr1, expr2) - Raise expr1 to the power of expr2.
Extended Usage:
Arguments:
expr1 - a numeric expression.
expr2 - a numeric expression.
Examples:
> SELECT pow(2, 3);
8.0
```
- `current_timestamp`
```sql
Usage: current_timestamp() - Returns the current timestamp at the start
of query evaluation.
Extended Usage:
No example/arguemnt for current_timestamp.
```
### Functions with (already) multiple line usage
**Before**
- `approx_count_distinct`
```sql
Usage: approx_count_distinct(expr) - Returns the estimated cardinality
by HyperLogLog++.
approx_count_distinct(expr, relativeSD=0.05) - Returns the
estimated cardinality by HyperLogLog++
with relativeSD, the maximum estimation error allowed.
Extended Usage:
No example for approx_count_distinct.
```
- `percentile_approx`
```sql
Usage:
percentile_approx(col, percentage [, accuracy]) - Returns the
approximate percentile value of numeric
column `col` at the given percentage. The value of percentage
must be between 0.0
and 1.0. The `accuracy` parameter (default: 10000) is a positive
integer literal which
controls approximation accuracy at the cost of memory. Higher
value of `accuracy` yields
better accuracy, `1.0/accuracy` is the relative error of the
approximation.
percentile_approx(col, array(percentage1 [, percentage2]...) [,
accuracy]) - Returns the approximate
percentile array of column `col` at the given percentage array.
Each value of the
percentage array must be between 0.0 and 1.0. The `accuracy`
parameter (default: 10000) is
a positive integer literal which controls approximation accuracy
at the cost of memory.
Higher value of `accuracy` yields better accuracy, `1.0/accuracy`
is the relative error of
the approximation.
Extended Usage:
No example for percentile_approx.
```
**After**
- `approx_count_distinct`
```sql
Usage:
approx_count_distinct(expr[, relativeSD]) - Returns the estimated
cardinality by HyperLogLog++.
relativeSD defines the maximum estimation error allowed.
Extended Usage:
Arguments:
expr - an expression of any type that represents data to count.
relativeSD - a numeric literal that defines the maximum
estimation error allowed.
```
- `percentile_approx`
```sql
Usage:
percentile_approx(col, percentage [, accuracy]) - Returns the
approximate percentile value of numeric
column `col` at the given percentage. The value of `percentage`
must be between 0.0
and 1.0. The `accuracy` parameter (default: 10000) is a positive
integer literal which
controls approximation accuracy at the cost of memory. Higher
value of `accuracy` yields
better accuracy, `1.0/accuracy` is the relative error of the
approximation.
When `percentage` is an array, each value of the percentage array
must be between 0.0 and 1.0.
Extended Usage:
Arguments:
col - a numeric expression.
percentage - a numeric literal or an array literal of numeric
type that defines the
percentile. For example, 0.5 means 50-percentile.
accuracy - a numeric literal.
Examples:
> SELECT percentile_approx(10.0, array(0.5, 0.4, 0.1), 100);
[10.0,10.0,10.0]
> SELECT percentile_approx(10.0, 0.5, 100);
10.0
```
## How was this patch tested?
Manually tested
**When examples are multiple**
```sql
spark-sql> describe function extended reflect;
Function: reflect
Class: org.apache.spark.sql.catalyst.expressions.CallMethodViaReflection
Usage: reflect(class, method[, arg1[, arg2 ..]]) - Calls a method with
reflection.
Extended Usage:
Arguments:
class - a string literal that represents a fully-qualified class name.
method - a string literal that represents a method name.
arg - a boolean, string or numeric expression except decimal that
represents an argument for
the method.
Examples:
> SELECT reflect('java.util.UUID', 'randomUUID');
c33fb387-8500-4bfa-81d2-6e0e3e930df2
> SELECT reflect('java.util.UUID', 'fromString',
'a5cf6c42-0c85-418f-af6c-3e4e5b1328f2');
a5cf6c42-0c85-418f-af6c-3e4e5b1328f2
```
**When `Usage` is in single line**
```sql
spark-sql> describe function extended min;
Function: min
Class: org.apache.spark.sql.catalyst.expressions.aggregate.Min
Usage: min(expr) - Returns the minimum value of `expr`.
Extended Usage:
Arguments:
expr - an expression of any type.
```
**When `Usage` is already in multiple lines**
```sql
spark-sql> describe function extended percentile_approx;
Function: percentile_approx
Class:
org.apache.spark.sql.catalyst.expressions.aggregate.ApproximatePercentile
Usage:
percentile_approx(col, percentage [, accuracy]) - Returns the
approximate percentile value of numeric
column `col` at the given percentage. The value of `percentage` must
be between 0.0
and 1.0. The `accuracy` parameter (default: 10000) is a positive
integer literal which
controls approximation accuracy at the cost of memory. Higher value
of `accuracy` yields
better accuracy, `1.0/accuracy` is the relative error of the
approximation.
When `percentage` is an array, each value of the percentage array
must be between 0.0 and 1.0.
Extended Usage:
Arguments:
col - a numeric expression.
percentage - a numeric literal or an array literal of numeric type
that defines the
percentile. For example, 0.5 means 50-percentile.
accuracy - a numeric literal.
Examples:
> SELECT percentile_approx(10.0, array(0.5, 0.4, 0.1), 100);
[10.0,10.0,10.0]
> SELECT percentile_approx(10.0, 0.5, 100);
10.0
```
**When example/argument is missing**
```sql
spark-sql> describe function extended rank;
Function: rank
Class: org.apache.spark.sql.catalyst.expressions.Rank
Usage:
rank() - Computes the rank of a value in a group of values. The result
is one plus the number
of rows preceding or equal to the current row in the ordering of the
partition. The values
will produce gaps in the sequence.
Extended Usage:
No example/argument for rank.
```
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/HyukjinKwon/spark SPARK-17963-1
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/15677.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #15677
----
commit 5f70b1ddc26c51d0c2aef34b58fe98f8220ffc0a
Author: hyukjinkwon <[email protected]>
Date: 2016-10-17T12:07:52Z
Add examples (extend) in each function and improve documentation with
arguments
commit 77abafa5741c5c4c706f4e310f5a63c39811471b
Author: hyukjinkwon <[email protected]>
Date: 2016-10-18T13:02:49Z
aggregate OK
commit 8cd6d80e2c5232253444c59c363010c7a2c4aa69
Author: hyukjinkwon <[email protected]>
Date: 2016-10-18T13:24:36Z
xml OK
commit 2614572e0b04130d662f34b4926e6acaf866c704
Author: hyukjinkwon <[email protected]>
Date: 2016-10-18T14:14:54Z
arithmetic OK
commit 29d0262ef0b1c3d828b2301d50f2d3d1f37ac961
Author: hyukjinkwon <[email protected]>
Date: 2016-10-18T14:59:06Z
bitwiseExpressions OK and double-check others
commit 710c68eda59c0fdb299feccd5125f75d38063176
Author: hyukjinkwon <[email protected]>
Date: 2016-10-18T15:09:46Z
CallMethodViaReflection OK
commit 1d44ec38d5c68ea2853d8bda9713f633c984cdd0
Author: hyukjinkwon <[email protected]>
Date: 2016-10-18T15:15:01Z
Cast OK
commit 78b1fc75e9c9d931f5ff130109cf2f0d9ce2547b
Author: hyukjinkwon <[email protected]>
Date: 2016-10-18T16:20:07Z
collectionOperations OK
commit e2062afdd9579fbec08ac073a6021736078d7ad2
Author: hyukjinkwon <[email protected]>
Date: 2016-10-19T10:15:18Z
complexTypeCreator OK and double check others
commit e9672db417cbf800a92665942a8c096221724cbe
Author: hyukjinkwon <[email protected]>
Date: 2016-10-19T12:59:30Z
conditionalExpressions OK
commit 9baa847729cd6f7686c1aa476cc1de0548f9ac2f
Author: hyukjinkwon <[email protected]>
Date: 2016-10-19T14:09:38Z
datetimeExpressions OK
commit fff85f6bd645c17d91ce96ac73dfd99d957909d3
Author: hyukjinkwon <[email protected]>
Date: 2016-10-19T14:27:42Z
generators OK
commit 24627750226aef3a0e34886b3516496b4e7bb456
Author: hyukjinkwon <[email protected]>
Date: 2016-10-19T14:29:32Z
InputFileName OK
commit 10892fb676b6e70e4eb5039a8581f29edae59226
Author: hyukjinkwon <[email protected]>
Date: 2016-10-19T14:41:16Z
jsonExpressions OK
commit 45e7f99a9252d66a3a87d17136206e098cff6eea
Author: hyukjinkwon <[email protected]>
Date: 2016-10-19T16:01:14Z
mathExpressions OK, double check others and fix scala style
commit 1d69e40a89c1242c283e820ece0e9fdf4b52c7cb
Author: hyukjinkwon <[email protected]>
Date: 2016-10-20T14:14:49Z
misc OK, double-check others and ignore a test in SQLQuerySuite for now
commit 9b24e7879d6d75e913dd71ee3e32292483da5fb5
Author: hyukjinkwon <[email protected]>
Date: 2016-10-20T14:18:47Z
MonotonicallyIncreasingID OK
commit ed1c83936bdf78cbab35c50307c2a8afa6c586a3
Author: hyukjinkwon <[email protected]>
Date: 2016-10-20T14:43:17Z
nullExpressions OK
commit 15351863fecf7765ac97289b40c2b578ae4db7e1
Author: hyukjinkwon <[email protected]>
Date: 2016-10-20T14:59:35Z
predicates OK
commit 8efee7e07ca4bcce654fc947c0fb513b7f361555
Author: hyukjinkwon <[email protected]>
Date: 2016-10-20T15:11:47Z
randomExpressions OK
commit a111d2a63965eda60c6ba6f297a84c9e0a8f85f1
Author: hyukjinkwon <[email protected]>
Date: 2016-10-20T15:24:48Z
regexpExpressions OK
commit a8ddcc2010b06c19fa76a5bfcc64e490ad58f5b3
Author: hyukjinkwon <[email protected]>
Date: 2016-10-20T15:26:18Z
SparkPartitionID OK
commit 99b565879c86b1e8a89da8224a7f4183f0b10b1d
Author: hyukjinkwon <[email protected]>
Date: 2016-10-20T16:38:39Z
tringExpressions OK
commit a29472eeb0fbbdccdd8affd82d7e5706623114c0
Author: hyukjinkwon <[email protected]>
Date: 2016-10-20T16:51:30Z
windowExpressions OK
commit 73ccda0d89e71249adc25d4b04f70501849f1fd9
Author: hyukjinkwon <[email protected]>
Date: 2016-10-20T17:17:04Z
conditionalExpressions OK, double-check others and fix tests
commit d927bff266b978c26db98fd64ff346ca248f7ec4
Author: hyukjinkwon <[email protected]>
Date: 2016-10-20T17:32:31Z
double-check
commit 91d2ab5174273623819a6b18f0d2d557c54603f7
Author: hyukjinkwon <[email protected]>
Date: 2016-10-21T01:11:15Z
Fix tests in SQLQuerySuite and DDLSuite first
commit 7841860bf50fba8b31da774574ad9818ee678d85
Author: hyukjinkwon <[email protected]>
Date: 2016-10-21T01:13:35Z
Take out a space after `Extended Usage:`.
commit 01eecfe44c5edc3db0d25f43a7ae8f80ca07ac61
Author: hyukjinkwon <[email protected]>
Date: 2016-10-21T01:18:46Z
Consistent spacing in Examples
commit 1979d920afcac5794131b7605b8a170f837b461e
Author: hyukjinkwon <[email protected]>
Date: 2016-10-21T09:20:30Z
Remove repeated _FUNC_, consolidate usages and simplify the arguments
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]