[3/3] spark git commit: [SPARK-17963][SQL][DOCUMENTATION] Add examples (extend) in each expression and improve documentation

2016-11-02 Thread lixiao
[SPARK-17963][SQL][DOCUMENTATION] Add examples (extend) in each expression and 
improve documentation

## What changes were proposed in this pull request?

This PR proposes to change the documentation for functions. Please refer the 
discussion from https://github.com/apache/spark/pull/15513

The changes include
- Re-indent the documentation
- Add examples/arguments in `extended` where the arguments are multiple or 
specific format (e.g. xml/ json).

For examples, the documentation was updated as below:
### Functions with single line usage

**Before**
- `pow`

  ``` sql
  Usage: pow(x1, x2) - Raise x1 to the power of x2.
  Extended Usage:
  > SELECT pow(2, 3);
   8.0
  ```
- `current_timestamp`

  ``` sql
  Usage: current_timestamp() - Returns the current timestamp at the start of 
query evaluation.
  Extended Usage:
  No example for current_timestamp.
  ```

**After**
- `pow`

  ``` sql
  Usage: pow(expr1, expr2) - Raises `expr1` to the power of `expr2`.
  Extended Usage:
  Examples:
> SELECT pow(2, 3);
 8.0
  ```

- `current_timestamp`

  ``` sql
  Usage: current_timestamp() - Returns the current timestamp at the start of 
query evaluation.
  Extended Usage:
  No example/argument for current_timestamp.
  ```
### Functions with (already) multiple line usage

**Before**
- `approx_count_distinct`

  ``` sql
  Usage: approx_count_distinct(expr) - Returns the estimated cardinality by 
HyperLogLog++.
  approx_count_distinct(expr, relativeSD=0.05) - Returns the estimated 
cardinality by HyperLogLog++
with relativeSD, the maximum estimation error allowed.

  Extended Usage:
  No example for approx_count_distinct.
  ```
- `percentile_approx`

  ``` sql
  Usage:
percentile_approx(col, percentage [, accuracy]) - Returns the 
approximate percentile value of numeric
column `col` at the given percentage. The value of percentage must be 
between 0.0
and 1.0. The `accuracy` parameter (default: 1) is a positive 
integer literal which
controls approximation accuracy at the cost of memory. Higher value of 
`accuracy` yields
better accuracy, `1.0/accuracy` is the relative error of the 
approximation.

percentile_approx(col, array(percentage1 [, percentage2]...) [, 
accuracy]) - Returns the approximate
percentile array of column `col` at the given percentage array. Each 
value of the
percentage array must be between 0.0 and 1.0. The `accuracy` parameter 
(default: 1) is
a positive integer literal which controls approximation accuracy at the 
cost of memory.
Higher value of `accuracy` yields better accuracy, `1.0/accuracy` is 
the relative error of
the approximation.

  Extended Usage:
  No example for percentile_approx.
  ```

**After**
- `approx_count_distinct`

  ``` sql
  Usage:
  approx_count_distinct(expr[, relativeSD]) - Returns the estimated 
cardinality by HyperLogLog++.
`relativeSD` defines the maximum estimation error allowed.

  Extended Usage:
  No example/argument for approx_count_distinct.
  ```

- `percentile_approx`

  ``` sql
  Usage:
  percentile_approx(col, percentage [, accuracy]) - Returns the approximate 
percentile value of numeric
column `col` at the given percentage. The value of percentage must be 
between 0.0
and 1.0. The `accuracy` parameter (default: 1) is a positive 
numeric literal which
controls approximation accuracy at the cost of memory. Higher value of 
`accuracy` yields
better accuracy, `1.0/accuracy` is the relative error of the 
approximation.
When `percentage` is an array, each value of the percentage array must 
be between 0.0 and 1.0.
In this case, returns the approximate percentile array of column `col` 
at the given
percentage array.

  Extended Usage:
  Examples:
> SELECT percentile_approx(10.0, array(0.5, 0.4, 0.1), 100);
 [10.0,10.0,10.0]
> SELECT percentile_approx(10.0, 0.5, 100);
 10.0
  ```
## How was this patch tested?

Manually tested

**When examples are multiple**

``` sql
spark-sql> describe function extended reflect;
Function: reflect
Class: org.apache.spark.sql.catalyst.expressions.CallMethodViaReflection
Usage: reflect(class, method[, arg1[, arg2 ..]]) - Calls a method with 
reflection.
Extended Usage:
Examples:
  > SELECT reflect('java.util.UUID', 'randomUUID');
   c33fb387-8500-4bfa-81d2-6e0e3e930df2
  > SELECT reflect('java.util.UUID', 'fromString', 
'a5cf6c42-0c85-418f-af6c-3e4e5b1328f2');
   a5cf6c42-0c85-418f-af6c-3e4e5b1328f2
```

**When `Usage` is in single line**

``` sql
spark-sql> describe function extended min;
Function: min
Class: org.apache.spark.sql.catalyst.expressions.aggregate.Min
Usage: min(expr) - Returns the minimum value of `expr`.
Extended Usage:
No example/argument for min.
```

**When `Usage` is already in multiple lines**

``` sql
spark-sql> describe function extended 

[3/3] spark git commit: [SPARK-17963][SQL][DOCUMENTATION] Add examples (extend) in each expression and improve documentation

2016-11-02 Thread lixiao
[SPARK-17963][SQL][DOCUMENTATION] Add examples (extend) in each expression and 
improve documentation

## What changes were proposed in this pull request?

This PR proposes to change the documentation for functions. Please refer the 
discussion from https://github.com/apache/spark/pull/15513

The changes include
- Re-indent the documentation
- Add examples/arguments in `extended` where the arguments are multiple or 
specific format (e.g. xml/ json).

For examples, the documentation was updated as below:
### Functions with single line usage

**Before**
- `pow`

  ``` sql
  Usage: pow(x1, x2) - Raise x1 to the power of x2.
  Extended Usage:
  > SELECT pow(2, 3);
   8.0
  ```
- `current_timestamp`

  ``` sql
  Usage: current_timestamp() - Returns the current timestamp at the start of 
query evaluation.
  Extended Usage:
  No example for current_timestamp.
  ```

**After**
- `pow`

  ``` sql
  Usage: pow(expr1, expr2) - Raises `expr1` to the power of `expr2`.
  Extended Usage:
  Examples:
> SELECT pow(2, 3);
 8.0
  ```

- `current_timestamp`

  ``` sql
  Usage: current_timestamp() - Returns the current timestamp at the start of 
query evaluation.
  Extended Usage:
  No example/argument for current_timestamp.
  ```
### Functions with (already) multiple line usage

**Before**
- `approx_count_distinct`

  ``` sql
  Usage: approx_count_distinct(expr) - Returns the estimated cardinality by 
HyperLogLog++.
  approx_count_distinct(expr, relativeSD=0.05) - Returns the estimated 
cardinality by HyperLogLog++
with relativeSD, the maximum estimation error allowed.

  Extended Usage:
  No example for approx_count_distinct.
  ```
- `percentile_approx`

  ``` sql
  Usage:
percentile_approx(col, percentage [, accuracy]) - Returns the 
approximate percentile value of numeric
column `col` at the given percentage. The value of percentage must be 
between 0.0
and 1.0. The `accuracy` parameter (default: 1) is a positive 
integer literal which
controls approximation accuracy at the cost of memory. Higher value of 
`accuracy` yields
better accuracy, `1.0/accuracy` is the relative error of the 
approximation.

percentile_approx(col, array(percentage1 [, percentage2]...) [, 
accuracy]) - Returns the approximate
percentile array of column `col` at the given percentage array. Each 
value of the
percentage array must be between 0.0 and 1.0. The `accuracy` parameter 
(default: 1) is
a positive integer literal which controls approximation accuracy at the 
cost of memory.
Higher value of `accuracy` yields better accuracy, `1.0/accuracy` is 
the relative error of
the approximation.

  Extended Usage:
  No example for percentile_approx.
  ```

**After**
- `approx_count_distinct`

  ``` sql
  Usage:
  approx_count_distinct(expr[, relativeSD]) - Returns the estimated 
cardinality by HyperLogLog++.
`relativeSD` defines the maximum estimation error allowed.

  Extended Usage:
  No example/argument for approx_count_distinct.
  ```

- `percentile_approx`

  ``` sql
  Usage:
  percentile_approx(col, percentage [, accuracy]) - Returns the approximate 
percentile value of numeric
column `col` at the given percentage. The value of percentage must be 
between 0.0
and 1.0. The `accuracy` parameter (default: 1) is a positive 
numeric literal which
controls approximation accuracy at the cost of memory. Higher value of 
`accuracy` yields
better accuracy, `1.0/accuracy` is the relative error of the 
approximation.
When `percentage` is an array, each value of the percentage array must 
be between 0.0 and 1.0.
In this case, returns the approximate percentile array of column `col` 
at the given
percentage array.

  Extended Usage:
  Examples:
> SELECT percentile_approx(10.0, array(0.5, 0.4, 0.1), 100);
 [10.0,10.0,10.0]
> SELECT percentile_approx(10.0, 0.5, 100);
 10.0
  ```
## How was this patch tested?

Manually tested

**When examples are multiple**

``` sql
spark-sql> describe function extended reflect;
Function: reflect
Class: org.apache.spark.sql.catalyst.expressions.CallMethodViaReflection
Usage: reflect(class, method[, arg1[, arg2 ..]]) - Calls a method with 
reflection.
Extended Usage:
Examples:
  > SELECT reflect('java.util.UUID', 'randomUUID');
   c33fb387-8500-4bfa-81d2-6e0e3e930df2
  > SELECT reflect('java.util.UUID', 'fromString', 
'a5cf6c42-0c85-418f-af6c-3e4e5b1328f2');
   a5cf6c42-0c85-418f-af6c-3e4e5b1328f2
```

**When `Usage` is in single line**

``` sql
spark-sql> describe function extended min;
Function: min
Class: org.apache.spark.sql.catalyst.expressions.aggregate.Min
Usage: min(expr) - Returns the minimum value of `expr`.
Extended Usage:
No example/argument for min.
```

**When `Usage` is already in multiple lines**

``` sql
spark-sql> describe function extended