[GitHub] spark pull request #15513: [WIP][SPARK-17963][SQL][Documentation] Add exampl...

HyukjinKwon Mon, 17 Oct 2016 06:26:50 -0700

GitHub user HyukjinKwon opened a pull request:

    https://github.com/apache/spark/pull/15513


    [WIP][SPARK-17963][SQL][Documentation] Add examples (extend) in each 
function and improve documentation with arguments

    ## What changes were proposed in this pull request?
    
    This PR proposes to change the documentation for functions.
    
    The changes include
    
     - Re-indent the documentation
     - Add arguments
     - Add examples in `extended` where the arguments are multiple or specific 
format (e.g. xml/ json).
    
    For examples, the documentation was updated as below:
    
    **Before**
    
      - `approx_count_distinct`
    
        ```sql
    Usage: approx_count_distinct(expr) - Returns the estimated cardinality by 
HyperLogLog++.
        approx_count_distinct(expr, relativeSD=0.05) - Returns the estimated 
cardinality by HyperLogLog++
          with relativeSD, the maximum estimation error allowed.
    
    Extended Usage:
    No example for approx_count_distinct.
    ```
    
      - `percentile_approx`
    
        ```sql
    Usage:
          percentile_approx(col, percentage [, accuracy]) - Returns the 
approximate percentile value of numeric
          column `col` at the given percentage. The value of percentage must be 
between 0.0
          and 1.0. The `accuracy` parameter (default: 10000) is a positive 
integer literal which
          controls approximation accuracy at the cost of memory. Higher value 
of `accuracy` yields
          better accuracy, `1.0/accuracy` is the relative error of the 
approximation.
    
          percentile_approx(col, array(percentage1 [, percentage2]...) [, 
accuracy]) - Returns the approximate
          percentile array of column `col` at the given percentage array. Each 
value of the
          percentage array must be between 0.0 and 1.0. The `accuracy` 
parameter (default: 10000) is
          a positive integer literal which controls approximation accuracy at 
the cost of memory.
          Higher value of `accuracy` yields better accuracy, `1.0/accuracy` is 
the relative error of
          the approximation.
    
    Extended Usage:
    No example for percentile_approx.
    ```
    
    **After**
    
      - `approx_count_distinct`
    
        ```sql
    Usage:
          approx_count_distinct(expr) - Returns the estimated cardinality by 
HyperLogLog++.
    
            Arguments:
              expr - any type expression that represents data to collect the 
first.
    
          approx_count_distinct(expr, relativeSD) - Returns the estimated 
cardinality by HyperLogLog++
            with relativeSD, the maximum estimation error allowed.
    
            Arguments:
              expr - any type expression that represents data to collect the 
first.
              relativeSD - any numeric type or any nonnumeric type literal that 
can be implicitly
                converted to double type, that represents maximum estimation 
error allowed
                (default = 0.05).
    
    Extended Usage: No example for approx_count_distinct.
    ```
    
      - `percentile_approx`
    
        ```sql
    Usage:
          percentile_approx(col, percentage [, accuracy]) - Returns the 
approximate percentile value of numeric
            column `col` at the given percentage. The value of percentage must 
be between 0.0
            and 1.0. The `accuracy` parameter (default: 10000) is a positive 
integer literal which
            controls approximation accuracy at the cost of memory. Higher value 
of `accuracy` yields
            better accuracy, `1.0/accuracy` is the relative error of the 
approximation.
    
            Arguments:
              col - any numeric type or any nonnumeric type expression that can 
be implicitly
                converted to double type.
              percentage - any numeric type or any nonnumeric type literal that 
can be
                implicitly converted to double type.
              accuracy - any numeric type or any nonnumeric type literal that 
can be implicitly
                converted to int type.
    
          percentile_approx(col, array(percentage1 [, percentage2]...) [, 
accuracy]) - Returns the approximate
            percentile array of column `col` at the given percentage array. 
Each value of the
            percentage array must be between 0.0 and 1.0. The `accuracy` 
parameter (default: 10000) is
            a positive integer literal which controls approximation accuracy at 
the cost of memory.
            Higher value of `accuracy` yields better accuracy, `1.0/accuracy` 
is the relative error of
            the approximation.
    
            Arguments:
              col - any numeric type or any nonnumeric type expression that can 
be implicitly
                converted to double type.
              array(...) - an array that contains any numeric type literal that 
can be implicitly
                converted to double type.
              accuracy - any numeric type or any nonnumeric type literal that 
can be implicitly
                converted to int type.
    
    Extended Usage:
          > SELECT percentile_approx(10.0, 0.5, 100);
           10.0
    
          > SELECT percentile_approx(10.0, array(0.5, 0.4, 0.1), 100);
           [10.0,10.0,10.0]
    ```
    
    ## How was this patch tested?
    
    N/A

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/HyukjinKwon/spark SPARK-17963

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/15513.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #15513
    
----
commit 2059374537496c9f81512b643e3ec084e43e2594
Author: hyukjinkwon <[email protected]>
Date:   2016-10-17T12:07:52Z

    Add examples (extend) in each function and improve documentation with 
arguments

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #15513: [WIP][SPARK-17963][SQL][Documentation] Add exampl...

Reply via email to