[GitHub] spark pull request #15677: [SPARK-17963][SQL][Documentation] Add examples (e...

HyukjinKwon Fri, 28 Oct 2016 17:37:17 -0700

GitHub user HyukjinKwon opened a pull request:

    https://github.com/apache/spark/pull/15677


    [SPARK-17963][SQL][Documentation] Add examples (extend) in each expression 
and improve documentation with arguments

    ## What changes were proposed in this pull request?
    
    This PR proposes to change the documentation for functions. Please refer 
the discussion from https://github.com/apache/spark/pull/15513
    
    The changes include
    
     - Re-indent the documentation
     - Add examples/arguments in `extended` where the arguments are multiple or 
specific format (e.g. xml/ json).
    
    For examples, the documentation was updated as below:
    
    ### Functions with single line usage
    
    **Before**
      - `pow`
    
        ```sql
        Usage: pow(x1, x2) - Raise x1 to the power of x2.
        Extended Usage:
        > SELECT pow(2, 3);
         8.0
        ```
    
      - `current_timestamp`
    
        ```sql
        Usage: current_timestamp() - Returns the current timestamp at the start 
of query evaluation.
        Extended Usage:
        No example for current_timestamp.
        ```
    
    **After**
    
      - `pow`
    
        ```sql
        Usage: pow(expr1, expr2) - Raise expr1 to the power of expr2.
        Extended Usage:
            Arguments:
              expr1 - a numeric expression.
              expr2 - a numeric expression.
    
            Examples:
              > SELECT pow(2, 3);
               8.0
        ```
    
      - `current_timestamp`
    
        ```sql
        Usage: current_timestamp() - Returns the current timestamp at the start 
of query evaluation.
        Extended Usage:
            No example/arguemnt for current_timestamp.
        ```
    
    
    ### Functions with (already) multiple line usage
    
    **Before**
    
      - `approx_count_distinct`
    
        ```sql
        Usage: approx_count_distinct(expr) - Returns the estimated cardinality 
by HyperLogLog++.
            approx_count_distinct(expr, relativeSD=0.05) - Returns the 
estimated cardinality by HyperLogLog++
              with relativeSD, the maximum estimation error allowed.
    
        Extended Usage:
        No example for approx_count_distinct.
        ```
    
      - `percentile_approx`
    
        ```sql
        Usage:
              percentile_approx(col, percentage [, accuracy]) - Returns the 
approximate percentile value of numeric
              column `col` at the given percentage. The value of percentage 
must be between 0.0
              and 1.0. The `accuracy` parameter (default: 10000) is a positive 
integer literal which
              controls approximation accuracy at the cost of memory. Higher 
value of `accuracy` yields
              better accuracy, `1.0/accuracy` is the relative error of the 
approximation.
    
              percentile_approx(col, array(percentage1 [, percentage2]...) [, 
accuracy]) - Returns the approximate
              percentile array of column `col` at the given percentage array. 
Each value of the
              percentage array must be between 0.0 and 1.0. The `accuracy` 
parameter (default: 10000) is
              a positive integer literal which controls approximation accuracy 
at the cost of memory.
              Higher value of `accuracy` yields better accuracy, `1.0/accuracy` 
is the relative error of
              the approximation.
    
    Extended Usage:
    No example for percentile_approx.
    ```
    
    **After**
    
      - `approx_count_distinct`
    
        ```sql
        Usage:
            approx_count_distinct(expr[, relativeSD]) - Returns the estimated 
cardinality by HyperLogLog++.
              relativeSD defines the maximum estimation error allowed.
    
        Extended Usage:
            Arguments:
              expr - an expression of any type that represents data to count.
              relativeSD - a numeric literal that defines the maximum 
estimation error allowed.
        ```
    
      - `percentile_approx`
    
        ```sql
        Usage:
            percentile_approx(col, percentage [, accuracy]) - Returns the 
approximate percentile value of numeric
              column `col` at the given percentage. The value of `percentage` 
must be between 0.0
              and 1.0. The `accuracy` parameter (default: 10000) is a positive 
integer literal which
              controls approximation accuracy at the cost of memory. Higher 
value of `accuracy` yields
              better accuracy, `1.0/accuracy` is the relative error of the 
approximation.
              When `percentage` is an array, each value of the percentage array 
must be between 0.0 and 1.0.
    
        Extended Usage:
            Arguments:
              col - a numeric expression.
              percentage - a numeric literal or an array literal of numeric 
type that defines the
                percentile. For example, 0.5 means 50-percentile.
              accuracy - a numeric literal.
    
            Examples:
              > SELECT percentile_approx(10.0, array(0.5, 0.4, 0.1), 100);
               [10.0,10.0,10.0]
              > SELECT percentile_approx(10.0, 0.5, 100);
               10.0
        ```
    
    ## How was this patch tested?
    
    Manually tested
    
    **When examples are multiple**
    
    ```sql
    spark-sql> describe function extended reflect;
    Function: reflect
    Class: org.apache.spark.sql.catalyst.expressions.CallMethodViaReflection
    Usage: reflect(class, method[, arg1[, arg2 ..]]) - Calls a method with 
reflection.
    Extended Usage:
        Arguments:
          class - a string literal that represents a fully-qualified class name.
          method - a string literal that represents a method name.
          arg - a boolean, string or numeric expression except decimal that 
represents an argument for
            the method.
    
        Examples:
          > SELECT reflect('java.util.UUID', 'randomUUID');
           c33fb387-8500-4bfa-81d2-6e0e3e930df2
          > SELECT reflect('java.util.UUID', 'fromString', 
'a5cf6c42-0c85-418f-af6c-3e4e5b1328f2');
           a5cf6c42-0c85-418f-af6c-3e4e5b1328f2
    ```
    
    **When `Usage` is in single line**
    
    ```sql
    spark-sql> describe function extended min;
    Function: min
    Class: org.apache.spark.sql.catalyst.expressions.aggregate.Min
    Usage: min(expr) - Returns the minimum value of `expr`.
    Extended Usage:
        Arguments:
          expr - an expression of any type.
    ```
    
    **When `Usage` is already in multiple lines**
    
    ```sql
    spark-sql> describe function extended percentile_approx;
    Function: percentile_approx
    Class: 
org.apache.spark.sql.catalyst.expressions.aggregate.ApproximatePercentile
    Usage:
        percentile_approx(col, percentage [, accuracy]) - Returns the 
approximate percentile value of numeric
          column `col` at the given percentage. The value of `percentage` must 
be between 0.0
          and 1.0. The `accuracy` parameter (default: 10000) is a positive 
integer literal which
          controls approximation accuracy at the cost of memory. Higher value 
of `accuracy` yields
          better accuracy, `1.0/accuracy` is the relative error of the 
approximation.
          When `percentage` is an array, each value of the percentage array 
must be between 0.0 and 1.0.
    
    Extended Usage:
        Arguments:
          col - a numeric expression.
          percentage - a numeric literal or an array literal of numeric type 
that defines the
            percentile. For example, 0.5 means 50-percentile.
          accuracy - a numeric literal.
    
        Examples:
          > SELECT percentile_approx(10.0, array(0.5, 0.4, 0.1), 100);
           [10.0,10.0,10.0]
          > SELECT percentile_approx(10.0, 0.5, 100);
           10.0
    ```
    
    **When example/argument is missing**
    
    ```sql
    spark-sql> describe function extended rank;
    Function: rank
    Class: org.apache.spark.sql.catalyst.expressions.Rank
    Usage:
        rank() - Computes the rank of a value in a group of values. The result 
is one plus the number
          of rows preceding or equal to the current row in the ordering of the 
partition. The values
          will produce gaps in the sequence.
    
    Extended Usage:
        No example/argument for rank.
    ```


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/HyukjinKwon/spark SPARK-17963-1

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/15677.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #15677
    
----
commit 5f70b1ddc26c51d0c2aef34b58fe98f8220ffc0a
Author: hyukjinkwon <[email protected]>
Date:   2016-10-17T12:07:52Z

    Add examples (extend) in each function and improve documentation with 
arguments

commit 77abafa5741c5c4c706f4e310f5a63c39811471b
Author: hyukjinkwon <[email protected]>
Date:   2016-10-18T13:02:49Z

    aggregate OK

commit 8cd6d80e2c5232253444c59c363010c7a2c4aa69
Author: hyukjinkwon <[email protected]>
Date:   2016-10-18T13:24:36Z

    xml OK

commit 2614572e0b04130d662f34b4926e6acaf866c704
Author: hyukjinkwon <[email protected]>
Date:   2016-10-18T14:14:54Z

    arithmetic OK

commit 29d0262ef0b1c3d828b2301d50f2d3d1f37ac961
Author: hyukjinkwon <[email protected]>
Date:   2016-10-18T14:59:06Z

    bitwiseExpressions OK and double-check others

commit 710c68eda59c0fdb299feccd5125f75d38063176
Author: hyukjinkwon <[email protected]>
Date:   2016-10-18T15:09:46Z

    CallMethodViaReflection OK

commit 1d44ec38d5c68ea2853d8bda9713f633c984cdd0
Author: hyukjinkwon <[email protected]>
Date:   2016-10-18T15:15:01Z

    Cast OK

commit 78b1fc75e9c9d931f5ff130109cf2f0d9ce2547b
Author: hyukjinkwon <[email protected]>
Date:   2016-10-18T16:20:07Z

    collectionOperations OK

commit e2062afdd9579fbec08ac073a6021736078d7ad2
Author: hyukjinkwon <[email protected]>
Date:   2016-10-19T10:15:18Z

    complexTypeCreator OK and double check others

commit e9672db417cbf800a92665942a8c096221724cbe
Author: hyukjinkwon <[email protected]>
Date:   2016-10-19T12:59:30Z

    conditionalExpressions OK

commit 9baa847729cd6f7686c1aa476cc1de0548f9ac2f
Author: hyukjinkwon <[email protected]>
Date:   2016-10-19T14:09:38Z

    datetimeExpressions OK

commit fff85f6bd645c17d91ce96ac73dfd99d957909d3
Author: hyukjinkwon <[email protected]>
Date:   2016-10-19T14:27:42Z

    generators OK

commit 24627750226aef3a0e34886b3516496b4e7bb456
Author: hyukjinkwon <[email protected]>
Date:   2016-10-19T14:29:32Z

    InputFileName OK

commit 10892fb676b6e70e4eb5039a8581f29edae59226
Author: hyukjinkwon <[email protected]>
Date:   2016-10-19T14:41:16Z

    jsonExpressions OK

commit 45e7f99a9252d66a3a87d17136206e098cff6eea
Author: hyukjinkwon <[email protected]>
Date:   2016-10-19T16:01:14Z

    mathExpressions OK, double check others and fix scala style

commit 1d69e40a89c1242c283e820ece0e9fdf4b52c7cb
Author: hyukjinkwon <[email protected]>
Date:   2016-10-20T14:14:49Z

    misc OK, double-check others and ignore a test in SQLQuerySuite for now

commit 9b24e7879d6d75e913dd71ee3e32292483da5fb5
Author: hyukjinkwon <[email protected]>
Date:   2016-10-20T14:18:47Z

    MonotonicallyIncreasingID OK

commit ed1c83936bdf78cbab35c50307c2a8afa6c586a3
Author: hyukjinkwon <[email protected]>
Date:   2016-10-20T14:43:17Z

    nullExpressions OK

commit 15351863fecf7765ac97289b40c2b578ae4db7e1
Author: hyukjinkwon <[email protected]>
Date:   2016-10-20T14:59:35Z

    predicates OK

commit 8efee7e07ca4bcce654fc947c0fb513b7f361555
Author: hyukjinkwon <[email protected]>
Date:   2016-10-20T15:11:47Z

    randomExpressions OK

commit a111d2a63965eda60c6ba6f297a84c9e0a8f85f1
Author: hyukjinkwon <[email protected]>
Date:   2016-10-20T15:24:48Z

    regexpExpressions OK

commit a8ddcc2010b06c19fa76a5bfcc64e490ad58f5b3
Author: hyukjinkwon <[email protected]>
Date:   2016-10-20T15:26:18Z

    SparkPartitionID OK

commit 99b565879c86b1e8a89da8224a7f4183f0b10b1d
Author: hyukjinkwon <[email protected]>
Date:   2016-10-20T16:38:39Z

    tringExpressions OK

commit a29472eeb0fbbdccdd8affd82d7e5706623114c0
Author: hyukjinkwon <[email protected]>
Date:   2016-10-20T16:51:30Z

    windowExpressions OK

commit 73ccda0d89e71249adc25d4b04f70501849f1fd9
Author: hyukjinkwon <[email protected]>
Date:   2016-10-20T17:17:04Z

    conditionalExpressions OK, double-check others and fix tests

commit d927bff266b978c26db98fd64ff346ca248f7ec4
Author: hyukjinkwon <[email protected]>
Date:   2016-10-20T17:32:31Z

    double-check

commit 91d2ab5174273623819a6b18f0d2d557c54603f7
Author: hyukjinkwon <[email protected]>
Date:   2016-10-21T01:11:15Z

    Fix tests in SQLQuerySuite and DDLSuite first

commit 7841860bf50fba8b31da774574ad9818ee678d85
Author: hyukjinkwon <[email protected]>
Date:   2016-10-21T01:13:35Z

    Take out a space after `Extended Usage:`.

commit 01eecfe44c5edc3db0d25f43a7ae8f80ca07ac61
Author: hyukjinkwon <[email protected]>
Date:   2016-10-21T01:18:46Z

    Consistent spacing in Examples

commit 1979d920afcac5794131b7605b8a170f837b461e
Author: hyukjinkwon <[email protected]>
Date:   2016-10-21T09:20:30Z

    Remove repeated _FUNC_, consolidate usages and simplify the arguments

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #15677: [SPARK-17963][SQL][Documentation] Add examples (e...

Reply via email to