[jira] [Commented] (SPARK-17963) Add examples (extend) in each function and improve documentation with arguments

2016-10-28 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15617070#comment-15617070
 ] 

Apache Spark commented on SPARK-17963:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/15677

> Add examples (extend) in each function and improve documentation with 
> arguments
> ---
>
> Key: SPARK-17963
> URL: https://issues.apache.org/jira/browse/SPARK-17963
> Project: Spark
>  Issue Type: Documentation
>  Components: SQL
>Reporter: Hyukjin Kwon
>
> Currently, it seems function documentation is inconsistent and does not have 
> examples ({{extend}} much.
> For example, some functions have a bad indentation as below:
> {code}
> spark-sql> DESCRIBE FUNCTION EXTENDED approx_count_distinct;
> Function: approx_count_distinct
> Class: org.apache.spark.sql.catalyst.expressions.aggregate.HyperLogLogPlusPlus
> Usage: approx_count_distinct(expr) - Returns the estimated cardinality by 
> HyperLogLog++.
> approx_count_distinct(expr, relativeSD=0.05) - Returns the estimated 
> cardinality by HyperLogLog++
>   with relativeSD, the maximum estimation error allowed.
> Extended Usage:
> No example for approx_count_distinct.
> {code}
> {code}
> spark-sql> DESCRIBE FUNCTION EXTENDED count;
> Function: count
> Class: org.apache.spark.sql.catalyst.expressions.aggregate.Count
> Usage: count(*) - Returns the total number of retrieved rows, including rows 
> containing NULL values.
> count(expr) - Returns the number of rows for which the supplied 
> expression is non-NULL.
> count(DISTINCT expr[, expr...]) - Returns the number of rows for which 
> the supplied expression(s) are unique and non-NULL.
> Extended Usage:
> No example for count.
> {code}
> whereas some do have a pretty one
> {code}
> spark-sql> DESCRIBE FUNCTION EXTENDED percentile_approx;
> Function: percentile_approx
> Class: 
> org.apache.spark.sql.catalyst.expressions.aggregate.ApproximatePercentile
> Usage:
>   percentile_approx(col, percentage [, accuracy]) - Returns the 
> approximate percentile value of numeric
>   column `col` at the given percentage. The value of percentage must be 
> between 0.0
>   and 1.0. The `accuracy` parameter (default: 1) is a positive 
> integer literal which
>   controls approximation accuracy at the cost of memory. Higher value of 
> `accuracy` yields
>   better accuracy, `1.0/accuracy` is the relative error of the 
> approximation.
>   percentile_approx(col, array(percentage1 [, percentage2]...) [, 
> accuracy]) - Returns the approximate
>   percentile array of column `col` at the given percentage array. Each 
> value of the
>   percentage array must be between 0.0 and 1.0. The `accuracy` parameter 
> (default: 1) is
>a positive integer literal which controls approximation accuracy at 
> the cost of memory.
>Higher value of `accuracy` yields better accuracy, `1.0/accuracy` is 
> the relative error of
>the approximation.
> Extended Usage:
> No example for percentile_approx.
> {code}
> Also, there are several inconsistent indentation, for example, 
> {{_FUNC_(a,b)}} and {{_FUNC_(a, b)}} (note the indentation between arguments.
> It'd be nicer if most of them have a good example with possible argument 
> types.
> Suggested format is as below for multiple line usage:
> {code}
> spark-sql> DESCRIBE FUNCTION EXTENDED rand;
> Function: rand
> Class: org.apache.spark.sql.catalyst.expressions.Rand
> Usage:
>   rand() - Returns a random column with i.i.d. uniformly distributed 
> values in [0, 1].
> seed is given randomly.
>   rand(seed) - Returns a random column with i.i.d. uniformly distributed 
> values in [0, 1].
> seed should be an integer/long/NULL literal.
> Extended Usage:
> > SELECT rand();
>  0.9629742951434543
> > SELECT rand(0);
>  0.8446490682263027
> > SELECT rand(NULL);
>  0.8446490682263027
> {code}
> For single line usage:
> {code}
> spark-sql> DESCRIBE FUNCTION EXTENDED date_add;
> Function: date_add
> Class: org.apache.spark.sql.catalyst.expressions.DateAdd
> Usage: date_add(start_date, num_days) - Returns the date that is num_days 
> after start_date.
> Extended Usage:
> > SELECT date_add('2016-07-30', 1);
>  '2016-07-31'
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17963) Add examples (extend) in each function and improve documentation with arguments

2016-10-17 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15582259#comment-15582259
 ] 

Apache Spark commented on SPARK-17963:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/15513

> Add examples (extend) in each function and improve documentation with 
> arguments
> ---
>
> Key: SPARK-17963
> URL: https://issues.apache.org/jira/browse/SPARK-17963
> Project: Spark
>  Issue Type: Documentation
>  Components: SQL
>Reporter: Hyukjin Kwon
>
> Currently, it seems function documentation is inconsistent and does not have 
> examples ({{extend}} much.
> For example, some functions have a bad indentation as below:
> {code}
> spark-sql> DESCRIBE FUNCTION EXTENDED approx_count_distinct;
> Function: approx_count_distinct
> Class: org.apache.spark.sql.catalyst.expressions.aggregate.HyperLogLogPlusPlus
> Usage: approx_count_distinct(expr) - Returns the estimated cardinality by 
> HyperLogLog++.
> approx_count_distinct(expr, relativeSD=0.05) - Returns the estimated 
> cardinality by HyperLogLog++
>   with relativeSD, the maximum estimation error allowed.
> Extended Usage:
> No example for approx_count_distinct.
> {code}
> {code}
> spark-sql> DESCRIBE FUNCTION EXTENDED count;
> Function: count
> Class: org.apache.spark.sql.catalyst.expressions.aggregate.Count
> Usage: count(*) - Returns the total number of retrieved rows, including rows 
> containing NULL values.
> count(expr) - Returns the number of rows for which the supplied 
> expression is non-NULL.
> count(DISTINCT expr[, expr...]) - Returns the number of rows for which 
> the supplied expression(s) are unique and non-NULL.
> Extended Usage:
> No example for count.
> {code}
> whereas some do have a pretty one
> {code}
> spark-sql> DESCRIBE FUNCTION EXTENDED percentile_approx;
> Function: percentile_approx
> Class: 
> org.apache.spark.sql.catalyst.expressions.aggregate.ApproximatePercentile
> Usage:
>   percentile_approx(col, percentage [, accuracy]) - Returns the 
> approximate percentile value of numeric
>   column `col` at the given percentage. The value of percentage must be 
> between 0.0
>   and 1.0. The `accuracy` parameter (default: 1) is a positive 
> integer literal which
>   controls approximation accuracy at the cost of memory. Higher value of 
> `accuracy` yields
>   better accuracy, `1.0/accuracy` is the relative error of the 
> approximation.
>   percentile_approx(col, array(percentage1 [, percentage2]...) [, 
> accuracy]) - Returns the approximate
>   percentile array of column `col` at the given percentage array. Each 
> value of the
>   percentage array must be between 0.0 and 1.0. The `accuracy` parameter 
> (default: 1) is
>a positive integer literal which controls approximation accuracy at 
> the cost of memory.
>Higher value of `accuracy` yields better accuracy, `1.0/accuracy` is 
> the relative error of
>the approximation.
> Extended Usage:
> No example for percentile_approx.
> {code}
> Also, there are several inconsistent indentation, for example, 
> {{_FUNC_(a,b)}} and {{_FUNC_(a, b)}} (note the indentation between arguments.
> It'd be nicer if most of them have a good example with possible argument 
> types.
> Suggested format is as below for multiple line usage:
> {code}
> spark-sql> DESCRIBE FUNCTION EXTENDED rand;
> Function: rand
> Class: org.apache.spark.sql.catalyst.expressions.Rand
> Usage:
>   rand() - Returns a random column with i.i.d. uniformly distributed 
> values in [0, 1].
> seed is given randomly.
>   rand(seed) - Returns a random column with i.i.d. uniformly distributed 
> values in [0, 1].
> seed should be an integer/long/NULL literal.
> Extended Usage:
> > SELECT rand();
>  0.9629742951434543
> > SELECT rand(0);
>  0.8446490682263027
> > SELECT rand(NULL);
>  0.8446490682263027
> {code}
> For single line usage:
> {code}
> spark-sql> DESCRIBE FUNCTION EXTENDED date_add;
> Function: date_add
> Class: org.apache.spark.sql.catalyst.expressions.DateAdd
> Usage: date_add(start_date, num_days) - Returns the date that is num_days 
> after start_date.
> Extended Usage:
> > SELECT date_add('2016-07-30', 1);
>  '2016-07-31'
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17963) Add examples (extend) in each function and improve documentation with arguments

2016-10-16 Thread Hyukjin Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15581072#comment-15581072
 ] 

Hyukjin Kwon commented on SPARK-17963:
--

Thanks. Then, I will work on this.

> Add examples (extend) in each function and improve documentation with 
> arguments
> ---
>
> Key: SPARK-17963
> URL: https://issues.apache.org/jira/browse/SPARK-17963
> Project: Spark
>  Issue Type: Documentation
>  Components: SQL
>Reporter: Hyukjin Kwon
>
> Currently, it seems function documentation is inconsistent and does not have 
> examples ({{extend}} much.
> For example, some functions have a bad indentation as below:
> {code}
> spark-sql> DESCRIBE FUNCTION EXTENDED approx_count_distinct;
> Function: approx_count_distinct
> Class: org.apache.spark.sql.catalyst.expressions.aggregate.HyperLogLogPlusPlus
> Usage: approx_count_distinct(expr) - Returns the estimated cardinality by 
> HyperLogLog++.
> approx_count_distinct(expr, relativeSD=0.05) - Returns the estimated 
> cardinality by HyperLogLog++
>   with relativeSD, the maximum estimation error allowed.
> Extended Usage:
> No example for approx_count_distinct.
> {code}
> {code}
> spark-sql> DESCRIBE FUNCTION EXTENDED count;
> Function: count
> Class: org.apache.spark.sql.catalyst.expressions.aggregate.Count
> Usage: count(*) - Returns the total number of retrieved rows, including rows 
> containing NULL values.
> count(expr) - Returns the number of rows for which the supplied 
> expression is non-NULL.
> count(DISTINCT expr[, expr...]) - Returns the number of rows for which 
> the supplied expression(s) are unique and non-NULL.
> Extended Usage:
> No example for count.
> {code}
> whereas some do have a pretty one
> {code}
> spark-sql> DESCRIBE FUNCTION EXTENDED percentile_approx;
> Function: percentile_approx
> Class: 
> org.apache.spark.sql.catalyst.expressions.aggregate.ApproximatePercentile
> Usage:
>   percentile_approx(col, percentage [, accuracy]) - Returns the 
> approximate percentile value of numeric
>   column `col` at the given percentage. The value of percentage must be 
> between 0.0
>   and 1.0. The `accuracy` parameter (default: 1) is a positive 
> integer literal which
>   controls approximation accuracy at the cost of memory. Higher value of 
> `accuracy` yields
>   better accuracy, `1.0/accuracy` is the relative error of the 
> approximation.
>   percentile_approx(col, array(percentage1 [, percentage2]...) [, 
> accuracy]) - Returns the approximate
>   percentile array of column `col` at the given percentage array. Each 
> value of the
>   percentage array must be between 0.0 and 1.0. The `accuracy` parameter 
> (default: 1) is
>a positive integer literal which controls approximation accuracy at 
> the cost of memory.
>Higher value of `accuracy` yields better accuracy, `1.0/accuracy` is 
> the relative error of
>the approximation.
> Extended Usage:
> No example for percentile_approx.
> {code}
> Also, there are several inconsistent indentation, for example, 
> {{_FUNC_(a,b)}} and {{_FUNC_(a, b)}} (note the indentation between arguments.
> It'd be nicer if most of them have a good example with possible argument 
> types.
> Suggested format is as below for multiple line usage:
> {code}
> spark-sql> DESCRIBE FUNCTION EXTENDED rand;
> Function: rand
> Class: org.apache.spark.sql.catalyst.expressions.Rand
> Usage:
>   rand() - Returns a random column with i.i.d. uniformly distributed 
> values in [0, 1].
> seed is given randomly.
>   rand(seed) - Returns a random column with i.i.d. uniformly distributed 
> values in [0, 1].
> seed should be an integer/long/NULL literal.
> Extended Usage:
> > SELECT rand();
>  0.9629742951434543
> > SELECT rand(0);
>  0.8446490682263027
> > SELECT rand(NULL);
>  0.8446490682263027
> {code}
> For single line usage:
> {code}
> spark-sql> DESCRIBE FUNCTION EXTENDED date_add;
> Function: date_add
> Class: org.apache.spark.sql.catalyst.expressions.DateAdd
> Usage: date_add(start_date, num_days) - Returns the date that is num_days 
> after start_date.
> Extended Usage:
> > SELECT date_add('2016-07-30', 1);
>  '2016-07-31'
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17963) Add examples (extend) in each function and improve documentation with arguments

2016-10-16 Thread Reynold Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15581022#comment-15581022
 ] 

Reynold Xin commented on SPARK-17963:
-

It's definitely useful to do, but I don't think we need to have examples for 
every function (e.g. count).


> Add examples (extend) in each function and improve documentation with 
> arguments
> ---
>
> Key: SPARK-17963
> URL: https://issues.apache.org/jira/browse/SPARK-17963
> Project: Spark
>  Issue Type: Documentation
>  Components: SQL
>Reporter: Hyukjin Kwon
>
> Currently, it seems function documentation is inconsistent and does not have 
> examples ({{extend}} much.
> For example, some functions have a bad indentation as below:
> {code}
> spark-sql> DESCRIBE FUNCTION EXTENDED approx_count_distinct;
> Function: approx_count_distinct
> Class: org.apache.spark.sql.catalyst.expressions.aggregate.HyperLogLogPlusPlus
> Usage: approx_count_distinct(expr) - Returns the estimated cardinality by 
> HyperLogLog++.
> approx_count_distinct(expr, relativeSD=0.05) - Returns the estimated 
> cardinality by HyperLogLog++
>   with relativeSD, the maximum estimation error allowed.
> Extended Usage:
> No example for approx_count_distinct.
> {code}
> {code}
> spark-sql> DESCRIBE FUNCTION EXTENDED count;
> Function: count
> Class: org.apache.spark.sql.catalyst.expressions.aggregate.Count
> Usage: count(*) - Returns the total number of retrieved rows, including rows 
> containing NULL values.
> count(expr) - Returns the number of rows for which the supplied 
> expression is non-NULL.
> count(DISTINCT expr[, expr...]) - Returns the number of rows for which 
> the supplied expression(s) are unique and non-NULL.
> Extended Usage:
> No example for count.
> {code}
> whereas some do have a pretty one
> {code}
> spark-sql> DESCRIBE FUNCTION EXTENDED percentile_approx;
> Function: percentile_approx
> Class: 
> org.apache.spark.sql.catalyst.expressions.aggregate.ApproximatePercentile
> Usage:
>   percentile_approx(col, percentage [, accuracy]) - Returns the 
> approximate percentile value of numeric
>   column `col` at the given percentage. The value of percentage must be 
> between 0.0
>   and 1.0. The `accuracy` parameter (default: 1) is a positive 
> integer literal which
>   controls approximation accuracy at the cost of memory. Higher value of 
> `accuracy` yields
>   better accuracy, `1.0/accuracy` is the relative error of the 
> approximation.
>   percentile_approx(col, array(percentage1 [, percentage2]...) [, 
> accuracy]) - Returns the approximate
>   percentile array of column `col` at the given percentage array. Each 
> value of the
>   percentage array must be between 0.0 and 1.0. The `accuracy` parameter 
> (default: 1) is
>a positive integer literal which controls approximation accuracy at 
> the cost of memory.
>Higher value of `accuracy` yields better accuracy, `1.0/accuracy` is 
> the relative error of
>the approximation.
> Extended Usage:
> No example for percentile_approx.
> {code}
> Also, there are several inconsistent indentation, for example, 
> {{_FUNC_(a,b)}} and {{_FUNC_(a, b)}} (note the indentation between arguments.
> It'd be nicer if most of them have a good example with possible argument 
> types.
> Suggested format is as below for multiple line usage:
> {code}
> spark-sql> DESCRIBE FUNCTION EXTENDED rand;
> Function: rand
> Class: org.apache.spark.sql.catalyst.expressions.Rand
> Usage:
>   rand() - Returns a random column with i.i.d. uniformly distributed 
> values in [0, 1].
> seed is given randomly.
>   rand(seed) - Returns a random column with i.i.d. uniformly distributed 
> values in [0, 1].
> seed should be an integer/long/NULL literal.
> Extended Usage:
> > SELECT rand();
>  0.9629742951434543
> > SELECT rand(0);
>  0.8446490682263027
> > SELECT rand(NULL);
>  0.8446490682263027
> {code}
> For single line usage:
> {code}
> spark-sql> DESCRIBE FUNCTION EXTENDED date_add;
> Function: date_add
> Class: org.apache.spark.sql.catalyst.expressions.DateAdd
> Usage: date_add(start_date, num_days) - Returns the date that is num_days 
> after start_date.
> Extended Usage:
> > SELECT date_add('2016-07-30', 1);
>  '2016-07-31'
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17963) Add examples (extend) in each function and improve documentation with arguments

2016-10-16 Thread Hyukjin Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15580942#comment-15580942
 ] 

Hyukjin Kwon commented on SPARK-17963:
--

Hi [~rxin] and [~srowen], I guess the PR would be pretty big. So, I would like 
you both confirm this first. Do you think it'd be sensible?

> Add examples (extend) in each function and improve documentation with 
> arguments
> ---
>
> Key: SPARK-17963
> URL: https://issues.apache.org/jira/browse/SPARK-17963
> Project: Spark
>  Issue Type: Documentation
>  Components: SQL
>Reporter: Hyukjin Kwon
>
> Currently, it seems function documentation is inconsistent and does not have 
> examples ({{extend}} much.
> For example, some functions have a bad indentation as below:
> {code}
> spark-sql> DESCRIBE FUNCTION last;
> Function: last
> Class: org.apache.spark.sql.catalyst.expressions.aggregate.Last
> Usage: last(expr,isIgnoreNull) - Returns the last value of `child` for a 
> group of rows.
> last(expr,isIgnoreNull=false) - Returns the last value of `child` for a 
> group of rows.
>   If isIgnoreNull is true, returns only non-null values.
> {code}
> {code}
> spark-sql> DESCRIBE FUNCTION EXTENDED count;
> Function: count
> Class: org.apache.spark.sql.catalyst.expressions.aggregate.Count
> Usage: count(*) - Returns the total number of retrieved rows, including rows 
> containing NULL values.
> count(expr) - Returns the number of rows for which the supplied 
> expression is non-NULL.
> count(DISTINCT expr[, expr...]) - Returns the number of rows for which 
> the supplied expression(s) are unique and non-NULL.
> Extended Usage:
> No example for count.
> {code}
> whereas some do have a pretty one
> {code}
> spark-sql> DESCRIBE FUNCTION EXTENDED percentile_approx;
> Function: percentile_approx
> Class: 
> org.apache.spark.sql.catalyst.expressions.aggregate.ApproximatePercentile
> Usage:
>   percentile_approx(col, percentage [, accuracy]) - Returns the 
> approximate percentile value of numeric
>   column `col` at the given percentage. The value of percentage must be 
> between 0.0
>   and 1.0. The `accuracy` parameter (default: 1) is a positive 
> integer literal which
>   controls approximation accuracy at the cost of memory. Higher value of 
> `accuracy` yields
>   better accuracy, `1.0/accuracy` is the relative error of the 
> approximation.
>   percentile_approx(col, array(percentage1 [, percentage2]...) [, 
> accuracy]) - Returns the approximate
>   percentile array of column `col` at the given percentage array. Each 
> value of the
>   percentage array must be between 0.0 and 1.0. The `accuracy` parameter 
> (default: 1) is
>a positive integer literal which controls approximation accuracy at 
> the cost of memory.
>Higher value of `accuracy` yields better accuracy, `1.0/accuracy` is 
> the relative error of
>the approximation.
> Extended Usage:
> No example for percentile_approx.
> {code}
> Also, there are several inconsistent indentation, for example, 
> {{_FUNC_(a,b)}} and {{_FUNC_(a, b)}} (note the indentation between arguments.
> It'd be nicer if most of them have a good example with possible argument 
> types.
> Suggested format is as below for multiple line usage:
> {code}
> spark-sql> DESCRIBE FUNCTION EXTENDED rand;
> Function: rand
> Class: org.apache.spark.sql.catalyst.expressions.Rand
> Usage:
>   rand() - Returns a random column with i.i.d. uniformly distributed 
> values in [0, 1].
> seed is given randomly.
>   rand(seed) - Returns a random column with i.i.d. uniformly distributed 
> values in [0, 1].
> seed should be an integer/long/NULL literal.
> Extended Usage:
> > SELECT rand();
>  0.9629742951434543
> > SELECT rand(0);
>  0.8446490682263027
> > SELECT rand(NULL);
>  0.8446490682263027
> {code}
> For single line usage:
> {code}
> spark-sql> DESCRIBE FUNCTION EXTENDED date_add;
> Function: date_add
> Class: org.apache.spark.sql.catalyst.expressions.DateAdd
> Usage: date_add(start_date, num_days) - Returns the date that is num_days 
> after start_date.
> Extended Usage:
> > SELECT date_add('2016-07-30', 1);
>  '2016-07-31'
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org