[jira] [Commented] (SPARK-17963) Add examples (extend) in each function and improve documentation with arguments
[ https://issues.apache.org/jira/browse/SPARK-17963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15617070#comment-15617070 ] Apache Spark commented on SPARK-17963: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/15677 > Add examples (extend) in each function and improve documentation with > arguments > --- > > Key: SPARK-17963 > URL: https://issues.apache.org/jira/browse/SPARK-17963 > Project: Spark > Issue Type: Documentation > Components: SQL >Reporter: Hyukjin Kwon > > Currently, it seems function documentation is inconsistent and does not have > examples ({{extend}} much. > For example, some functions have a bad indentation as below: > {code} > spark-sql> DESCRIBE FUNCTION EXTENDED approx_count_distinct; > Function: approx_count_distinct > Class: org.apache.spark.sql.catalyst.expressions.aggregate.HyperLogLogPlusPlus > Usage: approx_count_distinct(expr) - Returns the estimated cardinality by > HyperLogLog++. > approx_count_distinct(expr, relativeSD=0.05) - Returns the estimated > cardinality by HyperLogLog++ > with relativeSD, the maximum estimation error allowed. > Extended Usage: > No example for approx_count_distinct. > {code} > {code} > spark-sql> DESCRIBE FUNCTION EXTENDED count; > Function: count > Class: org.apache.spark.sql.catalyst.expressions.aggregate.Count > Usage: count(*) - Returns the total number of retrieved rows, including rows > containing NULL values. > count(expr) - Returns the number of rows for which the supplied > expression is non-NULL. > count(DISTINCT expr[, expr...]) - Returns the number of rows for which > the supplied expression(s) are unique and non-NULL. > Extended Usage: > No example for count. > {code} > whereas some do have a pretty one > {code} > spark-sql> DESCRIBE FUNCTION EXTENDED percentile_approx; > Function: percentile_approx > Class: > org.apache.spark.sql.catalyst.expressions.aggregate.ApproximatePercentile > Usage: > percentile_approx(col, percentage [, accuracy]) - Returns the > approximate percentile value of numeric > column `col` at the given percentage. The value of percentage must be > between 0.0 > and 1.0. The `accuracy` parameter (default: 1) is a positive > integer literal which > controls approximation accuracy at the cost of memory. Higher value of > `accuracy` yields > better accuracy, `1.0/accuracy` is the relative error of the > approximation. > percentile_approx(col, array(percentage1 [, percentage2]...) [, > accuracy]) - Returns the approximate > percentile array of column `col` at the given percentage array. Each > value of the > percentage array must be between 0.0 and 1.0. The `accuracy` parameter > (default: 1) is >a positive integer literal which controls approximation accuracy at > the cost of memory. >Higher value of `accuracy` yields better accuracy, `1.0/accuracy` is > the relative error of >the approximation. > Extended Usage: > No example for percentile_approx. > {code} > Also, there are several inconsistent indentation, for example, > {{_FUNC_(a,b)}} and {{_FUNC_(a, b)}} (note the indentation between arguments. > It'd be nicer if most of them have a good example with possible argument > types. > Suggested format is as below for multiple line usage: > {code} > spark-sql> DESCRIBE FUNCTION EXTENDED rand; > Function: rand > Class: org.apache.spark.sql.catalyst.expressions.Rand > Usage: > rand() - Returns a random column with i.i.d. uniformly distributed > values in [0, 1]. > seed is given randomly. > rand(seed) - Returns a random column with i.i.d. uniformly distributed > values in [0, 1]. > seed should be an integer/long/NULL literal. > Extended Usage: > > SELECT rand(); > 0.9629742951434543 > > SELECT rand(0); > 0.8446490682263027 > > SELECT rand(NULL); > 0.8446490682263027 > {code} > For single line usage: > {code} > spark-sql> DESCRIBE FUNCTION EXTENDED date_add; > Function: date_add > Class: org.apache.spark.sql.catalyst.expressions.DateAdd > Usage: date_add(start_date, num_days) - Returns the date that is num_days > after start_date. > Extended Usage: > > SELECT date_add('2016-07-30', 1); > '2016-07-31' > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17963) Add examples (extend) in each function and improve documentation with arguments
[ https://issues.apache.org/jira/browse/SPARK-17963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15582259#comment-15582259 ] Apache Spark commented on SPARK-17963: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/15513 > Add examples (extend) in each function and improve documentation with > arguments > --- > > Key: SPARK-17963 > URL: https://issues.apache.org/jira/browse/SPARK-17963 > Project: Spark > Issue Type: Documentation > Components: SQL >Reporter: Hyukjin Kwon > > Currently, it seems function documentation is inconsistent and does not have > examples ({{extend}} much. > For example, some functions have a bad indentation as below: > {code} > spark-sql> DESCRIBE FUNCTION EXTENDED approx_count_distinct; > Function: approx_count_distinct > Class: org.apache.spark.sql.catalyst.expressions.aggregate.HyperLogLogPlusPlus > Usage: approx_count_distinct(expr) - Returns the estimated cardinality by > HyperLogLog++. > approx_count_distinct(expr, relativeSD=0.05) - Returns the estimated > cardinality by HyperLogLog++ > with relativeSD, the maximum estimation error allowed. > Extended Usage: > No example for approx_count_distinct. > {code} > {code} > spark-sql> DESCRIBE FUNCTION EXTENDED count; > Function: count > Class: org.apache.spark.sql.catalyst.expressions.aggregate.Count > Usage: count(*) - Returns the total number of retrieved rows, including rows > containing NULL values. > count(expr) - Returns the number of rows for which the supplied > expression is non-NULL. > count(DISTINCT expr[, expr...]) - Returns the number of rows for which > the supplied expression(s) are unique and non-NULL. > Extended Usage: > No example for count. > {code} > whereas some do have a pretty one > {code} > spark-sql> DESCRIBE FUNCTION EXTENDED percentile_approx; > Function: percentile_approx > Class: > org.apache.spark.sql.catalyst.expressions.aggregate.ApproximatePercentile > Usage: > percentile_approx(col, percentage [, accuracy]) - Returns the > approximate percentile value of numeric > column `col` at the given percentage. The value of percentage must be > between 0.0 > and 1.0. The `accuracy` parameter (default: 1) is a positive > integer literal which > controls approximation accuracy at the cost of memory. Higher value of > `accuracy` yields > better accuracy, `1.0/accuracy` is the relative error of the > approximation. > percentile_approx(col, array(percentage1 [, percentage2]...) [, > accuracy]) - Returns the approximate > percentile array of column `col` at the given percentage array. Each > value of the > percentage array must be between 0.0 and 1.0. The `accuracy` parameter > (default: 1) is >a positive integer literal which controls approximation accuracy at > the cost of memory. >Higher value of `accuracy` yields better accuracy, `1.0/accuracy` is > the relative error of >the approximation. > Extended Usage: > No example for percentile_approx. > {code} > Also, there are several inconsistent indentation, for example, > {{_FUNC_(a,b)}} and {{_FUNC_(a, b)}} (note the indentation between arguments. > It'd be nicer if most of them have a good example with possible argument > types. > Suggested format is as below for multiple line usage: > {code} > spark-sql> DESCRIBE FUNCTION EXTENDED rand; > Function: rand > Class: org.apache.spark.sql.catalyst.expressions.Rand > Usage: > rand() - Returns a random column with i.i.d. uniformly distributed > values in [0, 1]. > seed is given randomly. > rand(seed) - Returns a random column with i.i.d. uniformly distributed > values in [0, 1]. > seed should be an integer/long/NULL literal. > Extended Usage: > > SELECT rand(); > 0.9629742951434543 > > SELECT rand(0); > 0.8446490682263027 > > SELECT rand(NULL); > 0.8446490682263027 > {code} > For single line usage: > {code} > spark-sql> DESCRIBE FUNCTION EXTENDED date_add; > Function: date_add > Class: org.apache.spark.sql.catalyst.expressions.DateAdd > Usage: date_add(start_date, num_days) - Returns the date that is num_days > after start_date. > Extended Usage: > > SELECT date_add('2016-07-30', 1); > '2016-07-31' > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17963) Add examples (extend) in each function and improve documentation with arguments
[ https://issues.apache.org/jira/browse/SPARK-17963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15581072#comment-15581072 ] Hyukjin Kwon commented on SPARK-17963: -- Thanks. Then, I will work on this. > Add examples (extend) in each function and improve documentation with > arguments > --- > > Key: SPARK-17963 > URL: https://issues.apache.org/jira/browse/SPARK-17963 > Project: Spark > Issue Type: Documentation > Components: SQL >Reporter: Hyukjin Kwon > > Currently, it seems function documentation is inconsistent and does not have > examples ({{extend}} much. > For example, some functions have a bad indentation as below: > {code} > spark-sql> DESCRIBE FUNCTION EXTENDED approx_count_distinct; > Function: approx_count_distinct > Class: org.apache.spark.sql.catalyst.expressions.aggregate.HyperLogLogPlusPlus > Usage: approx_count_distinct(expr) - Returns the estimated cardinality by > HyperLogLog++. > approx_count_distinct(expr, relativeSD=0.05) - Returns the estimated > cardinality by HyperLogLog++ > with relativeSD, the maximum estimation error allowed. > Extended Usage: > No example for approx_count_distinct. > {code} > {code} > spark-sql> DESCRIBE FUNCTION EXTENDED count; > Function: count > Class: org.apache.spark.sql.catalyst.expressions.aggregate.Count > Usage: count(*) - Returns the total number of retrieved rows, including rows > containing NULL values. > count(expr) - Returns the number of rows for which the supplied > expression is non-NULL. > count(DISTINCT expr[, expr...]) - Returns the number of rows for which > the supplied expression(s) are unique and non-NULL. > Extended Usage: > No example for count. > {code} > whereas some do have a pretty one > {code} > spark-sql> DESCRIBE FUNCTION EXTENDED percentile_approx; > Function: percentile_approx > Class: > org.apache.spark.sql.catalyst.expressions.aggregate.ApproximatePercentile > Usage: > percentile_approx(col, percentage [, accuracy]) - Returns the > approximate percentile value of numeric > column `col` at the given percentage. The value of percentage must be > between 0.0 > and 1.0. The `accuracy` parameter (default: 1) is a positive > integer literal which > controls approximation accuracy at the cost of memory. Higher value of > `accuracy` yields > better accuracy, `1.0/accuracy` is the relative error of the > approximation. > percentile_approx(col, array(percentage1 [, percentage2]...) [, > accuracy]) - Returns the approximate > percentile array of column `col` at the given percentage array. Each > value of the > percentage array must be between 0.0 and 1.0. The `accuracy` parameter > (default: 1) is >a positive integer literal which controls approximation accuracy at > the cost of memory. >Higher value of `accuracy` yields better accuracy, `1.0/accuracy` is > the relative error of >the approximation. > Extended Usage: > No example for percentile_approx. > {code} > Also, there are several inconsistent indentation, for example, > {{_FUNC_(a,b)}} and {{_FUNC_(a, b)}} (note the indentation between arguments. > It'd be nicer if most of them have a good example with possible argument > types. > Suggested format is as below for multiple line usage: > {code} > spark-sql> DESCRIBE FUNCTION EXTENDED rand; > Function: rand > Class: org.apache.spark.sql.catalyst.expressions.Rand > Usage: > rand() - Returns a random column with i.i.d. uniformly distributed > values in [0, 1]. > seed is given randomly. > rand(seed) - Returns a random column with i.i.d. uniformly distributed > values in [0, 1]. > seed should be an integer/long/NULL literal. > Extended Usage: > > SELECT rand(); > 0.9629742951434543 > > SELECT rand(0); > 0.8446490682263027 > > SELECT rand(NULL); > 0.8446490682263027 > {code} > For single line usage: > {code} > spark-sql> DESCRIBE FUNCTION EXTENDED date_add; > Function: date_add > Class: org.apache.spark.sql.catalyst.expressions.DateAdd > Usage: date_add(start_date, num_days) - Returns the date that is num_days > after start_date. > Extended Usage: > > SELECT date_add('2016-07-30', 1); > '2016-07-31' > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17963) Add examples (extend) in each function and improve documentation with arguments
[ https://issues.apache.org/jira/browse/SPARK-17963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15581022#comment-15581022 ] Reynold Xin commented on SPARK-17963: - It's definitely useful to do, but I don't think we need to have examples for every function (e.g. count). > Add examples (extend) in each function and improve documentation with > arguments > --- > > Key: SPARK-17963 > URL: https://issues.apache.org/jira/browse/SPARK-17963 > Project: Spark > Issue Type: Documentation > Components: SQL >Reporter: Hyukjin Kwon > > Currently, it seems function documentation is inconsistent and does not have > examples ({{extend}} much. > For example, some functions have a bad indentation as below: > {code} > spark-sql> DESCRIBE FUNCTION EXTENDED approx_count_distinct; > Function: approx_count_distinct > Class: org.apache.spark.sql.catalyst.expressions.aggregate.HyperLogLogPlusPlus > Usage: approx_count_distinct(expr) - Returns the estimated cardinality by > HyperLogLog++. > approx_count_distinct(expr, relativeSD=0.05) - Returns the estimated > cardinality by HyperLogLog++ > with relativeSD, the maximum estimation error allowed. > Extended Usage: > No example for approx_count_distinct. > {code} > {code} > spark-sql> DESCRIBE FUNCTION EXTENDED count; > Function: count > Class: org.apache.spark.sql.catalyst.expressions.aggregate.Count > Usage: count(*) - Returns the total number of retrieved rows, including rows > containing NULL values. > count(expr) - Returns the number of rows for which the supplied > expression is non-NULL. > count(DISTINCT expr[, expr...]) - Returns the number of rows for which > the supplied expression(s) are unique and non-NULL. > Extended Usage: > No example for count. > {code} > whereas some do have a pretty one > {code} > spark-sql> DESCRIBE FUNCTION EXTENDED percentile_approx; > Function: percentile_approx > Class: > org.apache.spark.sql.catalyst.expressions.aggregate.ApproximatePercentile > Usage: > percentile_approx(col, percentage [, accuracy]) - Returns the > approximate percentile value of numeric > column `col` at the given percentage. The value of percentage must be > between 0.0 > and 1.0. The `accuracy` parameter (default: 1) is a positive > integer literal which > controls approximation accuracy at the cost of memory. Higher value of > `accuracy` yields > better accuracy, `1.0/accuracy` is the relative error of the > approximation. > percentile_approx(col, array(percentage1 [, percentage2]...) [, > accuracy]) - Returns the approximate > percentile array of column `col` at the given percentage array. Each > value of the > percentage array must be between 0.0 and 1.0. The `accuracy` parameter > (default: 1) is >a positive integer literal which controls approximation accuracy at > the cost of memory. >Higher value of `accuracy` yields better accuracy, `1.0/accuracy` is > the relative error of >the approximation. > Extended Usage: > No example for percentile_approx. > {code} > Also, there are several inconsistent indentation, for example, > {{_FUNC_(a,b)}} and {{_FUNC_(a, b)}} (note the indentation between arguments. > It'd be nicer if most of them have a good example with possible argument > types. > Suggested format is as below for multiple line usage: > {code} > spark-sql> DESCRIBE FUNCTION EXTENDED rand; > Function: rand > Class: org.apache.spark.sql.catalyst.expressions.Rand > Usage: > rand() - Returns a random column with i.i.d. uniformly distributed > values in [0, 1]. > seed is given randomly. > rand(seed) - Returns a random column with i.i.d. uniformly distributed > values in [0, 1]. > seed should be an integer/long/NULL literal. > Extended Usage: > > SELECT rand(); > 0.9629742951434543 > > SELECT rand(0); > 0.8446490682263027 > > SELECT rand(NULL); > 0.8446490682263027 > {code} > For single line usage: > {code} > spark-sql> DESCRIBE FUNCTION EXTENDED date_add; > Function: date_add > Class: org.apache.spark.sql.catalyst.expressions.DateAdd > Usage: date_add(start_date, num_days) - Returns the date that is num_days > after start_date. > Extended Usage: > > SELECT date_add('2016-07-30', 1); > '2016-07-31' > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17963) Add examples (extend) in each function and improve documentation with arguments
[ https://issues.apache.org/jira/browse/SPARK-17963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15580942#comment-15580942 ] Hyukjin Kwon commented on SPARK-17963: -- Hi [~rxin] and [~srowen], I guess the PR would be pretty big. So, I would like you both confirm this first. Do you think it'd be sensible? > Add examples (extend) in each function and improve documentation with > arguments > --- > > Key: SPARK-17963 > URL: https://issues.apache.org/jira/browse/SPARK-17963 > Project: Spark > Issue Type: Documentation > Components: SQL >Reporter: Hyukjin Kwon > > Currently, it seems function documentation is inconsistent and does not have > examples ({{extend}} much. > For example, some functions have a bad indentation as below: > {code} > spark-sql> DESCRIBE FUNCTION last; > Function: last > Class: org.apache.spark.sql.catalyst.expressions.aggregate.Last > Usage: last(expr,isIgnoreNull) - Returns the last value of `child` for a > group of rows. > last(expr,isIgnoreNull=false) - Returns the last value of `child` for a > group of rows. > If isIgnoreNull is true, returns only non-null values. > {code} > {code} > spark-sql> DESCRIBE FUNCTION EXTENDED count; > Function: count > Class: org.apache.spark.sql.catalyst.expressions.aggregate.Count > Usage: count(*) - Returns the total number of retrieved rows, including rows > containing NULL values. > count(expr) - Returns the number of rows for which the supplied > expression is non-NULL. > count(DISTINCT expr[, expr...]) - Returns the number of rows for which > the supplied expression(s) are unique and non-NULL. > Extended Usage: > No example for count. > {code} > whereas some do have a pretty one > {code} > spark-sql> DESCRIBE FUNCTION EXTENDED percentile_approx; > Function: percentile_approx > Class: > org.apache.spark.sql.catalyst.expressions.aggregate.ApproximatePercentile > Usage: > percentile_approx(col, percentage [, accuracy]) - Returns the > approximate percentile value of numeric > column `col` at the given percentage. The value of percentage must be > between 0.0 > and 1.0. The `accuracy` parameter (default: 1) is a positive > integer literal which > controls approximation accuracy at the cost of memory. Higher value of > `accuracy` yields > better accuracy, `1.0/accuracy` is the relative error of the > approximation. > percentile_approx(col, array(percentage1 [, percentage2]...) [, > accuracy]) - Returns the approximate > percentile array of column `col` at the given percentage array. Each > value of the > percentage array must be between 0.0 and 1.0. The `accuracy` parameter > (default: 1) is >a positive integer literal which controls approximation accuracy at > the cost of memory. >Higher value of `accuracy` yields better accuracy, `1.0/accuracy` is > the relative error of >the approximation. > Extended Usage: > No example for percentile_approx. > {code} > Also, there are several inconsistent indentation, for example, > {{_FUNC_(a,b)}} and {{_FUNC_(a, b)}} (note the indentation between arguments. > It'd be nicer if most of them have a good example with possible argument > types. > Suggested format is as below for multiple line usage: > {code} > spark-sql> DESCRIBE FUNCTION EXTENDED rand; > Function: rand > Class: org.apache.spark.sql.catalyst.expressions.Rand > Usage: > rand() - Returns a random column with i.i.d. uniformly distributed > values in [0, 1]. > seed is given randomly. > rand(seed) - Returns a random column with i.i.d. uniformly distributed > values in [0, 1]. > seed should be an integer/long/NULL literal. > Extended Usage: > > SELECT rand(); > 0.9629742951434543 > > SELECT rand(0); > 0.8446490682263027 > > SELECT rand(NULL); > 0.8446490682263027 > {code} > For single line usage: > {code} > spark-sql> DESCRIBE FUNCTION EXTENDED date_add; > Function: date_add > Class: org.apache.spark.sql.catalyst.expressions.DateAdd > Usage: date_add(start_date, num_days) - Returns the date that is num_days > after start_date. > Extended Usage: > > SELECT date_add('2016-07-30', 1); > '2016-07-31' > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org