Hyukjin Kwon created SPARK-17963: ------------------------------------ Summary: Add examples (extend) in each function and improve documentation with arguments Key: SPARK-17963 URL: https://issues.apache.org/jira/browse/SPARK-17963 Project: Spark Issue Type: Documentation Components: SQL Reporter: Hyukjin Kwon
Currently, it seems function documentation is inconsistent and does not have examples ({{extend}} much. For example, some functions have a bad indentation as below: {code} spark-sql> DESCRIBE FUNCTION last; Function: last Class: org.apache.spark.sql.catalyst.expressions.aggregate.Last Usage: last(expr,isIgnoreNull) - Returns the last value of `child` for a group of rows. last(expr,isIgnoreNull=false) - Returns the last value of `child` for a group of rows. If isIgnoreNull is true, returns only non-null values. {code} {code} spark-sql> DESCRIBE FUNCTION EXTENDED count; Function: count Class: org.apache.spark.sql.catalyst.expressions.aggregate.Count Usage: count(*) - Returns the total number of retrieved rows, including rows containing NULL values. count(expr) - Returns the number of rows for which the supplied expression is non-NULL. count(DISTINCT expr[, expr...]) - Returns the number of rows for which the supplied expression(s) are unique and non-NULL. Extended Usage: No example for count. {code} whereas some do have a pretty one {code} spark-sql> DESCRIBE FUNCTION EXTENDED percentile_approx; Function: percentile_approx Class: org.apache.spark.sql.catalyst.expressions.aggregate.ApproximatePercentile Usage: percentile_approx(col, percentage [, accuracy]) - Returns the approximate percentile value of numeric column `col` at the given percentage. The value of percentage must be between 0.0 and 1.0. The `accuracy` parameter (default: 10000) is a positive integer literal which controls approximation accuracy at the cost of memory. Higher value of `accuracy` yields better accuracy, `1.0/accuracy` is the relative error of the approximation. percentile_approx(col, array(percentage1 [, percentage2]...) [, accuracy]) - Returns the approximate percentile array of column `col` at the given percentage array. Each value of the percentage array must be between 0.0 and 1.0. The `accuracy` parameter (default: 10000) is a positive integer literal which controls approximation accuracy at the cost of memory. Higher value of `accuracy` yields better accuracy, `1.0/accuracy` is the relative error of the approximation. Extended Usage: No example for percentile_approx. {code} Also, there are several inconsistent indentation, for example, {{_FUNC_(a,b)}} and {{_FUNC_(a, b)}} (note the indentation between arguments. It'd be nicer if most of them have a good example with possible argument types. Suggested format is as below for multiple line usage: {code} spark-sql> DESCRIBE FUNCTION EXTENDED rand; Function: rand Class: org.apache.spark.sql.catalyst.expressions.Rand Usage: rand() - Returns a random column with i.i.d. uniformly distributed values in [0, 1]. seed is given randomly. rand(seed) - Returns a random column with i.i.d. uniformly distributed values in [0, 1]. seed should be an integer/long/NULL literal. Extended Usage: > SELECT rand(); 0.9629742951434543 > SELECT rand(0); 0.8446490682263027 > SELECT rand(NULL); 0.8446490682263027 {code} For single line usage: {code} spark-sql> DESCRIBE FUNCTION EXTENDED date_add; Function: date_add Class: org.apache.spark.sql.catalyst.expressions.DateAdd Usage: date_add(start_date, num_days) - Returns the date that is num_days after start_date. Extended Usage: > SELECT date_add('2016-07-30', 1); '2016-07-31' {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org