maropu commented on a change in pull request #28120: [SPARK-31349][SQL][DOCS]
Document built-in aggregate functions in SQL Reference
URL: https://github.com/apache/spark/pull/28120#discussion_r405897913
##########
File path: docs/sql-ref-functions-builtin-aggregate.md
##########
@@ -19,4 +19,657 @@ license: |
limitations under the License.
---
-Aggregate functions
\ No newline at end of file
+Spark SQL provides build-in aggregate functions defined in the dataset API and
SQL interface. Aggregate functions
+operate on a group of rows and return a single aggregated value.
+
+<table class="table">
+ <thead>
+ <tr><th style="width:25%">Function</th><th>Argument
Type(s)</th><th>Description</th></tr>
+ </thead>
+ <tbody>
+ <tr>
+ <td><b>{any | some | bool_or}</b>(<i>expression</i>)</td>
+ <td>boolean</td>
+ <td>Returns true if at least one value is true.</td>
+ </tr>
+ <tr>
+ <td><b>approx_count_distinct</b>(<i>expression[, relativeSD]</i>)</td>
+ <td>(long, double)</td>
+ <td>`relativeSD` is the maximum estimation error allowed. Returns the
estimated cardinality by HyperLogLog++.</td>
+ </tr>
+ <tr>
+ <td><b>{avg | mean}</b>(<i>expression</i>)</td>
+ <td>short, float, byte, decimal, double, int, long or string</td>
+ <td>Returns the average of values in the input expression.</td>
+ </tr>
+ <tr>
+ <td><b>{bool_and | every}</b>(<i>expression</i>)</td>
+ <td>boolean</td>
+ <td>Returns true if all values are true.</td>
+ </tr>
+ <tr>
+ <td><b>collect_list</b>(<i>expression</i>)</td>
+ <td>any</td>
+ <td>Collects and returns a list of non-unique elements. The function is
non-deterministic because the order of collected results depends on the order
of the rows which may be non-deterministic after a shuffle.</td>
+ </tr>
+ <tr>
+ <td><b>collect_set</b>(<i>expression</i>)</td>
+ <td>any</td>
+ <td>Collects and returns a set of unique elements. The function is
non-deterministic because the order of collected results depends on the order
of the rows which may be non-deterministic after a shuffle.</td>
+ </tr>
+ <tr>
+ <td><b>corr</b>(<i>expression1, expression2</i>)</td>
+ <td>(double, double)</td>
+ <td>Returns Pearson coefficient of correlation between a set of number
pairs.</td>
+ </tr>
+ <tr>
+ <td><b>count</b>([<b>DISTINCT</b>] <i>*</i>)</td>
+ <td>none</td>
+ <td>If specified <code>DISTINCT</code>, returns the total number of
retrieved rows are unique and not null; Otherwise, returns the total number of
retrieved rows, including rows containing null.</td>
Review comment:
nit: `; Otherwise` -> `; otherwise`?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]