[GitHub] [spark] huaxingao commented on a change in pull request #28220: [SPARK-31390][SQL][DOCS] Document Window Function in SQL Syntax Section

GitBox Fri, 17 Apr 2020 16:56:14 -0700

huaxingao commented on a change in pull request #28220: 
[SPARK-31390][SQL][DOCS] Document Window Function in SQL Syntax Section
URL: https://github.com/apache/spark/pull/28220#discussion_r410518875


 ##########
 File path: docs/sql-ref-syntax-qry-window.md
 ##########
 @@ -19,4 +19,188 @@ license: |
   limitations under the License.
 ---
 
-**This page is under construction**
+### Description
+
+Window functions operate on a group of rows, referred to as a window, and 
calculate a return value for each row based on the group of rows. Window 
functions are useful for processing tasks such as calculating a moving average, 
computing a cumulative statistic, or accessing the value of rows given the 
relative position of the current row.
+
+### Syntax
+
+{% highlight sql %}
+window_function OVER
+( [  { PARTITION | DISTRIBUTE } BY partition_col_name = partition_col_val ( [ 
, ... ] ) ]
+  { ORDER | SORT } BY expression [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [ , 
... ]
+  [ window_frame ] )
+{% endhighlight %}
+
+### Parameters
+
+<dl>
+  <dt><code><em>window_function</em></code></dt>
+  <dd>
+    <ul>
+      <li>Ranking Functions</li>
+      <br>
+      <b>Syntax:</b>
+        <code>
+          RANK | DENSE_RANK | PERCENT_RANK | NTILE | ROW_NUMBER
+        </code>
+    </ul>
+    <ul>
+      <li>Analytic Functions</li>
+      <br>
+      <b>Syntax:</b>
+        <code>
+          CUME_DIST | LAG | LEAD
+        </code>
+    </ul>
+    <ul>
+      <li>Aggregate Functions</li>
+      <br>
+      <b>Syntax:</b>
+        <code>
+          MAX | MIN | COUNT | SUM | AVG | ...
+        </code>
+        <br>
+        Please refer to the <a href="api/sql/">Built-in Functions</a> document 
for a complete list of Spark aggregate functions.
+    </ul>
+  </dd>
+</dl>
+<dl>
+  <dt><code><em>window_frame</em></code></dt>
+  <dd>
+    Specifies which row to start the window on and where to end it.<br>
+    <b>Syntax:</b><br>
+      <code>{ RANGE | ROWS } { frame_start | BETWEEN frame_start AND frame_end 
}</code><br>
+      If frame_end is omitted it defaults to CURRENT ROW.<br><br>
+      <ul>
+      <code>frame_start</code> and <code>frame_end</code> have the following 
syntax<br>
+      <b>Syntax:</b><br>
+        <code>
+          UNBOUNDED PRECEDING | offset PRECEDING | CURRENT ROW | offset 
FOLLOWING | UNBOUNDED FOLLOWING
+        </code><br>
+        <code>offset:</code>specifies the <code>offset</code> from the 
position of the current row.
+      </ul>
+  </dd>
+</dl>
+
+### Examples
+
+{% highlight sql %}
+CREATE TABLE employees (name STRING, dept STRING, salary INT, age INT);
+
+INSERT INTO employees VALUES ("Lisa", "Sales", 10000, 35);
+INSERT INTO employees VALUES ("Evan", "Sales", 32000, 38);
+INSERT INTO employees VALUES ("Fred", "Engineering", 21000, 28);
+INSERT INTO employees VALUES ("Alex", "Sales", 30000, 33);
+INSERT INTO employees VALUES ("Tom", "Engineering", 23000, 33);
+INSERT INTO employees VALUES ("Jane", "Marketing", 29000, 28);
+INSERT INTO employees VALUES ("Jeff", "Marketing", 35000, 38);
+INSERT INTO employees VALUES ("Paul", "Engineering", 29000, 23);
+INSERT INTO employees VALUES ("Chloe", "Engineering", 23000, 25);
+
+SELECT * FROM employees;
+  +-----+-----------+------+-----+
+  | name|       dept|salary|  age|
+  +-----+-----------+------+-----+
+  |Chloe|Engineering| 23000|   25|
+  | Fred|Engineering| 21000|   28|
+  | Paul|Engineering| 29000|   23|
+  |Helen|  Marketing| 29000|   40|
+  |  Tom|Engineering| 23000|   33|
+  | Jane|  Marketing| 29000|   28|
+  | Jeff|  Marketing| 35000|   38|
+  | Evan|      Sales| 32000|   38|
+  | Lisa|      Sales| 10000|   35|
+  | Alex|      Sales| 30000|   33|
+  +-----+-----------+------+-----+
+
+SELECT name, dept, RANK() OVER (PARTITION BY dept ORDER BY salary) AS rank 
FROM employees;
+  +-----+-----------+------+----+
+  | name|       dept|salary|rank|
+  +-----+-----------+------+----+
+  | Lisa|      Sales| 10000|   1|
+  | Alex|      Sales| 30000|   2|
+  | Evan|      Sales| 32000|   3|
+  | Fred|Engineering| 21000|   1|
+  |  Tom|Engineering| 23000|   2|
+  |Chloe|Engineering| 23000|   2|
+  | Paul|Engineering| 29000|   4|
+  |Helen|  Marketing| 29000|   1|
+  | Jane|  Marketing| 29000|   1|
+  | Jeff|  Marketing| 35000|   3|
+  +-----+-----------+------+----+
+
+SELECT name, dept, DENSE_RANK() OVER (PARTITION BY dept ORDER BY salary ROWS 
BETWEEN
+    UNBOUNDED PRECEDING AND CURRENT ROW) AS dense_rank FROM employees;
+  +-----+-----------+------+----------+
+  | name|       dept|salary|dense_rank|
+  +-----+-----------+------+----------+
+  | Lisa|      Sales| 10000|         1|
+  | Alex|      Sales| 30000|         2|
+  | Evan|      Sales| 32000|         3|
+  | Fred|Engineering| 21000|         1|
+  |  Tom|Engineering| 23000|         2|
+  |Chloe|Engineering| 23000|         2|
+  | Paul|Engineering| 29000|         3|
+  |Helen|  Marketing| 29000|         1|
+  | Jane|  Marketing| 29000|         1|
+  | Jeff|  Marketing| 35000|         2|
+  +-----+-----------+------+----------+
+
+SELECT name, dept, age, CUME_DIST() OVER (PARTITION BY dept ORDER BY age
+    RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as cume_dist FROM 
employees;
 
 Review comment:
   I will fix this

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] huaxingao commented on a change in pull request #28220: [SPARK-31390][SQL][DOCS] Document Window Function in SQL Syntax Section

Reply via email to