Re: [DOCS] Window function docs

Bruce Momjian Thu, 06 Apr 2017 18:15:17 -0700

On Thu, Apr  6, 2017 at 03:53:24PM -0700, David G. Johnston wrote:
> On Thu, Apr 6, 2017 at 2:49 PM, Bruce Momjian <[email protected]> wrote:
> 
>     As part of writing a talk about window functions, I have done some
>     cleanups of the window function docs, attached, that I would like to
>     apply to head.
> 
> 
> +1 Overall.  Some minor technical items noted below.  The only conceptual item
> is whether (and, if so, how) to deal with partitions vs. frames.
> 
> -    But unlike regular aggregate functions, use of a window function does 
> not
> +    But unlike normal aggregate functions, use of a window function does not
> 
> How broadly did you look for usage of normal versus regular?  The window
> function doc page uses "normal" once, vis-a-vis "ordered-set", I haven't 
> looked
> further.  Note on that page the intro paragraph doesn't mention


I did a grep looking for 'regular.*agg'.

> hypothetical-set aggregates as being distinct from ordered-set aggregates yet
> the tables and the docs here do.

Do you have any suggested text for that?

> https://www.postgresql.org/docs/current/static/functions-aggregate.html
> 
> +    treated as a window function and computed across the entire partition.)
> 
> Would introducing the concept of frame here trade precision for comprehension?

I modified this to say "window frame" as part of the patch for Tom.

> -    <firstterm>window frame</>.  Many (but not all) window functions act only
> +    <firstterm>window frame</>.  Some window functions act only
> 
> This is a semantic change as to whether the dominate behavior is to act on
> frames or partitions.  IIUC very few window functions, when presented with a
> ORDER BY'd OVER and thus a limited frame, will still act on the entire
> partition instead of the just the frame they are provided.  In particular 
> don't

If you look at slide 79 here you can get a good overview of what
operates on frames vs. partitions:

        http://momjian.us/main/writings/pgsql/window.pdf

> all normal aggregates (and user-defined ones) operate on frames?  The 
> cume_dist
> function noted later is one of the few exceptions since it effectively 
> operates
> on both - the frame for the numerator and the partition for the denominator.

Actually, cume_dist does not operate on frames because if you specify
ORDER BY x ROWS CURRENT ROW, cume_dist still operates on the peer and
lesser rows.  So, the frame clause is ignored by cume_dist --- in a way
it has RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW wired into the
window function.

> +   aggregate function (i.e. not ordered-set or hypothetical-set aggregates)
> 
> I, and others, subscribe to the believe that both i.e., and e.g., should have
> trailing commas...
> 
> http://english.stackexchange.com/questions/6894/
> should-there-be-a-comma-after-i-e

Done.

> +   functions (includes <function>cume_dist</>) are defined so that they
> 
> s/b "including <function>cume_dist</>"

Done.

> +   give the same answer for any two peer rows.
> 
> for all peer rows?

Much better, done.

> +   assuming the current row is does not exist in the partition.
> 
> remove "is"

Fixed in Tom's patch.

Updated patch attached.  Thanks for the review.

-- 
  Bruce Momjian  <[email protected]>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +

diff --git a/doc/src/sgml/advanced.sgml b/doc/src/sgml/advanced.sgml
index 1e45511..7563131 100644
--- a/doc/src/sgml/advanced.sgml
+++ b/doc/src/sgml/advanced.sgml
@@ -328,8 +328,8 @@ COMMIT;
     A <firstterm>window function</> performs a calculation across a set of
     table rows that are somehow related to the current row.  This is comparable
     to the type of calculation that can be done with an aggregate function.
-    But unlike regular aggregate functions, use of a window function does not
-    cause rows to become grouped into a single output row &mdash; the
+    However, window functions do not cause rows to become grouped into a single
+    output row like non-window aggregate calls would.  Instead, the
     rows retain their separate identities.  Behind the scenes, the window
     function is able to access more than just the current row of the query
     result.
@@ -363,20 +363,19 @@ SELECT depname, empno, salary, avg(salary) OVER (PARTITION BY depname) FROM emps
     <structname>empsalary</>, and there is one output row for each row in the
     table.  The fourth column represents an average taken across all the table
     rows that have the same <structfield>depname</> value as the current row.
-    (This actually is the same function as the regular <function>avg</>
-    aggregate function, but the <literal>OVER</> clause causes it to be
-    treated as a window function and computed across an appropriate set of
-    rows.)
+    (This actually is the same function as the non-window <function>avg</>
+    aggregate, but the <literal>OVER</> clause causes it to be
+    treated as a window function and computed across the window frame.)
    </para>
 
    <para>
     A window function call always contains an <literal>OVER</> clause
     directly following the window function's name and argument(s).  This is what
-    syntactically distinguishes it from a regular function or aggregate
-    function.  The <literal>OVER</> clause determines exactly how the
+    syntactically distinguishes it from a normal function or non-window
+    aggregate.  The <literal>OVER</> clause determines exactly how the
     rows of the query are split up for processing by the window function.
-    The <literal>PARTITION BY</> list within <literal>OVER</> specifies
-    dividing the rows into groups, or partitions, that share the same
+    The <literal>PARTITION BY</> clause within <literal>OVER</>
+    divides the rows into groups, or partitions, that share the same
     values of the <literal>PARTITION BY</> expression(s).  For each row,
     the window function is computed across the rows that fall into the
     same partition as the current row.
@@ -411,8 +410,8 @@ FROM empsalary;
 </screen>
 
     As shown here, the <function>rank</> function produces a numerical rank
-    within the current row's partition for each distinct <literal>ORDER BY</>
-    value, in the order defined by the <literal>ORDER BY</> clause.
+    for each distinct <literal>ORDER BY</> value in the current row's
+    partition, using the order defined by the <literal>ORDER BY</> clause.
     <function>rank</> needs no explicit parameter, because its behavior
     is entirely determined by the <literal>OVER</> clause.
    </para>
@@ -424,20 +423,20 @@ FROM empsalary;
     if any.  For example, a row removed because it does not meet the
     <literal>WHERE</> condition is not seen by any window function.
     A query can contain multiple window functions that slice up the data
-    in different ways by means of different <literal>OVER</> clauses, but
+    in different ways using different <literal>OVER</> clauses, but
     they all act on the same collection of rows defined by this virtual table.
    </para>
 
    <para>
     We already saw that <literal>ORDER BY</> can be omitted if the ordering
     of rows is not important.  It is also possible to omit <literal>PARTITION
-    BY</>, in which case there is just one partition containing all the rows.
+    BY</>, in which case there is a single partition containing all rows.
    </para>
 
    <para>
     There is another important concept associated with window functions:
     for each row, there is a set of rows within its partition called its
-    <firstterm>window frame</>.  Many (but not all) window functions act only
+    <firstterm>window frame</>.  Some window functions act only
     on the rows of the window frame, rather than of the whole partition.
     By default, if <literal>ORDER BY</> is supplied then the frame consists of
     all rows from the start of the partition up through the current row, plus
@@ -515,7 +514,7 @@ SELECT salary, sum(salary) OVER (ORDER BY salary) FROM empsalary;
     elsewhere, such as in <literal>GROUP BY</>, <literal>HAVING</>
     and <literal>WHERE</literal> clauses.  This is because they logically
     execute after the processing of those clauses.  Also, window functions
-    execute after regular aggregate functions.  This means it is valid to
+    execute after non-window aggregate functions.  This means it is valid to
     include an aggregate function call in the arguments of a window function,
     but not vice versa.
    </para>
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index e3d852c..d87e74a 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -14413,17 +14413,18 @@ SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab;
   <para>
    The built-in window functions are listed in
    <xref linkend="functions-window-table">.  Note that these functions
-   <emphasis>must</> be invoked using window function syntax; that is an
+   <emphasis>must</> be invoked using window function syntax, i.e., an
    <literal>OVER</> clause is required.
   </para>
 
   <para>
    In addition to these functions, any built-in or user-defined normal
-   aggregate function (but not ordered-set or hypothetical-set aggregates)
+   aggregate function (i.e., not ordered-set or hypothetical-set aggregates)
    can be used as a window function; see
    <xref linkend="functions-aggregate"> for a list of the built-in aggregates.
    Aggregate functions act as window functions only when an <literal>OVER</>
-   clause follows the call; otherwise they act as regular aggregates.
+   clause follows the call; otherwise they act as non-window aggregates
+   and return a single row for the entire set.
   </para>
 
   <table id="functions-window-table">
@@ -14488,7 +14489,7 @@ SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab;
       <entry>
        <type>double precision</type>
       </entry>
-      <entry>relative rank of the current row: (<function>rank</> - 1) / (total rows - 1)</entry>
+      <entry>relative rank of the current row: (<function>rank</> - 1) / (total partition rows - 1)</entry>
      </row>
 
      <row>
@@ -14501,7 +14502,7 @@ SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab;
       <entry>
        <type>double precision</type>
       </entry>
-      <entry>relative rank of the current row: (number of rows preceding or peer with current row) / (total rows)</entry>
+      <entry>cumulative distribution: (number of partition rows preceding or peer with current row) / total partition rows</entry>
      </row>
 
      <row>
@@ -14634,9 +14635,10 @@ SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab;
    All of the functions listed in
    <xref linkend="functions-window-table"> depend on the sort ordering
    specified by the <literal>ORDER BY</> clause of the associated window
-   definition.  Rows that are not distinct in the <literal>ORDER BY</>
-   ordering are said to be <firstterm>peers</>; the four ranking functions
-   are defined so that they give the same answer for any two peer rows.
+   definition.  Rows that are not distinct when considering only the
+   <literal>ORDER BY</> columns are said to be <firstterm>peers</>.
+   The four ranking functions (including <function>cume_dist</>) are
+   defined so that they give the same answer for all peer rows.
   </para>
 
   <para>
@@ -14679,6 +14681,14 @@ SELECT xmlagg(x) FROM (SELECT x FROM test ORDER BY y DESC) AS tab;
    </para>
   </note>
 
+  <para>
+   <function>cume_dist</> computes the fraction of partition rows that
+   are less than or equal to the current row and its peers, while
+   <function>percent_rank</> computes the fraction of partition rows that
+   are less than the current row, assuming the current row does not exist
+   in the partition.
+  </para>
+
  </sect1>
 
  <sect1 id="functions-subquery">
diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml
index 40f722c..4746ae5 100644
--- a/doc/src/sgml/syntax.sgml
+++ b/doc/src/sgml/syntax.sgml
@@ -1779,10 +1779,10 @@ FROM generate_series(1,10) AS s(i);
    <para>
     A <firstterm>window function call</firstterm> represents the application
     of an aggregate-like function over some portion of the rows selected
-    by a query.  Unlike regular aggregate function calls, this is not tied
+    by a query.  Unlike non-window aggregate calls, this is not tied
     to grouping of the selected rows into a single output row &mdash; each
     row remains separate in the query output.  However the window function
-    is able to scan all the rows that would be part of the current row's
+    has access to all the rows that would be part of the current row's
     group according to the grouping specification (<literal>PARTITION BY</>
     list) of the window function call.
     The syntax of a window function call is one of the following:
@@ -1831,20 +1831,20 @@ UNBOUNDED FOLLOWING
     named window in the <literal>WINDOW</literal> clause; see the
     <xref linkend="sql-select"> reference page for details.  It's worth
     pointing out that <literal>OVER wname</> is not exactly equivalent to
-    <literal>OVER (wname)</>; the latter implies copying and modifying the
+    <literal>OVER (wname ...)</>; the latter implies copying and modifying the
     window definition, and will be rejected if the referenced window
     specification includes a frame clause.
    </para>
 
    <para>
-    The <literal>PARTITION BY</> option groups the rows of the query into
+    The <literal>PARTITION BY</> clause groups the rows of the query into
     <firstterm>partitions</>, which are processed separately by the window
     function.  <literal>PARTITION BY</> works similarly to a query-level
     <literal>GROUP BY</> clause, except that its expressions are always just
     expressions and cannot be output-column names or numbers.
     Without <literal>PARTITION BY</>, all rows produced by the query are
     treated as a single partition.
-    The <literal>ORDER BY</> option determines the order in which the rows
+    The <literal>ORDER BY</> clause determines the order in which the rows
     of a partition are processed by the window function.  It works similarly
     to a query-level <literal>ORDER BY</> clause, but likewise cannot use
     output-column names or numbers.  Without <literal>ORDER BY</>, rows are
@@ -1922,8 +1922,8 @@ UNBOUNDED FOLLOWING
     The built-in window functions are described in <xref
     linkend="functions-window-table">.  Other window functions can be added by
     the user.  Also, any built-in or user-defined normal aggregate function
-    can be used as a window function.  Ordered-set aggregates presently
-    cannot be used as window functions, however.
+    can be used as a window function.  However, ordered-set aggregates
+    cannot presently be used as window functions.
    </para>
 
    <para>

-- 
Sent via pgsql-docs mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-docs

Re: [DOCS] Window function docs

Reply via email to