[GitHub] [spark] maropu commented on a change in pull request #27216: [SPARK-28588][DOC] Document SELECT statement in SQL Reference (Main page)

GitBox Thu, 16 Jan 2020 21:49:59 -0800

maropu commented on a change in pull request #27216: [SPARK-28588][DOC] 
Document SELECT statement in SQL Reference (Main page)
URL: https://github.com/apache/spark/pull/27216#discussion_r367778428


 ##########
 File path: docs/sql-ref-syntax-qry-select.md
 ##########
 @@ -18,8 +18,132 @@ license: |
   See the License for the specific language governing permissions and
   limitations under the License.
 ---
+Spark supports `SELECT` statement and conforms to ANSI SQL standard. Queries 
are
+used to retrieve result sets from one or more table. The following section 
+describes the overall query syntax and the sub-sections cover different 
constructs
+of a query along with examples. 
 
-Spark SQL is a Apache Spark's module for working with structured data.
-This guide is a reference for Structured Query Language (SQL) for Apache 
-Spark. This document describes the SQL constructs supported by Spark in detail
-along with usage examples when applicable.
+### Syntax
+{% highlight sql %}
+[WITH with_query [, ...]]
+SELECT [hints, ...] [ALL|DISTINCT] named_expression[, named_expression, ...]
+  FROM from_item [, from_item, ...]
+  [WHERE boolean_expression]
+  [GROUP BY expression [, ...] ]
+  [HAVING boolean_expression [, ...] ]
+  [ORDER BY expression [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [, ...] ]
+  [SORT  BY expression [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [, ...] ]
+  [CLUSTER BY [expression [, ...] ]
+  [DISTRIBUTE BY [expression [, ...] ]
+  { UNION | INTERSECT | EXCEPT } [ ALL | DISTINCT ] select ]
+  [WINDOW named_window[, WINDOW named_window, ...]]
+  [LIMIT {ALL | expression}]
+{% endhighlight %}
+
+### Parameters
+<dl>
+  <dt><code><em>with_query</em></code></dt>
+  <dd>
+    Specifies the common table expressions (CTEs) before the main 
<code>SELECT</code> query block.
+    These table expressions are allowed to be referenced later in the main 
query. This is useful to abstract
+    out repeated sub query blocks in the main query and improves readability 
of the query.
+  </dd>
+  <dt><code><em>hints</em></code></dt>
+  <dd>
+    Hints can be specified to help spark optimizer make better planning 
decisions. Currently spark supports hints
+    that influence selection of join strategies and repartitioning of the 
data. For a detailed explanation, please
+    refer to. 
+  </dd>
+  <dt><code><em>ALL</em></code></dt>
+  <dd>
+    Select all matching rows from the relation and is enabled by default.
+  </dd>
+  <dt><code><em>DISTINCT</em></code></dt>
+  <dd>
+    Select all matching rows from the relation after removing duplicates in 
results.
+  </dd>
+  <dt><code><em>named_expression</em></code></dt>
+  <dd>
+    A expression with an assigned name. In general, it denotes a column 
expression.<br><br>
+    <b>Syntax:</b>
+      <code>
+        expression [AS] [alias]
+      </code>
+  </dd>
+  <dt><code><em>from_item</em></code></dt>
+  <dd>
+    Specifies a source of input for the query. It can be one of the following.
+    <ol>
+      <li>Table relation</li>
+      <li>Join relation</li>
+      <li>Table valued function</li>
+      <li>Inlined table</li>
+      <li>Subquery</li>    
+    </ol>
+  </dd>
+  <dt><code><em>WHERE</em></code></dt>
+  <dd>
+    Filters the result of the FROM clause based on the supplied predicates.
+  </dd>
+  <dt><code><em>GROUP BY</em></code></dt>
+  <dd>
+    Specifies the expressions that are used to group the rows. This is used in 
conjunction with aggregate functions
+    (MIN, MAX, COUNT, SUM, AVG) to group rows bsed on the grouping expressions.
+  </dd>
+  <dt><code><em>HAVING</em></code></dt>
+  <dd>
+    Specifies the predicates by which the rows produced by GROUP BY are 
filtered. The HAVING clause is used to
+    filter rows after the grouping is performed
+  </dd>
+  <dt><code><em>ORDER BY</em></code></dt>
+  <dd>
+    Specifies an ordering of the rows of the complete result set of the query. 
The output rows are ordered
 
 Review comment:
   How about writing the default behaviour here (e.g., direction and null 
order)?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] maropu commented on a change in pull request #27216: [SPARK-28588][DOC] Document SELECT statement in SQL Reference (Main page)

Reply via email to