maropu commented on a change in pull request #27489: [SPARK-30703][SQL][DOCS] 
Add a document for the ANSI mode
URL: https://github.com/apache/spark/pull/27489#discussion_r388643951
 
 

 ##########
 File path: docs/sql-ref-ansi-compliance.md
 ##########
 @@ -19,6 +19,127 @@ license: |
   limitations under the License.
 ---
 
+Spark SQL has two options to comply with the SQL standard: 
`spark.sql.ansi.enabled` and `spark.sql.storeAssignmentPolicy` (See a table 
below for details).
+When `spark.sql.ansi.enabled` is set to `true`, Spark SQL follows the standard 
in basic behaviours (e.g., arithmetic operations, type conversion, and SQL 
parsing).
+Moreover, Spark SQL has an independent option to control implicit casting 
behaviours when inserting rows in a table.
+The casting behaviours are defined as store assignment rules in the standard.
+When `spark.sql.storeAssignmentPolicy` is set to `ANSI`, Spark SQL complies 
with the ANSI store assignment rules.
+
+<table class="table">
+<tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr>
+<tr>
+  <td><code>spark.sql.ansi.enabled</code></td>
+  <td>false</td>
+  <td>
+    When true, Spark tries to conform to the ANSI SQL specification:
+    1. Spark will throw a runtime exception if an overflow occurs in any 
operation on integral/decimal field.
+    2. Spark will forbid using the reserved keywords of ANSI SQL as 
identifiers in the SQL parser.
+  </td>
+</tr>
+<tr>
+  <td><code>spark.sql.storeAssignmentPolicy</code></td>
+  <td>ANSI</td>
+  <td>
+    When inserting a value into a column with different data type, Spark will 
perform type coercion.
+    Currently, we support 3 policies for the type coercion rules: ANSI, legacy 
and strict. With ANSI policy,
+    Spark performs the type coercion as per ANSI SQL. In practice, the 
behavior is mostly the same as PostgreSQL.
+    It disallows certain unreasonable type conversions such as converting 
string to int or double to boolean.
+    With legacy policy, Spark allows the type coercion as long as it is a 
valid Cast, which is very loose.
+    e.g. converting string to int or double to boolean is allowed.
+    It is also the only behavior in Spark 2.x and it is compatible with Hive.
+    With strict policy, Spark doesn't allow any possible precision loss or 
data truncation in type coercion,
+    e.g. converting double to int or decimal to double is not allowed.
+  </td>
+</tr>
+</table>
+
+The following subsections present behaviour changes in arithmetic operations, 
type conversions, and SQL parsing when the ANSI mode enabled.
+
+### Arithmetic Operations
+
+In Spark SQL, arithmetic operations performed on numeric types (with the 
exception of decimal) are not checked for overflows by default.
+This means that in case an operation causes overflows, the result is the same 
that the same operation returns in a Java/Scala program (e.g., if the sum of 2 
integers is higher than the maximum value representable, the result is a 
negative number).
 
 Review comment:
   https://github.com/apache/spark/pull/27819

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to