cloud-fan commented on a change in pull request #29837:
URL: https://github.com/apache/spark/pull/29837#discussion_r517131842



##########
File path: docs/sql-ref-datatypes.md
##########
@@ -314,3 +314,206 @@ SELECT COUNT(*), c2 FROM test GROUP BY c2;
 |        3| Infinity|
 +---------+---------+
 ```
+
+### Type Conversion
+
+In general, an expression can contain different data types and type conversion 
is the transformation of some data types into others in order to resolve type 
mismatches. 
+Spark supports both implicit conversions by type coercion and explicit 
conversions by explicit casting and store assignment casting.
+
+#### Type Coercion in Operations between Different Types 
+
+Type Coercion refers to the automatic or implicit conversion of values from 
one type to another when you need to to resolve type mismatches.
+The following matrix shows the resulting type to which they are implicitly 
converted to resolve an expression involving different data types.
+
+**Numeric Expressions**:
+
+|               |ByteType   |ShortType  |IntegerType |LongType   |FloatType    
         |DoubleType            |DecimalType                  |
+|---------------|-----------|-----------|------------|-----------|----------------------|----------------------|-----------------------------|
+|**ByteType**   |--         |ShortType  |IntegerType |LongType   |FloatType    
         |DoubleType            |DecimalType(3,0)<sup>1</sup> |
+|**ShortType**  |ShortType  |--         |IntegerType |LongType   |FloatType    
         |DoubleType            |DecimalType(5,0)<sup>1</sup> |
+|**IntegerType**|IntegerType|IntegerType|--          |LongType   |FloatType    
         |DoubleType            |DecimalType(10,0)<sup>1</sup>|
+|**LongType**   |LongType   |LongType   |LongType    |--         |FloatType    
         |DoubleType            |DecimalType(20,0)<sup>1</sup>|
+|**FloatType**  |FloatType  |FloatType  |FloatType   |FloatType  |--           
         |DoubleType            |DoubleType                   |
+|**DoubleType** |DoubleType |DoubleType |DoubleType  |DoubleType |DoubleType   
         |--                    |DoubleType                   |
+|**DecimalType**|DecimalType|DecimalType|DecimalType 
|DecimalType|DoubleType<sup>2</sup>|DoubleType<sup>2</sup>|--                   
        |
+
+**Note 1**: DecimalType(precision,scale)   
+**Note 2**: In these cases DecimalType can lose precision, there is no common 
type for decimal and double because double's range is larger than decimal, and 
yet decimal is more precise than double so when we cast Decimaltype into 
DobleType it could lose precision.
+
+**StringType Behavior**  
+* Arithmetic Expressions: When we have an arithmetic expression with one 
operand of type StringType, both operands will be implicitly casted to 
DoubleType.
+
+    |               |ByteType   |ShortType  |IntegerType |LongType   
|FloatType    |DoubleType  |
+    
|---------------|-----------|-----------|------------|-----------|-------------|------------|
+    |**StringType** |DoubleType |DoubleType |DoubleType  |DoubleType 
|DoubleType   |DoubleType  |
+
+* Comparison: When we have a comparison expression with an operand of type 
StringType, the operand StringType will be casted implicitly according to the 
following table.
+
+    |               |ByteType   |ShortType  |IntegerType |LongType   
|FloatType    |DoubleType  |DecimalType |DateType             |TimestampType    
         |
+    
|---------------|-----------|-----------|------------|-----------|-------------|------------|------------|---------------------|--------------------------|
+    |**StringType** |ByteType   |ShortType  |IntegerType |LongType   
|FloatType    |DoubleType  |DoubleType  |DateType<sup>1</sup> 
|TimestampType<sup>1</sup> |
+
+    **Note 1**: If `spark.sql.legacy.typeCoercion.datetimeToString` is true, 
DateType and TimestampType will be casted to StringType
+    
+* in, except, intersect, union, array: If the list of values has a StringType 
element, all the elements will be casted to StringType.
+ 
+* concat, concat_ws, array_join: All elements will be casted to StringType.
+
+* map_concat: If the list of key has a StringType element, all the keys will 
be casted to StringType. The same goes for the values.
+
+* if, when: If any of the results has StringType, all the results will be 
casted to StringType.
+
+**Time Expressions**:
+
+|                  |DateType     |TimestampType |
+|------------------|-------------|--------------|
+|**DateType**      |--           |TimestampType |
+|**TimestampType** |TimestampType|--            |
+
+
+**Possible implicit conversions**:
+
+|                  |ByteType  |ShortType |IntegerType |LongType |FloatType 
|DoubleType |DecimalType|StringType |BinaryType |BooleanType |TimestampType 
|DateType|
+|------------------|----------|----------|------------|---------|----------|-----------|-----------|-----------|-----------|------------|--------------|--------|
+|**ByteType**      |--        |X         |X           |X        |X         |X  
        |X          |X          |           |            |              |       
 |
+|**ShortType**     |X         |--        |X           |X        |X         |X  
        |X          |X          |           |            |              |       
 |
+|**IntegerType**   |X         |X         |--          |X        |X         |X  
        |X          |X          |           |            |              |       
 |
+|**LongType**      |X         |X         |X           |--       |X         |X  
        |X          |X          |           |            |              |       
 |
+|**FloatType**     |X         |X         |X           |X        |--        |X  
        |X          |X          |           |            |              |       
 |
+|**DoubleType**    |X         |X         |X           |X        |X         |-- 
        |X          |X          |           |            |              |       
 |
+|**DecimalType**   |X         |X         |X           |X        |X         |X  
        |--         |X          |           |            |              |       
 |
+|**StringType**    |X         |X         |X           |X        |X         |X  
        |X          |--         |X          |X           |X             |X      
 |
+|**BinaryType**    |          |          |            |         |          |   
        |           |X          |--         |            |              |       
 |
+|**BooleanType**   |          |          |            |         |          |   
        |           |X          |           |--          |              |       
 |
+|**TimestampType** |          |          |            |         |          |   
        |           |X          |           |            |--            |X      
 |
+|**DateType**      |          |          |            |         |          |   
        |           |X          |           |            |X             |--     
 |
+
+#### Type Coercion Examples
+
+```sql
+DESCRIBE TABLE numericTable;
++-------------+---------+-------+
+|col_name     |data_type|comment|
++-------------+---------+-------+
+|integerColumn|int      |null   |
+|doubleColumn |double   |null   |
++-------------+---------+-------+
+
+DESCRIBE SELECT integerColumn + doubleColumn as result FROM numericTable;
++--------+---------+-------+
+|col_name|data_type|comment|
++--------+---------+-------+
+|  result|   double|   null|
++--------+---------+-------+
+
+```
+
+```sql
+DESCRIBE SELECT MONTHS_BETWEEN(CAST('2020-10-10' AS Date),CAST('2020-08-13' AS 
timestamp))
+
++------------------------------------------------------------------------------------------------+---------+-------+
+|col_name                                                                      
                  |data_type|comment|
++------------------------------------------------------------------------------------------------+---------+-------+
+|months_between(CAST(CAST(2020-10-10 AS DATE) AS TIMESTAMP), CAST(2020-08-13 
AS TIMESTAMP), true)|double   |null   |
++------------------------------------------------------------------------------------------------+---------+-------+
+
+```
+
+```sql
+DESCRIBE SELECT 1 + '2'
+
++---------------------------------------+---------+-------+
+|col_name                               |data_type|comment|
++---------------------------------------+---------+-------+
+|(CAST(1 AS DOUBLE) + CAST(2 AS DOUBLE))|double   |null   |
++---------------------------------------+---------+-------+
+
+```
+
+```sql
+DESCRIBE SELECT 1 = '2'
+
++--------------------+---------+-------+
+|col_name            |data_type|comment|
++--------------------+---------+-------+
+|(1 = CAST(2 AS INT))|boolean  |null   |
++--------------------+---------+-------+
+
+```
+
+```sql
+DESCRIBE SELECT 1 IN ('2', 3)
+
++-------------------------------------------------------------+---------+-------+
+|col_name                                                     
|data_type|comment|
++-------------------------------------------------------------+---------+-------+
+|(CAST(1 AS STRING) IN (CAST(2 AS STRING), CAST(3 AS STRING)))|boolean  |null  
 |
++-------------------------------------------------------------+---------+-------+
+
+```
+
+
+
+#### Explicit Casting and Store Assignment Casting

Review comment:
       Ansi mode is off by default, and Store Assignment is using ANSI policy 
by default. That said, these two have different behaviors by default.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to