Github user xuanyuanking commented on a diff in the pull request:
https://github.com/apache/spark/pull/22746#discussion_r226011439
--- Diff: docs/sql-reference.md ---
@@ -0,0 +1,641 @@
+---
+layout: global
+title: Reference
+displayTitle: Reference
+---
+
+* Table of contents
+{:toc}
+
+## Data Types
+
+Spark SQL and DataFrames support the following data types:
+
+* Numeric types
+ - `ByteType`: Represents 1-byte signed integer numbers.
+ The range of numbers is from `-128` to `127`.
+ - `ShortType`: Represents 2-byte signed integer numbers.
+ The range of numbers is from `-32768` to `32767`.
+ - `IntegerType`: Represents 4-byte signed integer numbers.
+ The range of numbers is from `-2147483648` to `2147483647`.
+ - `LongType`: Represents 8-byte signed integer numbers.
+ The range of numbers is from `-9223372036854775808` to
`9223372036854775807`.
+ - `FloatType`: Represents 4-byte single-precision floating point numbers.
+ - `DoubleType`: Represents 8-byte double-precision floating point
numbers.
+ - `DecimalType`: Represents arbitrary-precision signed decimal numbers.
Backed internally by `java.math.BigDecimal`. A `BigDecimal` consists of an
arbitrary precision integer unscaled value and a 32-bit integer scale.
+* String type
+ - `StringType`: Represents character string values.
+* Binary type
+ - `BinaryType`: Represents byte sequence values.
+* Boolean type
+ - `BooleanType`: Represents boolean values.
+* Datetime type
+ - `TimestampType`: Represents values comprising values of fields year,
month, day,
+ hour, minute, and second.
+ - `DateType`: Represents values comprising values of fields year, month,
day.
+* Complex types
+ - `ArrayType(elementType, containsNull)`: Represents values comprising a
sequence of
+ elements with the type of `elementType`. `containsNull` is used to
indicate if
+ elements in a `ArrayType` value can have `null` values.
+ - `MapType(keyType, valueType, valueContainsNull)`:
+ Represents values comprising a set of key-value pairs. The data type of
keys are
+ described by `keyType` and the data type of values are described by
`valueType`.
+ For a `MapType` value, keys are not allowed to have `null` values.
`valueContainsNull`
+ is used to indicate if values of a `MapType` value can have `null`
values.
+ - `StructType(fields)`: Represents values with the structure described by
+ a sequence of `StructField`s (`fields`).
+ * `StructField(name, dataType, nullable)`: Represents a field in a
`StructType`.
+ The name of a field is indicated by `name`. The data type of a field
is indicated
+ by `dataType`. `nullable` is used to indicate if values of this fields
can have
+ `null` values.
+
+<div class="codetabs">
+<div data-lang="scala" markdown="1">
+
+All data types of Spark SQL are located in the package
`org.apache.spark.sql.types`.
+You can access them by doing
+
+{% include_example data_types
scala/org/apache/spark/examples/sql/SparkSQLExample.scala %}
+
+<table class="table">
+<tr>
+ <th style="width:20%">Data type</th>
+ <th style="width:40%">Value type in Scala</th>
+ <th>API to access or create a data type</th></tr>
+<tr>
+ <td> <b>ByteType</b> </td>
+ <td> Byte </td>
+ <td>
+ ByteType
+ </td>
+</tr>
+<tr>
+ <td> <b>ShortType</b> </td>
+ <td> Short </td>
+ <td>
+ ShortType
+ </td>
+</tr>
+<tr>
+ <td> <b>IntegerType</b> </td>
+ <td> Int </td>
+ <td>
+ IntegerType
+ </td>
+</tr>
+<tr>
+ <td> <b>LongType</b> </td>
+ <td> Long </td>
+ <td>
+ LongType
+ </td>
+</tr>
+<tr>
+ <td> <b>FloatType</b> </td>
+ <td> Float </td>
+ <td>
+ FloatType
+ </td>
+</tr>
+<tr>
+ <td> <b>DoubleType</b> </td>
+ <td> Double </td>
+ <td>
+ DoubleType
+ </td>
+</tr>
+<tr>
+ <td> <b>DecimalType</b> </td>
+ <td> java.math.BigDecimal </td>
+ <td>
+ DecimalType
+ </td>
+</tr>
+<tr>
+ <td> <b>StringType</b> </td>
+ <td> String </td>
+ <td>
+ StringType
+ </td>
+</tr>
+<tr>
+ <td> <b>BinaryType</b> </td>
+ <td> Array[Byte] </td>
+ <td>
+ BinaryType
+ </td>
+</tr>
+<tr>
+ <td> <b>BooleanType</b> </td>
+ <td> Boolean </td>
+ <td>
+ BooleanType
+ </td>
+</tr>
+<tr>
+ <td> <b>TimestampType</b> </td>
+ <td> java.sql.Timestamp </td>
+ <td>
+ TimestampType
+ </td>
+</tr>
+<tr>
+ <td> <b>DateType</b> </td>
+ <td> java.sql.Date </td>
+ <td>
+ DateType
+ </td>
+</tr>
+<tr>
+ <td> <b>ArrayType</b> </td>
+ <td> scala.collection.Seq </td>
+ <td>
+ ArrayType(<i>elementType</i>, [<i>containsNull</i>])<br />
+ <b>Note:</b> The default value of <i>containsNull</i> is <i>true</i>.
+ </td>
+</tr>
+<tr>
+ <td> <b>MapType</b> </td>
+ <td> scala.collection.Map </td>
+ <td>
+ MapType(<i>keyType</i>, <i>valueType</i>, [<i>valueContainsNull</i>])<br
/>
+ <b>Note:</b> The default value of <i>valueContainsNull</i> is
<i>true</i>.
+ </td>
+</tr>
+<tr>
+ <td> <b>StructType</b> </td>
+ <td> org.apache.spark.sql.Row </td>
+ <td>
+ StructType(<i>fields</i>)<br />
+ <b>Note:</b> <i>fields</i> is a Seq of StructFields. Also, two fields
with the same
+ name are not allowed.
+ </td>
+</tr>
+<tr>
+ <td> <b>StructField</b> </td>
+ <td> The value type in Scala of the data type of this field
+ (For example, Int for a StructField with the data type IntegerType) </td>
+ <td>
+ StructField(<i>name</i>, <i>dataType</i>, [<i>nullable</i>])<br />
+ <b>Note:</b> The default value of <i>nullable</i> is <i>true</i>.
+ </td>
+</tr>
+</table>
+
+</div>
+
+<div data-lang="java" markdown="1">
+
+All data types of Spark SQL are located in the package of
+`org.apache.spark.sql.types`. To access or create a data type,
+please use factory methods provided in
+`org.apache.spark.sql.types.DataTypes`.
+
+<table class="table">
+<tr>
+ <th style="width:20%">Data type</th>
+ <th style="width:40%">Value type in Java</th>
+ <th>API to access or create a data type</th></tr>
+<tr>
+ <td> <b>ByteType</b> </td>
+ <td> byte or Byte </td>
+ <td>
+ DataTypes.ByteType
+ </td>
+</tr>
+<tr>
+ <td> <b>ShortType</b> </td>
+ <td> short or Short </td>
+ <td>
+ DataTypes.ShortType
+ </td>
+</tr>
+<tr>
+ <td> <b>IntegerType</b> </td>
+ <td> int or Integer </td>
+ <td>
+ DataTypes.IntegerType
+ </td>
+</tr>
+<tr>
+ <td> <b>LongType</b> </td>
+ <td> long or Long </td>
+ <td>
+ DataTypes.LongType
+ </td>
+</tr>
+<tr>
+ <td> <b>FloatType</b> </td>
+ <td> float or Float </td>
+ <td>
+ DataTypes.FloatType
+ </td>
+</tr>
+<tr>
+ <td> <b>DoubleType</b> </td>
+ <td> double or Double </td>
+ <td>
+ DataTypes.DoubleType
+ </td>
+</tr>
+<tr>
+ <td> <b>DecimalType</b> </td>
+ <td> java.math.BigDecimal </td>
+ <td>
+ DataTypes.createDecimalType()<br />
+ DataTypes.createDecimalType(<i>precision</i>, <i>scale</i>).
+ </td>
+</tr>
+<tr>
+ <td> <b>StringType</b> </td>
+ <td> String </td>
+ <td>
+ DataTypes.StringType
+ </td>
+</tr>
+<tr>
+ <td> <b>BinaryType</b> </td>
+ <td> byte[] </td>
+ <td>
+ DataTypes.BinaryType
+ </td>
+</tr>
+<tr>
+ <td> <b>BooleanType</b> </td>
+ <td> boolean or Boolean </td>
+ <td>
+ DataTypes.BooleanType
+ </td>
+</tr>
+<tr>
+ <td> <b>TimestampType</b> </td>
+ <td> java.sql.Timestamp </td>
+ <td>
+ DataTypes.TimestampType
+ </td>
+</tr>
+<tr>
+ <td> <b>DateType</b> </td>
+ <td> java.sql.Date </td>
+ <td>
+ DataTypes.DateType
+ </td>
+</tr>
+<tr>
+ <td> <b>ArrayType</b> </td>
+ <td> java.util.List </td>
+ <td>
+ DataTypes.createArrayType(<i>elementType</i>)<br />
+ <b>Note:</b> The value of <i>containsNull</i> will be <i>true</i><br />
+ DataTypes.createArrayType(<i>elementType</i>, <i>containsNull</i>).
+ </td>
+</tr>
+<tr>
+ <td> <b>MapType</b> </td>
+ <td> java.util.Map </td>
+ <td>
+ DataTypes.createMapType(<i>keyType</i>, <i>valueType</i>)<br />
+ <b>Note:</b> The value of <i>valueContainsNull</i> will be
<i>true</i>.<br />
+ DataTypes.createMapType(<i>keyType</i>, <i>valueType</i>,
<i>valueContainsNull</i>)<br />
+ </td>
+</tr>
+<tr>
+ <td> <b>StructType</b> </td>
+ <td> org.apache.spark.sql.Row </td>
+ <td>
+ DataTypes.createStructType(<i>fields</i>)<br />
+ <b>Note:</b> <i>fields</i> is a List or an array of StructFields.
+ Also, two fields with the same name are not allowed.
+ </td>
+</tr>
+<tr>
+ <td> <b>StructField</b> </td>
+ <td> The value type in Java of the data type of this field
+ (For example, int for a StructField with the data type IntegerType) </td>
+ <td>
+ DataTypes.createStructField(<i>name</i>, <i>dataType</i>,
<i>nullable</i>)
+ </td>
+</tr>
+</table>
+
+</div>
+
+<div data-lang="python" markdown="1">
+
+All data types of Spark SQL are located in the package of
`pyspark.sql.types`.
+You can access them by doing
+{% highlight python %}
+from pyspark.sql.types import *
+{% endhighlight %}
+
+<table class="table">
+<tr>
+ <th style="width:20%">Data type</th>
+ <th style="width:40%">Value type in Python</th>
+ <th>API to access or create a data type</th></tr>
+<tr>
+ <td> <b>ByteType</b> </td>
+ <td>
+ int or long <br />
+ <b>Note:</b> Numbers will be converted to 1-byte signed integer numbers
at runtime.
+ Please make sure that numbers are within the range of -128 to 127.
+ </td>
+ <td>
+ ByteType()
+ </td>
+</tr>
+<tr>
+ <td> <b>ShortType</b> </td>
+ <td>
+ int or long <br />
+ <b>Note:</b> Numbers will be converted to 2-byte signed integer numbers
at runtime.
+ Please make sure that numbers are within the range of -32768 to 32767.
+ </td>
+ <td>
+ ShortType()
+ </td>
+</tr>
+<tr>
+ <td> <b>IntegerType</b> </td>
+ <td> int or long </td>
+ <td>
+ IntegerType()
+ </td>
+</tr>
+<tr>
+ <td> <b>LongType</b> </td>
+ <td>
+ long <br />
+ <b>Note:</b> Numbers will be converted to 8-byte signed integer numbers
at runtime.
+ Please make sure that numbers are within the range of
+ -9223372036854775808 to 9223372036854775807.
+ Otherwise, please convert data to decimal.Decimal and use DecimalType.
+ </td>
+ <td>
+ LongType()
+ </td>
+</tr>
+<tr>
+ <td> <b>FloatType</b> </td>
+ <td>
+ float <br />
+ <b>Note:</b> Numbers will be converted to 4-byte single-precision
floating
+ point numbers at runtime.
+ </td>
+ <td>
+ FloatType()
+ </td>
+</tr>
+<tr>
+ <td> <b>DoubleType</b> </td>
+ <td> float </td>
+ <td>
+ DoubleType()
+ </td>
+</tr>
+<tr>
+ <td> <b>DecimalType</b> </td>
+ <td> decimal.Decimal </td>
+ <td>
+ DecimalType()
+ </td>
+</tr>
+<tr>
+ <td> <b>StringType</b> </td>
+ <td> string </td>
+ <td>
+ StringType()
+ </td>
+</tr>
+<tr>
+ <td> <b>BinaryType</b> </td>
+ <td> bytearray </td>
+ <td>
+ BinaryType()
+ </td>
+</tr>
+<tr>
+ <td> <b>BooleanType</b> </td>
+ <td> bool </td>
+ <td>
+ BooleanType()
+ </td>
+</tr>
+<tr>
+ <td> <b>TimestampType</b> </td>
+ <td> datetime.datetime </td>
+ <td>
+ TimestampType()
+ </td>
+</tr>
+<tr>
+ <td> <b>DateType</b> </td>
+ <td> datetime.date </td>
+ <td>
+ DateType()
+ </td>
+</tr>
+<tr>
+ <td> <b>ArrayType</b> </td>
+ <td> list, tuple, or array </td>
+ <td>
+ ArrayType(<i>elementType</i>, [<i>containsNull</i>])<br />
+ <b>Note:</b> The default value of <i>containsNull</i> is <i>True</i>.
+ </td>
+</tr>
+<tr>
+ <td> <b>MapType</b> </td>
+ <td> dict </td>
+ <td>
+ MapType(<i>keyType</i>, <i>valueType</i>, [<i>valueContainsNull</i>])<br
/>
+ <b>Note:</b> The default value of <i>valueContainsNull</i> is
<i>True</i>.
+ </td>
+</tr>
+<tr>
+ <td> <b>StructType</b> </td>
+ <td> list or tuple </td>
+ <td>
+ StructType(<i>fields</i>)<br />
+ <b>Note:</b> <i>fields</i> is a Seq of StructFields. Also, two fields
with the same
+ name are not allowed.
+ </td>
+</tr>
+<tr>
+ <td> <b>StructField</b> </td>
+ <td> The value type in Python of the data type of this field
+ (For example, Int for a StructField with the data type IntegerType) </td>
+ <td>
+ StructField(<i>name</i>, <i>dataType</i>, [<i>nullable</i>])<br />
+ <b>Note:</b> The default value of <i>nullable</i> is <i>True</i>.
+ </td>
+</tr>
+</table>
+
+</div>
+
+<div data-lang="r" markdown="1">
+
+<table class="table">
+<tr>
+ <th style="width:20%">Data type</th>
+ <th style="width:40%">Value type in R</th>
+ <th>API to access or create a data type</th></tr>
+<tr>
+ <td> <b>ByteType</b> </td>
+ <td>
+ integer <br />
+ <b>Note:</b> Numbers will be converted to 1-byte signed integer numbers
at runtime.
+ Please make sure that numbers are within the range of -128 to 127.
+ </td>
+ <td>
+ "byte"
+ </td>
+</tr>
+<tr>
+ <td> <b>ShortType</b> </td>
+ <td>
+ integer <br />
+ <b>Note:</b> Numbers will be converted to 2-byte signed integer numbers
at runtime.
+ Please make sure that numbers are within the range of -32768 to 32767.
+ </td>
+ <td>
+ "short"
+ </td>
+</tr>
+<tr>
+ <td> <b>IntegerType</b> </td>
+ <td> integer </td>
+ <td>
+ "integer"
+ </td>
+</tr>
+<tr>
+ <td> <b>LongType</b> </td>
+ <td>
+ integer <br />
+ <b>Note:</b> Numbers will be converted to 8-byte signed integer numbers
at runtime.
+ Please make sure that numbers are within the range of
+ -9223372036854775808 to 9223372036854775807.
+ Otherwise, please convert data to decimal.Decimal and use DecimalType.
+ </td>
+ <td>
+ "long"
+ </td>
+</tr>
+<tr>
+ <td> <b>FloatType</b> </td>
+ <td>
+ numeric <br />
+ <b>Note:</b> Numbers will be converted to 4-byte single-precision
floating
+ point numbers at runtime.
+ </td>
+ <td>
+ "float"
+ </td>
+</tr>
+<tr>
+ <td> <b>DoubleType</b> </td>
+ <td> numeric </td>
+ <td>
+ "double"
+ </td>
+</tr>
+<tr>
+ <td> <b>DecimalType</b> </td>
+ <td> Not supported </td>
+ <td>
+ Not supported
+ </td>
+</tr>
+<tr>
+ <td> <b>StringType</b> </td>
+ <td> character </td>
+ <td>
+ "string"
+ </td>
+</tr>
+<tr>
+ <td> <b>BinaryType</b> </td>
+ <td> raw </td>
+ <td>
+ "binary"
+ </td>
+</tr>
+<tr>
+ <td> <b>BooleanType</b> </td>
+ <td> logical </td>
+ <td>
+ "bool"
+ </td>
+</tr>
+<tr>
+ <td> <b>TimestampType</b> </td>
+ <td> POSIXct </td>
+ <td>
+ "timestamp"
+ </td>
+</tr>
+<tr>
+ <td> <b>DateType</b> </td>
+ <td> Date </td>
+ <td>
+ "date"
+ </td>
+</tr>
+<tr>
+ <td> <b>ArrayType</b> </td>
+ <td> vector or list </td>
+ <td>
+ list(type="array", elementType=<i>elementType</i>,
containsNull=[<i>containsNull</i>])<br />
+ <b>Note:</b> The default value of <i>containsNull</i> is <i>TRUE</i>.
+ </td>
+</tr>
+<tr>
+ <td> <b>MapType</b> </td>
+ <td> environment </td>
+ <td>
+ list(type="map", keyType=<i>keyType</i>, valueType=<i>valueType</i>,
valueContainsNull=[<i>valueContainsNull</i>])<br />
+ <b>Note:</b> The default value of <i>valueContainsNull</i> is
<i>TRUE</i>.
+ </td>
+</tr>
+<tr>
+ <td> <b>StructType</b> </td>
+ <td> named list</td>
+ <td>
+ list(type="struct", fields=<i>fields</i>)<br />
+ <b>Note:</b> <i>fields</i> is a Seq of StructFields. Also, two fields
with the same
+ name are not allowed.
+ </td>
+</tr>
+<tr>
+ <td> <b>StructField</b> </td>
+ <td> The value type in R of the data type of this field
+ (For example, integer for a StructField with the data type IntegerType)
</td>
+ <td>
+ list(name=<i>name</i>, type=<i>dataType</i>,
nullable=[<i>nullable</i>])<br />
+ <b>Note:</b> The default value of <i>nullable</i> is <i>TRUE</i>.
+ </td>
+</tr>
+</table>
+
+</div>
+
+</div>
+
+## NaN Semantics
+
+There is specially handling for not-a-number (NaN) when dealing with
`float` or `double` types that
+does not exactly match standard floating point semantics.
+Specifically:
+
+ - NaN = NaN returns true.
+ - In aggregations, all NaN values are grouped together.
+ - NaN is treated as a normal value in join keys.
+ - NaN values go last when in ascending order, larger than any other
numeric value.
+
+ ## Arithmetic operations
--- End diff --
ah thanks! Fix in b3fc39d.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]