maropu commented on a change in pull request #32037: URL: https://github.com/apache/spark/pull/32037#discussion_r609157680
########## File path: sql/core/src/test/scala/org/apache/spark/sql/TPCDSBase.scala ########## @@ -21,6 +21,49 @@ import org.apache.spark.sql.catalyst.TableIdentifier import org.apache.spark.sql.internal.SQLConf import org.apache.spark.sql.test.SharedSparkSession + +/** + * Base trait for TPC-DS related tests. + * + * Datatype mapping for TPC-DS and Spark SQL, see more at: + * http://www.tpc.org/tpc_documents_current_versions/pdf/tpc-ds_v2.9.0.pdf + * + * |---------------|---------------| + * | TPC-DS | Spark SQL | + * |---------------|---------------| + * | Identifier | INT | + * |---------------|---------------| + * | Integer | INT | + * |---------------|---------------| + * | Decimal(d, f) | Decimal(d, f) | + * |---------------|---------------| + * | Char(N) | Char(N) | + * |---------------|---------------| + * | Varchar(N) | Varchar(N) | + * |---------------|---------------| + * | Date | Date | + * |---------------|---------------| + * + * + * Remarks: + * The TPC-DS spec requires benchmark implementer may employ any internal representation or SQL + * datatype that meets the following requirements: + * 1. Identifier means that the column shall be able to hold any key value generated for that + * column. + * 2. Integer means that the column shall be able to exactly represent integer values (i.e., + * values in increments of 1) in the range of [-2<sup>63</sup>, 2<sup>63</sup>-1] + * 3. Decimal(d, f) means that the column shall be able to represent decimal values up to and + * including d digits,of which f shall occur to the right of the decimal place; the values can be + * either represented exactly or interpreted to be in this range. + * 4. Char(N) means that the column shall be able to hold any string of characters of a fixed + * length of N. + * 5. Varchar(N) means that the column shall be able to hold any string of characters of a + * variable length with a maximum length of N. Columns defined as "varchar(N)" may optionally + * be implemented as "char(N)". + * 6. Date means that the column shall be able to express any calendar day + * between January 1, 1900 and December 31, 2199. + * Review comment: nit: remove this blank linke. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
