maropu commented on a change in pull request #32037: URL: https://github.com/apache/spark/pull/32037#discussion_r609157429
########## File path: sql/core/src/test/scala/org/apache/spark/sql/TPCDSBase.scala ########## @@ -21,6 +21,49 @@ import org.apache.spark.sql.catalyst.TableIdentifier import org.apache.spark.sql.internal.SQLConf import org.apache.spark.sql.test.SharedSparkSession + +/** + * Base trait for TPC-DS related tests. + * + * Datatype mapping for TPC-DS and Spark SQL, see more at: + * http://www.tpc.org/tpc_documents_current_versions/pdf/tpc-ds_v2.9.0.pdf + * + * |---------------|---------------| + * | TPC-DS | Spark SQL | + * |---------------|---------------| + * | Identifier | INT | + * |---------------|---------------| + * | Integer | INT | Review comment: > One thing might be clear that we should replace bigint type which is now used in web_returns and store_returns with int type. Another thing that might need to be further discussed is - shall we use bigint for all the integer columns in the TPCDS Data Definition to meet 2.2.2.1 b)? The statement in the spec below implicitly suggests `Integer` should be bigint? ``` b) Integer means that the column shall be able to exactly represent integer values (i.e., values in increments of 1) in the range of at least ( − 2n − 1) to (2n − 1 − 1), where n is 64. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
