maropu commented on a change in pull request #32037:
URL: https://github.com/apache/spark/pull/32037#discussion_r609157429



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/TPCDSBase.scala
##########
@@ -21,6 +21,49 @@ import org.apache.spark.sql.catalyst.TableIdentifier
 import org.apache.spark.sql.internal.SQLConf
 import org.apache.spark.sql.test.SharedSparkSession
 
+
+/**
+ * Base trait for TPC-DS related tests.
+ *
+ * Datatype mapping for TPC-DS and Spark SQL, see more at:
+ *   http://www.tpc.org/tpc_documents_current_versions/pdf/tpc-ds_v2.9.0.pdf
+ *
+ *    |---------------|---------------|
+ *    |    TPC-DS     |  Spark  SQL   |
+ *    |---------------|---------------|
+ *    |  Identifier   |      INT      |
+ *    |---------------|---------------|
+ *    |    Integer    |      INT      |

Review comment:
       > One thing might be clear that we should replace bigint type which is 
now used in web_returns and store_returns with int type.
   Another thing that might need to be further discussed is - shall we use 
bigint for all the integer columns in the TPCDS Data Definition to meet 2.2.2.1 
b)?
   
   The statement in the spec below implicitly suggests `Integer` should be 
bigint?
   ```
   b) Integer means that the column shall be able to exactly represent integer 
values (i.e., values in increments of
   1) in the range of at least ( − 2n − 1) to (2n − 1 − 1), where n is 64.
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to