Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/14855#discussion_r76694730
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -791,11 +791,22 @@ case class ShowCreateTableCommand(table:
TableIdentifier) extends RunnableComman
}
}
+ // These table properties should not be included in the output statement
of SHOW CREATE TABLE
+ val excludedTableProperties = Set(
+ // The following are hive-generated statistics fields
+ "COLUMN_STATS_ACCURATE",
+ "numFiles",
+ "numPartitions",
+ "numRows",
+ "rawDataSize",
+ "totalSize"
+ )
+
private def showHiveTableProperties(metadata: CatalogTable, builder:
StringBuilder): Unit = {
if (metadata.properties.nonEmpty) {
val filteredProps = metadata.properties.filterNot {
- // Skips "EXTERNAL" property for external tables
- case (key, _) => key == "EXTERNAL" && metadata.tableType ==
EXTERNAL
+ // Skips all the stats info (See the JIRA: HIVE-13792)
--- End diff --
Yeah, agree on this general rule.
When we support translations, we need to be very careful about including
these statistics info in the SHOW CREATE TABLE DDL. Hive does not include them
in SHOW CREATE TABLE, as shown in their JIRA:
https://issues.apache.org/jira/browse/HIVE-13792. If we allow users to provide
the statistics info when creating the tables, we might need to mark them as
inaccurate, like what Hive does now?
BTW, should we merge this in 2.1 before we support the translation? So far,
Spark 2.0 has the bug. Let me know what I should do next.
Thanks!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]