[GitHub] spark pull request #14855: [SPARK-17284] [SQL] Remove Statistics-related Tab...

gatorsmile Mon, 29 Aug 2016 14:52:33 -0700

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14855#discussion_r76694730
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
    @@ -791,11 +791,22 @@ case class ShowCreateTableCommand(table: 
TableIdentifier) extends RunnableComman
         }
       }
     
    +  // These table properties should not be included in the output statement 
of SHOW CREATE TABLE
    +  val excludedTableProperties = Set(
    +    // The following are hive-generated statistics fields
    +    "COLUMN_STATS_ACCURATE",
    +    "numFiles",
    +    "numPartitions",
    +    "numRows",
    +    "rawDataSize",
    +    "totalSize"
    +  )
    +
       private def showHiveTableProperties(metadata: CatalogTable, builder: 
StringBuilder): Unit = {
         if (metadata.properties.nonEmpty) {
           val filteredProps = metadata.properties.filterNot {
    -        // Skips "EXTERNAL" property for external tables
    -        case (key, _) => key == "EXTERNAL" && metadata.tableType == 
EXTERNAL
    +        // Skips all the stats info (See the JIRA: HIVE-13792)
    --- End diff --
    
    Yeah, agree on this general rule. 
    
    When we support translations, we need to be very careful about including 
these statistics info in the SHOW CREATE TABLE DDL. Hive does not include them 
in SHOW CREATE TABLE, as shown in their JIRA: 
https://issues.apache.org/jira/browse/HIVE-13792. If we allow users to provide 
the statistics info when creating the tables, we might need to mark them as 
inaccurate, like what Hive does now?
    
    BTW, should we merge this in 2.1 before we support the translation? So far, 
Spark 2.0 has the bug. Let me know what I should do next. 
    
    Thanks!



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #14855: [SPARK-17284] [SQL] Remove Statistics-related Tab...

Reply via email to