Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19479#discussion_r149851930
  
    --- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
    @@ -1024,21 +1024,36 @@ private[spark] class HiveExternalCatalog(conf: 
SparkConf, hadoopConf: Configurat
           stats: CatalogStatistics,
           schema: StructType): Map[String, String] = {
     
    -    var statsProperties: Map[String, String] =
    -      Map(STATISTICS_TOTAL_SIZE -> stats.sizeInBytes.toString())
    +    val statsProperties = new mutable.HashMap[String, String]()
    +    statsProperties += STATISTICS_TOTAL_SIZE -> 
stats.sizeInBytes.toString()
         if (stats.rowCount.isDefined) {
           statsProperties += STATISTICS_NUM_ROWS -> 
stats.rowCount.get.toString()
         }
     
    +    // In Hive metastore, the length of value in table properties cannot 
be larger than 4000.
    +    // We need to split the key-value pair into multiple key-value 
properties if the length of
    +    // value exceeds this threshold.
    +    val threshold = conf.get(SCHEMA_STRING_LENGTH_THRESHOLD)
    --- End diff --
    
    do we still need this hack? I don't think histogram string can hit this 
limitation. Creating too many buckets is non-sense.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to