Github user budde commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16944#discussion_r102553568
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
 ---
    @@ -181,7 +186,8 @@ case class CatalogTable(
         viewText: Option[String] = None,
         comment: Option[String] = None,
         unsupportedFeatures: Seq[String] = Seq.empty,
    -    tracksPartitionsInCatalog: Boolean = false) {
    +    tracksPartitionsInCatalog: Boolean = false,
    +    schemaPreservesCase: Boolean = true) {
    --- End diff --
    
    I considered taking this approach but I think adding this as a parameter is 
more explicit and less flaky. I share your concern that adding more and more 
parameters to CatalogTable could make this less-usable, especially since params 
like ```schemaPreservesCase``` really only matter when dealing with Hive tables.
    
    However, I don't think dumping more and more parameters into 
```properties``` is a great solution either. As you've pointed out, we would 
need to filter out the properties only used internally by Spark before writing 
them to the catalog. HiveExternalCatalog already filters out Spark SQL-specific 
properties from the CatalogTable returned by HiveClient. Adding additional 
internal properties would put us in a place where properties contains:
    
    - Actual properties key/value pairs returned from the Hive metastore table.
    - Spark SQL-specific properties that are stored in the Hive metastore table 
but filtered out by HiveExternalCatalog when used by Spark internally. These 
properties must be restored before writing back.
    - Spark SQL internal-only properties that are added after reading the table 
from the metastore and must be removed before writing it.
    
    Which isn't even to mention that we'll have to be serializing/deserializing 
this value to and from a (String, String) pair just to pass information between 
```HiveExternalCatalog``` and ```HiveMetastoreCatalog```.
    
    I think that if CatalogTable ends up with too many datasource-specific 
internal parameters then maybe it makes more sense to introduce a new Map 
element, e.g. ```internalProperties```, so these don't get mixed in with the 
table properties.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to