Github user budde commented on a diff in the pull request:
https://github.com/apache/spark/pull/16944#discussion_r102553568
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
---
@@ -181,7 +186,8 @@ case class CatalogTable(
viewText: Option[String] = None,
comment: Option[String] = None,
unsupportedFeatures: Seq[String] = Seq.empty,
- tracksPartitionsInCatalog: Boolean = false) {
+ tracksPartitionsInCatalog: Boolean = false,
+ schemaPreservesCase: Boolean = true) {
--- End diff --
I considered taking this approach but I think adding this as a parameter is
more explicit and less flaky. I share your concern that adding more and more
parameters to CatalogTable could make this less-usable, especially since params
like ```schemaPreservesCase``` really only matter when dealing with Hive tables.
However, I don't think dumping more and more parameters into
```properties``` is a great solution either. As you've pointed out, we would
need to filter out the properties only used internally by Spark before writing
them to the catalog. HiveExternalCatalog already filters out Spark SQL-specific
properties from the CatalogTable returned by HiveClient. Adding additional
internal properties would put us in a place where properties contains:
- Actual properties key/value pairs returned from the Hive metastore table.
- Spark SQL-specific properties that are stored in the Hive metastore table
but filtered out by HiveExternalCatalog when used by Spark internally. These
properties must be restored before writing back.
- Spark SQL internal-only properties that are added after reading the table
from the metastore and must be removed before writing it.
Which isn't even to mention that we'll have to be serializing/deserializing
this value to and from a (String, String) pair just to pass information between
```HiveExternalCatalog``` and ```HiveMetastoreCatalog```.
I think that if CatalogTable ends up with too many datasource-specific
internal parameters then maybe it makes more sense to introduce a new Map
element, e.g. ```internalProperties```, so these don't get mixed in with the
table properties.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]