Github user budde commented on a diff in the pull request:
https://github.com/apache/spark/pull/16944#discussion_r104249293
--- Diff:
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala ---
@@ -217,6 +235,62 @@ private[hive] class HiveMetastoreCatalog(sparkSession:
SparkSession) extends Log
result.copy(expectedOutputAttributes = Some(relation.output))
}
+ private def inferIfNeeded(
+ relation: CatalogRelation,
+ options: Map[String, String],
+ fallbackSchema: StructType,
+ fileFormat: FileFormat,
+ fileIndexOpt: Option[FileIndex] = None): (StructType, CatalogTable)
= {
+ val inferenceMode =
sparkSession.sessionState.conf.caseSensitiveInferenceMode
+ val shouldInfer = (inferenceMode != NEVER_INFER) &&
!relation.tableMeta.schemaPreservesCase
+ val tableName = relation.tableMeta.identifier.table
+ if (shouldInfer) {
+ logInfo(s"Inferring case-sensitive schema for table $tableName
(inference mode: " +
+ s"$inferenceMode)")
+ val fileIndex = fileIndexOpt.getOrElse {
+ val rootPath = new Path(new URI(relation.tableMeta.location))
+ new InMemoryFileIndex(sparkSession, Seq(rootPath), options, None)
+ }
+
+ val inferredSchema = {
+ val schema = fileFormat.inferSchema(
+ sparkSession,
+ options,
+ fileIndex.listFiles(Nil).flatMap(_.files))
+ fileFormat match {
+ case _: ParquetFileFormat =>
+
schema.map(ParquetFileFormat.mergeMetastoreParquetSchema(relation.tableMeta.schema,
_))
--- End diff --
This was used in the previous code for schema inference. I think the only
reason this would still be needed is if the metastore schema contains a
nullable field that isn't actually present in the underlying Parquet data. See
#5214 for more details here.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]