[GitHub] [spark] shardulm94 commented on a change in pull request #34254: [SPARK-36905] Fix reading hive views without explicit column names

GitBox Wed, 13 Oct 2021 11:53:54 -0700


shardulm94 commented on a change in pull request #34254:
URL: https://github.com/apache/spark/pull/34254#discussion_r727359054




##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
##########
@@ -898,14 +905,24 @@ class SessionCatalog(
     val nameToCurrentOrdinal = scala.collection.mutable.HashMap.empty[String, 
Int]
     val viewDDL = buildViewDDL(metadata, isTempView)
 
-    val projectList = viewColumnNames.zip(metadata.schema).map { case (name, 
field) =>
-      val normalizedName = normalizeColName(name)
-      val count = nameToCounts(normalizedName)
-      val ordinal = nameToCurrentOrdinal.getOrElse(normalizedName, 0)
-      nameToCurrentOrdinal(normalizedName) = ordinal + 1
-      val col = GetViewColumnByNameAndOrdinal(
-        metadata.identifier.toString, name, ordinal, count, viewDDL)
-      Alias(UpCast(col, field.dataType), field.name)(explicitMetadata = 
Some(field.metadata))
+    val projectList = if (!isHiveCreatedView(metadata)) {
+      viewColumnNames.zip(metadata.schema).map { case (name, field) =>
+        val normalizedName = normalizeColName(name)
+        val count = nameToCounts(normalizedName)
+        val ordinal = nameToCurrentOrdinal.getOrElse(normalizedName, 0)
+        nameToCurrentOrdinal(normalizedName) = ordinal + 1
+        val col = GetViewColumnByNameAndOrdinal(
+          metadata.identifier.toString, name, ordinal, count, viewDDL)
+        Alias(UpCast(col, field.dataType), field.name)(explicitMetadata = 
Some(field.metadata))
+      }
+    } else {
+      // For view created by hive, the parsed view plan may have different 
output columns with
+      // the schema stored in metadata. For example: `CREATE VIEW v AS SELECT 
1 FROM t`
+      // the schema in metadata will be `_c0` while the parsed view plan has 
column named `1`
+      metadata.schema.zipWithIndex.map { case (field, index) =>
+        val col = GetColumnByOrdinal(index, field.dataType)

Review comment:
       I see that the `dataType` parameter of `GetColumnByOrdinal` is ignored 
by the 
[Analyzer](https://github.com/apache/spark/blob/1af7072fc27193f65caeb630f8229c8be89b57d3/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala#L1774).
 Should we keep the `UpCast` like in the branch above?

##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
##########
@@ -898,14 +905,24 @@ class SessionCatalog(
     val nameToCurrentOrdinal = scala.collection.mutable.HashMap.empty[String, 
Int]

Review comment:
       [These 
lines](https://github.com/apache/spark/blob/1af7072fc27193f65caeb630f8229c8be89b57d3/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala#L873-L899)
 are not used if `isHiveCreatedView(metadata)` is true. Should these lines be 
moved into the `if-else` branch for non-Hive views?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] shardulm94 commented on a change in pull request #34254: [SPARK-36905] Fix reading hive views without explicit column names

Reply via email to