[GitHub] [spark] cloud-fan commented on a diff in pull request #40300: [SPARK-42683] Automatically rename conflicting metadata columns

via GitHub Mon, 27 Mar 2023 20:52:43 -0700


cloud-fan commented on code in PR #40300:
URL: https://github.com/apache/spark/pull/40300#discussion_r1149997056



##########
sql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/InMemoryBaseTable.scala:
##########
@@ -297,8 +297,14 @@ abstract class InMemoryBaseTable(
       InMemoryBatchScan(data.map(_.asInstanceOf[InputPartition]), schema, 
tableSchema)
 
     override def pruneColumns(requiredSchema: StructType): Unit = {
-      val schemaNames = metadataColumnNames ++ tableSchema.map(_.name)
-      schema = StructType(requiredSchema.filter(f => 
schemaNames.contains(f.name)))
+      // The required schema could contain conflict-renamed metadata columns, 
so we need to match
+      // them by their logical (original) names, not their current names.
+      val schemaNames = tableSchema.map(_.name).toSet
+      val prunedFields = requiredSchema.filter {

Review Comment:
   > ... simply won't gain the ability to rename metadata columns on conflict.
   
   is it true? The rename happens at the Spark side, when Spark append metadata 
columns to the normal output columns.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #40300: [SPARK-42683] Automatically rename conflicting metadata columns

Reply via email to