vladimirg-db commented on code in PR #47484:
URL: https://github.com/apache/spark/pull/47484#discussion_r1694957395


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala:
##########
@@ -248,10 +249,14 @@ case class PreprocessTableCreation(catalog: 
SessionCatalog) extends Rule[Logical
         DDLUtils.checkTableColumns(tableDesc.copy(schema = 
analyzedQuery.schema))
 
         val output = analyzedQuery.output
+
+        val outputByName = HashMap(output.map(o => o.name -> o): _*)
         val partitionAttrs = normalizedTable.partitionColumnNames.map { 
partCol =>
-          output.find(_.name == partCol).get
+          outputByName(partCol)
         }
-        val newOutput = output.filterNot(partitionAttrs.contains) ++ 
partitionAttrs
+        val partitionAttrsSet = HashSet(partitionAttrs: _*)

Review Comment:
   Do you think it would be more correct to compare the expression ids? The 
risk is that we would be changing the behaviour here, since the old code used 
`Seq.contains`, which uses `==` of `AttributeReference`s



##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala:
##########
@@ -263,12 +268,14 @@ case class PreprocessTableCreation(catalog: 
SessionCatalog) extends Rule[Logical
         DDLUtils.checkTableColumns(tableDesc)
         val normalizedTable = normalizeCatalogTable(tableDesc.schema, 
tableDesc)
 
+        val normalizedSchemaByName = HashMap(normalizedTable.schema.map(s => 
s.name -> s): _*)
         val partitionSchema = normalizedTable.partitionColumnNames.map { 
partCol =>
-          normalizedTable.schema.find(_.name == partCol).get
+          normalizedSchemaByName(partCol)
         }
-
-        val reorderedSchema =
-          
StructType(normalizedTable.schema.filterNot(partitionSchema.contains) ++ 
partitionSchema)
+        val partitionSchemaSet = HashSet(partitionSchema: _*)

Review Comment:
   Yes



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to