[GitHub] [spark] Ngone51 commented on a change in pull request #32590: [SPARK-35445][SQL] Reduce the execution time of DeduplicateRelations

GitBox Wed, 19 May 2021 08:34:01 -0700


Ngone51 commented on a change in pull request #32590:
URL: https://github.com/apache/spark/pull/32590#discussion_r635358225




##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/DeduplicateRelations.scala
##########
@@ -17,16 +17,30 @@
 
 package org.apache.spark.sql.catalyst.analysis
 
-import scala.collection.mutable.ArrayBuffer
+import scala.collection.mutable
 
 import org.apache.spark.sql.catalyst.expressions.{Alias, AttributeMap, 
AttributeSet, NamedExpression, SubqueryExpression}
 import org.apache.spark.sql.catalyst.plans.logical._
 import org.apache.spark.sql.catalyst.rules.Rule
 import org.apache.spark.sql.catalyst.trees.AlwaysProcess
 
+/**
+ * A help class that used to detect duplicate relations fast in 
`DeduplicateRelations`
+ */
+case class ReferenceEqualPlanWrapper(plan: LogicalPlan) {
+  private val _hashCode = System.identityHashCode(plan)

Review comment:
       We only want to know if the two plans are the same instance. By default, 
`System.identityHashCode` returns the memory address of an object, which is 
faster than plan.hash() (which involves more details about the plan itself.)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] Ngone51 commented on a change in pull request #32590: [SPARK-35445][SQL] Reduce the execution time of DeduplicateRelations

Reply via email to