Github user chenghao-intel commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2114#discussion_r16722823
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala 
---
    @@ -38,7 +38,7 @@ abstract class TreeNode[BaseType <: TreeNode[BaseType]] {
        * Unlike `equals`, `id` can be used to differentiate distinct but 
structurally
        * identical branches of a tree.
        */
    -  val id = TreeNode.nextId()
    +  @transient lazy val id = TreeNode.nextId()
    --- End diff --
    
    Yes, you're right, we need to think about the id usage, but currently it is 
the workaround for performance. I noticed that the aggregation performance is 
not so good because large number of `AggregateFunction` objects were created 
during the execution on slave, you know the `AggregationFunction` is a sub 
class of `TreeNode`, actually the id generation here is the bottleneck in 
multithreading env(because of the memory barrier in a multi-core system).
    
    On the other hand, I don't think we have the logic to call the `eq` of an 
expression object during the execution time on slaves directly or indirectly. 
Those calls are supposed to be done in master(for example in logical plan 
analysis & optimization).
    
    So I am OK to merge this PR for the quick fix or rewrite the code for 
getting ride of the id entirely.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to