dbaliafroozeh opened a new pull request #29598:
URL: https://github.com/apache/spark/pull/29598


   ### What changes were proposed in this pull request?
   This PR changes `AttributeSet` and `ExpressionSet` to maintain the insertion 
order of the elements. More specifically, we:
   - change the underlying data structure of `AttributeSet` from `HashSet` to 
`LinkedHashSet` to maintain the insertion order.
   - `ExpressionSet` already uses a list to keep track of the expressions, 
however, since it is extending Scala's immutable.Set class, operations such as 
map and flatMap are delegated to the immutable.Set itself. This means that the 
result of these operations is not an instance of ExpressionSet anymore, rather 
it's a implementation picked up by the parent class. We also remove this 
inheritance from `immutable.Set `and implement the needed methods directly. 
ExpressionSet has a very specific semantics and it does not make sense to 
extend `immutable.Set` anyway.
   - We change the `PlanStabilitySuite` to not sort the attributes, to be able 
to catch changes in the order of expressions in different runs.
   
   
   ### Why are the changes needed?
   Expressions identity is based on the `ExprId` which is an auto-incremented 
number. This means that the same query can yield a query plan with different 
expression ids in different runs. `AttributeSet` and `ExpressionSet` internally 
use a `HashSet` as the underlying data structure, and therefore cannot 
guarantee the a fixed order of operations in different runs. This can be 
problematic in cases we like to check for plan changes in different runs.
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   Passes `PlanStabilitySuite` after regenerating the golden files.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to