Github user mgaido91 commented on a diff in the pull request:
https://github.com/apache/spark/pull/22364#discussion_r216298504
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/AttributeSet.scala
---
@@ -39,10 +41,15 @@ object AttributeSet {
/** Constructs a new [[AttributeSet]] given a sequence of [[Expression
Expressions]]. */
def apply(baseSet: Iterable[Expression]): AttributeSet = {
- new AttributeSet(
- baseSet
- .flatMap(_.references)
- .map(new AttributeEquals(_)).toSet)
--- End diff --
ok, so there are 2 reasons why the current one is better than the proposed:
- in your proposal we are calling `_.references` also in cases when it is
not needed. This causes unneeded operations (basically
https://github.com/apache/spark/pull/22364/files#diff-75576f0ec7f9d8b5032000245217d233R40
is called for each attribute... see also `Attribute.references`);
- the current one uses a mutable Set, but this is not a big deal, it can
be changed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]