[GitHub] spark pull request #22364: [SPARK-25379][SQL] Improve AttributeSet and Colum...

mgaido91 Mon, 10 Sep 2018 05:15:24 -0700

Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22364#discussion_r216298504
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/AttributeSet.scala
 ---
    @@ -39,10 +41,15 @@ object AttributeSet {
     
       /** Constructs a new [[AttributeSet]] given a sequence of [[Expression 
Expressions]]. */
       def apply(baseSet: Iterable[Expression]): AttributeSet = {
    -    new AttributeSet(
    -      baseSet
    -        .flatMap(_.references)
    -        .map(new AttributeEquals(_)).toSet)
    --- End diff --
    
    ok, so there are 2 reasons why the current one is better than the proposed:
    
     - in your proposal we are calling `_.references` also in cases when it is 
not needed. This causes unneeded operations (basically 
https://github.com/apache/spark/pull/22364/files#diff-75576f0ec7f9d8b5032000245217d233R40
 is called for each attribute... see also `Attribute.references`);
     - the current one uses a mutable Set, but this is not a big deal, it can 
be changed.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #22364: [SPARK-25379][SQL] Improve AttributeSet and Colum...

Reply via email to