Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/3633#discussion_r21565061
  
    --- Diff: core/src/main/scala/org/apache/spark/rdd/CoalescedRDD.scala ---
    @@ -55,9 +54,11 @@ private[spark] case class CoalescedRDDPartition(
        * @return locality of this coalesced partition between 0 and 1
        */
       def localFraction: Double = {
    -    val loc = parents.count(p =>
    -      rdd.context.getPreferredLocs(rdd, p.index).map(tl => 
tl.host).contains(preferredLocation))
    -
    +    val loc = parents.count { p =>
    --- End diff --
    
    Maybe this is outside the scope of what you want to do in this PR, but this 
statement seems a little dense and hard to read, so it might be nice to 
refactor it a bit.
    
    Here's what we have now, as of this PR:
    
    ```scala
        val loc = parents.count { p =>
          preferredLocation.exists { l =>
            rdd.context.getPreferredLocs(rdd, p.index).map(_.host).contains(l)
          }
        }
    ```
    
    The idea here seems to be something like "calculate the number of parents 
whose partitions can be computed at the given `preferredLocation`.  I think it 
might be clearer / more efficient to hoist the 
`rdd.context.getPreferredLocs(rdd, p.index).map(_host)` out of the inner 
`exists` loop:
    
    ```scala
        val loc = parents.count { p =>
          val parentPreferredLocations = rdd.context.getPreferredLocs(rdd, 
p.index).map(_.host)
          preferredLocation.exists(parentPreferredLocations.contains)
        }
    ```
    
    This seems clearer to me, since I think that the last line reads more 
clearly as "true if one of the parent's preferred locations is 
`preferredLocation`."
    
    It might also be nice to rename `loc` to something more descriptive, such 
as `localParents`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to