Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/6397#discussion_r31150154
  
    --- Diff: 
core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala ---
    @@ -123,43 +130,30 @@ private[spark] class ExternalSorter[K, V, C](
       // grow internal data structures by growing + copying every time the 
number of objects doubles.
       private val serializerBatchSize = 
conf.getLong("spark.shuffle.spill.batchSize", 10000)
     
    -  private def getPartition(key: K): Int = {
    -    if (shouldPartition) partitioner.get.getPartition(key) else 0
    +  // The idea here is to reduce minimize the visibility of variables like 
`kvChunkSize` and
    +  // `useSerializedPairBuffer` without recomputing them on each call to 
`newBuffer()`.
    +  private def newBuffer: () => WritablePartitionedPairCollection[K, C] 
with SizeTracker = {
    --- End diff --
    
    I'll just roll this back to a simpler version that does not attempt to hide 
those variables inside of the function, since this type of excessively clever 
code does not belong in the shuffle.
    
    Sent from my phone
    
    > On May 27, 2015, at 8:05 AM, Shixiong Zhu <[email protected]> 
wrote:
    > 
    > In 
core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala:
    > 
    > > @@ -123,43 +130,30 @@ private[spark] class ExternalSorter[K, V, C](
    > >    // grow internal data structures by growing + copying every time the 
number of objects doubles.
    > >    private val serializerBatchSize = 
conf.getLong("spark.shuffle.spill.batchSize", 10000)
    > >  
    > > -  private def getPartition(key: K): Int = {
    > > -    if (shouldPartition) partitioner.get.getPartition(key) else 0
    > > +  // The idea here is to reduce minimize the visibility of variables 
like `kvChunkSize` and
    > > +  // `useSerializedPairBuffer` without recomputing them on each call 
to `newBuffer()`.
    > > +  private def newBuffer: () => WritablePartitionedPairCollection[K, C] 
with SizeTracker = {
    > newBuffer should be val, or it will recompute each time.
    > 
    > —
    > Reply to this email directly or view it on GitHub.
    > 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to