[GitHub] spark pull request #21638: [SPARK-22357][CORE] SparkContext.binaryFiles igno...

bomeng Tue, 04 Sep 2018 11:35:34 -0700

Github user bomeng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21638#discussion_r215022562
  
    --- Diff: 
core/src/main/scala/org/apache/spark/input/PortableDataStream.scala ---
    @@ -47,7 +47,7 @@ private[spark] abstract class StreamFileInputFormat[T]
       def setMinPartitions(sc: SparkContext, context: JobContext, 
minPartitions: Int) {
         val defaultMaxSplitBytes = 
sc.getConf.get(config.FILES_MAX_PARTITION_BYTES)
         val openCostInBytes = sc.getConf.get(config.FILES_OPEN_COST_IN_BYTES)
    -    val defaultParallelism = sc.defaultParallelism
    +    val defaultParallelism = Math.max(sc.defaultParallelism, minPartitions)
    --- End diff --
    
    From the codes, you can see the calculation is just the intermediate result 
and this method won't return any value. Checking the split size does not make 
sense for this test case because it depends on multiple variables and this is 
just one of them.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #21638: [SPARK-22357][CORE] SparkContext.binaryFiles igno...

Reply via email to