Github user viirya commented on a diff in the pull request:
https://github.com/apache/spark/pull/21638#discussion_r197988075
--- Diff:
core/src/main/scala/org/apache/spark/input/PortableDataStream.scala ---
@@ -45,7 +45,8 @@ private[spark] abstract class StreamFileInputFormat[T]
* which is set through setMaxSplitSize
*/
def setMinPartitions(sc: SparkContext, context: JobContext,
minPartitions: Int) {
- val defaultMaxSplitBytes =
sc.getConf.get(config.FILES_MAX_PARTITION_BYTES)
+ val defaultMaxSplitBytes = Math.max(
+ sc.getConf.get(config.FILES_MAX_PARTITION_BYTES), minPartitions)
val openCostInBytes = sc.getConf.get(config.FILES_OPEN_COST_IN_BYTES)
val defaultParallelism = sc.defaultParallelism
--- End diff --
hmm, shouldn't `minPartitions` be used like this?
```scala
val defaultParallelism = Math.max(sc.defaultParallelism, if (minPartitions
== 0) 1 else minPartitions)
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]