Github user jiangxb1987 commented on a diff in the pull request:
https://github.com/apache/spark/pull/21638#discussion_r202737100
--- Diff:
core/src/main/scala/org/apache/spark/input/PortableDataStream.scala ---
@@ -47,7 +47,7 @@ private[spark] abstract class StreamFileInputFormat[T]
def setMinPartitions(sc: SparkContext, context: JobContext,
minPartitions: Int) {
val defaultMaxSplitBytes =
sc.getConf.get(config.FILES_MAX_PARTITION_BYTES)
val openCostInBytes = sc.getConf.get(config.FILES_OPEN_COST_IN_BYTES)
- val defaultParallelism = sc.defaultParallelism
+ val defaultParallelism = Math.max(sc.defaultParallelism, minPartitions)
--- End diff --
If `sc.defaultParallelism` < 2, and `minParititions` is not set in
`BinaryFileRDD`, then previously `defaultParallelism` shall be the same as
`sc.defaultParallelism`, and after the change it will be `2`. Have you already
consider this case and feel it's right behavior change to make?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]