Hi folks, puzzled by something pretty simple:
I have a standalone cluster with default parallelism of 2, spark-shell
running with 2 cores
sc.textFile("README.md").partitions.size returns 2 (this makes sense)
sc.textFile("README.md").coalesce(100,true).partitions.size returns 100,
also makes sense
but
sc.textFile("README.md",100).partitions.size
gives 102 --I was expecting this to be equivalent to last statement
(i.e.result in 100 partitions)
I'd appreciate if someone can enlighten me as to why I end up with 102
This is on Spark 1.2
thanks