While I'm not sure why you're seeing an increase in partitions with such a small data file, it's worth noting that the second parameter to textFile is the *minimum* number of partitions so there's no guarantee you'll get exactly that number.
-- Michael Mior mm...@apache.org 2017-06-01 6:28 GMT-04:00 Vikash Pareek <vikash.par...@infoobjects.com>: > Hi, > > I am creating a RDD from a text file by specifying number of partitions. > But > it gives me different number of partitions than the specified one. > > */scala> val people = sc.textFile("file:///home/pvikash/data/test.txt", 0) > people: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[72] at > textFile > at <console>:27 > > scala> people.getNumPartitions > res47: Int = 1 > > scala> val people = sc.textFile("file:///home/pvikash/data/test.txt", 1) > people: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[50] at > textFile > at <console>:27 > > scala> people.getNumPartitions > res36: Int = 1 > > scala> val people = sc.textFile("file:///home/pvikash/data/test.txt", 2) > people: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[52] at > textFile > at <console>:27 > > scala> people.getNumPartitions > res37: Int = 2 > > scala> val people = sc.textFile("file:///home/pvikash/data/test.txt", 3) > people: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[54] at > textFile > at <console>:27 > > scala> people.getNumPartitions > res38: Int = 3 > > scala> val people = sc.textFile("file:///home/pvikash/data/test.txt", 4) > people: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[56] at > textFile > at <console>:27 > > scala> people.getNumPartitions > res39: Int = 4 > > scala> val people = sc.textFile("file:///home/pvikash/data/test.txt", 5) > people: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[58] at > textFile > at <console>:27 > > scala> people.getNumPartitions > res40: Int = 6 > > scala> val people = sc.textFile("file:///home/pvikash/data/test.txt", 6) > people: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[60] at > textFile > at <console>:27 > > scala> people.getNumPartitions > res41: Int = 7 > > scala> val people = sc.textFile("file:///home/pvikash/data/test.txt", 7) > people: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[62] at > textFile > at <console>:27 > > scala> people.getNumPartitions > res42: Int = 8 > > scala> val people = sc.textFile("file:///home/pvikash/data/test.txt", 8) > people: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[64] at > textFile > at <console>:27 > > scala> people.getNumPartitions > res43: Int = 9 > > scala> val people = sc.textFile("file:///home/pvikash/data/test.txt", 9) > people: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[66] at > textFile > at <console>:27 > > scala> people.getNumPartitions > res44: Int = 11 > > scala> val people = sc.textFile("file:///home/pvikash/data/test.txt", 10) > people: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[68] at > textFile > at <console>:27 > > scala> people.getNumPartitions > res45: Int = 11 > > scala> val people = sc.textFile("file:///home/pvikash/data/test.txt", 11) > people: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[70] at > textFile > at <console>:27 > > scala> people.getNumPartitions > res46: Int = 13/* > > Contents of the file /home/pvikash/data/test.txt is: > " > This is a test file. > Will be used for rdd partition > " > > I am trying to understand why number of partitions is changing here and in > case we have small data (which can fit into one partition) then why spark > creates empty partitions? > > Any explanation would be appreciated. > > --Vikash > > > > -- > View this message in context: http://apache-spark-user-list. > 1001560.n3.nabble.com/Number-Of-Partitions-in-RDD-tp28730.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >