Hi Rares, The number of partition is controlled by HDFS input format, and one file may have multiple partitions if it consists of multiple block. In you case, I think there is one file with 2 splits.
Thanks. Zhan Zhang On Mar 27, 2015, at 3:12 PM, Rares Vernica <rvern...@gmail.com<mailto:rvern...@gmail.com>> wrote: Hello, I am using the Spark shell in Scala on the localhost. I am using sc.textFile to read a directory. The directory looks like this (generated by another Spark script): part-00000 part-00001 _SUCCESS The part-00000 has four short lines of text while part-00001 has two short lines of text. The _SUCCESS file is empty. When I check the number of partitions on the RDD I get: scala> foo.partitions.length 15/03/27 14:57:31 INFO FileInputFormat: Total input paths to process : 2 res68: Int = 3 I wonder why do the two input files generate three partitions. Does Spark check the number of lines in each file and try to generate three balanced partitions? Thanks! Rares