Hi, What's "<all other stuff>"? What master URL do you use?
Pozdrawiam, Jacek Laskowski ---- https://medium.com/@jaceklaskowski/ Mastering Apache Spark http://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Tue, Jul 26, 2016 at 2:18 AM, Mail.com <[email protected]> wrote: > Hi All, > > I have a directory which has 12 files. I want to read the entire file so I am > reading it as wholeTextFiles(dirpath, numPartitions). > > I run spark-submit as <all other stuff> --num-executors 12 --executor-cores 1 > and numPartitions 12. > > However, when I run the job I see that the stage which reads the directory > has only 8 tasks. So some task reads more than one file and takes twice the > time. > > What can I do that the files are read by 12 tasks I.e one file per task. > > Thanks, > Pradeep > > --------------------------------------------------------------------- > To unsubscribe e-mail: [email protected] > --------------------------------------------------------------------- To unsubscribe e-mail: [email protected]
