Hi all, I'm trying to run some spark job with spark-shell. What I want to do is just to count the number of lines in a file. I start the spark-shell with the default argument i.e just with ./bin/spark-shell.
Load the text file with sc.textFile("path") and then call count on my data. When I do this, my data is always split in 52 partitions. I don't understand why since I run it on a local machine with 8 cores and the sc.defaultParallelism gives me 8. Even, if I load the file with sc.textFile("path",8), I always get data.partitions.size = 52 I use spark 1.1.1. Any ideas ? Cheers, Jao