When I run some of the Apache Spark examples in the Spark-Shell or as a
job, I am not able to achieve full core utilization on a single machine.
For example:

var textColumn = sc.textFile("/home/someuser/verylargefile.txt").cache()

var distinctWordCount = textColumn.flatMap(line => line.split('\0'))

                             .map(word => (word, 1))

                             .reduceByKey(_+_)

                             .count()

When running this script, I mostly see only 1 or 2 active cores on my 8
core machine. Isn't Spark supposed to parallelize this?

This job takes about 15 seconds but most of my cores are idle...

How can I configure spark to utilize all cores?

Reply via email to