(By the way, you can use wordRDD.countByValue instead of the map and
reduceByKey. It won't make a difference to your issue but is more
compact.)
As you say, the problem is the very limited range of keys (word
lengths). I wonder if you can use sortBy instead of map and sortByKey,
and instead sortBy
For example, We consider the word count of the long text data (100GB order).
There is clearly a bias for the word , has been expected to be a long tail data
do word count. Probably word number 1 occupies about over 1 / 10.
word count code
```
val allWordLineSplited: RDD[String] = // create RD