cache() method returns new RDD so you have to use something like this:
val person =
sc.textFile("hdfs://namenode_host:8020/user/person.txt").map(_.split(",")).map(p
=> Person(p(0).trim.toInt, p(1)))
val cached = person.cache
cached.count
when you rerun count on cached you will see that ca
Hello everyone,
I'm using SparkSQL and would like to understand how can I determine right
value for "spark.sql.shuffle.partitions" parameter? For example if I'm
joining two RDDs where first has 10 partitions and second - 60, how big
this parameter should be?
Thank you,
Yuri