Hi all!
I'm running PageRank on GraphX, and I find on some tasks on one machine
can spend 5~6 times more time than on others, others are perfectly
balance (around 1 second to finish).
And since time for a stage (iteration) is determined by the slowest
task, the performance is undesirable.
I
Use unpersist(), even when not persisted before.
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/memory-size-for-caching-RDD-tp8256p8579.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
-
Also, if I am not mistaken, this data is automatically removed after your
run. Be sure to check it while running your program.
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/spark-local-dir-and-spark-worker-dir-not-used-tp8529p8578.html
Sent from the
As I've told before, I am currently writing my master's thesis on storage and
memory usage in Spark. I am currently specifically looking at the different
fractions of memory:
I was able to find 3 memory regions, but it seems to leave some unaccounted
for:
1. spark.shuffle.memoryFraction: 20%
2. sp