PageRank execution imbalance, might hurt performance by 6x

2014-09-27 Thread Larry Xiao
Hi all! I'm running PageRank on GraphX, and I find on some tasks on one machine can spend 5~6 times more time than on others, others are perfectly balance (around 1 second to finish). And since time for a stage (iteration) is determined by the slowest task, the performance is undesirable. I

Re: memory size for caching RDD

2014-09-27 Thread Tom Hubregtsen
Use unpersist(), even when not persisted before. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/memory-size-for-caching-RDD-tp8256p8579.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. -

RE: spark.local.dir and spark.worker.dir not used

2014-09-27 Thread Tom Hubregtsen
Also, if I am not mistaken, this data is automatically removed after your run. Be sure to check it while running your program. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/spark-local-dir-and-spark-worker-dir-not-used-tp8529p8578.html Sent from the

Spark memory regions

2014-09-27 Thread Tom Hubregtsen
As I've told before, I am currently writing my master's thesis on storage and memory usage in Spark. I am currently specifically looking at the different fractions of memory: I was able to find 3 memory regions, but it seems to leave some unaccounted for: 1. spark.shuffle.memoryFraction: 20% 2. sp