I wonder if there is any tool to tweak spark (worker and master).
I have 6 workers  (192 GB RAM, 32 cores CPU each) with 2 masters and see
only small different between MapReduce from hadoop and Spark. 
I've tested word count on 50 GB file. During tests spark hung on 2 nodes for
few minuts with message:

14/10/26 21:38:52 INFO scheduler.DAGScheduler: Submitting 2 missing tasks
from Stage 0 (MappedRDD[8] at saveAsTextFile at
NativeMethodAccessorImpl.java:-2)
14/10/26 21:38:52 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with
2 tasks
14/10/26 21:38:52 INFO spark.MapOutputTrackerMasterActor: Asked to send map
output locations for shuffle 0 to [email protected]:41437
14/10/26 21:38:52 INFO spark.MapOutputTrackerMaster: Size of output statuses
for shuffle 0 is 5942 bytes
14/10/26 21:38:52 INFO spark.MapOutputTrackerMasterActor: Asked to send map
output locations for shuffle 0 to [email protected]:34546

Best regards,

Morbious



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-optimization-tp17290.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to