I wonder if there is any tool to tweak spark (worker and master). I have 6 workers (192 GB RAM, 32 cores CPU each) with 2 masters and see only small different between MapReduce from hadoop and Spark. I've tested word count on 50 GB file. During tests spark hung on 2 nodes for few minuts with message:
14/10/26 21:38:52 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from Stage 0 (MappedRDD[8] at saveAsTextFile at NativeMethodAccessorImpl.java:-2) 14/10/26 21:38:52 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 2 tasks 14/10/26 21:38:52 INFO spark.MapOutputTrackerMasterActor: Asked to send map output locations for shuffle 0 to [email protected]:41437 14/10/26 21:38:52 INFO spark.MapOutputTrackerMaster: Size of output statuses for shuffle 0 is 5942 bytes 14/10/26 21:38:52 INFO spark.MapOutputTrackerMasterActor: Asked to send map output locations for shuffle 0 to [email protected]:34546 Best regards, Morbious -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-optimization-tp17290.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
