Hi Tim, Any way I can provide more info on this?
On Thu, Oct 1, 2015 at 4:21 PM, Utkarsh Sengar <utkarsh2...@gmail.com> wrote: > Not sure what you mean by that, I shared the data which I see in spark UI. > Can you point me to a location where I can precisely get the data you need? > > When I run the job in fine grained mode, I see tons are tasks created and > destroyed under a mesos "framework". I have about 80k spark tasks which I > think translates directly to independent mesos tasks. > > https://dl.dropboxusercontent.com/u/2432670/Screen%20Shot%202015-10-01%20at%204.14.34%20PM.png > > When i run the job in coarse grained mode, I just see 1-4 tasks with 1-4 > executors (it varies from what mesos allocates). And these mesos tasks try > to complete the 80k spark tasks and runs out of memory eventually (see the > stack track above) in the gist shared above. > > > On Thu, Oct 1, 2015 at 4:07 PM, Tim Chen <t...@mesosphere.io> wrote: > >> Hi Utkarsh, >> >> I replied earlier asking what is your task assignment like with fine vs >> coarse grain mode look like? >> >> Tim >> >> On Thu, Oct 1, 2015 at 4:05 PM, Utkarsh Sengar <utkarsh2...@gmail.com> >> wrote: >> >>> Bumping it up, its not really a blocking issue. >>> But fine grain mode eats up uncertain number of resources in mesos and >>> launches tons of tasks, so I would prefer using the coarse grained mode if >>> only it didn't run out of memory. >>> >>> Thanks, >>> -Utkarsh >>> >>> On Mon, Sep 28, 2015 at 2:24 PM, Utkarsh Sengar <utkarsh2...@gmail.com> >>> wrote: >>> >>>> Hi Tim, >>>> >>>> 1. spark.mesos.coarse:false (fine grain mode) >>>> This is the data dump for config and executors assigned: >>>> https://gist.github.com/utkarsh2012/6401d5526feccab14687 >>>> >>>> 2. spark.mesos.coarse:true (coarse grain mode) >>>> Dump for coarse mode: >>>> https://gist.github.com/utkarsh2012/918cf6f8ed5945627188 >>>> >>>> As you can see, exactly the same code works fine in fine grained, goes >>>> out of memory in coarse grained mode. First an executor was lost and then >>>> the driver went out of memory. >>>> So I am trying to understand what is different in fine grained vs >>>> coarse mode other than allocation of multiple mesos tasks vs 1 mesos task. >>>> Clearly spark is not managing memory in the same way. >>>> >>>> Thanks, >>>> -Utkarsh >>>> >>>> >>>> On Fri, Sep 25, 2015 at 9:17 AM, Tim Chen <t...@mesosphere.io> wrote: >>>> >>>>> Hi Utkarsh, >>>>> >>>>> What is your job placement like when you run fine grain mode? You said >>>>> coarse grain mode only ran with one node right? >>>>> >>>>> And when the job is running could you open the Spark webui and get >>>>> stats about the heap size and other java settings? >>>>> >>>>> Tim >>>>> >>>>> On Thu, Sep 24, 2015 at 10:56 PM, Utkarsh Sengar < >>>>> utkarsh2...@gmail.com> wrote: >>>>> >>>>>> Bumping this one up, any suggestions on the stacktrace? >>>>>> spark.mesos.coarse=true is not working and the driver crashed with >>>>>> the error. >>>>>> >>>>>> On Wed, Sep 23, 2015 at 3:29 PM, Utkarsh Sengar < >>>>>> utkarsh2...@gmail.com> wrote: >>>>>> >>>>>>> Missed to do a reply-all. >>>>>>> >>>>>>> Tim, >>>>>>> >>>>>>> spark.mesos.coarse = true doesn't work and spark.mesos.coarse = >>>>>>> false works (sorry there was a typo in my last email, I meant "when I do >>>>>>> "spark.mesos.coarse=false", the job works like a charm. "). >>>>>>> >>>>>>> I get this exception with spark.mesos.coarse = true: >>>>>>> >>>>>>> 15/09/22 20:18:05 INFO MongoCollectionSplitter: Created split: min={ >>>>>>> "_id" : "55af4bf26750ad38a444d7cf"}, max= { "_id" : >>>>>>> "55af5a61e8a42806f47546c1"} >>>>>>> >>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#611337>15/09/22 >>>>>>> 20:18:05 INFO MongoCollectionSplitter: Created split: min={ "_id" : >>>>>>> "55af5a61e8a42806f47546c1"}, max= null >>>>>>> >>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#611453>Exception >>>>>>> in thread "main" java.lang.OutOfMemoryError: Java heap space >>>>>>> >>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#611524> >>>>>>> at >>>>>>> org.apache.spark.rdd.CartesianRDD.getPartitions(CartesianRDD.scala:60) >>>>>>> >>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#611599> >>>>>>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219) >>>>>>> >>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#611671> >>>>>>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217) >>>>>>> >>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#611743> >>>>>>> at scala.Option.getOrElse(Option.scala:120) >>>>>>> >>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#611788> >>>>>>> at org.apache.spark.rdd.RDD.partitions(RDD.scala:217) >>>>>>> >>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#611843> >>>>>>> at >>>>>>> org.apache.spark.rdd.CartesianRDD.getPartitions(CartesianRDD.scala:60) >>>>>>> >>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#611918> >>>>>>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219) >>>>>>> >>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#611990> >>>>>>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217) >>>>>>> >>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#612062> >>>>>>> at scala.Option.getOrElse(Option.scala:120) >>>>>>> >>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#612107> >>>>>>> at org.apache.spark.rdd.RDD.partitions(RDD.scala:217) >>>>>>> >>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#612162> >>>>>>> at >>>>>>> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32) >>>>>>> >>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#612245> >>>>>>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219) >>>>>>> >>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#612317> >>>>>>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217) >>>>>>> >>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#612389> >>>>>>> at scala.Option.getOrElse(Option.scala:120) >>>>>>> >>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#612434> >>>>>>> at org.apache.spark.rdd.RDD.partitions(RDD.scala:217) >>>>>>> >>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#612489> >>>>>>> at >>>>>>> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32) >>>>>>> >>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#612572> >>>>>>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219) >>>>>>> >>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#612644> >>>>>>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217) >>>>>>> >>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#612716> >>>>>>> at scala.Option.getOrElse(Option.scala:120) >>>>>>> >>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#612761> >>>>>>> at org.apache.spark.rdd.RDD.partitions(RDD.scala:217) >>>>>>> >>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#612816> >>>>>>> at >>>>>>> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32) >>>>>>> >>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#612899> >>>>>>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219) >>>>>>> >>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#612971> >>>>>>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217) >>>>>>> >>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#613043> >>>>>>> at scala.Option.getOrElse(Option.scala:120) >>>>>>> >>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#613088> >>>>>>> at org.apache.spark.rdd.RDD.partitions(RDD.scala:217) >>>>>>> >>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#613143> >>>>>>> at >>>>>>> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32) >>>>>>> >>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#613226> >>>>>>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219) >>>>>>> >>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#613298> >>>>>>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217) >>>>>>> >>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#613370> >>>>>>> at scala.Option.getOrElse(Option.scala:120) >>>>>>> >>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#613415> >>>>>>> at org.apache.spark.rdd.RDD.partitions(RDD.scala:217) >>>>>>> >>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#613470> >>>>>>> at org.apache.spark.ShuffleDependency.<init>(Dependency.scala:82) >>>>>>> >>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#613537> >>>>>>> at >>>>>>> org.apache.spark.rdd.ShuffledRDD.getDependencies(ShuffledRDD.scala:78) >>>>>>> >>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#613612>15/09/22 >>>>>>> 20:18:17 INFO SparkContext: Invoking stop() from shutdown hook >>>>>>> >>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#613684>15/09/22 >>>>>>> 20:18:17 INFO BlockManagerInfo: Removed broadcast_2_piece0 on >>>>>>> some-ip-here:37706 in memory (size: 1964.0 B, free: 2.8 GB) >>>>>>> >>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#613814>15/09/22 >>>>>>> 20:18:17 INFO BlockManagerInfo: Removed broadcast_2_piece0 on >>>>>>> mesos-slave10 >>>>>>> in memory (size: 1964.0 B, free: 5.2 GB) >>>>>>> >>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#613977>15/09/22 >>>>>>> 20:18:17 INFO BlockManagerInfo: Removed broadcast_1_piece0 on >>>>>>> some-ip-here:37706 in memory (size: 17.2 KB, free: 2.8 GB) >>>>>>> >>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#614106>15/09/22 >>>>>>> 20:18:17 INFO BlockManagerInfo: Removed broadcast_1_piece0 on >>>>>>> mesos-slave105 in memory (size: 17.2 KB, free: 5.2 GB) >>>>>>> >>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#614268>15/09/22 >>>>>>> 20:18:17 INFO BlockManagerInfo: Removed broadcast_1_piece0 on >>>>>>> mesos-slave1 >>>>>>> in memory (size: 17.2 KB, free: 5.2 GB) >>>>>>> >>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#614429>15/09/22 >>>>>>> 20:18:17 INFO BlockManagerInfo: Removed broadcast_1_piece0 on >>>>>>> mesos-slave9 >>>>>>> in memory (size: 17.2 KB, free: 5.2 GB) >>>>>>> >>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#614590>15/09/22 >>>>>>> 20:18:17 INFO BlockManagerInfo: Removed broadcast_1_piece0 on >>>>>>> mesos-slave3 >>>>>>> in memory (size: 17.2 KB, free: 5.2 GB) >>>>>>> >>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#614751>15/09/22 >>>>>>> 20:18:17 INFO SparkUI: Stopped Spark web UI at >>>>>>> http://some-ip-here:4040 >>>>>>> >>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#614831>15/09/22 >>>>>>> 20:18:17 INFO DAGScheduler: Stopping DAGScheduler >>>>>>> >>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#614890>15/09/22 >>>>>>> 20:18:17 INFO CoarseMesosSchedulerBackend: Shutting down all executors >>>>>>> >>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#614970>15/09/22 >>>>>>> 20:18:17 INFO CoarseMesosSchedulerBackend: Asking each executor to shut >>>>>>> down >>>>>>> >>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#615056>I0922 >>>>>>> 20:18:17.794598 171 sched.cpp:1591] Asked to stop the driver >>>>>>> >>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#615125>I0922 >>>>>>> 20:18:17.794739 143 sched.cpp:835] Stopping framework >>>>>>> '20150803-224832-1577534986-5050-1614-0016' >>>>>>> >>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#615231>15/09/22 >>>>>>> 20:18:17 INFO CoarseMesosSchedulerBackend: driver.run() returned with >>>>>>> code >>>>>>> DRIVER_STOPPED >>>>>>> >>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#615330>15/09/22 >>>>>>> 20:18:17 INFO MapOutputTrackerMasterEndpoint: >>>>>>> MapOutputTrackerMasterEndpoint stopped! >>>>>>> >>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#615425>15/09/22 >>>>>>> 20:18:17 INFO Utils: path = >>>>>>> /tmp/spark-98801318-9c49-473b-bf2f-07ea42187252/blockmgr-0e0e1a1c-894e-4e79-beac-ead0dff43166, >>>>>>> already present as root for deletion. >>>>>>> >>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#615595>15/09/22 >>>>>>> 20:18:17 INFO MemoryStore: MemoryStore cleared >>>>>>> >>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#615651>15/09/22 >>>>>>> 20:18:17 INFO BlockManager: BlockManager stopped >>>>>>> >>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#615709>15/09/22 >>>>>>> 20:18:17 INFO BlockManagerMaster: BlockManagerMaster stopped >>>>>>> >>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#615779>15/09/22 >>>>>>> 20:18:17 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: >>>>>>> OutputCommitCoordinator stopped! >>>>>>> >>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#615892>15/09/22 >>>>>>> 20:18:17 INFO SparkContext: Successfully stopped SparkContext >>>>>>> >>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#615963>15/09/22 >>>>>>> 20:18:17 INFO Utils: Shutdown hook called >>>>>>> >>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#616014>15/09/22 >>>>>>> 20:18:17 INFO Utils: Deleting directory >>>>>>> /tmp/spark-98801318-9c49-473b-bf2f-07ea42187252 >>>>>>> >>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#616111>15/09/22 >>>>>>> 20:18:17 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down >>>>>>> remote daemon. >>>>>>> >>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#616206>15/09/22 >>>>>>> 20:18:17 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon >>>>>>> shut >>>>>>> down; proceeding with flushing remote transports. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Tue, Sep 22, 2015 at 1:26 AM, Tim Chen <t...@mesosphere.io> wrote: >>>>>>> >>>>>>>> Hi Utkarsh, >>>>>>>> >>>>>>>> Just to be sure you originally set coarse to false but then to >>>>>>>> true? Or is it the other way around? >>>>>>>> >>>>>>>> Also what's the exception/stack trace when the driver crashed? >>>>>>>> >>>>>>>> Coarse grain mode per-starts all the Spark executor backends, so >>>>>>>> has the least overhead comparing to fine grain. There is no single >>>>>>>> answer >>>>>>>> for which mode you should use, otherwise we would have removed one of >>>>>>>> those >>>>>>>> modes since it depends on your use case. >>>>>>>> >>>>>>>> There are quite some factor why there could be huge GC pauses, but >>>>>>>> I don't think if you switch to standalone your GC pauses go away. >>>>>>>> >>>>>>>> Tim >>>>>>>> >>>>>>>> On Mon, Sep 21, 2015 at 5:18 PM, Utkarsh Sengar < >>>>>>>> utkarsh2...@gmail.com> wrote: >>>>>>>> >>>>>>>>> I am running Spark 1.4.1 on mesos. >>>>>>>>> >>>>>>>>> The spark job does a "cartesian" of 4 RDDs (aRdd, bRdd, cRdd, >>>>>>>>> dRdd) of size 100, 100, 7 and 1 respectively. Lets call it prouctRDD. >>>>>>>>> >>>>>>>>> Creation of "aRdd" needs data pull from multiple data sources, >>>>>>>>> merging it and creating a tuple of JavaRdd, finally aRDD looks >>>>>>>>> something >>>>>>>>> like this: JavaRDD<Tuple4<A1, A2>> >>>>>>>>> bRdd, cRdd and dRdds are just List<> of values. >>>>>>>>> >>>>>>>>> Then apply a transformation on prouctRDD and finally call >>>>>>>>> "saveAsTextFile" to save the result of my transformation. >>>>>>>>> >>>>>>>>> Problem: >>>>>>>>> By setting "spark.mesos.coarse=true", creation of "aRdd" works >>>>>>>>> fine but driver crashes while doing the cartesian but when I do >>>>>>>>> "spark.mesos.coarse=true", the job works like a charm. I am running >>>>>>>>> spark >>>>>>>>> on mesos. >>>>>>>>> >>>>>>>>> Comments: >>>>>>>>> So I wanted to understand what role does "spark.mesos.coarse=true" >>>>>>>>> plays in terms of memory and compute performance. My findings look >>>>>>>>> counter >>>>>>>>> intuitive since: >>>>>>>>> >>>>>>>>> 1. "spark.mesos.coarse=true" just runs on 1 mesos task, so >>>>>>>>> there should be an overhead of spinning up mesos tasks which >>>>>>>>> should impact >>>>>>>>> the performance. >>>>>>>>> 2. What config for "spark.mesos.coarse" recommended for >>>>>>>>> running spark on mesos? Or there is no best answer and it depends >>>>>>>>> on >>>>>>>>> usecase? >>>>>>>>> 3. Also by setting "spark.mesos.coarse=true", I notice that I >>>>>>>>> get huge GC pauses even with small dataset but a long running job >>>>>>>>> (but this >>>>>>>>> can be a separate discussion). >>>>>>>>> >>>>>>>>> Let me know if I am missing something obvious, we are learning >>>>>>>>> spark tuning as we move forward :) >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Thanks, >>>>>>>>> -Utkarsh >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Thanks, >>>>>>> -Utkarsh >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Thanks, >>>>>> -Utkarsh >>>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> Thanks, >>>> -Utkarsh >>>> >>> >>> >>> >>> -- >>> Thanks, >>> -Utkarsh >>> >> >> > > > -- > Thanks, > -Utkarsh > -- Thanks, -Utkarsh