Hi Tim,

Any way I can provide more info on this?

On Thu, Oct 1, 2015 at 4:21 PM, Utkarsh Sengar <utkarsh2...@gmail.com>
wrote:

> Not sure what you mean by that, I shared the data which I see in spark UI.
> Can you point me to a location where I can precisely get the data you need?
>
> When I run the job in fine grained mode, I see tons are tasks created and
> destroyed under a mesos "framework". I have about 80k spark tasks which I
> think translates directly to independent mesos tasks.
>
> https://dl.dropboxusercontent.com/u/2432670/Screen%20Shot%202015-10-01%20at%204.14.34%20PM.png
>
> When i run the job in coarse grained mode, I just see 1-4 tasks with 1-4
> executors (it varies from what mesos allocates). And these mesos tasks try
> to complete the 80k spark tasks and runs out of memory eventually (see the
> stack track above) in the gist shared above.
>
>
> On Thu, Oct 1, 2015 at 4:07 PM, Tim Chen <t...@mesosphere.io> wrote:
>
>> Hi Utkarsh,
>>
>> I replied earlier asking what is your task assignment like with fine vs
>> coarse grain mode look like?
>>
>> Tim
>>
>> On Thu, Oct 1, 2015 at 4:05 PM, Utkarsh Sengar <utkarsh2...@gmail.com>
>> wrote:
>>
>>> Bumping it up, its not really a blocking issue.
>>> But fine grain mode eats up uncertain number of resources in mesos and
>>> launches tons of tasks, so I would prefer using the coarse grained mode if
>>> only it didn't run out of memory.
>>>
>>> Thanks,
>>> -Utkarsh
>>>
>>> On Mon, Sep 28, 2015 at 2:24 PM, Utkarsh Sengar <utkarsh2...@gmail.com>
>>> wrote:
>>>
>>>> Hi Tim,
>>>>
>>>> 1. spark.mesos.coarse:false (fine grain mode)
>>>> This is the data dump for config and executors assigned:
>>>> https://gist.github.com/utkarsh2012/6401d5526feccab14687
>>>>
>>>> 2. spark.mesos.coarse:true (coarse grain mode)
>>>> Dump for coarse mode:
>>>> https://gist.github.com/utkarsh2012/918cf6f8ed5945627188
>>>>
>>>> As you can see, exactly the same code works fine in fine grained, goes
>>>> out of memory in coarse grained mode. First an executor was lost and then
>>>> the driver went out of memory.
>>>> So I am trying to understand what is different in fine grained vs
>>>> coarse mode other than allocation of multiple mesos tasks vs 1 mesos task.
>>>> Clearly spark is not managing memory in the same way.
>>>>
>>>> Thanks,
>>>> -Utkarsh
>>>>
>>>>
>>>> On Fri, Sep 25, 2015 at 9:17 AM, Tim Chen <t...@mesosphere.io> wrote:
>>>>
>>>>> Hi Utkarsh,
>>>>>
>>>>> What is your job placement like when you run fine grain mode? You said
>>>>> coarse grain mode only ran with one node right?
>>>>>
>>>>> And when the job is running could you open the Spark webui and get
>>>>> stats about the heap size and other java settings?
>>>>>
>>>>> Tim
>>>>>
>>>>> On Thu, Sep 24, 2015 at 10:56 PM, Utkarsh Sengar <
>>>>> utkarsh2...@gmail.com> wrote:
>>>>>
>>>>>> Bumping this one up, any suggestions on the stacktrace?
>>>>>> spark.mesos.coarse=true is not working and the driver crashed with
>>>>>> the error.
>>>>>>
>>>>>> On Wed, Sep 23, 2015 at 3:29 PM, Utkarsh Sengar <
>>>>>> utkarsh2...@gmail.com> wrote:
>>>>>>
>>>>>>> Missed to do a reply-all.
>>>>>>>
>>>>>>> Tim,
>>>>>>>
>>>>>>> spark.mesos.coarse = true doesn't work and spark.mesos.coarse =
>>>>>>> false works (sorry there was a typo in my last email, I meant "when I do
>>>>>>> "spark.mesos.coarse=false", the job works like a charm. ").
>>>>>>>
>>>>>>> I get this exception with spark.mesos.coarse = true:
>>>>>>>
>>>>>>> 15/09/22 20:18:05 INFO MongoCollectionSplitter: Created split: min={
>>>>>>> "_id" : "55af4bf26750ad38a444d7cf"}, max= { "_id" :
>>>>>>> "55af5a61e8a42806f47546c1"}
>>>>>>>
>>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#611337>15/09/22
>>>>>>> 20:18:05 INFO MongoCollectionSplitter: Created split: min={ "_id" :
>>>>>>> "55af5a61e8a42806f47546c1"}, max= null
>>>>>>>
>>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#611453>Exception
>>>>>>> in thread "main" java.lang.OutOfMemoryError: Java heap space
>>>>>>>
>>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#611524>
>>>>>>> at 
>>>>>>> org.apache.spark.rdd.CartesianRDD.getPartitions(CartesianRDD.scala:60)
>>>>>>>
>>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#611599>
>>>>>>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
>>>>>>>
>>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#611671>
>>>>>>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
>>>>>>>
>>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#611743>
>>>>>>> at scala.Option.getOrElse(Option.scala:120)
>>>>>>>
>>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#611788>
>>>>>>> at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
>>>>>>>
>>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#611843>
>>>>>>> at 
>>>>>>> org.apache.spark.rdd.CartesianRDD.getPartitions(CartesianRDD.scala:60)
>>>>>>>
>>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#611918>
>>>>>>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
>>>>>>>
>>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#611990>
>>>>>>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
>>>>>>>
>>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#612062>
>>>>>>> at scala.Option.getOrElse(Option.scala:120)
>>>>>>>
>>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#612107>
>>>>>>> at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
>>>>>>>
>>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#612162>
>>>>>>> at
>>>>>>> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32)
>>>>>>>
>>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#612245>
>>>>>>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
>>>>>>>
>>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#612317>
>>>>>>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
>>>>>>>
>>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#612389>
>>>>>>> at scala.Option.getOrElse(Option.scala:120)
>>>>>>>
>>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#612434>
>>>>>>> at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
>>>>>>>
>>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#612489>
>>>>>>> at
>>>>>>> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32)
>>>>>>>
>>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#612572>
>>>>>>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
>>>>>>>
>>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#612644>
>>>>>>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
>>>>>>>
>>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#612716>
>>>>>>> at scala.Option.getOrElse(Option.scala:120)
>>>>>>>
>>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#612761>
>>>>>>> at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
>>>>>>>
>>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#612816>
>>>>>>> at
>>>>>>> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32)
>>>>>>>
>>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#612899>
>>>>>>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
>>>>>>>
>>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#612971>
>>>>>>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
>>>>>>>
>>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#613043>
>>>>>>> at scala.Option.getOrElse(Option.scala:120)
>>>>>>>
>>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#613088>
>>>>>>> at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
>>>>>>>
>>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#613143>
>>>>>>> at
>>>>>>> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32)
>>>>>>>
>>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#613226>
>>>>>>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
>>>>>>>
>>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#613298>
>>>>>>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
>>>>>>>
>>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#613370>
>>>>>>> at scala.Option.getOrElse(Option.scala:120)
>>>>>>>
>>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#613415>
>>>>>>> at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
>>>>>>>
>>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#613470>
>>>>>>> at org.apache.spark.ShuffleDependency.<init>(Dependency.scala:82)
>>>>>>>
>>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#613537>
>>>>>>> at 
>>>>>>> org.apache.spark.rdd.ShuffledRDD.getDependencies(ShuffledRDD.scala:78)
>>>>>>>
>>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#613612>15/09/22
>>>>>>> 20:18:17 INFO SparkContext: Invoking stop() from shutdown hook
>>>>>>>
>>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#613684>15/09/22
>>>>>>> 20:18:17 INFO BlockManagerInfo: Removed broadcast_2_piece0 on
>>>>>>> some-ip-here:37706 in memory (size: 1964.0 B, free: 2.8 GB)
>>>>>>>
>>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#613814>15/09/22
>>>>>>> 20:18:17 INFO BlockManagerInfo: Removed broadcast_2_piece0 on 
>>>>>>> mesos-slave10
>>>>>>> in memory (size: 1964.0 B, free: 5.2 GB)
>>>>>>>
>>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#613977>15/09/22
>>>>>>> 20:18:17 INFO BlockManagerInfo: Removed broadcast_1_piece0 on
>>>>>>> some-ip-here:37706 in memory (size: 17.2 KB, free: 2.8 GB)
>>>>>>>
>>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#614106>15/09/22
>>>>>>> 20:18:17 INFO BlockManagerInfo: Removed broadcast_1_piece0 on
>>>>>>> mesos-slave105 in memory (size: 17.2 KB, free: 5.2 GB)
>>>>>>>
>>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#614268>15/09/22
>>>>>>> 20:18:17 INFO BlockManagerInfo: Removed broadcast_1_piece0 on 
>>>>>>> mesos-slave1
>>>>>>> in memory (size: 17.2 KB, free: 5.2 GB)
>>>>>>>
>>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#614429>15/09/22
>>>>>>> 20:18:17 INFO BlockManagerInfo: Removed broadcast_1_piece0 on 
>>>>>>> mesos-slave9
>>>>>>> in memory (size: 17.2 KB, free: 5.2 GB)
>>>>>>>
>>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#614590>15/09/22
>>>>>>> 20:18:17 INFO BlockManagerInfo: Removed broadcast_1_piece0 on 
>>>>>>> mesos-slave3
>>>>>>> in memory (size: 17.2 KB, free: 5.2 GB)
>>>>>>>
>>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#614751>15/09/22
>>>>>>> 20:18:17 INFO SparkUI: Stopped Spark web UI at
>>>>>>> http://some-ip-here:4040
>>>>>>>
>>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#614831>15/09/22
>>>>>>> 20:18:17 INFO DAGScheduler: Stopping DAGScheduler
>>>>>>>
>>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#614890>15/09/22
>>>>>>> 20:18:17 INFO CoarseMesosSchedulerBackend: Shutting down all executors
>>>>>>>
>>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#614970>15/09/22
>>>>>>> 20:18:17 INFO CoarseMesosSchedulerBackend: Asking each executor to shut 
>>>>>>> down
>>>>>>>
>>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#615056>I0922
>>>>>>> 20:18:17.794598 171 sched.cpp:1591] Asked to stop the driver
>>>>>>>
>>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#615125>I0922
>>>>>>> 20:18:17.794739 143 sched.cpp:835] Stopping framework
>>>>>>> '20150803-224832-1577534986-5050-1614-0016'
>>>>>>>
>>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#615231>15/09/22
>>>>>>> 20:18:17 INFO CoarseMesosSchedulerBackend: driver.run() returned with 
>>>>>>> code
>>>>>>> DRIVER_STOPPED
>>>>>>>
>>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#615330>15/09/22
>>>>>>> 20:18:17 INFO MapOutputTrackerMasterEndpoint:
>>>>>>> MapOutputTrackerMasterEndpoint stopped!
>>>>>>>
>>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#615425>15/09/22
>>>>>>> 20:18:17 INFO Utils: path =
>>>>>>> /tmp/spark-98801318-9c49-473b-bf2f-07ea42187252/blockmgr-0e0e1a1c-894e-4e79-beac-ead0dff43166,
>>>>>>> already present as root for deletion.
>>>>>>>
>>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#615595>15/09/22
>>>>>>> 20:18:17 INFO MemoryStore: MemoryStore cleared
>>>>>>>
>>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#615651>15/09/22
>>>>>>> 20:18:17 INFO BlockManager: BlockManager stopped
>>>>>>>
>>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#615709>15/09/22
>>>>>>> 20:18:17 INFO BlockManagerMaster: BlockManagerMaster stopped
>>>>>>>
>>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#615779>15/09/22
>>>>>>> 20:18:17 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:
>>>>>>> OutputCommitCoordinator stopped!
>>>>>>>
>>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#615892>15/09/22
>>>>>>> 20:18:17 INFO SparkContext: Successfully stopped SparkContext
>>>>>>>
>>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#615963>15/09/22
>>>>>>> 20:18:17 INFO Utils: Shutdown hook called
>>>>>>>
>>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#616014>15/09/22
>>>>>>> 20:18:17 INFO Utils: Deleting directory
>>>>>>> /tmp/spark-98801318-9c49-473b-bf2f-07ea42187252
>>>>>>>
>>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#616111>15/09/22
>>>>>>> 20:18:17 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down
>>>>>>> remote daemon.
>>>>>>>
>>>>>>> <http://singularity-qa-uswest2.otenv.com/task/ds-tetris-simspark-usengar.2015.09.22T20.14.36-1442952963980-1-mesos_slave1_qa_uswest2.qasql.opentable.com-us_west_2a/tail/stderr#616206>15/09/22
>>>>>>> 20:18:17 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon 
>>>>>>> shut
>>>>>>> down; proceeding with flushing remote transports.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Sep 22, 2015 at 1:26 AM, Tim Chen <t...@mesosphere.io> wrote:
>>>>>>>
>>>>>>>> Hi Utkarsh,
>>>>>>>>
>>>>>>>> Just to be sure you originally set coarse to false but then to
>>>>>>>> true? Or is it the other way around?
>>>>>>>>
>>>>>>>> Also what's the exception/stack trace when the driver crashed?
>>>>>>>>
>>>>>>>> Coarse grain mode per-starts all the Spark executor backends, so
>>>>>>>> has the least overhead comparing to fine grain. There is no single 
>>>>>>>> answer
>>>>>>>> for which mode you should use, otherwise we would have removed one of 
>>>>>>>> those
>>>>>>>> modes since it depends on your use case.
>>>>>>>>
>>>>>>>> There are quite some factor why there could be huge GC pauses, but
>>>>>>>> I don't think if you switch to standalone your GC pauses go away.
>>>>>>>>
>>>>>>>> Tim
>>>>>>>>
>>>>>>>> On Mon, Sep 21, 2015 at 5:18 PM, Utkarsh Sengar <
>>>>>>>> utkarsh2...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> I am running Spark 1.4.1 on mesos.
>>>>>>>>>
>>>>>>>>> The spark job does a "cartesian" of 4 RDDs (aRdd, bRdd, cRdd,
>>>>>>>>> dRdd) of size 100, 100, 7 and 1 respectively. Lets call it prouctRDD.
>>>>>>>>>
>>>>>>>>> Creation of "aRdd" needs data pull from multiple data sources,
>>>>>>>>> merging it and creating a tuple of JavaRdd, finally aRDD looks 
>>>>>>>>> something
>>>>>>>>> like this: JavaRDD<Tuple4<A1, A2>>
>>>>>>>>> bRdd, cRdd and dRdds are just List<> of values.
>>>>>>>>>
>>>>>>>>> Then apply a transformation on prouctRDD and finally call
>>>>>>>>> "saveAsTextFile" to save the result of my transformation.
>>>>>>>>>
>>>>>>>>> Problem:
>>>>>>>>> By setting "spark.mesos.coarse=true", creation of "aRdd" works
>>>>>>>>> fine but driver crashes while doing the cartesian but when I do
>>>>>>>>> "spark.mesos.coarse=true", the job works like a charm. I am running 
>>>>>>>>> spark
>>>>>>>>> on mesos.
>>>>>>>>>
>>>>>>>>> Comments:
>>>>>>>>> So I wanted to understand what role does "spark.mesos.coarse=true"
>>>>>>>>> plays in terms of memory and compute performance. My findings look 
>>>>>>>>> counter
>>>>>>>>> intuitive since:
>>>>>>>>>
>>>>>>>>>    1. "spark.mesos.coarse=true" just runs on 1 mesos task, so
>>>>>>>>>    there should be an overhead of spinning up mesos tasks which 
>>>>>>>>> should impact
>>>>>>>>>    the performance.
>>>>>>>>>    2. What config for "spark.mesos.coarse" recommended for
>>>>>>>>>    running spark on mesos? Or there is no best answer and it depends 
>>>>>>>>> on
>>>>>>>>>    usecase?
>>>>>>>>>    3. Also by setting "spark.mesos.coarse=true", I notice that I
>>>>>>>>>    get huge GC pauses even with small dataset but a long running job 
>>>>>>>>> (but this
>>>>>>>>>    can be a separate discussion).
>>>>>>>>>
>>>>>>>>> Let me know if I am missing something obvious, we are learning
>>>>>>>>> spark tuning as we move forward :)
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Thanks,
>>>>>>>>> -Utkarsh
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Thanks,
>>>>>>> -Utkarsh
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Thanks,
>>>>>> -Utkarsh
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Thanks,
>>>> -Utkarsh
>>>>
>>>
>>>
>>>
>>> --
>>> Thanks,
>>> -Utkarsh
>>>
>>
>>
>
>
> --
> Thanks,
> -Utkarsh
>



-- 
Thanks,
-Utkarsh

Reply via email to