Re: spark.mesos.coarse impacts memory performance on mesos

2015-10-09 Thread Utkarsh Sengar
Hi Tim,

Any way I can provide more info on this?

On Thu, Oct 1, 2015 at 4:21 PM, Utkarsh Sengar 
wrote:

> Not sure what you mean by that, I shared the data which I see in spark UI.
> Can you point me to a location where I can precisely get the data you need?
>
> When I run the job in fine grained mode, I see tons are tasks created and
> destroyed under a mesos "framework". I have about 80k spark tasks which I
> think translates directly to independent mesos tasks.
>
> https://dl.dropboxusercontent.com/u/2432670/Screen%20Shot%202015-10-01%20at%204.14.34%20PM.png
>
> When i run the job in coarse grained mode, I just see 1-4 tasks with 1-4
> executors (it varies from what mesos allocates). And these mesos tasks try
> to complete the 80k spark tasks and runs out of memory eventually (see the
> stack track above) in the gist shared above.
>
>
> On Thu, Oct 1, 2015 at 4:07 PM, Tim Chen  wrote:
>
>> Hi Utkarsh,
>>
>> I replied earlier asking what is your task assignment like with fine vs
>> coarse grain mode look like?
>>
>> Tim
>>
>> On Thu, Oct 1, 2015 at 4:05 PM, Utkarsh Sengar 
>> wrote:
>>
>>> Bumping it up, its not really a blocking issue.
>>> But fine grain mode eats up uncertain number of resources in mesos and
>>> launches tons of tasks, so I would prefer using the coarse grained mode if
>>> only it didn't run out of memory.
>>>
>>> Thanks,
>>> -Utkarsh
>>>
>>> On Mon, Sep 28, 2015 at 2:24 PM, Utkarsh Sengar 
>>> wrote:
>>>
 Hi Tim,

 1. spark.mesos.coarse:false (fine grain mode)
 This is the data dump for config and executors assigned:
 https://gist.github.com/utkarsh2012/6401d5526feccab14687

 2. spark.mesos.coarse:true (coarse grain mode)
 Dump for coarse mode:
 https://gist.github.com/utkarsh2012/918cf6f8ed5945627188

 As you can see, exactly the same code works fine in fine grained, goes
 out of memory in coarse grained mode. First an executor was lost and then
 the driver went out of memory.
 So I am trying to understand what is different in fine grained vs
 coarse mode other than allocation of multiple mesos tasks vs 1 mesos task.
 Clearly spark is not managing memory in the same way.

 Thanks,
 -Utkarsh


 On Fri, Sep 25, 2015 at 9:17 AM, Tim Chen  wrote:

> Hi Utkarsh,
>
> What is your job placement like when you run fine grain mode? You said
> coarse grain mode only ran with one node right?
>
> And when the job is running could you open the Spark webui and get
> stats about the heap size and other java settings?
>
> Tim
>
> On Thu, Sep 24, 2015 at 10:56 PM, Utkarsh Sengar <
> utkarsh2...@gmail.com> wrote:
>
>> Bumping this one up, any suggestions on the stacktrace?
>> spark.mesos.coarse=true is not working and the driver crashed with
>> the error.
>>
>> On Wed, Sep 23, 2015 at 3:29 PM, Utkarsh Sengar <
>> utkarsh2...@gmail.com> wrote:
>>
>>> Missed to do a reply-all.
>>>
>>> Tim,
>>>
>>> spark.mesos.coarse = true doesn't work and spark.mesos.coarse =
>>> false works (sorry there was a typo in my last email, I meant "when I do
>>> "spark.mesos.coarse=false", the job works like a charm. ").
>>>
>>> I get this exception with spark.mesos.coarse = true:
>>>
>>> 15/09/22 20:18:05 INFO MongoCollectionSplitter: Created split: min={
>>> "_id" : "55af4bf26750ad38a444d7cf"}, max= { "_id" :
>>> "55af5a61e8a42806f47546c1"}
>>>
>>> 15/09/22
>>> 20:18:05 INFO MongoCollectionSplitter: Created split: min={ "_id" :
>>> "55af5a61e8a42806f47546c1"}, max= null
>>>
>>> Exception
>>> in thread "main" java.lang.OutOfMemoryError: Java heap space
>>>
>>> 
>>> at 
>>> org.apache.spark.rdd.CartesianRDD.getPartitions(CartesianRDD.scala:60)
>>>
>>> 
>>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
>>>
>>> 
>>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
>>>
>>> 

Re: spark.mesos.coarse impacts memory performance on mesos

2015-10-01 Thread Utkarsh Sengar
Not sure what you mean by that, I shared the data which I see in spark UI.
Can you point me to a location where I can precisely get the data you need?

When I run the job in fine grained mode, I see tons are tasks created and
destroyed under a mesos "framework". I have about 80k spark tasks which I
think translates directly to independent mesos tasks.
https://dl.dropboxusercontent.com/u/2432670/Screen%20Shot%202015-10-01%20at%204.14.34%20PM.png

When i run the job in coarse grained mode, I just see 1-4 tasks with 1-4
executors (it varies from what mesos allocates). And these mesos tasks try
to complete the 80k spark tasks and runs out of memory eventually (see the
stack track above) in the gist shared above.


On Thu, Oct 1, 2015 at 4:07 PM, Tim Chen  wrote:

> Hi Utkarsh,
>
> I replied earlier asking what is your task assignment like with fine vs
> coarse grain mode look like?
>
> Tim
>
> On Thu, Oct 1, 2015 at 4:05 PM, Utkarsh Sengar 
> wrote:
>
>> Bumping it up, its not really a blocking issue.
>> But fine grain mode eats up uncertain number of resources in mesos and
>> launches tons of tasks, so I would prefer using the coarse grained mode if
>> only it didn't run out of memory.
>>
>> Thanks,
>> -Utkarsh
>>
>> On Mon, Sep 28, 2015 at 2:24 PM, Utkarsh Sengar 
>> wrote:
>>
>>> Hi Tim,
>>>
>>> 1. spark.mesos.coarse:false (fine grain mode)
>>> This is the data dump for config and executors assigned:
>>> https://gist.github.com/utkarsh2012/6401d5526feccab14687
>>>
>>> 2. spark.mesos.coarse:true (coarse grain mode)
>>> Dump for coarse mode:
>>> https://gist.github.com/utkarsh2012/918cf6f8ed5945627188
>>>
>>> As you can see, exactly the same code works fine in fine grained, goes
>>> out of memory in coarse grained mode. First an executor was lost and then
>>> the driver went out of memory.
>>> So I am trying to understand what is different in fine grained vs coarse
>>> mode other than allocation of multiple mesos tasks vs 1 mesos task. Clearly
>>> spark is not managing memory in the same way.
>>>
>>> Thanks,
>>> -Utkarsh
>>>
>>>
>>> On Fri, Sep 25, 2015 at 9:17 AM, Tim Chen  wrote:
>>>
 Hi Utkarsh,

 What is your job placement like when you run fine grain mode? You said
 coarse grain mode only ran with one node right?

 And when the job is running could you open the Spark webui and get
 stats about the heap size and other java settings?

 Tim

 On Thu, Sep 24, 2015 at 10:56 PM, Utkarsh Sengar >>> > wrote:

> Bumping this one up, any suggestions on the stacktrace?
> spark.mesos.coarse=true is not working and the driver crashed with the
> error.
>
> On Wed, Sep 23, 2015 at 3:29 PM, Utkarsh Sengar  > wrote:
>
>> Missed to do a reply-all.
>>
>> Tim,
>>
>> spark.mesos.coarse = true doesn't work and spark.mesos.coarse = false
>> works (sorry there was a typo in my last email, I meant "when I do
>> "spark.mesos.coarse=false", the job works like a charm. ").
>>
>> I get this exception with spark.mesos.coarse = true:
>>
>> 15/09/22 20:18:05 INFO MongoCollectionSplitter: Created split: min={
>> "_id" : "55af4bf26750ad38a444d7cf"}, max= { "_id" :
>> "55af5a61e8a42806f47546c1"}
>>
>> 15/09/22
>> 20:18:05 INFO MongoCollectionSplitter: Created split: min={ "_id" :
>> "55af5a61e8a42806f47546c1"}, max= null
>>
>> Exception
>> in thread "main" java.lang.OutOfMemoryError: Java heap space
>>
>> 
>> at org.apache.spark.rdd.CartesianRDD.getPartitions(CartesianRDD.scala:60)
>>
>> 
>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
>>
>> 
>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
>>
>> 
>> at scala.Option.getOrElse(Option.scala:120)
>>
>> 

Re: spark.mesos.coarse impacts memory performance on mesos

2015-10-01 Thread Tim Chen
Hi Utkarsh,

I replied earlier asking what is your task assignment like with fine vs
coarse grain mode look like?

Tim

On Thu, Oct 1, 2015 at 4:05 PM, Utkarsh Sengar 
wrote:

> Bumping it up, its not really a blocking issue.
> But fine grain mode eats up uncertain number of resources in mesos and
> launches tons of tasks, so I would prefer using the coarse grained mode if
> only it didn't run out of memory.
>
> Thanks,
> -Utkarsh
>
> On Mon, Sep 28, 2015 at 2:24 PM, Utkarsh Sengar 
> wrote:
>
>> Hi Tim,
>>
>> 1. spark.mesos.coarse:false (fine grain mode)
>> This is the data dump for config and executors assigned:
>> https://gist.github.com/utkarsh2012/6401d5526feccab14687
>>
>> 2. spark.mesos.coarse:true (coarse grain mode)
>> Dump for coarse mode:
>> https://gist.github.com/utkarsh2012/918cf6f8ed5945627188
>>
>> As you can see, exactly the same code works fine in fine grained, goes
>> out of memory in coarse grained mode. First an executor was lost and then
>> the driver went out of memory.
>> So I am trying to understand what is different in fine grained vs coarse
>> mode other than allocation of multiple mesos tasks vs 1 mesos task. Clearly
>> spark is not managing memory in the same way.
>>
>> Thanks,
>> -Utkarsh
>>
>>
>> On Fri, Sep 25, 2015 at 9:17 AM, Tim Chen  wrote:
>>
>>> Hi Utkarsh,
>>>
>>> What is your job placement like when you run fine grain mode? You said
>>> coarse grain mode only ran with one node right?
>>>
>>> And when the job is running could you open the Spark webui and get stats
>>> about the heap size and other java settings?
>>>
>>> Tim
>>>
>>> On Thu, Sep 24, 2015 at 10:56 PM, Utkarsh Sengar 
>>> wrote:
>>>
 Bumping this one up, any suggestions on the stacktrace?
 spark.mesos.coarse=true is not working and the driver crashed with the
 error.

 On Wed, Sep 23, 2015 at 3:29 PM, Utkarsh Sengar 
 wrote:

> Missed to do a reply-all.
>
> Tim,
>
> spark.mesos.coarse = true doesn't work and spark.mesos.coarse = false
> works (sorry there was a typo in my last email, I meant "when I do
> "spark.mesos.coarse=false", the job works like a charm. ").
>
> I get this exception with spark.mesos.coarse = true:
>
> 15/09/22 20:18:05 INFO MongoCollectionSplitter: Created split: min={
> "_id" : "55af4bf26750ad38a444d7cf"}, max= { "_id" :
> "55af5a61e8a42806f47546c1"}
>
> 15/09/22
> 20:18:05 INFO MongoCollectionSplitter: Created split: min={ "_id" :
> "55af5a61e8a42806f47546c1"}, max= null
>
> Exception
> in thread "main" java.lang.OutOfMemoryError: Java heap space
>
> 
> at org.apache.spark.rdd.CartesianRDD.getPartitions(CartesianRDD.scala:60)
>
> 
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
>
> 
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
>
> 
> at scala.Option.getOrElse(Option.scala:120)
>
> 
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
>
> 
> at org.apache.spark.rdd.CartesianRDD.getPartitions(CartesianRDD.scala:60)
>
> 
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
>
> 

Re: spark.mesos.coarse impacts memory performance on mesos

2015-10-01 Thread Utkarsh Sengar
Bumping it up, its not really a blocking issue.
But fine grain mode eats up uncertain number of resources in mesos and
launches tons of tasks, so I would prefer using the coarse grained mode if
only it didn't run out of memory.

Thanks,
-Utkarsh

On Mon, Sep 28, 2015 at 2:24 PM, Utkarsh Sengar 
wrote:

> Hi Tim,
>
> 1. spark.mesos.coarse:false (fine grain mode)
> This is the data dump for config and executors assigned:
> https://gist.github.com/utkarsh2012/6401d5526feccab14687
>
> 2. spark.mesos.coarse:true (coarse grain mode)
> Dump for coarse mode:
> https://gist.github.com/utkarsh2012/918cf6f8ed5945627188
>
> As you can see, exactly the same code works fine in fine grained, goes out
> of memory in coarse grained mode. First an executor was lost and then the
> driver went out of memory.
> So I am trying to understand what is different in fine grained vs coarse
> mode other than allocation of multiple mesos tasks vs 1 mesos task. Clearly
> spark is not managing memory in the same way.
>
> Thanks,
> -Utkarsh
>
>
> On Fri, Sep 25, 2015 at 9:17 AM, Tim Chen  wrote:
>
>> Hi Utkarsh,
>>
>> What is your job placement like when you run fine grain mode? You said
>> coarse grain mode only ran with one node right?
>>
>> And when the job is running could you open the Spark webui and get stats
>> about the heap size and other java settings?
>>
>> Tim
>>
>> On Thu, Sep 24, 2015 at 10:56 PM, Utkarsh Sengar 
>> wrote:
>>
>>> Bumping this one up, any suggestions on the stacktrace?
>>> spark.mesos.coarse=true is not working and the driver crashed with the
>>> error.
>>>
>>> On Wed, Sep 23, 2015 at 3:29 PM, Utkarsh Sengar 
>>> wrote:
>>>
 Missed to do a reply-all.

 Tim,

 spark.mesos.coarse = true doesn't work and spark.mesos.coarse = false
 works (sorry there was a typo in my last email, I meant "when I do
 "spark.mesos.coarse=false", the job works like a charm. ").

 I get this exception with spark.mesos.coarse = true:

 15/09/22 20:18:05 INFO MongoCollectionSplitter: Created split: min={
 "_id" : "55af4bf26750ad38a444d7cf"}, max= { "_id" :
 "55af5a61e8a42806f47546c1"}

 15/09/22
 20:18:05 INFO MongoCollectionSplitter: Created split: min={ "_id" :
 "55af5a61e8a42806f47546c1"}, max= null

 Exception
 in thread "main" java.lang.OutOfMemoryError: Java heap space

 
 at org.apache.spark.rdd.CartesianRDD.getPartitions(CartesianRDD.scala:60)

 
 at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)

 
 at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)

 
 at scala.Option.getOrElse(Option.scala:120)

 
 at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)

 
 at org.apache.spark.rdd.CartesianRDD.getPartitions(CartesianRDD.scala:60)

 
 at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)

 
 at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)

 

Re: spark.mesos.coarse impacts memory performance on mesos

2015-09-28 Thread Utkarsh Sengar
Hi Tim,

1. spark.mesos.coarse:false (fine grain mode)
This is the data dump for config and executors assigned:
https://gist.github.com/utkarsh2012/6401d5526feccab14687

2. spark.mesos.coarse:true (coarse grain mode)
Dump for coarse mode:
https://gist.github.com/utkarsh2012/918cf6f8ed5945627188

As you can see, exactly the same code works fine in fine grained, goes out
of memory in coarse grained mode. First an executor was lost and then the
driver went out of memory.
So I am trying to understand what is different in fine grained vs coarse
mode other than allocation of multiple mesos tasks vs 1 mesos task. Clearly
spark is not managing memory in the same way.

Thanks,
-Utkarsh


On Fri, Sep 25, 2015 at 9:17 AM, Tim Chen  wrote:

> Hi Utkarsh,
>
> What is your job placement like when you run fine grain mode? You said
> coarse grain mode only ran with one node right?
>
> And when the job is running could you open the Spark webui and get stats
> about the heap size and other java settings?
>
> Tim
>
> On Thu, Sep 24, 2015 at 10:56 PM, Utkarsh Sengar 
> wrote:
>
>> Bumping this one up, any suggestions on the stacktrace?
>> spark.mesos.coarse=true is not working and the driver crashed with the
>> error.
>>
>> On Wed, Sep 23, 2015 at 3:29 PM, Utkarsh Sengar 
>> wrote:
>>
>>> Missed to do a reply-all.
>>>
>>> Tim,
>>>
>>> spark.mesos.coarse = true doesn't work and spark.mesos.coarse = false
>>> works (sorry there was a typo in my last email, I meant "when I do
>>> "spark.mesos.coarse=false", the job works like a charm. ").
>>>
>>> I get this exception with spark.mesos.coarse = true:
>>>
>>> 15/09/22 20:18:05 INFO MongoCollectionSplitter: Created split: min={
>>> "_id" : "55af4bf26750ad38a444d7cf"}, max= { "_id" :
>>> "55af5a61e8a42806f47546c1"}
>>>
>>> 15/09/22
>>> 20:18:05 INFO MongoCollectionSplitter: Created split: min={ "_id" :
>>> "55af5a61e8a42806f47546c1"}, max= null
>>>
>>> Exception
>>> in thread "main" java.lang.OutOfMemoryError: Java heap space
>>>
>>> 
>>> at org.apache.spark.rdd.CartesianRDD.getPartitions(CartesianRDD.scala:60)
>>>
>>> 
>>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
>>>
>>> 
>>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
>>>
>>> 
>>> at scala.Option.getOrElse(Option.scala:120)
>>>
>>> 
>>> at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
>>>
>>> 
>>> at org.apache.spark.rdd.CartesianRDD.getPartitions(CartesianRDD.scala:60)
>>>
>>> 
>>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
>>>
>>> 
>>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
>>>
>>> 
>>> at scala.Option.getOrElse(Option.scala:120)
>>>
>>> 
>>> at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
>>>
>>> 

Re: spark.mesos.coarse impacts memory performance on mesos

2015-09-25 Thread Tim Chen
Hi Utkarsh,

What is your job placement like when you run fine grain mode? You said
coarse grain mode only ran with one node right?

And when the job is running could you open the Spark webui and get stats
about the heap size and other java settings?

Tim

On Thu, Sep 24, 2015 at 10:56 PM, Utkarsh Sengar 
wrote:

> Bumping this one up, any suggestions on the stacktrace?
> spark.mesos.coarse=true is not working and the driver crashed with the
> error.
>
> On Wed, Sep 23, 2015 at 3:29 PM, Utkarsh Sengar 
> wrote:
>
>> Missed to do a reply-all.
>>
>> Tim,
>>
>> spark.mesos.coarse = true doesn't work and spark.mesos.coarse = false
>> works (sorry there was a typo in my last email, I meant "when I do
>> "spark.mesos.coarse=false", the job works like a charm. ").
>>
>> I get this exception with spark.mesos.coarse = true:
>>
>> 15/09/22 20:18:05 INFO MongoCollectionSplitter: Created split: min={
>> "_id" : "55af4bf26750ad38a444d7cf"}, max= { "_id" :
>> "55af5a61e8a42806f47546c1"}
>>
>> 15/09/22
>> 20:18:05 INFO MongoCollectionSplitter: Created split: min={ "_id" :
>> "55af5a61e8a42806f47546c1"}, max= null
>>
>> Exception
>> in thread "main" java.lang.OutOfMemoryError: Java heap space
>>
>> 
>> at org.apache.spark.rdd.CartesianRDD.getPartitions(CartesianRDD.scala:60)
>>
>> 
>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
>>
>> 
>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
>>
>> 
>> at scala.Option.getOrElse(Option.scala:120)
>>
>> 
>> at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
>>
>> 
>> at org.apache.spark.rdd.CartesianRDD.getPartitions(CartesianRDD.scala:60)
>>
>> 
>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
>>
>> 
>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
>>
>> 
>> at scala.Option.getOrElse(Option.scala:120)
>>
>> 
>> at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
>>
>> 
>> at
>> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32)
>>
>> 
>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
>>
>> 
>> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
>>
>> 

Re: spark.mesos.coarse impacts memory performance on mesos

2015-09-24 Thread Utkarsh Sengar
Bumping this one up, any suggestions on the stacktrace?
spark.mesos.coarse=true is not working and the driver crashed with the
error.

On Wed, Sep 23, 2015 at 3:29 PM, Utkarsh Sengar 
wrote:

> Missed to do a reply-all.
>
> Tim,
>
> spark.mesos.coarse = true doesn't work and spark.mesos.coarse = false
> works (sorry there was a typo in my last email, I meant "when I do
> "spark.mesos.coarse=false", the job works like a charm. ").
>
> I get this exception with spark.mesos.coarse = true:
>
> 15/09/22 20:18:05 INFO MongoCollectionSplitter: Created split: min={ "_id"
> : "55af4bf26750ad38a444d7cf"}, max= { "_id" : "55af5a61e8a42806f47546c1"}
>
> 15/09/22
> 20:18:05 INFO MongoCollectionSplitter: Created split: min={ "_id" :
> "55af5a61e8a42806f47546c1"}, max= null
>
> Exception
> in thread "main" java.lang.OutOfMemoryError: Java heap space
>
> 
> at org.apache.spark.rdd.CartesianRDD.getPartitions(CartesianRDD.scala:60)
>
> 
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
>
> 
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
>
> 
> at scala.Option.getOrElse(Option.scala:120)
>
> 
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
>
> 
> at org.apache.spark.rdd.CartesianRDD.getPartitions(CartesianRDD.scala:60)
>
> 
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
>
> 
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
>
> 
> at scala.Option.getOrElse(Option.scala:120)
>
> 
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
>
> 
> at
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32)
>
> 
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
>
> 
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
>
> 
> at scala.Option.getOrElse(Option.scala:120)
>
> 
> at org.apache.spark.rdd.RDD.partitions

Re: spark.mesos.coarse impacts memory performance on mesos

2015-09-23 Thread Utkarsh Sengar
Missed to do a reply-all.

Tim,

spark.mesos.coarse = true doesn't work and spark.mesos.coarse = false works
(sorry there was a typo in my last email, I meant "when I do
"spark.mesos.coarse=false", the job works like a charm. ").

I get this exception with spark.mesos.coarse = true:

15/09/22 20:18:05 INFO MongoCollectionSplitter: Created split: min={ "_id"
: "55af4bf26750ad38a444d7cf"}, max= { "_id" : "55af5a61e8a42806f47546c1"}
15/09/22
20:18:05 INFO MongoCollectionSplitter: Created split: min={ "_id" :
"55af5a61e8a42806f47546c1"}, max= null
Exception
in thread "main" java.lang.OutOfMemoryError: Java heap space

at org.apache.spark.rdd.CartesianRDD.getPartitions(CartesianRDD.scala:60)

at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)

at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)

at scala.Option.getOrElse(Option.scala:120)

at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)

at org.apache.spark.rdd.CartesianRDD.getPartitions(CartesianRDD.scala:60)

at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)

at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)

at scala.Option.getOrElse(Option.scala:120)

at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)

at
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32)

at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)

at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)

at scala.Option.getOrElse(Option.scala:120)

at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)

at
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32)


Re: spark.mesos.coarse impacts memory performance on mesos

2015-09-22 Thread Tim Chen
Hi Utkarsh,

Just to be sure you originally set coarse to false but then to true? Or is
it the other way around?

Also what's the exception/stack trace when the driver crashed?

Coarse grain mode per-starts all the Spark executor backends, so has the
least overhead comparing to fine grain. There is no single answer for which
mode you should use, otherwise we would have removed one of those modes
since it depends on your use case.

There are quite some factor why there could be huge GC pauses, but I don't
think if you switch to standalone your GC pauses go away.

Tim

On Mon, Sep 21, 2015 at 5:18 PM, Utkarsh Sengar 
wrote:

> I am running Spark 1.4.1 on mesos.
>
> The spark job does a "cartesian" of 4 RDDs (aRdd, bRdd, cRdd, dRdd) of
> size 100, 100, 7 and 1 respectively. Lets call it prouctRDD.
>
> Creation of "aRdd" needs data pull from multiple data sources, merging it
> and creating a tuple of JavaRdd, finally aRDD looks something like this:
> JavaRDD>
> bRdd, cRdd and dRdds are just List<> of values.
>
> Then apply a transformation on prouctRDD and finally call "saveAsTextFile"
> to save the result of my transformation.
>
> Problem:
> By setting "spark.mesos.coarse=true", creation of "aRdd" works fine but
> driver crashes while doing the cartesian but when I do
> "spark.mesos.coarse=true", the job works like a charm. I am running spark
> on mesos.
>
> Comments:
> So I wanted to understand what role does "spark.mesos.coarse=true" plays
> in terms of memory and compute performance. My findings look counter
> intuitive since:
>
>1. "spark.mesos.coarse=true" just runs on 1 mesos task, so there
>should be an overhead of spinning up mesos tasks which should impact the
>performance.
>2. What config for "spark.mesos.coarse" recommended for running spark
>on mesos? Or there is no best answer and it depends on usecase?
>3. Also by setting "spark.mesos.coarse=true", I notice that I get huge
>GC pauses even with small dataset but a long running job (but this can be a
>separate discussion).
>
> Let me know if I am missing something obvious, we are learning spark
> tuning as we move forward :)
>
> --
> Thanks,
> -Utkarsh
>