Spark Mesos Architecture Details and Integration details

2016-12-09 Thread Chawla,Sumit
Hi All

I am a new member of this group. Sorry if this question has already been
asked.  I am looking for some information regarding Spark Mesos Integration:

1.  How does Mesos schedules and launch Spark Executors? (Any pointer to
code will be helpful)

2.  How does Mesos frontend the Spark Administration capabilities?  Are
there any urls through which mesos proxy's the request for spark master.  I
am particularly interested to find out how to access Spark Master Admin and
Monitoring URLs when running using Mesos.

3.  Any other good documentations/architecture for me to get started on
Mesos and Spark integration and internals.


Regards
Sumit Chawla


Re: Quota

2016-12-09 Thread Vijay
The dispatcher needs 1cpu and 1G memory. 

Regards,
Vijay

Sent from my iPhone

> On Dec 9, 2016, at 4:51 PM, Vinod Kone  wrote:
> 
> And how many resources does spark need?
> 
>> On Fri, Dec 9, 2016 at 4:05 PM, Vijay Srinivasaraghavan 
>>  wrote:
>> Here is the slave state info. I see marathon is registered as "slave_public" 
>> role and is configured with "default_accepted_resource_roles" as "*"
>> 
>> "slaves":[  
>>   {  
>>  "id":"69356344-e2c4-453d-baaf-22df4a4cc430-S0",
>>  "pid":"slave(1)@xxx.xxx.xxx.100:5051",
>>  "hostname":"xxx.xxx.xxx.100",
>>  "registered_time":1481267726.19244,
>>  "resources":{  
>> "disk":12099.0,
>> "mem":14863.0,
>> "gpus":0.0,
>> "cpus":4.0,
>> "ports":"[1025-2180, 2182-3887, 3889-5049, 5052-8079, 8082-8180, 
>> 8182-32000]"
>>  },
>>  "used_resources":{  
>> "disk":0.0,
>> "mem":0.0,
>> "gpus":0.0,
>> "cpus":0.0
>>  },
>>  "offered_resources":{  
>> "disk":0.0,
>> "mem":0.0,
>> "gpus":0.0,
>> "cpus":0.0
>>  },
>>  "reserved_resources":{  
>> 
>>  },
>>  "unreserved_resources":{  
>> "disk":12099.0,
>> "mem":14863.0,
>> "gpus":0.0,
>> "cpus":4.0,
>> "ports":"[1025-2180, 2182-3887, 3889-5049, 5052-8079, 8082-8180, 
>> 8182-32000]"
>>  },
>>  "attributes":{  
>> 
>>  },
>>  "active":true,
>>  "version":"1.0.1"
>>   }
>>],
>> 
>> Regards
>> Vijay
>> On Friday, December 9, 2016 3:48 PM, Vinod Kone  wrote:
>> 
>> 
>> How many resources does the agent register with the master? How many 
>> resources does spark task need?
>> 
>> I'm guessing marathon is not registered with "test" role so it is only 
>> getting un-reserved resources which are not enough for spark task?
>> 
>> On Fri, Dec 9, 2016 at 2:54 PM, Vijay Srinivasaraghavan 
>>  wrote:
>> I have a standalone DCOS setup (Single node Vagrant VM running DCOS 
>> v.1.9-dev build + Mesos 1.0.1 + Marathon 1.3.0). Both master and agent are 
>> running on same VM.
>> 
>> Resource: 4 CPU, 16GB Memory, 20G Disk
>> 
>> I have created a quota using new V1 API which creates a role "test" with 
>> resource constraints of 0.5 CPU and 1G Memory.
>> 
>> When I try to deploy Spark package, Marathon receives the request but the 
>> task is in "waiting" state since it did not receive any offers from Master 
>> though I don't see any resource constraints from the hardware perspective.
>> 
>> However, when I deleted the quota, Marathon is able to move forward with the 
>> deployment and Spark was deployed/up and running. I could see from the Mesos 
>> master logs that it had sent an offer to the Marathon framework.
>> 
>> To debug the issue, I was trying to create a quota but this time did not 
>> provide any CPU and Memory (0 cpu and 0 mem). After this, when I try to 
>> deploy Spark from DCOS UI, I could see Marathon getting offer from Master 
>> and able to deploy Spark without the need to delete the quota this time.
>> 
>> Did anyone notice similar behavior?
>> 
>> Regards
>> Vijay
>> 
>> 
>> 
> 


Re: Quota

2016-12-09 Thread Vinod Kone
And how many resources does spark need?

On Fri, Dec 9, 2016 at 4:05 PM, Vijay Srinivasaraghavan <
vijikar...@yahoo.com> wrote:

> Here is the slave state info. I see marathon is registered as
> "slave_public" role and is configured with "default_accepted_resource_roles"
> as "*"
>
> "slaves":[
>   {
>  "id":"69356344-e2c4-453d-baaf-22df4a4cc430-S0",
>  "pid":"slave(1)@xxx.xxx.xxx.100:5051",
>  "hostname":"xxx.xxx.xxx.100",
>  "registered_time":1481267726.19244,
>  "resources":{
> "disk":12099.0,
> "mem":14863.0,
> "gpus":0.0,
> "cpus":4.0,
> "ports":"[1025-2180, 2182-3887, 3889-5049,
> 5052-8079, 8082-8180, 8182-32000]"
>  },
>  "used_resources":{
> "disk":0.0,
> "mem":0.0,
> "gpus":0.0,
> "cpus":0.0
>  },
>  "offered_resources":{
> "disk":0.0,
> "mem":0.0,
> "gpus":0.0,
> "cpus":0.0
>  },
>  "reserved_resources":{
>
>  },
>  "unreserved_resources":{
> "disk":12099.0,
> "mem":14863.0,
> "gpus":0.0,
> "cpus":4.0,
> "ports":"[1025-2180, 2182-3887, 3889-5049,
> 5052-8079, 8082-8180, 8182-32000]"
>  },
>  "attributes":{
>
>  },
>  "active":true,
>  "version":"1.0.1"
>   }
>],
>
> Regards
> Vijay
> On Friday, December 9, 2016 3:48 PM, Vinod Kone 
> wrote:
>
>
> How many resources does the agent register with the master? How many
> resources does spark task need?
>
> I'm guessing marathon is not registered with "test" role so it is only
> getting un-reserved resources which are not enough for spark task?
>
> On Fri, Dec 9, 2016 at 2:54 PM, Vijay Srinivasaraghavan <
> vijikar...@yahoo.com> wrote:
>
> I have a standalone DCOS setup (Single node Vagrant VM running DCOS
> v.1.9-dev build + Mesos 1.0.1 + Marathon 1.3.0). Both master and agent are
> running on same VM.
>
> Resource: 4 CPU, 16GB Memory, 20G Disk
>
> I have created a quota using new V1 API which creates a role "test" with
> resource constraints of 0.5 CPU and 1G Memory.
>
> When I try to deploy Spark package, Marathon receives the request but the
> task is in "waiting" state since it did not receive any offers from Master
> though I don't see any resource constraints from the hardware perspective.
>
> However, when I deleted the quota, Marathon is able to move forward with
> the deployment and Spark was deployed/up and running. I could see from the
> Mesos master logs that it had sent an offer to the Marathon framework.
>
> To debug the issue, I was trying to create a quota but this time did not
> provide any CPU and Memory (0 cpu and 0 mem). After this, when I try to
> deploy Spark from DCOS UI, I could see Marathon getting offer from Master
> and able to deploy Spark without the need to delete the quota this time.
>
> Did anyone notice similar behavior?
>
> Regards
> Vijay
>
>
>
>
>


Re: Quota

2016-12-09 Thread Vijay Srinivasaraghavan
Here is the slave state info. I see marathon is registered as "slave_public" 
role and is configured with "default_accepted_resource_roles" as "*"
"slaves":[  
  {  
 "id":"69356344-e2c4-453d-baaf-22df4a4cc430-S0",
 "pid":"slave(1)@xxx.xxx.xxx.100:5051",
 "hostname":"xxx.xxx.xxx.100",
 "registered_time":1481267726.19244,
 "resources":{  
"disk":12099.0,
"mem":14863.0,
"gpus":0.0,
"cpus":4.0,
"ports":"[1025-2180, 2182-3887, 3889-5049, 5052-8079, 8082-8180, 
8182-32000]"
 },
 "used_resources":{  
"disk":0.0,
"mem":0.0,
"gpus":0.0,
"cpus":0.0
 },
 "offered_resources":{  
"disk":0.0,
"mem":0.0,
"gpus":0.0,
"cpus":0.0
 },
 "reserved_resources":{  

 },
 "unreserved_resources":{  
"disk":12099.0,
"mem":14863.0,
"gpus":0.0,
"cpus":4.0,
"ports":"[1025-2180, 2182-3887, 3889-5049, 5052-8079, 8082-8180, 
8182-32000]"
 },
 "attributes":{  

 },
 "active":true,
 "version":"1.0.1"
  }
   ],
 
RegardsVijayOn Friday, December 9, 2016 3:48 PM, Vinod Kone 
 wrote:
 

 How many resources does the agent register with the master? How many resources 
does spark task need?
I'm guessing marathon is not registered with "test" role so it is only getting 
un-reserved resources which are not enough for spark task?
On Fri, Dec 9, 2016 at 2:54 PM, Vijay Srinivasaraghavan  
wrote:

 I have a standalone DCOS setup (Single node Vagrant VM running DCOS v.1.9-dev 
build + Mesos 1.0.1 + Marathon 1.3.0). Both master and agent are running on 
same VM.
Resource: 4 CPU, 16GB Memory, 20G Disk
I have created a quota using new V1 API which creates a role "test" with 
resource constraints of 0.5 CPU and 1G Memory.
When I try to deploy Spark package, Marathon receives the request but the task 
is in "waiting" state since it did not receive any offers from Master though I 
don't see any resource constraints from the hardware perspective.
However, when I deleted the quota, Marathon is able to move forward with the 
deployment and Spark was deployed/up and running. I could see from the Mesos 
master logs that it had sent an offer to the Marathon framework.
To debug the issue, I was trying to create a quota but this time did not 
provide any CPU and Memory (0 cpu and 0 mem). After this, when I try to deploy 
Spark from DCOS UI, I could see Marathon getting offer from Master and able to 
deploy Spark without the need to delete the quota this time.
Did anyone notice similar behavior?
RegardsVijay



   

Re: Quota

2016-12-09 Thread Vinod Kone
How many resources does the agent register with the master? How many
resources does spark task need?

I'm guessing marathon is not registered with "test" role so it is only
getting un-reserved resources which are not enough for spark task?

On Fri, Dec 9, 2016 at 2:54 PM, Vijay Srinivasaraghavan <
vijikar...@yahoo.com> wrote:

> I have a standalone DCOS setup (Single node Vagrant VM running DCOS
> v.1.9-dev build + Mesos 1.0.1 + Marathon 1.3.0). Both master and agent are
> running on same VM.
>
> Resource: 4 CPU, 16GB Memory, 20G Disk
>
> I have created a quota using new V1 API which creates a role "test" with
> resource constraints of 0.5 CPU and 1G Memory.
>
> When I try to deploy Spark package, Marathon receives the request but the
> task is in "waiting" state since it did not receive any offers from Master
> though I don't see any resource constraints from the hardware perspective.
>
> However, when I deleted the quota, Marathon is able to move forward with
> the deployment and Spark was deployed/up and running. I could see from the
> Mesos master logs that it had sent an offer to the Marathon framework.
>
> To debug the issue, I was trying to create a quota but this time did not
> provide any CPU and Memory (0 cpu and 0 mem). After this, when I try to
> deploy Spark from DCOS UI, I could see Marathon getting offer from Master
> and able to deploy Spark without the need to delete the quota this time.
>
> Did anyone notice similar behavior?
>
> Regards
> Vijay
>


Re: Multi-agent machine

2016-12-09 Thread Charles Allen
Ok, thanks!

On Fri, Dec 9, 2016 at 2:32 PM Benjamin Mahler  wrote:

> Maintenance should work in this case, it will just be applied to all
> agents on the machine.
>
> On Fri, Dec 9, 2016 at 1:20 PM, Charles Allen <
> charles.al...@metamarkets.com> wrote:
>
> Thanks for the insight.
>
> I take that to mean the maintenance primitives might not work right for
> multi-agent machines? aka, I can't do maintenance on one agent but not the
> others?
>
>
> On Fri, Dec 9, 2016 at 12:16 PM Jie Yu  wrote:
>
> Charles,
>
> It should be possible. Here are the global 'object' that might conflict:
> 1) cgroup (you can use different cgroup root)
> 2) work_dir and runime_dir (you can set them to be different between
> agents)
> 3) network (e.g., iptables, if you use host network, should not be a
> problem. Otherwise, you might need to configure your network isolator
> properly)
>
> But we haven't tested. Another potential thing that might come up is the
> code that rely on hostname of the agent (MachineID in maintenance
> primitive?)
>
> - Jie
>
> On Fri, Dec 9, 2016 at 12:11 PM, Charles Allen <
> charles.al...@metamarkets.com> wrote:
>
> Is it possible to setup a machine such that multiple mesos agents are
> running on the same machine and registering with the same master?
>
> For example, with different cgroup roots or different default working
> directory.
>
>
>
>


Quota

2016-12-09 Thread Vijay Srinivasaraghavan
 I have a standalone DCOS setup (Single node Vagrant VM running DCOS v.1.9-dev 
build + Mesos 1.0.1 + Marathon 1.3.0). Both master and agent are running on 
same VM.
Resource: 4 CPU, 16GB Memory, 20G Disk
I have created a quota using new V1 API which creates a role "test" with 
resource constraints of 0.5 CPU and 1G Memory.
When I try to deploy Spark package, Marathon receives the request but the task 
is in "waiting" state since it did not receive any offers from Master though I 
don't see any resource constraints from the hardware perspective.
However, when I deleted the quota, Marathon is able to move forward with the 
deployment and Spark was deployed/up and running. I could see from the Mesos 
master logs that it had sent an offer to the Marathon framework.
To debug the issue, I was trying to create a quota but this time did not 
provide any CPU and Memory (0 cpu and 0 mem). After this, when I try to deploy 
Spark from DCOS UI, I could see Marathon getting offer from Master and able to 
deploy Spark without the need to delete the quota this time.
Did anyone notice similar behavior?
RegardsVijay

Re: Multi-agent machine

2016-12-09 Thread Benjamin Mahler
Maintenance should work in this case, it will just be applied to all agents
on the machine.

On Fri, Dec 9, 2016 at 1:20 PM, Charles Allen  wrote:

> Thanks for the insight.
>
> I take that to mean the maintenance primitives might not work right for
> multi-agent machines? aka, I can't do maintenance on one agent but not the
> others?
>
>
> On Fri, Dec 9, 2016 at 12:16 PM Jie Yu  wrote:
>
>> Charles,
>>
>> It should be possible. Here are the global 'object' that might conflict:
>> 1) cgroup (you can use different cgroup root)
>> 2) work_dir and runime_dir (you can set them to be different between
>> agents)
>> 3) network (e.g., iptables, if you use host network, should not be a
>> problem. Otherwise, you might need to configure your network isolator
>> properly)
>>
>> But we haven't tested. Another potential thing that might come up is the
>> code that rely on hostname of the agent (MachineID in maintenance
>> primitive?)
>>
>> - Jie
>>
>> On Fri, Dec 9, 2016 at 12:11 PM, Charles Allen <
>> charles.al...@metamarkets.com> wrote:
>>
>> Is it possible to setup a machine such that multiple mesos agents are
>> running on the same machine and registering with the same master?
>>
>> For example, with different cgroup roots or different default working
>> directory.
>>
>>
>>


Re: Multi-agent machine

2016-12-09 Thread Charles Allen
Thanks for the insight.

I take that to mean the maintenance primitives might not work right for
multi-agent machines? aka, I can't do maintenance on one agent but not the
others?


On Fri, Dec 9, 2016 at 12:16 PM Jie Yu  wrote:

> Charles,
>
> It should be possible. Here are the global 'object' that might conflict:
> 1) cgroup (you can use different cgroup root)
> 2) work_dir and runime_dir (you can set them to be different between
> agents)
> 3) network (e.g., iptables, if you use host network, should not be a
> problem. Otherwise, you might need to configure your network isolator
> properly)
>
> But we haven't tested. Another potential thing that might come up is the
> code that rely on hostname of the agent (MachineID in maintenance
> primitive?)
>
> - Jie
>
> On Fri, Dec 9, 2016 at 12:11 PM, Charles Allen <
> charles.al...@metamarkets.com> wrote:
>
> Is it possible to setup a machine such that multiple mesos agents are
> running on the same machine and registering with the same master?
>
> For example, with different cgroup roots or different default working
> directory.
>
>
>


Re: Multi-agent machine

2016-12-09 Thread Jie Yu
Charles,

It should be possible. Here are the global 'object' that might conflict:
1) cgroup (you can use different cgroup root)
2) work_dir and runime_dir (you can set them to be different between agents)
3) network (e.g., iptables, if you use host network, should not be a
problem. Otherwise, you might need to configure your network isolator
properly)

But we haven't tested. Another potential thing that might come up is the
code that rely on hostname of the agent (MachineID in maintenance
primitive?)

- Jie

On Fri, Dec 9, 2016 at 12:11 PM, Charles Allen <
charles.al...@metamarkets.com> wrote:

> Is it possible to setup a machine such that multiple mesos agents are
> running on the same machine and registering with the same master?
>
> For example, with different cgroup roots or different default working
> directory.
>


Multi-agent machine

2016-12-09 Thread Charles Allen
Is it possible to setup a machine such that multiple mesos agents are
running on the same machine and registering with the same master?

For example, with different cgroup roots or different default working
directory.


Re: Duplicate task IDs

2016-12-09 Thread Joris Van Remoortere
Hey Neil,

I concur that using duplicate task IDs is bad practice and asking for
trouble.

Could you please clarify *why* you want to use a hashmap? Is your goal to
remove duplicate task IDs or is this just a side-effect and you have a
different reason (e.g. performance) for using a hashmap?

I'm wondering why a multi-hashmap is not sufficient. This would be clear if
you were explicitly *trying* to get rid of duplicates of course :-)

Thanks,
Joris

—
*Joris Van Remoortere*
Mesosphere

On Fri, Dec 9, 2016 at 7:08 AM, Neil Conway  wrote:

> Folks,
>
> The master stores a cache of metadata about recently completed tasks;
> for example, this information can be accessed via the "/tasks" HTTP
> endpoint or the "GET_TASKS" call in the new Operator API.
>
> The master currently stores this metadata using a list; this means
> that duplicate task IDs are permitted. We're considering [1] changing
> this to use a hashmap instead. Using a hashmap would mean that
> duplicate task IDs would be discarded: if two completed tasks have the
> same task ID, only the metadata for the most recently completed task
> would be retained by the master.
>
> If this behavior change would cause problems for your framework or
> other software that relies on Mesos, please let me know.
>
> (Note that if you do have two completed tasks with the same ID, you'd
> need an unambiguous way to tell them apart. As a recommendation, I
> would strongly encourage framework authors to never reuse task IDs.)
>
> Neil
>
> [1] https://reviews.apache.org/r/54179/
>


Command healthcheck failed but status KILLED

2016-12-09 Thread Tomek Janiszewski
Hi

What is desired behavior when command health check failed? On Mesos 1.0.2
when health check fails task has state KILLED instead of FAILED with reason
specifying it was killed due to failing health check.

Thanks
Tomek