[jira] [Updated] (MESOS-1826) Improve logging for when master cannot connect to slaves

2018-01-12 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-1826:
--
Shepherd:   (was: Adam B)

> Improve logging for when master cannot connect to slaves
> 
>
> Key: MESOS-1826
> URL: https://issues.apache.org/jira/browse/MESOS-1826
> Project: Mesos
>  Issue Type: Improvement
>Affects Versions: 0.20.0
>Reporter: Thomas Rampelberg
>Assignee: Guangya Liu
>Priority: Minor
>  Labels: newbie
>
> When first setting a mesos cluster up, it is possible to get into a state 
> where your slaves are constantly re-registering. This happens because the 
> slave pid is not reachable from the master.
> Currently, the master logs make it pretty tough to figure out that this is 
> the problem that is occurring. It would be fantastic if there was a better 
> explanation in the logs, something like:
> Unable to connect to slave X at x.x.x.x:5051. Please make sure that host 
> is reachable from your master.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-2560) Remove RunTaskMessage.framework_id

2018-01-12 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-2560:
--
Shepherd:   (was: Adam B)

> Remove RunTaskMessage.framework_id
> --
>
> Key: MESOS-2560
> URL: https://issues.apache.org/jira/browse/MESOS-2560
> Project: Mesos
>  Issue Type: Task
>  Components: framework
>Reporter: Kapil Arya
>Assignee: Kapil Arya
>  Labels: mesosphere
>
> The previous release doesn't use framework_id and so it can be safely removed.
> This should land only after https://issues.apache.org/jira/browse/MESOS-2559 
> has been shipped.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-2513) FrameworkID not set in FrameworkInfo sent to Slave as part of RunTaskMessage

2018-01-12 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-2513:
--
Shepherd:   (was: Adam B)

> FrameworkID not set in FrameworkInfo sent to Slave as part of RunTaskMessage
> 
>
> Key: MESOS-2513
> URL: https://issues.apache.org/jira/browse/MESOS-2513
> Project: Mesos
>  Issue Type: Bug
>Reporter: Kapil Arya
>Assignee: Kapil Arya
>  Labels: mesosphere
>
> While working on creating the decorator/undecorator hooks we realized that in 
> one of our slave hooks, we receive a FrameworkInfo which doesn't have the 
> 'id' field set. As it turns out, the FrameworkInfo sent by Master to Slave as 
> part of the RunTaskMessage is missing the id as well. In fact, the Master 
> also sends a FrameworkID separately as part of the RunTaskMessage.
> Why not modify the Master to always set 'id' in FrameworkInfo before sending 
> it with RunTaskMessage. Would this affect the correctness?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-2731) Allow frameworks to deploy storage drivers on demand.

2018-01-12 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-2731:
--
Shepherd:   (was: Adam B)

> Allow frameworks to deploy storage drivers on demand.
> -
>
> Key: MESOS-2731
> URL: https://issues.apache.org/jira/browse/MESOS-2731
> Project: Mesos
>  Issue Type: Epic
>Reporter: Jörg Schad
>  Labels: mesosphere
>
> Certain storage options require storage drivers to access them including HDFS 
> driver, Quobyte client, Database driver, and so on.
> When Tasks in Mesos require access to such storage they also need access to 
> the respective driver on the node where they were scheduled to.
> As it is not desirable to deploy the driver onto all nodes in the cluster, it 
> would be good to deploy the driver on demand.
> Use Cases:
> 1. Fetcher Cache pulling resources from user-provided URIs
> 2. Framework executors/tasks requiring r/w access to HDFS/DFS
> 3. Framework executors/tasks requiring r/w Databases access (requiring 
> drivers)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-2732) Expose Mount Tables

2018-01-12 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-2732:
--
Shepherd:   (was: Adam B)

> Expose Mount Tables
> ---
>
> Key: MESOS-2732
> URL: https://issues.apache.org/jira/browse/MESOS-2732
> Project: Mesos
>  Issue Type: Epic
>Reporter: Jörg Schad
>  Labels: mesosphere
>
> When there are multiple distributed/network-attached filesystems connected to 
> a Mesos cluster, clients (e.g. the Mesos fetcher, or a Mesos task) of those 
> filesystems need a clear way to distinguish between them and Mesos needs a 
> way to direct requests to the correct (distributed) filesystem.
> _Use Cases_:
>  - Multiple HDFS clusters on the same Mesos cluster
>  - Connecting HDFS, MapRFS, Ceph, Lustre, GlusterFS, S3, GCS, and other 
> SAN/NAS to a Mesos cluster
>  - The Mesos fetcher may want to pull from any of the above.
>  - An executor or task may want to read or write to multiple filesystems, 
> within the same process.
> _Traditional Operating System Analogy_:
> Each line in Linux's fstab describes a different filesystem to mount into the 
> root filesystem:
>  1. The device name or remote filesystem to be mounted.
>  2. The mount point, where the data is to be attached to the root file system.
>  3. The file system type or algorithm used to interpret the file system.
>  4. Options to be used when mounting (e.g. Read-Only).
> _What we need for each filesystem in the Mesos ecosystem_:
>  1. The metadata server or dfs/san entrypoint host:port
>  2. Mount point, where this filesystem fits into the universal 
> Mesos-accessible filesystem namespace.
>  3. The protocol to speak, perhaps acceptable URI prefixes.
>  4. Options, ACLs for which frameworks/principals can access a particular 
> filesystem, and how.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-3401) Add labels to Resources

2018-01-12 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-3401:
--
Shepherd:   (was: Adam B)

> Add labels to Resources
> ---
>
> Key: MESOS-3401
> URL: https://issues.apache.org/jira/browse/MESOS-3401
> Project: Mesos
>  Issue Type: Improvement
>  Components: agent
>Reporter: Adam B
>  Labels: external-volumes, mesosphere, resources
>
> Similar to how we have added labels to tasks/executors (MESOS-2120), and even 
> FrameworkInfo (MESOS-2841), we should extend Resource to allow arbitrary 
> key/value pairs.
> This could be used to specify that a cpu resource has a certain speed, that a 
> disk resource is SSD, or express any other metadata about a built-in or 
> custom resource type. Only the scalar quantity will be used for determining 
> fair share in the Mesos allocator. The rest will be passed onto frameworks as 
> info they can use for scheduling decisions.
> This would require changes to how the slave specifies its `--resources` 
> (probably as json), how the slave/master reports resources in its web/json 
> API, and how resources are offered to frameworks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-4631) Document how to use custom authentication modules

2018-01-12 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-4631:
--
Shepherd:   (was: Adam B)

> Document how to use custom authentication modules
> -
>
> Key: MESOS-4631
> URL: https://issues.apache.org/jira/browse/MESOS-4631
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation
>Reporter: Neil Conway
>Priority: Minor
>  Labels: authentication, documentation, mesosphere
>
> The authentication doc page talks about custom authentication modules a bit, 
> but doesn't give enough information. For example:
> * What interface does a custom authentication module need to satisfy?
> * Can multiple authentication modules be used?
> * How do I implement a framework that authenticates with a master that uses a 
> non-default authentication module, e.g., one that doesn't use credentials?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7719) CHECK failure in mesos-execute with old master

2018-01-12 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-7719:
--
Shepherd:   (was: Adam B)

> CHECK failure in mesos-execute with old master
> --
>
> Key: MESOS-7719
> URL: https://issues.apache.org/jira/browse/MESOS-7719
> Project: Mesos
>  Issue Type: Bug
>Reporter: Neil Conway
>Assignee: Neil Conway
>  Labels: mesosphere
>
> If the master does not support reservation refinement, {{mesos-execute}} 
> might hit a {{CHECK}} failure:
> {noformat}
> *** Check failure stack trace: ***
> @0x113042b9a  google::LogMessage::Fail()
> @0x1130408b5  google::LogMessage::SendToLog()
> @0x113041609  google::LogMessage::Flush()
> @0x11304a268  google::LogMessageFatal::~LogMessageFatal()
> @0x113043085  google::LogMessageFatal::~LogMessageFatal()
> @0x111c3b70d  mesos::v1::Resources::isEmpty()
> @0x111c3db03  mesos::v1::Resources::Resource_::isEmpty()
> @0x111c363f1  mesos::v1::Resources::add()
> @0x111c4db2a  mesos::v1::Resources::operator+=()
> @0x111c3e63b  mesos::v1::Resources::operator+=()
> @0x111c3eb6d  mesos::v1::Resources::Resources()
> @0x111c3ec0d  mesos::v1::Resources::Resources()
> @0x10f39374d  CommandScheduler::offers()
> @0x10f3823d3  CommandScheduler::received()
> @0x10f38cd2c  
> _ZZN7process8dispatchI16CommandSchedulerNSt3__15queueIN5mesos2v19scheduler5EventENS2_5dequeIS7_NS2_9allocatorIS7_EESC_EEvRKNS_3PIDIT_EEMSE_FvT0_ET1_ENKUlPNS_11ProcessBaseEE_clESN_
> @0x10f38cb40  
> _ZNSt3__128__invoke_void_return_wrapperIvE6__callIJRZN7process8dispatchI16CommandSchedulerNS_5queueIN5mesos2v19scheduler5EventENS_5dequeISA_NS_9allocatorISA_EESF_EEvRKNS3_3PIDIT_EEMSH_FvT0_ET1_EUlPNS3_11ProcessBaseEE_SQ_EEEvDpOT_
> @0x10f38c869  
> _ZNSt3__110__function6__funcIZN7process8dispatchI16CommandSchedulerNS_5queueIN5mesos2v19scheduler5EventENS_5dequeIS9_NS_9allocatorIS9_EESE_EEvRKNS2_3PIDIT_EEMSG_FvT0_ET1_EUlPNS2_11ProcessBaseEE_NSB_ISQ_EEFvSP_EEclEOSP_
> @0x112e8d5ba  std::__1::function<>::operator()()
> @0x112e8d4fc  process::ProcessBase::visit()
> @0x112efa73e  process::DispatchEvent::visit()
> @0x10f381001  process::ProcessBase::serve()
> @0x112e880d6  process::ProcessManager::resume()
> @0x112f68cb0  
> process::ProcessManager::init_threads()::$_1::operator()()
> @0x112f688d2  
> _ZNSt3__114__thread_proxyINS_5tupleIJZN7process14ProcessManager12init_threadsEvE3$_1EPvS6_
> @ 0x7fffde14693b  _pthread_body
> @ 0x7fffde146887  _pthread_start
> @ 0x7fffde14608d  thread_start
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-5343) Behavior of custom HTTP authenticators with disabled HTTP authentication is inconsistent between master and agent

2018-01-12 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-5343:
--
Shepherd:   (was: Adam B)

> Behavior of custom HTTP authenticators with disabled HTTP authentication is 
> inconsistent between master and agent
> -
>
> Key: MESOS-5343
> URL: https://issues.apache.org/jira/browse/MESOS-5343
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.0.0
>Reporter: Benjamin Bannier
>Priority: Minor
>  Labels: mesosphere, security
>
> When setting a custom authenticator with {{http_authenticators}} and also 
> specifying {{authenticate_http=false}} currently agents refuse to start with
> {code}
> A custom HTTP authenticator was specified with the '--http_authenticators' 
> flag, but HTTP authentication was not enabled via '--authenticate_http'
> {code}
> Masters on the other hand accept this setting.
> Having differing behavior between master and agents is confusing, and we 
> should decide on whether we want to accept these settings or not, and make 
> the implementations consistent.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-5032) Remove plain text Credential format (after deprecation cycle)

2018-01-12 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-5032:
--
Shepherd:   (was: Adam B)

> Remove plain text Credential format (after deprecation cycle)
> -
>
> Key: MESOS-5032
> URL: https://issues.apache.org/jira/browse/MESOS-5032
> Project: Mesos
>  Issue Type: Improvement
>  Components: agent, master
>Affects Versions: 1.0.0
>Reporter: Cody Maloney
>Priority: Minor
>  Labels: mesosphere, security, tech-debt
>
> Currently two formats of credentials are supported: JSON
> {code}
>   "credentials": [
> {
>   "principal": "sherman",
>   "secret": "kitesurf"
> }
> {code}
> And a deprecated new line file:
> {code}
> principal1 secret1
> pricipal2 secret2
> {code}
> We deprecated the new line format in 0.29, and should remove it after the 
> deprecation cycle ends.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7247) HTTP Authenticator modules should be able to redirect users

2018-01-12 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-7247:
--
Shepherd:   (was: Adam B)

> HTTP Authenticator modules should be able to redirect users
> ---
>
> Key: MESOS-7247
> URL: https://issues.apache.org/jira/browse/MESOS-7247
> Project: Mesos
>  Issue Type: Improvement
>  Components: agent, libprocess, master
>Reporter: Silas Snider
>  Labels: mesosphere
>
> RIght now, Autheticator modules can only respond with an Unauthorized HTTP 
> status code if they need to get auth information from the client. This works 
> for Basic auth, but not for authentication types like oauth, which expect the 
> server to redirect the client to the right authorization provider URL.
> We should change AuthenticationResult to allow arbitrary http responses to 
> allow for more flexibility here.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-2317) Remove deprecated checkpoint=false code

2018-01-12 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-2317:
--
Shepherd:   (was: Adam B)

> Remove deprecated checkpoint=false code
> ---
>
> Key: MESOS-2317
> URL: https://issues.apache.org/jira/browse/MESOS-2317
> Project: Mesos
>  Issue Type: Epic
>Affects Versions: 0.22.0
>Reporter: Adam B
>Assignee: Jörg Schad
>  Labels: checkpoint, mesosphere
>
> Cody's plan from MESOS-444 was:
> 1) -Make it so the flag can't be changed at the command line-
> 2) -Remove the checkpoint variable entirely from slave/flags.hpp. This is a 
> fairly involved change since a number of unit tests depend on manually 
> setting the flag, as well as the default being non-checkpointing.-
> 3) -Remove logic around checkpointing in the slave, remove logic inside the 
> master.-
> 4) Drop the flag from the SlaveInfo struct (Will require a deprecation cycle).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-905) Remove Framework.id in favor of FrameworkInfo.id

2018-01-12 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-905:
-
Shepherd:   (was: Adam B)

> Remove Framework.id in favor of FrameworkInfo.id
> 
>
> Key: MESOS-905
> URL: https://issues.apache.org/jira/browse/MESOS-905
> Project: Mesos
>  Issue Type: Story
>  Components: framework
>Reporter: Adam B
>Assignee: Kapil Arya
>  Labels: mesosphere
>
> Framework.id currently holds the correct FrameworkId, but Framework also 
> contains a FrameworkInfo, and the FrameworkInfo.id is not necessarily set.
> I propose that we eliminate the Framework.id member variable and replace it 
> with a Framework.id() accessor that references Framework.FrameworkInfo.id and 
> ensure that it is correctly set.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-907) Add Kerberos Authentication support

2018-01-12 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-907:
-
Shepherd:   (was: Adam B)

> Add Kerberos Authentication support
> ---
>
> Key: MESOS-907
> URL: https://issues.apache.org/jira/browse/MESOS-907
> Project: Mesos
>  Issue Type: Story
>Reporter: Adam B
>Assignee: Tim Anderegg
>  Labels: security, twitter
>
> MESOS-704 added basic authentication support using CRAM-MD5 through SASL. Now 
> we should integrate Kerberos authentication using GSS-API, which is already 
> supported by SASL. Kerberos is a widely-used industry standard authentication 
> service, and integration with Mesos will make it easier for customers to 
> integrate their existing security process with Mesos.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7260) Authorization for `/role` endpoint should take both VIEW_ROLES and VIEW_FRAMEWORKS into account.

2018-01-12 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-7260:
--
Shepherd:   (was: Adam B)

> Authorization for `/role` endpoint should take both VIEW_ROLES and 
> VIEW_FRAMEWORKS into account.
> 
>
> Key: MESOS-7260
> URL: https://issues.apache.org/jira/browse/MESOS-7260
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API, master
>Reporter: Jay Guo
>Assignee: Jay Guo
>
> Consider following case: both {{framework1}} and {{framework2}} subscribe to 
> {{roleX}}, {{principal}} is allowed to view {{roleX}} and {{framework1}}, but 
> *NOT* {{framework2}}, therefore, {{/role}} endpoint should only contain 
> {{framework1}}, but not both frameworks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-5406) Validate ACLs on creating an instance of local authorizer.

2018-01-12 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-5406:
--
Shepherd:   (was: Adam B)

> Validate ACLs on creating an instance of local authorizer.
> --
>
> Key: MESOS-5406
> URL: https://issues.apache.org/jira/browse/MESOS-5406
> Project: Mesos
>  Issue Type: Improvement
>  Components: security
>Reporter: Alexander Rukletsov
>Assignee: Jay Guo
>  Labels: mesosphere, security
>
> Some combinations of ACLs are not allowed, for example, specifying both 
> {{SetQuota}} and {{UpdateQuota}}. We should capture such issues and error out 
> early. 
> This ticket aims to add as many validations as possible to a dedicated 
> {{validate()}} routine, instead of having them implicitly in the codebase.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-2728) Introduce concept of cluster wide resources.

2018-01-12 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-2728:
--
Shepherd:   (was: Adam B)

> Introduce concept of cluster wide resources.
> 
>
> Key: MESOS-2728
> URL: https://issues.apache.org/jira/browse/MESOS-2728
> Project: Mesos
>  Issue Type: Epic
>Reporter: Jörg Schad
>Assignee: Jörg Schad
>  Labels: external-volumes, mesosphere
>
> There are resources which are not provided by a single node. Consider for 
> example a external Network Bandwidth of a cluster. Being a limited resource 
> it makes sense for Mesos to manage it but still it is not a resource being 
> offered by a single node. A cluster-wide resource is still consumed by a 
> task, and when that task completes, the resources are then available to be 
> allocated to another framework/task.
> Use Cases:
> 1. Network Bandwidth
> 2. IP Addresses
> 3. Global Service Ports
> 4. Distributed File System Storage
> 5. Software Licences
> 6. SAN Volumes



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-8362) Verify end-to-end operation status update retry after RP failover

2018-01-08 Thread Adam B (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16317061#comment-16317061
 ] 

Adam B commented on MESOS-8362:
---

Story points, please?

> Verify end-to-end operation status update retry after RP failover
> -
>
> Key: MESOS-8362
> URL: https://issues.apache.org/jira/browse/MESOS-8362
> Project: Mesos
>  Issue Type: Task
>Reporter: Gastón Kleiman
>Assignee: Gastón Kleiman
>  Labels: mesosphere
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-8382) Master should bookkeep local resource providers.

2018-01-08 Thread Adam B (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16317060#comment-16317060
 ] 

Adam B commented on MESOS-8382:
---

Story points, please?

> Master should bookkeep local resource providers.
> 
>
> Key: MESOS-8382
> URL: https://issues.apache.org/jira/browse/MESOS-8382
> Project: Mesos
>  Issue Type: Task
>Reporter: Jie Yu
>Assignee: Benjamin Bannier
>   Original Estimate: 5m
>  Remaining Estimate: 5m
>
> This will simplify the handling of `UpdateSlaveMessage`. ALso, it'll simplify 
> the endpoint serving.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8357) Example frameworks have an inconsistent UX.

2018-01-08 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-8357:
--
Sprint: Mesosphere Sprint 71, Mesosphere Sprint 72  (was: Mesosphere Sprint 
71)

> Example frameworks have an inconsistent UX.
> ---
>
> Key: MESOS-8357
> URL: https://issues.apache.org/jira/browse/MESOS-8357
> Project: Mesos
>  Issue Type: Improvement
>Affects Versions: 1.5.0
>Reporter: Till Toenshoff
>Assignee: Till Toenshoff
>Priority: Minor
>  Labels: mesosphere
>
> Our example frameworks are a bit inconsistent when it comes to specifying 
> things like the framework principal / secret etc.. 
> Many of these examples have great value in testing a Mesos cluster. Unifying 
> the parameterizing would improve the user experience when testing Mesos.
> {{MESOS_AUTHENTICATE_FRAMEWORKS}} is being used by many examples for enabling 
> / disabling authentication. {{load_generator_framework}} as one example 
> however uses {{MESOS_AUTHENTICATE}} for that purpose. The credentials 
> themselves are most commonly expected in environment variables 
> {{DEFAULT_PRINCIPAL}} and {{DEFAULT_SECRET}} while in some cases we chose to 
> use {{MESOS_PRINCIPAL}}, {{MESOS_SECRET}} instead.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-5362) Add authentication to example frameworks

2018-01-08 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-5362:
--
Sprint: Mesosphere Sprint 71, Mesosphere Sprint 72  (was: Mesosphere Sprint 
71)

> Add authentication to example frameworks
> 
>
> Key: MESOS-5362
> URL: https://issues.apache.org/jira/browse/MESOS-5362
> Project: Mesos
>  Issue Type: Improvement
>  Components: security
>Reporter: Greg Mann
>Assignee: Till Toenshoff
>  Labels: authentication, mesosphere, security
>
> Some example frameworks do not have the ability to authenticate with the 
> master. Adding authentication to the example frameworks that don't already 
> have it implemented would allow us to use these frameworks for testing in 
> authenticated/authorized scenarios.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8375) Use protobuf reflection to simplify upgrading of resources.

2018-01-08 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-8375:
--
Sprint: Mesosphere Sprint 71, Mesosphere Sprint 72  (was: Mesosphere Sprint 
71)

> Use protobuf reflection to simplify upgrading of resources.
> ---
>
> Key: MESOS-8375
> URL: https://issues.apache.org/jira/browse/MESOS-8375
> Project: Mesos
>  Issue Type: Task
>Reporter: Michael Park
>Assignee: Michael Park
>Priority: Blocker
>
> This is the {{upgradeResources}} half of the protobuf-reflection-based 
> upgrade/downgrade of resources: 
> https://issues.apache.org/jira/browse/MESOS-8221
> We will also add {{state::read}} to complement {{state::checkpoint}} which 
> will be used to read protobufs from disk rather than {{protobuf::read}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8373) Test reconciliation after operation is dropped en route to agent

2018-01-08 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-8373:
--
Sprint: Mesosphere Sprint 71, Mesosphere Sprint 72  (was: Mesosphere Sprint 
71)

> Test reconciliation after operation is dropped en route to agent
> 
>
> Key: MESOS-8373
> URL: https://issues.apache.org/jira/browse/MESOS-8373
> Project: Mesos
>  Issue Type: Task
>  Components: agent, master
>Reporter: Greg Mann
>Assignee: Greg Mann
>  Labels: mesosphere
>
> Since new code paths were added to handle operations on resources in 1.5, we 
> should test that such operations are reconciled correctly after an operation 
> is dropped on the way from the master to the agent.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-5333) GET /master/maintenance/schedule/ produces 404.

2018-01-08 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-5333:
--
Sprint: Mesosphere Sprint 70, Mesosphere Sprint 71, Mesosphere Sprint 72  
(was: Mesosphere Sprint 70, Mesosphere Sprint 71)

> GET /master/maintenance/schedule/ produces 404.
> ---
>
> Key: MESOS-5333
> URL: https://issues.apache.org/jira/browse/MESOS-5333
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API, libprocess
>Reporter: Nathan Handler
>Assignee: Alexander Rukletsov
>Priority: Minor
>  Labels: mesosphere
>
> Attempts to make a GET request to /master/maintenance/schedule/ result in a 
> 404. However, if I make a GET request to /master/maintenance/schedule 
> (without the trailing /), it works. My current (untested) theory is that this 
> might be related to the fact that there is also a 
> /master/maintenance/schedule/status endpoint (an endpoint built on top of a 
> functioning endpoint), as requests to /help and /help/ (with and without the 
> trailing slash) produce the same functioning result.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8291) Add documentation about fault domains

2018-01-08 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-8291:
--
Sprint: Mesosphere Sprint 70, Mesosphere Sprint 71, Mesosphere Sprint 72  
(was: Mesosphere Sprint 70, Mesosphere Sprint 71)

> Add documentation about fault domains
> -
>
> Key: MESOS-8291
> URL: https://issues.apache.org/jira/browse/MESOS-8291
> Project: Mesos
>  Issue Type: Documentation
>Reporter: Vinod Kone
>Assignee: Benno Evers
>
> We need some user docs for fault domains.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7790) Design hierarchical quota allocation.

2018-01-08 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-7790:
--
Sprint: Mesosphere Sprint 66, Mesosphere Sprint 67, Mesosphere Sprint 68, 
Mesosphere Sprint 69, Mesosphere Sprint 70, Mesosphere Sprint 71, Mesosphere 
Sprint 72  (was: Mesosphere Sprint 66, Mesosphere Sprint 67, Mesosphere Sprint 
68, Mesosphere Sprint 69, Mesosphere Sprint 70, Mesosphere Sprint 71)

> Design hierarchical quota allocation.
> -
>
> Key: MESOS-7790
> URL: https://issues.apache.org/jira/browse/MESOS-7790
> Project: Mesos
>  Issue Type: Task
>  Components: allocation
>Reporter: Benjamin Mahler
>Assignee: Michael Park
>  Labels: multitenancy
>
> When quota is assigned in the role hierarchy (see MESOS-6375), it's possible 
> for there to be "undelegated" quota for a role. For example:
> {noformat}
> ^
>   /   \
> /   \
>eng (90 cpus)   sales (10 cpus)
>  ^
>/   \
>  /   \
>  ads (50 cpus)   build (10 cpus)
> {noformat}
> Here, the "eng" role has 60 of its 90 cpus of quota delegated to its 
> children, and 30 cpus remain undelegated. We need to design how to allocate 
> these 30 cpus undelegated cpus. Are they allocated entirely to the "eng" 
> role? Are they allocated to the "eng" role tree? If so, how do we determine 
> how much is allocated to each role in the "eng" tree (i.e. "eng", "eng/ads", 
> "eng/build").



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8382) Master should bookkeep local resource providers.

2018-01-08 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-8382:
--
Sprint: Mesosphere Sprint 71, Mesosphere Sprint 72  (was: Mesosphere Sprint 
71)

> Master should bookkeep local resource providers.
> 
>
> Key: MESOS-8382
> URL: https://issues.apache.org/jira/browse/MESOS-8382
> Project: Mesos
>  Issue Type: Task
>Reporter: Jie Yu
>Assignee: Benjamin Bannier
>   Original Estimate: 5m
>  Remaining Estimate: 5m
>
> This will simplify the handling of `UpdateSlaveMessage`. ALso, it'll simplify 
> the endpoint serving.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8361) Example frameworks to support launching mesos-local.

2018-01-08 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-8361:
--
Sprint: Mesosphere Sprint 71, Mesosphere Sprint 72  (was: Mesosphere Sprint 
71)

> Example frameworks to support launching mesos-local.
> 
>
> Key: MESOS-8361
> URL: https://issues.apache.org/jira/browse/MESOS-8361
> Project: Mesos
>  Issue Type: Improvement
>  Components: framework
>Affects Versions: 1.5.0
>Reporter: Till Toenshoff
>Assignee: Till Toenshoff
>Priority: Minor
>  Labels: mesosphere
>
> The scheduler driver and library support implicit launching of mesos-local 
> for a convenient test setup. Some of our example frameworks account for this 
> in supporting implicit ACL rendering and more. 
> We should unify the experience by documenting this behaviour and adding it to 
> all example frameworks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8279) Persistent volumes are not visible in Mesos UI using default executor on Linux.

2018-01-03 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-8279:
--
Target Version/s: 1.6.0

> Persistent volumes are not visible in Mesos UI using default executor on 
> Linux.
> ---
>
> Key: MESOS-8279
> URL: https://issues.apache.org/jira/browse/MESOS-8279
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.2.2, 1.3.1, 1.4.1
>Reporter: Jie Yu
>
> The reason is because on Linux, if multiple containers in a default executor 
> want to share a persistent volume, it'll use SANDBOX_PATH volume source with 
> type PARENT. This will be translated into a bind mount in the nested 
> container's mount namespace, thus not visible in the host mount namespace. 
> Mesos UI operates in the host mount namespace.
> One potential solution for that is to create a symlink (instead of just a 
> mkdir) in the sandbox. The symlink will be shadowed by the bind mount in the 
> nested container, but in the host mount namespace, it'll points to the 
> corresponding persistent volume.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8221) Use protobuf reflection to simplify downgrading of resources.

2017-12-22 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-8221:
--
Sprint: Mesosphere Sprint 68, Mesosphere Sprint 69, Mesosphere Sprint 70, 
Mesosphere Sprint 71  (was: Mesosphere Sprint 68, Mesosphere Sprint 69, 
Mesosphere Sprint 70)

> Use protobuf reflection to simplify downgrading of resources.
> -
>
> Key: MESOS-8221
> URL: https://issues.apache.org/jira/browse/MESOS-8221
> Project: Mesos
>  Issue Type: Improvement
>  Components: agent
>Reporter: Michael Park
>Assignee: Michael Park
>
> We currently have a {{downgradeResources}} function which is called on every
> {{repeated Resource}} field in every message that we checkpoint. We should 
> leverage
> protobuf reflection to automatically downgrade any instances of {{Resource}} 
> within any
> protobuf message.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8352) Resources may get over allocated to some roles while fail to meet the quota of other roles.

2017-12-22 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-8352:
--
Sprint: Mesosphere Sprint 70, Mesosphere Sprint 71  (was: Mesosphere Sprint 
70)

> Resources may get over allocated to some roles while fail to meet the quota 
> of other roles.
> ---
>
> Key: MESOS-8352
> URL: https://issues.apache.org/jira/browse/MESOS-8352
> Project: Mesos
>  Issue Type: Bug
>  Components: allocation
>Reporter: Meng Zhu
>Assignee: Meng Zhu
>  Labels: multitenancy, quotas
>
> In the quota role allocation stage, if a role gets some resources on an agent 
> to meet its quota, it will also get all other resources on the same agent 
> that it does not have quota for. This may starve roles behind it that have 
> quotas set for those resources.
> To fix that, we need to track quota headroom in the quota role allocation 
> stage. In that stage, if a role has no quota set for a scalar resource, it 
> will get that resource only when two conditions are both met:
> - It got some other resources on the same agent to meet its quota; And
> - After allocating those resources, quota headroom is still above the 
> required amount.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8303) Add user doc for agent reconfiguration

2017-12-22 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-8303:
--
Sprint: Mesosphere Sprint 70, Mesosphere Sprint 71  (was: Mesosphere Sprint 
70)

> Add user doc for agent reconfiguration
> --
>
> Key: MESOS-8303
> URL: https://issues.apache.org/jira/browse/MESOS-8303
> Project: Mesos
>  Issue Type: Documentation
>Reporter: Vinod Kone
>Assignee: Benno Evers
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8184) Implement master's AcknowledgeOfferOperationMessage handler.

2017-12-22 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-8184:
--
Sprint: Mesosphere Sprint 68, Mesosphere Sprint 69, Mesosphere Sprint 70, 
Mesosphere Sprint 71  (was: Mesosphere Sprint 68, Mesosphere Sprint 69, 
Mesosphere Sprint 70)

> Implement master's AcknowledgeOfferOperationMessage handler.
> 
>
> Key: MESOS-8184
> URL: https://issues.apache.org/jira/browse/MESOS-8184
> Project: Mesos
>  Issue Type: Task
>Reporter: Gastón Kleiman
>Assignee: Gastón Kleiman
>  Labels: mesosphere
>
> This handler should validate the message and forward it to the corresponding 
> agent/ERP.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8144) Add a mock resource provider manager.

2017-12-22 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-8144:
--
Sprint: Mesosphere Sprint 66, Mesosphere Sprint 67, Mesosphere Sprint 68, 
Mesosphere Sprint 69, Mesosphere Sprint 70, Mesosphere Sprint 71  (was: 
Mesosphere Sprint 66, Mesosphere Sprint 67, Mesosphere Sprint 68, Mesosphere 
Sprint 69, Mesosphere Sprint 70)

> Add a mock resource provider manager.
> -
>
> Key: MESOS-8144
> URL: https://issues.apache.org/jira/browse/MESOS-8144
> Project: Mesos
>  Issue Type: Task
>Reporter: Chun-Hung Hsiao
>Assignee: Chun-Hung Hsiao
>  Labels: storage
>
> To test a storage local resource provider, we need to inject a mock resource 
> provider manager such that:
> 1. A full agent will start during the test so the resource provider can 
> launch standalone containers for CSI plugins.
> 2. We can inject offer operations through the mock manager to test the 
> resource provider.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7506) Multiple tests leave orphan containers.

2017-12-22 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-7506:
--
Sprint: Mesosphere Sprint 66, Mesosphere Sprint 67, Mesosphere Sprint 68, 
Mesosphere Sprint 69, Mesosphere Sprint 70, Mesosphere Sprint 71  (was: 
Mesosphere Sprint 66, Mesosphere Sprint 67, Mesosphere Sprint 68, Mesosphere 
Sprint 69, Mesosphere Sprint 70)

> Multiple tests leave orphan containers.
> ---
>
> Key: MESOS-7506
> URL: https://issues.apache.org/jira/browse/MESOS-7506
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
> Environment: Ubuntu 16.04
> Fedora 23
> other Linux distros
>Reporter: Alexander Rukletsov
>Assignee: Andrei Budnik
>  Labels: containerizer, flaky-test, mesosphere
> Attachments: KillMultipleTasks-badrun.txt, 
> ROOT_IsolatorFlags-badrun.txt, ResourceLimitation-badrun.txt, 
> ResourceLimitation-badrun2.txt, 
> RestartSlaveRequireExecutorAuthentication-badrun.txt, 
> TaskWithFileURI-badrun.txt
>
>
> I've observed a number of flaky tests that leave orphan containers upon 
> cleanup. A typical log looks like this:
> {noformat}
> ../../src/tests/cluster.cpp:580: Failure
> Value of: containers->empty()
>   Actual: false
> Expected: true
> Failed to destroy containers: { da3e8aa8-98e7-4e72-a8fd-5d0bae960014 }
> {noformat}
> All currently affected tests:
> {noformat}
> SlaveTest.RestartSlaveRequireExecutorAuthentication
> LinuxCapabilitiesIsolatorFlagsTest.ROOT_IsolatorFlags
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8096) Enqueueing events in MockHTTPScheduler can lead to segfaults.

2017-12-22 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-8096:
--
Sprint: Mesosphere Sprint 66, Mesosphere Sprint 67, Mesosphere Sprint 68, 
Mesosphere Sprint 69, Mesosphere Sprint 70, Mesosphere Sprint 71  (was: 
Mesosphere Sprint 66, Mesosphere Sprint 67, Mesosphere Sprint 68, Mesosphere 
Sprint 69, Mesosphere Sprint 70)

> Enqueueing events in MockHTTPScheduler can lead to segfaults.
> -
>
> Key: MESOS-8096
> URL: https://issues.apache.org/jira/browse/MESOS-8096
> Project: Mesos
>  Issue Type: Bug
>  Components: scheduler driver, test
> Environment: Fedora 23, Ubuntu 14.04, Ubuntu 16
>Reporter: Alexander Rukletsov
>Assignee: Alexander Rukletsov
>  Labels: flaky-test, mesosphere
> Attachments: AsyncExecutorProcess-badrun-1.txt, 
> AsyncExecutorProcess-badrun-2.txt, AsyncExecutorProcess-badrun-3.txt, 
> scheduler-shutdown-invalid-driver.txt
>
>
> Various tests segfault due to a yet unknown reason. Comparing logs (attached) 
> hints that the problem might be in the scheduler's event queue.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8102) Add a test CSI plugin for storage local resource provider.

2017-12-22 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-8102:
--
Sprint: Mesosphere Sprint 66, Mesosphere Sprint 67, Mesosphere Sprint 68, 
Mesosphere Sprint 69, Mesosphere Sprint 70, Mesosphere Sprint 71  (was: 
Mesosphere Sprint 66, Mesosphere Sprint 67, Mesosphere Sprint 68, Mesosphere 
Sprint 69, Mesosphere Sprint 70)

> Add a test CSI plugin for storage local resource provider.
> --
>
> Key: MESOS-8102
> URL: https://issues.apache.org/jira/browse/MESOS-8102
> Project: Mesos
>  Issue Type: Task
>  Components: test
>Reporter: Chun-Hung Hsiao
>Assignee: Chun-Hung Hsiao
>  Labels: mesosphere, storage
>
> We need a dummy CSI plugin for testing storage local resoure providers. The 
> test CSI plugin would just create subdirectories under its working 
> directories to mimic the behavior of creating volumes, then bind-mount those 
> volumes to mimic publish.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8115) Add a master flag to disallow agents that are not configured with fault domain

2017-12-22 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-8115:
--
Sprint: Mesosphere Sprint 66, Mesosphere Sprint 67, Mesosphere Sprint 68, 
Mesosphere Sprint 69, Mesosphere Sprint 70, Mesosphere Sprint 71  (was: 
Mesosphere Sprint 66, Mesosphere Sprint 67, Mesosphere Sprint 68, Mesosphere 
Sprint 69, Mesosphere Sprint 70)

> Add a master flag to disallow agents that are not configured with fault domain
> --
>
> Key: MESOS-8115
> URL: https://issues.apache.org/jira/browse/MESOS-8115
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Vinod Kone
>Assignee: Benno Evers
>
> Once mesos masters and agents in a cluster are *all* upgraded to a version 
> where the fault domains feature is available, it is beneficial to enforce 
> that agents without a fault domain configured are not allowed to join the 
> cluster. 
> This is a safety net for operators who could forget to configure the fault 
> domain of a remote agent and let it join the cluster. If this happens, an 
> agent in a remote region will be considered a local agent by the master and 
> frameworks (because agent's fault domain is not configured) causing tasks to 
> potentially land in a remote agent which is undesirable.
> Note that this has to be a configurable flag and not enforced by default 
> because otherwise upgrades from a fault domain non-configured cluster to a 
> configured cluster will not be possible.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8297) Built-in driver-based executors ignore kill task if the task has not been launched.

2017-12-22 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-8297:
--
Sprint: Mesosphere Sprint 69, Mesosphere Sprint 70, Mesosphere Sprint 71  
(was: Mesosphere Sprint 69, Mesosphere Sprint 70)

> Built-in driver-based executors ignore kill task if the task has not been 
> launched.
> ---
>
> Key: MESOS-8297
> URL: https://issues.apache.org/jira/browse/MESOS-8297
> Project: Mesos
>  Issue Type: Bug
>  Components: executor
>Reporter: Alexander Rukletsov
>Assignee: Alexander Rukletsov
>Priority: Blocker
>  Labels: mesosphere
>
> If docker executor receives a kill task request and the task has never been 
> launch, the request is ignored. We now know that: the executor has never 
> received the registration confirmation, hence has ignored the launch task 
> request, hence the task has never started. And this is how the executor 
> enters an idle state, waiting for registration and ignoring kill task 
> requests.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7790) Design hierarchical quota allocation.

2017-12-22 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-7790:
--
Sprint: Mesosphere Sprint 66, Mesosphere Sprint 67, Mesosphere Sprint 68, 
Mesosphere Sprint 69, Mesosphere Sprint 70, Mesosphere Sprint 71  (was: 
Mesosphere Sprint 66, Mesosphere Sprint 67, Mesosphere Sprint 68, Mesosphere 
Sprint 69, Mesosphere Sprint 70)

> Design hierarchical quota allocation.
> -
>
> Key: MESOS-7790
> URL: https://issues.apache.org/jira/browse/MESOS-7790
> Project: Mesos
>  Issue Type: Task
>  Components: allocation
>Reporter: Benjamin Mahler
>Assignee: Michael Park
>  Labels: multitenancy
>
> When quota is assigned in the role hierarchy (see MESOS-6375), it's possible 
> for there to be "undelegated" quota for a role. For example:
> {noformat}
> ^
>   /   \
> /   \
>eng (90 cpus)   sales (10 cpus)
>  ^
>/   \
>  /   \
>  ads (50 cpus)   build (10 cpus)
> {noformat}
> Here, the "eng" role has 60 of its 90 cpus of quota delegated to its 
> children, and 30 cpus remain undelegated. We need to design how to allocate 
> these 30 cpus undelegated cpus. Are they allocated entirely to the "eng" 
> role? Are they allocated to the "eng" role tree? If so, how do we determine 
> how much is allocated to each role in the "eng" tree (i.e. "eng", "eng/ads", 
> "eng/build").



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8240) Add an option to build the new CLI and run unit tests.

2017-12-22 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-8240:
--
Sprint: Mesosphere Sprint 70, Mesosphere Sprint 71  (was: Mesosphere Sprint 
70)

> Add an option to build the new CLI and run unit tests.
> --
>
> Key: MESOS-8240
> URL: https://issues.apache.org/jira/browse/MESOS-8240
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Armand Grillet
>Assignee: Armand Grillet
>
> An update of the discarded https://reviews.apache.org/r/52543/



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8190) Update the master to accept OfferOperationIDs from frameworks.

2017-12-22 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-8190:
--
Sprint: Mesosphere Sprint 69, Mesosphere Sprint 70, Mesosphere Sprint 71  
(was: Mesosphere Sprint 69, Mesosphere Sprint 70)

> Update the master to accept OfferOperationIDs from frameworks.
> --
>
> Key: MESOS-8190
> URL: https://issues.apache.org/jira/browse/MESOS-8190
> Project: Mesos
>  Issue Type: Task
>Reporter: Gastón Kleiman
>Assignee: Greg Mann
>  Labels: mesosphere
>
> Master’s {{ACCEPT}} handler should send failed operation updates when a 
> framework sets the {{OfferOperationID}} on an operation destined for an agent 
> without the {{RESOURCE_PROVIDER}} capability.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8108) Process offer operations in storage local resource provider

2017-12-22 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-8108:
--
Sprint: Mesosphere Sprint 66, Mesosphere Sprint 67, Mesosphere Sprint 68, 
Mesosphere Sprint 69, Mesosphere Sprint 70, Mesosphere Sprint 71  (was: 
Mesosphere Sprint 66, Mesosphere Sprint 67, Mesosphere Sprint 68, Mesosphere 
Sprint 69, Mesosphere Sprint 70)

> Process offer operations in storage local resource provider
> ---
>
> Key: MESOS-8108
> URL: https://issues.apache.org/jira/browse/MESOS-8108
> Project: Mesos
>  Issue Type: Task
>  Components: storage
>Reporter: Chun-Hung Hsiao
>Assignee: Chun-Hung Hsiao
>  Labels: storage
>
> The storage local resource provider receives offer operations for 
> reservations and resource conversions, and invoke proper CSI calls to 
> implement these operations.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8244) Add operator API to reload local resource providers.

2017-12-22 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-8244:
--
Sprint: Mesosphere Sprint 68, Mesosphere Sprint 69, Mesosphere Sprint 70, 
Mesosphere Sprint 71  (was: Mesosphere Sprint 68, Mesosphere Sprint 69, 
Mesosphere Sprint 70)

> Add operator API to reload local resource providers.
> 
>
> Key: MESOS-8244
> URL: https://issues.apache.org/jira/browse/MESOS-8244
> Project: Mesos
>  Issue Type: Task
>Reporter: Chun-Hung Hsiao
>Assignee: Chun-Hung Hsiao
>  Labels: mesosphere, storage
>
> To add, remove and update local resource providers on the fly more 
> conveniently and without restarting agents, we would like to introduce new 
> operator API to add new config files in the resource provider config 
> directory and trigger a reload for the resource provider.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8291) Add documentation about fault domains

2017-12-22 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-8291:
--
Sprint: Mesosphere Sprint 70, Mesosphere Sprint 71  (was: Mesosphere Sprint 
70)

> Add documentation about fault domains
> -
>
> Key: MESOS-8291
> URL: https://issues.apache.org/jira/browse/MESOS-8291
> Project: Mesos
>  Issue Type: Documentation
>Reporter: Vinod Kone
>Assignee: Benno Evers
>
> We need some user docs for fault domains.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8143) Publish and unpublish storage local resources through CSI plugins.

2017-12-22 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-8143:
--
Sprint: Mesosphere Sprint 66, Mesosphere Sprint 67, Mesosphere Sprint 68, 
Mesosphere Sprint 69, Mesosphere Sprint 70, Mesosphere Sprint 71  (was: 
Mesosphere Sprint 66, Mesosphere Sprint 67, Mesosphere Sprint 68, Mesosphere 
Sprint 69, Mesosphere Sprint 70)

> Publish and unpublish storage local resources through CSI plugins.
> --
>
> Key: MESOS-8143
> URL: https://issues.apache.org/jira/browse/MESOS-8143
> Project: Mesos
>  Issue Type: Task
>  Components: storage
>Reporter: Chun-Hung Hsiao
>Assignee: Chun-Hung Hsiao
>  Labels: mesosphere, storage
>
> Storage local resource provider needs to call the following CSI API to 
> publish CSI volumes for tasks to use:
> 1. ControllerPublishVolume (optional)
> 2. NodePublishVolume
> Although we don't need to unpublish CSI volumes after tasks are completed, we 
> still needs to unpublish them for DESTROY_VOLUME or DESTROY_BLOCK:
> 1. NodeUnpublishVolume
> 2. ControllerUnpublishVolume (optional)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8101) Import resources from CSI plugins in storage local resource provider.

2017-12-22 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-8101:
--
Sprint: Mesosphere Sprint 66, Mesosphere Sprint 67, Mesosphere Sprint 68, 
Mesosphere Sprint 69, Mesosphere Sprint 70, Mesosphere Sprint 71  (was: 
Mesosphere Sprint 66, Mesosphere Sprint 67, Mesosphere Sprint 68, Mesosphere 
Sprint 69, Mesosphere Sprint 70)

> Import resources from CSI plugins in storage local resource provider.
> -
>
> Key: MESOS-8101
> URL: https://issues.apache.org/jira/browse/MESOS-8101
> Project: Mesos
>  Issue Type: Task
>  Components: storage
>Reporter: Chun-Hung Hsiao
>Assignee: Chun-Hung Hsiao
>  Labels: mesosphere, storage
>
> The following lists the steps to import resources from a CSI plugin:
> 1. Launch the node plugin
> 1.1 GetSupportedVersions
> 1.2 GetPluginInfo
> 1.3 ProbeNode
> 1.4 GetNodeCapabilities
> 2. Launch the controller plugin
> 2.1 GetSuportedVersions
> 2.2 GetPluginInfo
> 2.3 GetControllerCapabilities
> 3. GetCapacity
> 4. ListVolumes
> 5. Report to the resource provider through UPDATE_TOTAL_RESOURCES



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-5333) GET /master/maintenance/schedule/ produces 404.

2017-12-22 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-5333:
--
Sprint: Mesosphere Sprint 70, Mesosphere Sprint 71  (was: Mesosphere Sprint 
70)

> GET /master/maintenance/schedule/ produces 404.
> ---
>
> Key: MESOS-5333
> URL: https://issues.apache.org/jira/browse/MESOS-5333
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API, libprocess
>Reporter: Nathan Handler
>Assignee: Alexander Rukletsov
>Priority: Minor
>  Labels: mesosphere
>
> Attempts to make a GET request to /master/maintenance/schedule/ result in a 
> 404. However, if I make a GET request to /master/maintenance/schedule 
> (without the trailing /), it works. My current (untested) theory is that this 
> might be related to the fact that there is also a 
> /master/maintenance/schedule/status endpoint (an endpoint built on top of a 
> functioning endpoint), as requests to /help and /help/ (with and without the 
> trailing slash) produce the same functioning result.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7550) Publish Local Resource Provider resources in the agent before container launch or update.

2017-12-22 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-7550:
--
Sprint: Mesosphere Sprint 66, Mesosphere Sprint 67, Mesosphere Sprint 68, 
Mesosphere Sprint 69, Mesosphere Sprint 70, Mesosphere Sprint 71  (was: 
Mesosphere Sprint 66, Mesosphere Sprint 67, Mesosphere Sprint 68, Mesosphere 
Sprint 69, Mesosphere Sprint 70)

> Publish Local Resource Provider resources in the agent before container 
> launch or update.
> -
>
> Key: MESOS-7550
> URL: https://issues.apache.org/jira/browse/MESOS-7550
> Project: Mesos
>  Issue Type: Task
>Reporter: Jie Yu
>Assignee: Chun-Hung Hsiao
>  Labels: mesosphere, storage
>
> The agent will ask RP manager to publish the resources before container can 
> start to use them. SLRP (storage local resource provider) will be responsible 
> for making sure the CSI volume is made available on the host. This will 
> involve calling `ControllerPublishVolume` and `NodePublishVolume` RPCs from 
> the CSI Plugin.
> This will happen when a workload (i.e., task/executor) are being launched on 
> the agent that uses a CSI volume as a persistent volume. During the creation 
> of a CSI volume, the SLRP will generate a fixed mount point under the agent's 
> work directory based on the ID of the CSI volume, and store the mount point 
> in the `Resource.disk.source.path.root` or `Resource.disk.source.path.mount` 
> fields. Prior to a workload launch, SLRP will mount the CSI volume to the 
> same path, then the Docker containerizer or the Mesos containerizer will 
> again bind-mount the volume into the container of the workload. Since the 
> containerizers know nothing about the resource providers, it would extract 
> the mount point of the CSI volume from the `Resource.disk.source.path.root` 
> or `Resource.disk.source.path.mount` fields.
> For storage local resource provider, the agent's work directory is known 
> during the creation of the CSI volume since it will be created an used on the 
> same agent. However, in the case of a storage external resource provider, 
> where a CSI volume might be created on one agent X and published on another 
> agent Y, the work directory of agent Y might not be known at the creation of 
> a CSI volume on X. To support it in the future, we introduce new semantics 
> for `Resource.disk.source.path.root` and `Resource.disk.source.path.mount`, 
> such that if these fields are set to relative paths, they are relative to the 
> agent's work directory, so the containerizer can extract the mount point by 
> prefixing the relative paths with the agent's work directory.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8265) Add state recovery for storage local resource provider.

2017-12-22 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-8265:
--
Sprint: Mesosphere Sprint 68, Mesosphere Sprint 69, Mesosphere Sprint 70, 
Mesosphere Sprint 71  (was: Mesosphere Sprint 68, Mesosphere Sprint 69, 
Mesosphere Sprint 70)

> Add state recovery for storage local resource provider.
> ---
>
> Key: MESOS-8265
> URL: https://issues.apache.org/jira/browse/MESOS-8265
> Project: Mesos
>  Issue Type: Task
>Reporter: Chun-Hung Hsiao
>Assignee: Chun-Hung Hsiao
>  Labels: mesosphere, storage
>
> The storage local resource provider needs to checkpoint its total resources 
> and pending operations atomically, and recover them after failing over.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8032) Launch CSI plugins in storage local resource provider.

2017-12-22 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-8032:
--
Sprint: Mesosphere Sprint 64, Mesosphere Sprint 65, Mesosphere Sprint 66, 
Mesosphere Sprint 67, Mesosphere Sprint 68, Mesosphere Sprint 69, Mesosphere 
Sprint 70, Mesosphere Sprint 71  (was: Mesosphere Sprint 64, Mesosphere Sprint 
65, Mesosphere Sprint 66, Mesosphere Sprint 67, Mesosphere Sprint 68, 
Mesosphere Sprint 69, Mesosphere Sprint 70)

> Launch CSI plugins in storage local resource provider.
> --
>
> Key: MESOS-8032
> URL: https://issues.apache.org/jira/browse/MESOS-8032
> Project: Mesos
>  Issue Type: Task
>  Components: storage
>Reporter: Chun-Hung Hsiao
>Assignee: Chun-Hung Hsiao
>  Labels: mesosphere, storage
>
> Launching a CSI plugin requires the following steps:
> 1. Verify the configuration.
> 2. Prepare a directory in the work directory of the resource provider where 
> the socket file should be placed, and construct the path of the socket file.
> 3. If the socket file already exists and the plugin is already running, we 
> should not launch another plugin instance.
> 4. Otherwise, launch a standalone container to run the plugin and connect to 
> it through the socket file.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7697) Mesos scheduler v1 HTTP API may generate 404 errors for temporary conditions

2017-12-15 Thread Adam B (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16293510#comment-16293510
 ] 

Adam B commented on MESOS-7697:
---

cc: [~vinodkone], [~anandmazumdar]

> Mesos scheduler v1 HTTP API may generate 404 errors for temporary conditions
> 
>
> Key: MESOS-7697
> URL: https://issues.apache.org/jira/browse/MESOS-7697
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API, libprocess
>Reporter: James DeFelice
>  Labels: mesosphere
>
> Returning a 404 error for a condition that's a known temporary condition is 
> confusing from a client's perspective. A client wants to know how to recover 
> from various error conditions. A 404 error condition should be distinct from 
> a "server is not yet ready, but will be shortly" condition (which should 
> probably be reported as a 503 "unavailable" error).
> https://github.com/apache/mesos/blob/72752fc6deb8ebcbfbd5448dc599ef3774339d31/src/scheduler/scheduler.cpp#L593
> {code}
> if (response->code == process::http::Status::NOT_FOUND) {
>   // This could happen if the master libprocess process has not yet set up
>   // HTTP routes.
>   LOG(WARNING) << "Received '" << response->status << "' ("
><< response->body << ") for " << call.type();
>   return;
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8297) Built-in driver-based executors ignore kill task if the task has not been launched.

2017-12-08 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-8297:
--
Sprint: Mesosphere Sprint 69, Mesosphere Sprint 70  (was: Mesosphere Sprint 
69)

> Built-in driver-based executors ignore kill task if the task has not been 
> launched.
> ---
>
> Key: MESOS-8297
> URL: https://issues.apache.org/jira/browse/MESOS-8297
> Project: Mesos
>  Issue Type: Bug
>  Components: executor
>Reporter: Alexander Rukletsov
>Assignee: Alexander Rukletsov
>Priority: Blocker
>  Labels: mesosphere
>
> If docker executor receives a kill task request and the task has never been 
> launch, the request is ignored. We now know that: the executor has never 
> received the registration confirmation, hence has ignored the launch task 
> request, hence the task has never started. And this is how the executor 
> enters an idle state, waiting for registration and ignoring kill task 
> requests.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8190) Update the master to accept OfferOperationIDs from frameworks.

2017-12-08 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-8190:
--
Sprint: Mesosphere Sprint 69, Mesosphere Sprint 70  (was: Mesosphere Sprint 
69)

> Update the master to accept OfferOperationIDs from frameworks.
> --
>
> Key: MESOS-8190
> URL: https://issues.apache.org/jira/browse/MESOS-8190
> Project: Mesos
>  Issue Type: Task
>Reporter: Gastón Kleiman
>Assignee: Greg Mann
>  Labels: mesosphere
>
> Master’s {{ACCEPT}} handler should send failed operation updates when a 
> framework sets the {{OfferOperationID}} on an operation destined for an agent 
> without the {{RESOURCE_PROVIDER}} capability.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7550) Publish Local Resource Provider resources in the agent before container launch or update.

2017-12-08 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-7550:
--
Sprint: Mesosphere Sprint 66, Mesosphere Sprint 67, Mesosphere Sprint 68, 
Mesosphere Sprint 69, Mesosphere Sprint 70  (was: Mesosphere Sprint 66, 
Mesosphere Sprint 67, Mesosphere Sprint 68, Mesosphere Sprint 69)

> Publish Local Resource Provider resources in the agent before container 
> launch or update.
> -
>
> Key: MESOS-7550
> URL: https://issues.apache.org/jira/browse/MESOS-7550
> Project: Mesos
>  Issue Type: Task
>Reporter: Jie Yu
>Assignee: Chun-Hung Hsiao
>  Labels: mesosphere, storage
>
> The agent will ask RP manager to publish the resources before container can 
> start to use them. SLRP (storage local resource provider) will be responsible 
> for making sure the CSI volume is made available on the host. This will 
> involve calling `ControllerPublishVolume` and `NodePublishVolume` RPCs from 
> the CSI Plugin.
> This will happen when a workload (i.e., task/executor) are being launched on 
> the agent that uses a CSI volume as a persistent volume. During the creation 
> of a CSI volume, the SLRP will generate a fixed mount point under the agent's 
> work directory based on the ID of the CSI volume, and store the mount point 
> in the `Resource.disk.source.path.root` or `Resource.disk.source.path.mount` 
> fields. Prior to a workload launch, SLRP will mount the CSI volume to the 
> same path, then the Docker containerizer or the Mesos containerizer will 
> again bind-mount the volume into the container of the workload. Since the 
> containerizers know nothing about the resource providers, it would extract 
> the mount point of the CSI volume from the `Resource.disk.source.path.root` 
> or `Resource.disk.source.path.mount` fields.
> For storage local resource provider, the agent's work directory is known 
> during the creation of the CSI volume since it will be created an used on the 
> same agent. However, in the case of a storage external resource provider, 
> where a CSI volume might be created on one agent X and published on another 
> agent Y, the work directory of agent Y might not be known at the creation of 
> a CSI volume on X. To support it in the future, we introduce new semantics 
> for `Resource.disk.source.path.root` and `Resource.disk.source.path.mount`, 
> such that if these fields are set to relative paths, they are relative to the 
> agent's work directory, so the containerizer can extract the mount point by 
> prefixing the relative paths with the agent's work directory.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8265) Add state recovery for storage local resource provider.

2017-12-08 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-8265:
--
Sprint: Mesosphere Sprint 68, Mesosphere Sprint 69, Mesosphere Sprint 70  
(was: Mesosphere Sprint 68, Mesosphere Sprint 69)

> Add state recovery for storage local resource provider.
> ---
>
> Key: MESOS-8265
> URL: https://issues.apache.org/jira/browse/MESOS-8265
> Project: Mesos
>  Issue Type: Task
>Reporter: Chun-Hung Hsiao
>Assignee: Chun-Hung Hsiao
>  Labels: mesosphere, storage
>
> The storage local resource provider needs to checkpoint its total resources 
> and pending operations atomically, and recover them after failing over.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8143) Publish and unpublish storage local resources through CSI plugins.

2017-12-08 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-8143:
--
Sprint: Mesosphere Sprint 66, Mesosphere Sprint 67, Mesosphere Sprint 68, 
Mesosphere Sprint 69, Mesosphere Sprint 70  (was: Mesosphere Sprint 66, 
Mesosphere Sprint 67, Mesosphere Sprint 68, Mesosphere Sprint 69)

> Publish and unpublish storage local resources through CSI plugins.
> --
>
> Key: MESOS-8143
> URL: https://issues.apache.org/jira/browse/MESOS-8143
> Project: Mesos
>  Issue Type: Task
>  Components: storage
>Reporter: Chun-Hung Hsiao
>Assignee: Chun-Hung Hsiao
>  Labels: mesosphere, storage
>
> Storage local resource provider needs to call the following CSI API to 
> publish CSI volumes for tasks to use:
> 1. ControllerPublishVolume (optional)
> 2. NodePublishVolume
> Although we don't need to unpublish CSI volumes after tasks are completed, we 
> still needs to unpublish them for DESTROY_VOLUME or DESTROY_BLOCK:
> 1. NodeUnpublishVolume
> 2. ControllerUnpublishVolume (optional)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8244) Add operator API to reload local resource providers.

2017-12-08 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-8244:
--
Sprint: Mesosphere Sprint 68, Mesosphere Sprint 69, Mesosphere Sprint 70  
(was: Mesosphere Sprint 68, Mesosphere Sprint 69)

> Add operator API to reload local resource providers.
> 
>
> Key: MESOS-8244
> URL: https://issues.apache.org/jira/browse/MESOS-8244
> Project: Mesos
>  Issue Type: Task
>Reporter: Chun-Hung Hsiao
>Assignee: Chun-Hung Hsiao
>  Labels: mesosphere, storage
>
> To add, remove and update local resource providers on the fly more 
> conveniently and without restarting agents, we would like to introduce new 
> operator API to add new config files in the resource provider config 
> directory and trigger a reload for the resource provider.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8108) Process offer operations in storage local resource provider

2017-12-08 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-8108:
--
Sprint: Mesosphere Sprint 66, Mesosphere Sprint 67, Mesosphere Sprint 68, 
Mesosphere Sprint 69, Mesosphere Sprint 70  (was: Mesosphere Sprint 66, 
Mesosphere Sprint 67, Mesosphere Sprint 68, Mesosphere Sprint 69)

> Process offer operations in storage local resource provider
> ---
>
> Key: MESOS-8108
> URL: https://issues.apache.org/jira/browse/MESOS-8108
> Project: Mesos
>  Issue Type: Task
>  Components: storage
>Reporter: Chun-Hung Hsiao
>Assignee: Chun-Hung Hsiao
>  Labels: storage
>
> The storage local resource provider receives offer operations for 
> reservations and resource conversions, and invoke proper CSI calls to 
> implement these operations.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7361) Command checks via agent pollute agent logs.

2017-12-08 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-7361:
--
Sprint: Mesosphere Sprint 68, Mesosphere Sprint 69, Mesosphere Sprint 70  
(was: Mesosphere Sprint 68, Mesosphere Sprint 69)

> Command checks via agent pollute agent logs.
> 
>
> Key: MESOS-7361
> URL: https://issues.apache.org/jira/browse/MESOS-7361
> Project: Mesos
>  Issue Type: Improvement
>  Components: agent
>Reporter: Alexander Rukletsov
>Assignee: Armand Grillet
>  Labels: check, health-check, mesosphere
>
> Command checks via agent leverage debug container API of the agent to start 
> checks. Each such invocation triggers a bunch of logs on the agent, because 
> the API was not originally designed with periodic invocations in mind. We 
> should find a way to avoid excessive logging on the agent.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8221) Use protobuf reflection to simplify downgrading of resources.

2017-12-08 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-8221:
--
Sprint: Mesosphere Sprint 68, Mesosphere Sprint 69, Mesosphere Sprint 70  
(was: Mesosphere Sprint 68, Mesosphere Sprint 69)

> Use protobuf reflection to simplify downgrading of resources.
> -
>
> Key: MESOS-8221
> URL: https://issues.apache.org/jira/browse/MESOS-8221
> Project: Mesos
>  Issue Type: Improvement
>  Components: agent
>Reporter: Michael Park
>Assignee: Michael Park
>
> We currently have a {{downgradeResources}} function which is called on every
> {{repeated Resource}} field in every message that we checkpoint. We should 
> leverage
> protobuf reflection to automatically downgrade any instances of {{Resource}} 
> within any
> protobuf message.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8144) Add a mock resource provider manager.

2017-12-08 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-8144:
--
Sprint: Mesosphere Sprint 66, Mesosphere Sprint 67, Mesosphere Sprint 68, 
Mesosphere Sprint 69, Mesosphere Sprint 70  (was: Mesosphere Sprint 66, 
Mesosphere Sprint 67, Mesosphere Sprint 68, Mesosphere Sprint 69)

> Add a mock resource provider manager.
> -
>
> Key: MESOS-8144
> URL: https://issues.apache.org/jira/browse/MESOS-8144
> Project: Mesos
>  Issue Type: Task
>Reporter: Chun-Hung Hsiao
>Assignee: Chun-Hung Hsiao
>  Labels: storage
>
> To test a storage local resource provider, we need to inject a mock resource 
> provider manager such that:
> 1. A full agent will start during the test so the resource provider can 
> launch standalone containers for CSI plugins.
> 2. We can inject offer operations through the mock manager to test the 
> resource provider.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7506) Multiple tests leave orphan containers.

2017-12-08 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-7506:
--
Sprint: Mesosphere Sprint 66, Mesosphere Sprint 67, Mesosphere Sprint 68, 
Mesosphere Sprint 69, Mesosphere Sprint 70  (was: Mesosphere Sprint 66, 
Mesosphere Sprint 67, Mesosphere Sprint 68, Mesosphere Sprint 69)

> Multiple tests leave orphan containers.
> ---
>
> Key: MESOS-7506
> URL: https://issues.apache.org/jira/browse/MESOS-7506
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
> Environment: Ubuntu 16.04
> Fedora 23
> other Linux distros
>Reporter: Alexander Rukletsov
>Assignee: Andrei Budnik
>  Labels: containerizer, flaky-test, mesosphere
> Attachments: KillMultipleTasks-badrun.txt, 
> ROOT_IsolatorFlags-badrun.txt, ResourceLimitation-badrun.txt, 
> ResourceLimitation-badrun2.txt, 
> RestartSlaveRequireExecutorAuthentication-badrun.txt, 
> TaskWithFileURI-badrun.txt
>
>
> I've observed a number of flaky tests that leave orphan containers upon 
> cleanup. A typical log looks like this:
> {noformat}
> ../../src/tests/cluster.cpp:580: Failure
> Value of: containers->empty()
>   Actual: false
> Expected: true
> Failed to destroy containers: { da3e8aa8-98e7-4e72-a8fd-5d0bae960014 }
> {noformat}
> All currently affected tests:
> {noformat}
> SlaveTest.RestartSlaveRequireExecutorAuthentication
> LinuxCapabilitiesIsolatorFlagsTest.ROOT_IsolatorFlags
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8309) Introduce a UUID message type

2017-12-08 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-8309:
--
Sprint: Mesosphere Sprint 69, Mesosphere Sprint 70  (was: Mesosphere Sprint 
69)

> Introduce a UUID message type
> -
>
> Key: MESOS-8309
> URL: https://issues.apache.org/jira/browse/MESOS-8309
> Project: Mesos
>  Issue Type: Task
>Reporter: Jan Schlicht
>Assignee: Jan Schlicht
>  Labels: mesosphere
> Fix For: 1.5.0
>
>
> Currently when UUID need to be part of a protobuf message, we use a byte 
> array field for that. This has some drawbacks, especially when it comes to 
> outputting the UUID in logs: To stringify the UUID field, we first have to 
> create a stout UUID, then call {{.toString()}} of that one. It would help to 
> have a UUID type in {{mesos.proto}} and provide a stringification function 
> for it in {{type_utils.hpp}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8096) Enqueueing events in MockHTTPScheduler can lead to segfaults.

2017-12-08 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-8096:
--
Sprint: Mesosphere Sprint 66, Mesosphere Sprint 67, Mesosphere Sprint 68, 
Mesosphere Sprint 69, Mesosphere Sprint 70  (was: Mesosphere Sprint 66, 
Mesosphere Sprint 67, Mesosphere Sprint 68, Mesosphere Sprint 69)

> Enqueueing events in MockHTTPScheduler can lead to segfaults.
> -
>
> Key: MESOS-8096
> URL: https://issues.apache.org/jira/browse/MESOS-8096
> Project: Mesos
>  Issue Type: Bug
>  Components: scheduler driver, test
> Environment: Fedora 23, Ubuntu 14.04, Ubuntu 16
>Reporter: Alexander Rukletsov
>Assignee: Alexander Rukletsov
>  Labels: flaky-test, mesosphere
> Attachments: AsyncExecutorProcess-badrun-1.txt, 
> AsyncExecutorProcess-badrun-2.txt, AsyncExecutorProcess-badrun-3.txt, 
> scheduler-shutdown-invalid-driver.txt
>
>
> Various tests segfault due to a yet unknown reason. Comparing logs (attached) 
> hints that the problem might be in the scheduler's event queue.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8070) Bundled GRPC build does not build on Debian 8

2017-12-08 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-8070:
--
Sprint: Mesosphere Sprint 65, Mesosphere Sprint 66, Mesosphere Sprint 67, 
Mesosphere Sprint 68, Mesosphere Sprint 69, Mesosphere Sprint 70  (was: 
Mesosphere Sprint 65, Mesosphere Sprint 66, Mesosphere Sprint 67, Mesosphere 
Sprint 68, Mesosphere Sprint 69)

> Bundled GRPC build does not build on Debian 8
> -
>
> Key: MESOS-8070
> URL: https://issues.apache.org/jira/browse/MESOS-8070
> Project: Mesos
>  Issue Type: Bug
>Reporter: Zhitao Li
>Assignee: Chun-Hung Hsiao
> Fix For: 1.5.0
>
>
> Debian 8 includes an outdated version of libc-ares-dev, which prevents 
> bundled GRPC to build.
> I believe [~chhsia0] already has a fix.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8032) Launch CSI plugins in storage local resource provider.

2017-12-08 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-8032:
--
Sprint: Mesosphere Sprint 64, Mesosphere Sprint 65, Mesosphere Sprint 66, 
Mesosphere Sprint 67, Mesosphere Sprint 68, Mesosphere Sprint 69, Mesosphere 
Sprint 70  (was: Mesosphere Sprint 64, Mesosphere Sprint 65, Mesosphere Sprint 
66, Mesosphere Sprint 67, Mesosphere Sprint 68, Mesosphere Sprint 69)

> Launch CSI plugins in storage local resource provider.
> --
>
> Key: MESOS-8032
> URL: https://issues.apache.org/jira/browse/MESOS-8032
> Project: Mesos
>  Issue Type: Task
>  Components: storage
>Reporter: Chun-Hung Hsiao
>Assignee: Chun-Hung Hsiao
>  Labels: mesosphere, storage
>
> Launching a CSI plugin requires the following steps:
> 1. Verify the configuration.
> 2. Prepare a directory in the work directory of the resource provider where 
> the socket file should be placed, and construct the path of the socket file.
> 3. If the socket file already exists and the plugin is already running, we 
> should not launch another plugin instance.
> 4. Otherwise, launch a standalone container to run the plugin and connect to 
> it through the socket file.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7790) Design hierarchical quota allocation.

2017-12-08 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-7790:
--
Sprint: Mesosphere Sprint 66, Mesosphere Sprint 67, Mesosphere Sprint 68, 
Mesosphere Sprint 69, Mesosphere Sprint 70  (was: Mesosphere Sprint 66, 
Mesosphere Sprint 67, Mesosphere Sprint 68, Mesosphere Sprint 69)

> Design hierarchical quota allocation.
> -
>
> Key: MESOS-7790
> URL: https://issues.apache.org/jira/browse/MESOS-7790
> Project: Mesos
>  Issue Type: Task
>  Components: allocation
>Reporter: Benjamin Mahler
>Assignee: Michael Park
>  Labels: multitenancy
>
> When quota is assigned in the role hierarchy (see MESOS-6375), it's possible 
> for there to be "undelegated" quota for a role. For example:
> {noformat}
> ^
>   /   \
> /   \
>eng (90 cpus)   sales (10 cpus)
>  ^
>/   \
>  /   \
>  ads (50 cpus)   build (10 cpus)
> {noformat}
> Here, the "eng" role has 60 of its 90 cpus of quota delegated to its 
> children, and 30 cpus remain undelegated. We need to design how to allocate 
> these 30 cpus undelegated cpus. Are they allocated entirely to the "eng" 
> role? Are they allocated to the "eng" role tree? If so, how do we determine 
> how much is allocated to each role in the "eng" tree (i.e. "eng", "eng/ads", 
> "eng/build").



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8101) Import resources from CSI plugins in storage local resource provider.

2017-12-08 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-8101:
--
Sprint: Mesosphere Sprint 66, Mesosphere Sprint 67, Mesosphere Sprint 68, 
Mesosphere Sprint 69, Mesosphere Sprint 70  (was: Mesosphere Sprint 66, 
Mesosphere Sprint 67, Mesosphere Sprint 68, Mesosphere Sprint 69)

> Import resources from CSI plugins in storage local resource provider.
> -
>
> Key: MESOS-8101
> URL: https://issues.apache.org/jira/browse/MESOS-8101
> Project: Mesos
>  Issue Type: Task
>  Components: storage
>Reporter: Chun-Hung Hsiao
>Assignee: Chun-Hung Hsiao
>  Labels: mesosphere, storage
>
> The following lists the steps to import resources from a CSI plugin:
> 1. Launch the node plugin
> 1.1 GetSupportedVersions
> 1.2 GetPluginInfo
> 1.3 ProbeNode
> 1.4 GetNodeCapabilities
> 2. Launch the controller plugin
> 2.1 GetSuportedVersions
> 2.2 GetPluginInfo
> 2.3 GetControllerCapabilities
> 3. GetCapacity
> 4. ListVolumes
> 5. Report to the resource provider through UPDATE_TOTAL_RESOURCES



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8102) Add a test CSI plugin for storage local resource provider.

2017-12-08 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-8102:
--
Sprint: Mesosphere Sprint 66, Mesosphere Sprint 67, Mesosphere Sprint 68, 
Mesosphere Sprint 69, Mesosphere Sprint 70  (was: Mesosphere Sprint 66, 
Mesosphere Sprint 67, Mesosphere Sprint 68, Mesosphere Sprint 69)

> Add a test CSI plugin for storage local resource provider.
> --
>
> Key: MESOS-8102
> URL: https://issues.apache.org/jira/browse/MESOS-8102
> Project: Mesos
>  Issue Type: Task
>  Components: test
>Reporter: Chun-Hung Hsiao
>Assignee: Chun-Hung Hsiao
>  Labels: mesosphere, storage
>
> We need a dummy CSI plugin for testing storage local resoure providers. The 
> test CSI plugin would just create subdirectories under its working 
> directories to mimic the behavior of creating volumes, then bind-mount those 
> volumes to mimic publish.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-4527) Roles can exceed limit allocation via reservations.

2017-12-08 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-4527:
--
Sprint: Mesosphere Sprint 27, Mesosphere Sprint 69, Mesosphere Sprint 70  
(was: Mesosphere Sprint 27, Mesosphere Sprint 69)

> Roles can exceed limit allocation via reservations.
> ---
>
> Key: MESOS-4527
> URL: https://issues.apache.org/jira/browse/MESOS-4527
> Project: Mesos
>  Issue Type: Task
>  Components: allocation
>Reporter: Michael Park
>Assignee: Meng Zhu
>  Labels: mesosphere, multitenancy
>
> Since unallocated reservations are not accounted towards the guarantee (which 
> today is also a limit), we might exceed the limit.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8184) Implement master's AcknowledgeOfferOperationMessage handler.

2017-12-08 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-8184:
--
Sprint: Mesosphere Sprint 68, Mesosphere Sprint 69, Mesosphere Sprint 70  
(was: Mesosphere Sprint 68, Mesosphere Sprint 69)

> Implement master's AcknowledgeOfferOperationMessage handler.
> 
>
> Key: MESOS-8184
> URL: https://issues.apache.org/jira/browse/MESOS-8184
> Project: Mesos
>  Issue Type: Task
>Reporter: Gastón Kleiman
>  Labels: mesosphere
>
> This handler should validate the message and forward it to the corresponding 
> agent/ERP.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (MESOS-8312) Pass resource provider information to master as part of UpdateSlaveMessage

2017-12-07 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B reassigned MESOS-8312:
-

Assignee: Benjamin Bannier

> Pass resource provider information to master as part of UpdateSlaveMessage
> --
>
> Key: MESOS-8312
> URL: https://issues.apache.org/jira/browse/MESOS-8312
> Project: Mesos
>  Issue Type: Bug
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>
> We extended {{UpdateSlaveMessage}} so updates to an agent's total resources 
> from resource providers are possible. We realized that will need to 
> explicitly pass resource provider details (here for now: 
> {{ResourceProviderInfo}}) to the master so it can be queried for the 
> providers present on certain agents. This should happen as part of 
> {{UpdateSlaveMessage}} so a single synchronization channel is used for this 
> kind of information.
> We need to adjust {{UpdateSlaveMessage}} for these requirements. This should 
> happen before 1.5.0 gets released so we do not need to deprecate a never 
> really used message format.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8309) Introduce a UUID message type

2017-12-07 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-8309:
--
Target Version/s: 1.5.0

> Introduce a UUID message type
> -
>
> Key: MESOS-8309
> URL: https://issues.apache.org/jira/browse/MESOS-8309
> Project: Mesos
>  Issue Type: Task
>Reporter: Jan Schlicht
>Assignee: Jan Schlicht
>  Labels: mesosphere
> Fix For: 1.5.0
>
>
> Currently when UUID need to be part of a protobuf message, we use a byte 
> array field for that. This has some drawbacks, especially when it comes to 
> outputting the UUID in logs: To stringify the UUID field, we first have to 
> create a stout UUID, then call {{.toString()}} of that one. It would help to 
> have a UUID type in {{mesos.proto}} and provide a stringification function 
> for it in {{type_utils.hpp}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-8244) Add operator API to reload local resource providers.

2017-12-06 Thread Adam B (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16281461#comment-16281461
 ] 

Adam B commented on MESOS-8244:
---

[~chhsia0] Is this in progress or in review yet?

> Add operator API to reload local resource providers.
> 
>
> Key: MESOS-8244
> URL: https://issues.apache.org/jira/browse/MESOS-8244
> Project: Mesos
>  Issue Type: Task
>Reporter: Chun-Hung Hsiao
>Assignee: Chun-Hung Hsiao
>  Labels: mesosphere, storage
>
> To add, remove and update local resource providers on the fly more 
> conveniently and without restarting agents, we would like to introduce new 
> operator API to add new config files in the resource provider config 
> directory and trigger a reload for the resource provider.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7330) Add resource provider to offer

2017-12-06 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-7330:
--
Labels: csi-post-mvp external-resources mesosphere storage  (was: 
external-resources mesosphere storage)

> Add resource provider to offer
> --
>
> Key: MESOS-7330
> URL: https://issues.apache.org/jira/browse/MESOS-7330
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Benjamin Bannier
>Priority: Minor
>  Labels: csi-post-mvp, external-resources, mesosphere, storage
>
> In order to introduce external resource providers we need to add an 
> {{optional}} resource provider field to the {{Offer}} message which can be 
> used to unambiguously identify the provider. In addition, the existing 
> {{slave_id}} will become {{optional}} with the requirement that either 
> {{slave_id}} or {{resource_provider_id}} is set,
> {code}
> message Offer {
>   // ..
>   optional SlaveID slave_id = 3;
>   optional ResourceProviderID resource_provider_id = 11;
>   // ..
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7523) Whitelist devices in bulk on a per-container basis

2017-12-06 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-7523:
--
Labels: csi-post-mvp mesosphere storage  (was: mesosphere storage)

> Whitelist devices in bulk on a per-container basis
> --
>
> Key: MESOS-7523
> URL: https://issues.apache.org/jira/browse/MESOS-7523
> Project: Mesos
>  Issue Type: Bug
>Reporter: James DeFelice
>  Labels: csi-post-mvp, mesosphere, storage
>
> Continuation of the work in MESOS-6791
> It should be possible to whitelist a range (R) of devices such that R may be 
> exposed to a container launched by an agent. Not all containers should have 
> access to R by default, only those containers whose ContainerInfo specifies 
> such access.
> For example, it may be useful to whitelist the range of devices matching the 
> glob expressions `/dev/\{s,h,xv}d\[a-z]*` and `/dev/dm-\*` and 
> `/dev/mapper/\*` for a container that intends to manage storage devices.
> /cc [~jieyu]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7303) Support Isolator capabilities.

2017-12-06 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-7303:
--
Target Version/s: 1.5.0

> Support Isolator capabilities.
> --
>
> Key: MESOS-7303
> URL: https://issues.apache.org/jira/browse/MESOS-7303
> Project: Mesos
>  Issue Type: Task
>  Components: containerization
>Reporter: Jie Yu
>Assignee: Joseph Wu
>  Labels: mesosphere, storage
>
> Currently, isolators have one capability: whether it supports nesting or not. 
> To support launching containers that are not tied to Mesos tasks or executors 
> (standalone containers), we need to add another capability to the Isolator 
> interface so that we can avoid invoking those isolators that are not yet 
> support that when launching standalone containers.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7330) Add resource provider to offer

2017-12-06 Thread Adam B (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16281443#comment-16281443
 ] 

Adam B commented on MESOS-7330:
---

[~bbannier] Is this completed already? If so, please close the ticket with 
FixVersion 1.5.0 linking to the commits.
If not, please set the TargetVersion for 1.5.0 (now!), 1.6.0, or close it as 
Won't Do.

> Add resource provider to offer
> --
>
> Key: MESOS-7330
> URL: https://issues.apache.org/jira/browse/MESOS-7330
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Benjamin Bannier
>Priority: Minor
>  Labels: external-resources, mesosphere, storage
>
> In order to introduce external resource providers we need to add an 
> {{optional}} resource provider field to the {{Offer}} message which can be 
> used to unambiguously identify the provider. In addition, the existing 
> {{slave_id}} will become {{optional}} with the requirement that either 
> {{slave_id}} or {{resource_provider_id}} is set,
> {code}
> message Offer {
>   // ..
>   optional SlaveID slave_id = 3;
>   optional ResourceProviderID resource_provider_id = 11;
>   // ..
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7523) Whitelist devices in bulk on a per-container basis

2017-12-06 Thread Adam B (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16281445#comment-16281445
 ] 

Adam B commented on MESOS-7523:
---

[~jieyu] Is this completed already? If so, please close the ticket with 
FixVersion 1.5.0 linking to the commits.
If not, please set the TargetVersion for 1.5.0 (now!), 1.6.0, or close it as 
Won't Do.

> Whitelist devices in bulk on a per-container basis
> --
>
> Key: MESOS-7523
> URL: https://issues.apache.org/jira/browse/MESOS-7523
> Project: Mesos
>  Issue Type: Bug
>Reporter: James DeFelice
>  Labels: mesosphere, storage
>
> Continuation of the work in MESOS-6791
> It should be possible to whitelist a range (R) of devices such that R may be 
> exposed to a container launched by an agent. Not all containers should have 
> access to R by default, only those containers whose ContainerInfo specifies 
> such access.
> For example, it may be useful to whitelist the range of devices matching the 
> glob expressions `/dev/\{s,h,xv}d\[a-z]*` and `/dev/dm-\*` and 
> `/dev/mapper/\*` for a container that intends to manage storage devices.
> /cc [~jieyu]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8282) Take pending offer operations into account when calculating framework allocated resources

2017-12-06 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-8282:
--
Labels: csi-post-mvp  (was: )

> Take pending offer operations into account when calculating framework 
> allocated resources
> -
>
> Key: MESOS-8282
> URL: https://issues.apache.org/jira/browse/MESOS-8282
> Project: Mesos
>  Issue Type: Bug
>  Components: agent
>Reporter: Benjamin Bannier
>  Labels: csi-post-mvp
>
> When calculating a framework's allocated resources in the agent (currently 
> used just in endpoints) in {{slave.cpp}}'s {{Framework::allocatedResources}} 
> we currently only take tasks into account. We should update this code to take 
> non-terminal, non-speculated offer operations for the framework into account.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-8086) Update ACCEPT call handler in master for new operations.

2017-12-06 Thread Adam B (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16281439#comment-16281439
 ] 

Adam B commented on MESOS-8086:
---

[~nfnt] Is this completed already? If so, please close the ticket with 
FixVersion 1.5.0 linking to the commits.
If not, please set the TargetVersion for 1.5.0 (now!), 1.6.0, or close it as 
Won't Do.

> Update ACCEPT call handler in master for new operations.
> 
>
> Key: MESOS-8086
> URL: https://issues.apache.org/jira/browse/MESOS-8086
> Project: Mesos
>  Issue Type: Task
>Reporter: Jie Yu
>  Labels: storage
>
> Please follow the details here
> https://docs.google.com/document/d/1RrrLVATZUyaURpEOeGjgxA6ccshuLo94G678IbL-Yco/edit#
> There will be a difference in terms of old and new operations. Also, the 
> message we went to the agent depends on the agent capability.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7328) Validate offer operations for converting disk resources

2017-12-06 Thread Adam B (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16281438#comment-16281438
 ] 

Adam B commented on MESOS-7328:
---

[~nfnt] Is this completed already? If so, please close the ticket with 
FixVersion 1.5.0 linking to the commits.
If not, please set the TargetVersion for 1.5.0 (now!), 1.6.0, or close it as 
Won't Do.

> Validate offer operations for converting disk resources
> ---
>
> Key: MESOS-7328
> URL: https://issues.apache.org/jira/browse/MESOS-7328
> Project: Mesos
>  Issue Type: Task
>  Components: master
>Reporter: Jan Schlicht
>  Labels: mesosphere, storage, validation
>
> Validation logic for the new operations created in MESOS-7314 needs to be 
> implemented. E.g. a {{CREATE_VOLUME}} operation must only use disk resources 
> with a {{RAW}} source



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7553) Distinguish between different resource provider states in RP Manager.

2017-12-06 Thread Adam B (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16281437#comment-16281437
 ] 

Adam B commented on MESOS-7553:
---

[~nfnt] Is this completed already? If so, please close the ticket with 
FixVersion 1.5.0 linking to the commits.
If not, please set the TargetVersion for 1.5.0 (now!), 1.6.0, or close it as 
Won't Do.

> Distinguish between different resource provider states in RP Manager.
> -
>
> Key: MESOS-7553
> URL: https://issues.apache.org/jira/browse/MESOS-7553
> Project: Mesos
>  Issue Type: Task
>  Components: master
>Reporter: Jan Schlicht
>  Labels: mesosphere, storage
>
> In preparation to support time-outs for resource provider re-registrations, 
> the RP manager needs to be able to distinguish between registered, 
> unreachable and gone resource providers, so that resources aren't offered 
> when not registered. For that, internal resource provider states have to be 
> added to the RP manager, as it is already implemented for agents (i.e. the 
> {{completed}}, {{registered}}, {{removed}} maps in {{master.cpp}}).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7556) Wait for resource provider re-registrations after an agent failover

2017-12-06 Thread Adam B (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16281428#comment-16281428
 ] 

Adam B commented on MESOS-7556:
---

[~nfnt] Is this completed already? If so, please close the ticket with 
FixVersion 1.5.0 linking to the commits.
If not, please set the TargetVersion for 1.5.0 (now!), 1.6.0, or close it as 
Won't Do.

> Wait for resource provider re-registrations after an agent failover
> ---
>
> Key: MESOS-7556
> URL: https://issues.apache.org/jira/browse/MESOS-7556
> Project: Mesos
>  Issue Type: Task
>  Components: master
>Reporter: Jan Schlicht
>  Labels: mesosphere, storage
>
> Recover all resource provider IDs from registrar after a failover and set up 
> timeouts for resource providers to re-register.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8181) Add tests that a failed offer operation on resource provider resources leads to a clock update

2017-12-06 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-8181:
--
Labels: csi-post-mvp storage  (was: storage)

> Add tests that a failed offer operation on resource provider resources leads 
> to a clock update
> --
>
> Key: MESOS-8181
> URL: https://issues.apache.org/jira/browse/MESOS-8181
> Project: Mesos
>  Issue Type: Task
>  Components: agent, master, test
>Reporter: Benjamin Bannier
>  Labels: csi-post-mvp, storage
>
> When an offer operation fails in a resource provider, the resource provider 
> is expected to update its internal resource version and communicate the 
> change back to the RP manager/agent which in turn will propagate it to the 
> master.
> We should add tests ensuring the we propagate this update all the way.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7533) Add a function stub for resource provider re-registration

2017-12-06 Thread Adam B (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16281427#comment-16281427
 ] 

Adam B commented on MESOS-7533:
---

[~nfnt] Is this completed already? If so, please close the ticket with 
FixVersion 1.5.0 linking to the commits.
If not, please set the TargetVersion for 1.5.0 (now!), 1.6.0, or close it as 
Won't Do.

> Add a function stub for resource provider re-registration
> -
>
> Key: MESOS-7533
> URL: https://issues.apache.org/jira/browse/MESOS-7533
> Project: Mesos
>  Issue Type: Task
>Reporter: Jan Schlicht
>  Labels: mesosphere, storage
>
> In case there is a resource provider failover it is expected that a resource 
> provider will re-register with the master using its ID that was provided by 
> the master. A function needs to be added to the master to support this. I.e. 
> if a resource provider {{SUBSCRIBE}}s using a {{ResourceProviderInfo}} that 
> contains an ID, check if that was already registered. A later task would be 
> to implement resource reconciliation.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7309) Support specifying devices for a container.

2017-12-06 Thread Adam B (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16281421#comment-16281421
 ] 

Adam B commented on MESOS-7309:
---

[~jieyu] Is this completed already? If so, please close the ticket with 
FixVersion 1.5.0 linking to the commits.
If not, please set the TargetVersion for 1.5.0 (now!), 1.6.0, or close it as 
Won't Do.

> Support specifying devices for a container.
> ---
>
> Key: MESOS-7309
> URL: https://issues.apache.org/jira/browse/MESOS-7309
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization
>Reporter: Jie Yu
>  Labels: mesosphere, storage
>
> Some container requires some devices to be available in the container (e.g., 
> /dev/fuse). Currently, the default devices are hard coded if the rootfs image 
> is specified for the container.
> We should allow frameworks to specify additional devices that will be made 
> available to the container. Besides bind mount the device file, the devices 
> cgroup needs to be configured properly to allow access to that device.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7537) Add functionality to disconnect resource providers in the master

2017-12-06 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-7537:
--
Labels: csi-post-mvp external-resources mesosphere storage  (was: 
external-resources mesosphere storage)

> Add functionality to disconnect resource providers in the master
> 
>
> Key: MESOS-7537
> URL: https://issues.apache.org/jira/browse/MESOS-7537
> Project: Mesos
>  Issue Type: Task
>Reporter: Jan Schlicht
>  Labels: csi-post-mvp, external-resources, mesosphere, storage
>
> Similar to the existing {{disconnect}} methods for frameworks and agents, a 
> similar function has to be added to the master.
> It needs to be called in {{Master::exited}}, i.e. when it detects that a 
> resource provider is no longer reachable.
> For local resource providers this also has to be called when the agent 
> disconnects where these are running on.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7554) Add a re-registration timeout for resource providers

2017-12-06 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-7554:
--
Labels: csi-post-mvp mesosphere storage  (was: mesosphere storage)

> Add a re-registration timeout for resource providers
> 
>
> Key: MESOS-7554
> URL: https://issues.apache.org/jira/browse/MESOS-7554
> Project: Mesos
>  Issue Type: Task
>  Components: master
>Reporter: Jan Schlicht
>  Labels: csi-post-mvp, mesosphere, storage
>
> This re-registration timeout will be started when a resource provider seems 
> to have disconnected, similar to how it's done for agents. While waiting for 
> the resource provider to reconnect, it will be deactivated. On 
> re-registration the timeout will be canceled and the resource provider 
> activated again. In case of a timeout, the internal state will be changed to 
> {{unreachable}} (as it is for agents in that situation) and considered gone.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7535) Distinguish between active and inactive resource providers in RP Manager

2017-12-06 Thread Adam B (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16281415#comment-16281415
 ] 

Adam B commented on MESOS-7535:
---

[~nfnt], Is this completed already? If so, please close the ticket with 
FixVersion 1.5.0 linking to the commits.
If not, please set the TargetVersion for 1.5.0 (now!), 1.6.0, or close it as 
Won't Do.

> Distinguish between active and inactive resource providers in RP Manager
> 
>
> Key: MESOS-7535
> URL: https://issues.apache.org/jira/browse/MESOS-7535
> Project: Mesos
>  Issue Type: Task
>Reporter: Jan Schlicht
>  Labels: mesosphere, storage
>
> To support re-registration with the master after a resource provider 
> failover, the master should be able to distinguish between active and 
> inactive resource providers. In the case that a resource provider disconnects 
> (handled in a different ticket), it would be marked as inactive until it 
> re-registers. While being inactive, resource of it won't be offered.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-8088) Introduce Lamport timestamp for offer operations.

2017-12-06 Thread Adam B (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16281414#comment-16281414
 ] 

Adam B commented on MESOS-8088:
---

[~bbannier] Is this completed already? If so, please close the ticket with 
FixVersion 1.5.0.
If not, please set the TargetVersion for 1.5.0 (now!), 1.6.0, or close it as 
Won't Do.


> Introduce Lamport timestamp for offer operations.
> -
>
> Key: MESOS-8088
> URL: https://issues.apache.org/jira/browse/MESOS-8088
> Project: Mesos
>  Issue Type: Task
>Reporter: Jie Yu
>  Labels: storage
>
> We need to use Lamport clock 
> (https://en.wikipedia.org/wiki/Lamport_timestamps) to establish partial 
> ordering between offer operation, and the resources the operation is 
> operating on.
> It is used to establish happens before relations so that RPs can reject those 
> operations that applies to a stale snapshot of the resources due to 
> speculation failures.
> See more details in this doc:
> https://docs.google.com/document/d/1RrrLVATZUyaURpEOeGjgxA6ccshuLo94G678IbL-Yco/edit#
> Given that the Lamport clock needs to be transferred between agent and 
> masters, it needs to be serialized to protobuf. We probably needs to define 
> the following methods for it:
> ```
> merge(...); // Take a max between the two.
> increment();
> operation<(...);
> copy and assignment operator
> ```



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7536) Test that a resource provider failover results in a re-registration with the master

2017-12-06 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-7536:
--
Labels: csi-post-mvp mesosphere storage  (was: mesosphere storage)

> Test that a resource provider failover results in a re-registration with the 
> master
> ---
>
> Key: MESOS-7536
> URL: https://issues.apache.org/jira/browse/MESOS-7536
> Project: Mesos
>  Issue Type: Task
>Reporter: Jan Schlicht
>  Labels: csi-post-mvp, mesosphere, storage
>
> Multiple scenarios need to be tested:
> * A failover of a local resource provider will lead to a re-registration
> * A failover of an agent that a running a local resource provider will lead 
> to a re-registration of the local resource provider (following a 
> re-registration of the agent) 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7388) Update allocator interfaces to support resource providers

2017-12-06 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-7388:
--
Target Version/s: 1.5.0

> Update allocator interfaces to support resource providers
> -
>
> Key: MESOS-7388
> URL: https://issues.apache.org/jira/browse/MESOS-7388
> Project: Mesos
>  Issue Type: Task
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>  Labels: storage
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8278) Mesos Containerizer cannot recover due to check failure.

2017-12-06 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-8278:
--
Labels: containerizer csi-post-mvp standalone  (was: containerizer 
standalone)

> Mesos Containerizer cannot recover due to check failure.
> 
>
> Key: MESOS-8278
> URL: https://issues.apache.org/jira/browse/MESOS-8278
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Reporter: Gilbert Song
>Priority: Critical
>  Labels: containerizer, csi-post-mvp, standalone
>
> Mesos containerizer cannot recover due to a check failure on nested 
> container's sandbox directory.
> {noformat}
> I1129 22:00:42.556479  5812 containerizer.cpp:670] Recovering containerizer
> F1129 22:00:42.560739  5812 containerizer.cpp:912] CHECK_SOME(directory): is 
> NONE 
> *** Check failure stack trace: ***
> @ 0x7f7e6cf1294d  google::LogMessage::Fail()
> @ 0x7f7e6cf11d1e  google::LogMessage::SendToLog()
> @ 0x7f7e6cf1261d  google::LogMessage::Flush()
> @ 0x7f7e6cf15a98  google::LogMessageFatal::~LogMessageFatal()
> @ 0x55ca72a95197  _CheckFatal::~_CheckFatal()
> @ 0x7f7e6bb23770  
> mesos::internal::slave::MesosContainerizerProcess::recover()
> @ 0x7f7e6bbe643c  
> _ZZN7process8dispatchI7NothingN5mesos8internal5slave25MesosContainerizerProcessERK6OptionINS4_5state10SlaveStateEESB_EENS_6FutureIT_EERKNS_3PIDIT0_EEMSG_FSE_T1_EOT2_ENKUlRS9_PNS_11ProcessBaseEE_clESP_SR_
> @ 0x7f7e6bbe6295  
> _ZNSt5_BindIFZN7process8dispatchI7NothingN5mesos8internal5slave25MesosContainerizerProcessERK6OptionINS5_5state10SlaveStateEESC_EENS0_6FutureIT_EERKNS0_3PIDIT0_EEMSH_FSF_T1_EOT2_EUlRSA_PNS0_11ProcessBaseEE_SA_St12_PlaceholderILi16__callIvJOSS_EJLm0ELm1SE_OSt5tupleIJDpT0_EESt12_Index_tupleIJXspT1_EEE
> @ 0x7f7e6bbe61f6  
> _ZNSt5_BindIFZN7process8dispatchI7NothingN5mesos8internal5slave25MesosContainerizerProcessERK6OptionINS5_5state10SlaveStateEESC_EENS0_6FutureIT_EERKNS0_3PIDIT0_EEMSH_FSF_T1_EOT2_EUlRSA_PNS0_11ProcessBaseEE_SA_St12_PlaceholderILi1clIJSS_EvEESH_DpOT_
> @ 0x7f7e6bbe5f02  
> _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEESt5_BindIFZNS0_8dispatchI7NothingN5mesos8internal5slave25MesosContainerizerProcessERK6OptionINS9_5state10SlaveStateEESG_EENS0_6FutureIT_EERKNS0_3PIDIT0_EEMSL_FSJ_T1_EOT2_EUlRSE_S2_E_SE_St12_PlaceholderILi1E9_M_invokeERKSt9_Any_dataOS2_
> @ 0x7f7e6ce37cf4  std::function<>::operator()()
> @ 0x7f7e6ce1ded4  process::ProcessBase::visit()
> @ 0x7f7e6cea38fe  process::DispatchEvent::visit()
> @ 0x7f7e6a9741b1  process::ProcessBase::serve()
> @ 0x7f7e6ce1a8eb  process::ProcessManager::resume()
> @ 0x7f7e6ce2b86e  
> process::ProcessManager::init_threads()::$_7::operator()()
> @ 0x7f7e6ce2b715  
> _ZNSt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvE3$_7vEE9_M_invokeIJEEEvSt12_Index_tupleIJXspT_EEE
> @ 0x7f7e6ce2b6e5  std::_Bind_simple<>::operator()()
> @ 0x7f7e6ce2b6bc  std::thread::_Impl<>::_M_run()
> @ 0x7f7e6617d030  (unknown)
> @ 0x7f7e65c966aa  start_thread
> @ 0x7f7e659cbe9d  (unknown)
> {noformat}
> Maybe related to the change of standalone container support.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7329) Authorize offer operations for converting disk resources

2017-12-06 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-7329:
--
Labels: csi-post-mvp mesosphere security storage  (was: mesosphere security 
storage)

> Authorize offer operations for converting disk resources
> 
>
> Key: MESOS-7329
> URL: https://issues.apache.org/jira/browse/MESOS-7329
> Project: Mesos
>  Issue Type: Task
>  Components: master, security
>Reporter: Jan Schlicht
>  Labels: csi-post-mvp, mesosphere, security, storage
>
> All offer operations are authorized, hence authorization logic has to be 
> added to new offer operations as well.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7557) Test that resource providers can re-register after agent fails over.

2017-12-06 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-7557:
--
Labels: csi-post-mvp mesosphere storage  (was: mesosphere storage)

> Test that resource providers can re-register after agent fails over.
> 
>
> Key: MESOS-7557
> URL: https://issues.apache.org/jira/browse/MESOS-7557
> Project: Mesos
>  Issue Type: Task
>Reporter: Jan Schlicht
>  Labels: csi-post-mvp, mesosphere, storage
>
> Restarting a master in a test environment should trigger a resource provider 
> re-registration.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-7854) Authorize resource calls to provider manager api

2017-12-06 Thread Adam B (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam B updated MESOS-7854:
--
Labels: csi-post-mvp mesosphere storage  (was: mesosphere storage)

> Authorize resource calls to provider manager api
> 
>
> Key: MESOS-7854
> URL: https://issues.apache.org/jira/browse/MESOS-7854
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Benjamin Bannier
>Priority: Critical
>  Labels: csi-post-mvp, mesosphere, storage
>
> The resource provider manager provides a function
> {code}
> process::Future api(
> const process::http::Request& request,
> const Option& principal) const;
> {code}
> which is expose e.g., as an agent endpoint.
> We need to add authorization to this function in order to e.g., stop rough 
> callers.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-8278) Mesos Containerizer cannot recover due to check failure.

2017-12-06 Thread Adam B (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16281410#comment-16281410
 ] 

Adam B commented on MESOS-8278:
---

cc: [~kaysoky]

> Mesos Containerizer cannot recover due to check failure.
> 
>
> Key: MESOS-8278
> URL: https://issues.apache.org/jira/browse/MESOS-8278
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Reporter: Gilbert Song
>Priority: Critical
>  Labels: containerizer, standalone
>
> Mesos containerizer cannot recover due to a check failure on nested 
> container's sandbox directory.
> {noformat}
> I1129 22:00:42.556479  5812 containerizer.cpp:670] Recovering containerizer
> F1129 22:00:42.560739  5812 containerizer.cpp:912] CHECK_SOME(directory): is 
> NONE 
> *** Check failure stack trace: ***
> @ 0x7f7e6cf1294d  google::LogMessage::Fail()
> @ 0x7f7e6cf11d1e  google::LogMessage::SendToLog()
> @ 0x7f7e6cf1261d  google::LogMessage::Flush()
> @ 0x7f7e6cf15a98  google::LogMessageFatal::~LogMessageFatal()
> @ 0x55ca72a95197  _CheckFatal::~_CheckFatal()
> @ 0x7f7e6bb23770  
> mesos::internal::slave::MesosContainerizerProcess::recover()
> @ 0x7f7e6bbe643c  
> _ZZN7process8dispatchI7NothingN5mesos8internal5slave25MesosContainerizerProcessERK6OptionINS4_5state10SlaveStateEESB_EENS_6FutureIT_EERKNS_3PIDIT0_EEMSG_FSE_T1_EOT2_ENKUlRS9_PNS_11ProcessBaseEE_clESP_SR_
> @ 0x7f7e6bbe6295  
> _ZNSt5_BindIFZN7process8dispatchI7NothingN5mesos8internal5slave25MesosContainerizerProcessERK6OptionINS5_5state10SlaveStateEESC_EENS0_6FutureIT_EERKNS0_3PIDIT0_EEMSH_FSF_T1_EOT2_EUlRSA_PNS0_11ProcessBaseEE_SA_St12_PlaceholderILi16__callIvJOSS_EJLm0ELm1SE_OSt5tupleIJDpT0_EESt12_Index_tupleIJXspT1_EEE
> @ 0x7f7e6bbe61f6  
> _ZNSt5_BindIFZN7process8dispatchI7NothingN5mesos8internal5slave25MesosContainerizerProcessERK6OptionINS5_5state10SlaveStateEESC_EENS0_6FutureIT_EERKNS0_3PIDIT0_EEMSH_FSF_T1_EOT2_EUlRSA_PNS0_11ProcessBaseEE_SA_St12_PlaceholderILi1clIJSS_EvEESH_DpOT_
> @ 0x7f7e6bbe5f02  
> _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEESt5_BindIFZNS0_8dispatchI7NothingN5mesos8internal5slave25MesosContainerizerProcessERK6OptionINS9_5state10SlaveStateEESG_EENS0_6FutureIT_EERKNS0_3PIDIT0_EEMSL_FSJ_T1_EOT2_EUlRSE_S2_E_SE_St12_PlaceholderILi1E9_M_invokeERKSt9_Any_dataOS2_
> @ 0x7f7e6ce37cf4  std::function<>::operator()()
> @ 0x7f7e6ce1ded4  process::ProcessBase::visit()
> @ 0x7f7e6cea38fe  process::DispatchEvent::visit()
> @ 0x7f7e6a9741b1  process::ProcessBase::serve()
> @ 0x7f7e6ce1a8eb  process::ProcessManager::resume()
> @ 0x7f7e6ce2b86e  
> process::ProcessManager::init_threads()::$_7::operator()()
> @ 0x7f7e6ce2b715  
> _ZNSt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvE3$_7vEE9_M_invokeIJEEEvSt12_Index_tupleIJXspT_EEE
> @ 0x7f7e6ce2b6e5  std::_Bind_simple<>::operator()()
> @ 0x7f7e6ce2b6bc  std::thread::_Impl<>::_M_run()
> @ 0x7f7e6617d030  (unknown)
> @ 0x7f7e65c966aa  start_thread
> @ 0x7f7e659cbe9d  (unknown)
> {noformat}
> Maybe related to the change of standalone container support.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7388) Update allocator interfaces to support resource providers

2017-12-06 Thread Adam B (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16281407#comment-16281407
 ] 

Adam B commented on MESOS-7388:
---

[~bbannier], is this done? What's left before we can close this out?

> Update allocator interfaces to support resource providers
> -
>
> Key: MESOS-7388
> URL: https://issues.apache.org/jira/browse/MESOS-7388
> Project: Mesos
>  Issue Type: Task
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>  Labels: storage
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


  1   2   3   4   5   6   7   8   9   10   >