Re: Issues 16 and Issue 12
Swapnil, Looked over both Docs, HA and NM restart. It's pretty high level so I'll look forward to the details. Initial thoughts: 1. Getting framework reconciliation going would likely eliminate certain issues, such as sendFrameworkMessage being unreliable. So should be implemented sooner than later. 2. How stable is the RMStateStore API? If there's changes between versions of Hadoop, might be best to use Mesos's State API. 3. There was no mention of running two RM's in traditional Hadoop RM HA (maybe in marathon even), but this should be considered a possibility. That may have been implicit. Saw the PR will look at it. Darin Hi Darin, The Myriad HA work will involve work related to issue 16. I already have the Myriad HA design doc for review. Your feedback on it would be really helpful. I also plan to send out for review parts of the Myriad HA implementation (although it does not address task reconciliation yet). I was planning to work on it next. Regards Swapnil On Mon, Aug 3, 2015 at 12:08 PM, Darin Johnson dbjohnson1...@gmail.com wrote: Is anyone actively working these? I'm interested in both of these and should have some cycles to work on them. One question I have on issue 12 is how the generalize Scheduling Policies if we have autoscaling, fine grain scheduling, and fixed resources (with a flexup/flexdown option). Currently it seems as though FGS is embedded pretty deeply. Ideally though we could Have a SchedulerPolicy interface, and users could specify the SchedulerPolicy via the Myriad config. If I don't get a response, I'll probably start issue 16 as it's straight forward and write something up on 12. Darin
[jira] [Updated] (MYRIAD-22) Support Mesos Framework Authentication
[ https://issues.apache.org/jira/browse/MYRIAD-22?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam B updated MYRIAD-22: - Fix Version/s: (was: March HackWeek) Support Mesos Framework Authentication -- Key: MYRIAD-22 URL: https://issues.apache.org/jira/browse/MYRIAD-22 Project: Myriad Issue Type: Bug Reporter: Adam B See the [Mesos protobuf for Credential|https://github.com/apache/mesos/blob/master/include/mesos/mesos.proto#L820] Also see Marathon's issue/implementation: https://github.com/mesosphere/marathon/issues/638 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MYRIAD-14) Support fine grained scaling.
[ https://issues.apache.org/jira/browse/MYRIAD-14?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam B updated MYRIAD-14: - Fix Version/s: (was: March HackWeek) Support fine grained scaling. --- Key: MYRIAD-14 URL: https://issues.apache.org/jira/browse/MYRIAD-14 Project: Myriad Issue Type: Improvement Reporter: Santosh Marella Assignee: Santosh Marella Currently myriad supports scaling at Node Manager level. i.e. it launches a Node Manager instance for every resource offer it receives. We would like to support scaling at YARN's container level. i.e. launch a YARN container per mesos resource offer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (MYRIAD-47) Support custom users in Mesos
[ https://issues.apache.org/jira/browse/MYRIAD-47?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam B reopened MYRIAD-47: -- Reopening to close as duplicate Support custom users in Mesos - Key: MYRIAD-47 URL: https://issues.apache.org/jira/browse/MYRIAD-47 Project: Myriad Issue Type: Bug Reporter: Adam B Fix For: March HackWeek The YAML already specifies what user to start each NM as, but the TaskFactory sets the CommandInfo user as root, and the Scheduler sets the FrameworkInfo user as (current user). These should be configurable too, and should not force root. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MYRIAD-13) High Availability Mode for Myriad
[ https://issues.apache.org/jira/browse/MYRIAD-13?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam B updated MYRIAD-13: - Fix Version/s: (was: March HackWeek) High Availability Mode for Myriad - Key: MYRIAD-13 URL: https://issues.apache.org/jira/browse/MYRIAD-13 Project: Myriad Issue Type: Improvement Reporter: Mohit Soni When recovering from a failure, either a ResourceManager/Myriad JVM failure (new process) or a driver crash (same process), Myriad's should be able: 1. to reconstruct it's existing state during recovery 2. to reconcile the TaskStatus of non-terminal tasks To achieve 1, Myriad need to persist it state externally so that state outlives a Myriad process run. State can be stored either in Zookeeper or Replicated log abstraction provided by Mesos. (Issue MYRIAD-15) To achieve 2, Myriad needs to leverage reconciliation feature. Ben Mahler [document|https://gist.github.com/bmahler/18409fc4f052df43f403] on Reconciliation discusses an algorithm which frameworks can use to reconcile tasks. This should be implemented and used until Reconciliation is managed by SchedulerDriver itself. (Issue MYRIAD-16) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MYRIAD-16) Reconcile tasks when recovering from failure
[ https://issues.apache.org/jira/browse/MYRIAD-16?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam B updated MYRIAD-16: - Fix Version/s: (was: March HackWeek) Reconcile tasks when recovering from failure Key: MYRIAD-16 URL: https://issues.apache.org/jira/browse/MYRIAD-16 Project: Myriad Issue Type: Improvement Reporter: Mohit Soni Assignee: Mohit Soni After recovering from failure, reconcile the TaskStatus of non-terminal tasks. Notes: Ben Mahler's Reconciliation [document|https://gist.github.com/bmahler/18409fc4f052df43f403], recommends an algorithm to be used by frameworks for reconciliation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MYRIAD-15) SchedulerState store for Myriad
[ https://issues.apache.org/jira/browse/MYRIAD-15?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam B updated MYRIAD-15: - Fix Version/s: (was: March HackWeek) SchedulerState store for Myriad --- Key: MYRIAD-15 URL: https://issues.apache.org/jira/browse/MYRIAD-15 Project: Myriad Issue Type: Improvement Reporter: Mohit Soni Implement state store for Myriad. Details TBD. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (MYRIAD-46) Configurable Mesos Role
[ https://issues.apache.org/jira/browse/MYRIAD-46?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam B reopened MYRIAD-46: -- Configurable Mesos Role --- Key: MYRIAD-46 URL: https://issues.apache.org/jira/browse/MYRIAD-46 Project: Myriad Issue Type: Bug Reporter: Adam B Fix For: March HackWeek Enables support for resource reservations, as well as weighted fair sharing. Just register with a FrameworkInfo using non-default role (not */), and you can then get offered resources reserved just for that role. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (MYRIAD-37) Distribute executor/NM binaries rather than assume they already exist on-node.
[ https://issues.apache.org/jira/browse/MYRIAD-37?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam B reopened MYRIAD-37: -- Distribute executor/NM binaries rather than assume they already exist on-node. -- Key: MYRIAD-37 URL: https://issues.apache.org/jira/browse/MYRIAD-37 Project: Myriad Issue Type: Bug Reporter: Adam B Fix For: March HackWeek In a cluster using the HDFS framework, or without HDFS, the YARN/Hadoop binaries may not already be present on the node. In this case, the executor/NM binaries could be distributed via HDFS, S3, etc. Dockerization is handled in a separate issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (MYRIAD-46) Configurable Mesos Role
[ https://issues.apache.org/jira/browse/MYRIAD-46?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam B resolved MYRIAD-46. -- Resolution: Fixed Assignee: Shingo Omura Configurable Mesos Role --- Key: MYRIAD-46 URL: https://issues.apache.org/jira/browse/MYRIAD-46 Project: Myriad Issue Type: Bug Reporter: Adam B Assignee: Shingo Omura Fix For: March HackWeek Enables support for resource reservations, as well as weighted fair sharing. Just register with a FrameworkInfo using non-default role (not */), and you can then get offered resources reserved just for that role. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Myriad HA state store implementation
Hi all, I have sent a pull request for Myriad HA state store implementation here https://github.com/mesos/myriad/pull/123 * I want to mention that the design for Myriad HA has been strongly influenced by Paul Read's work here https://github.com/pdread100/myriad/tree/issue-13 and here https://github.com/pdread100/myriad/tree/issue-15 * I have used Paul's code to serialize and deserialize the scheduler state to the state store (see commit 1) I have made minor additions for the frameworkId to be stored and retrieved. I have made sure to commit this code with Paul as the author. * The pull request stores the Myriad Scheduler state to the DFS * To use the state store implementation you need to add the following properties to the yarn-site.xml on the RM. property nameyarn.resourcemanager.recovery.enabled/name valuetrue/value /property property nameyarn.resourcemanager.store.class/name valueorg.apache.hadoop.yarn.server.resourcemanager.recovery.MyriadFileSystemRMStateStore/value /property property nameyarn.resourcemanager.fs.state-store.uri/name value/var/mapr/cluster/yarn/rm/system/value !-- Replace this to desired path -- /property You should be able to see a directory structure similar to this on the dfs hadoop fs -ls /var/mapr/cluster/yarn/rm/system/FSRMStateRoot Found 4 items drwxr-xr-x - mapr mapr 5 2015-07-23 13:40 /var/mapr/cluster/yarn/rm/system/FSRMStateRoot/RMAppRoot drwxr-xr-x - mapr mapr 65 2015-08-02 17:19 /var/mapr/cluster/yarn/rm/system/FSRMStateRoot/RMDTSecretManagerRoot drwxr-xr-x - mapr mapr 1 2015-07-27 17:21 /var/mapr/cluster/yarn/rm/system/FSRMStateRoot/RMMyriadRoot --- Myriad state root folder -rwxr-xr-x 3 mapr mapr 4 2015-07-21 10:37 /var/mapr/cluster/yarn/rm/system/FSRMStateRoot/RMVersionNode hadoop fs -ls /var/mapr/cluster/yarn/rm/system/FSRMStateRoot/RMMyriadRoot Found 1 items -rwxr-xr-x 3 mapr mapr 80 2015-07-27 17:21 /var/mapr/cluster/yarn/rm/system/FSRMStateRoot/RMMyriadRoot/MyriadState -- Myriad state file * This pull request does not do the following (work in progress) 1. Reconcile state with Mesos Master and restart NMs if they are lost during Myriad scheduler restart. 2. In case of FGS, update the RM's view of NM resources for NMs running containers. * Detailed design doc for Myriad HA is here https://docs.google.com/document/d/1BkcDChhOLU5TDU6ZQEpIh-WBKoCwYPPi9OV-__mQlmQ/edit?usp=sharing Please let me know your thoughts, suggestions etc. Regards Swapnil
Myriad HA design doc
Hi All, I am posting a draft of the Myriad HA design doc using Google docs. You should be able to use the link below to add your comments. https://docs.google.com/document/d/1BkcDChhOLU5TDU6ZQEpIh-WBKoCwYPPi9OV-__mQlmQ/edit?usp=sharing Please let me know your thoughts and suggestions. I should be able to send a pull request for storing and retrieving Myriad state from the state store shortly. Regards Swapnil
Re: Jira ready (mesos/myriad github issues imported)
I just disabled our github issues link, and added a link to the new JIRA at the top of the github page. Please use JIRA from now on for any new issues or updates to existing issues. You probably need to create an account if you don't already have one. Let us know if you need to be added to the Developers group to assign an issue to yourself. All committers have JIRA Admin access. Thanks! On Sun, Aug 2, 2015 at 6:23 AM, Gavin McDonald gmcdon...@apache.org wrote: Hi All, mesos/myriad Github issues imported into the ASF Jira. as per https://issues.apache.org/jira/browse/INFRA-9516 Please now stop using Github and start using Jira. https://issues.apache.org/jira/browse/MYRIAD Please comment on the INFRA-9516 ticket if you spot anything or otherwise please confirm all ok and close the ticket. Thanks Gav…
Re: Issues 16 and Issue 12
Hi Darin, The Myriad HA work will involve work related to issue 16. I already have the Myriad HA design doc for review. Your feedback on it would be really helpful. I also plan to send out for review parts of the Myriad HA implementation (although it does not address task reconciliation yet). I was planning to work on it next. Regards Swapnil On Mon, Aug 3, 2015 at 12:08 PM, Darin Johnson dbjohnson1...@gmail.com wrote: Is anyone actively working these? I'm interested in both of these and should have some cycles to work on them. One question I have on issue 12 is how the generalize Scheduling Policies if we have autoscaling, fine grain scheduling, and fixed resources (with a flexup/flexdown option). Currently it seems as though FGS is embedded pretty deeply. Ideally though we could Have a SchedulerPolicy interface, and users could specify the SchedulerPolicy via the Myriad config. If I don't get a response, I'll probably start issue 16 as it's straight forward and write something up on 12. Darin