[jira] [Commented] (AURORA-1973) Documentation issue in installation docs
[ https://issues.apache.org/jira/browse/AURORA-1973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17043709#comment-17043709 ] Renan DelValle commented on AURORA-1973: [~asutosh_pandya] the Apache version of this project has now been archived. If you would like to submit a patch for documentation, I highly suggest sending it over to the spiritual successor https://github.com/aurora-scheduler/aurora > Documentation issue in installation docs > > > Key: AURORA-1973 > URL: https://issues.apache.org/jira/browse/AURORA-1973 > Project: Aurora > Issue Type: Bug >Reporter: Tokuhiro Matsuno >Priority: Trivial > Time Spent: 10m > Remaining Estimate: 0h > > In Installation docs, `sudo systemctl start aurora` was specified. But it's > incorrect. > It should be `sudo systemctl start aurora-scheduler` > https://github.com/apache/aurora/commit/537e052cf9bdd69b1454962d77bb90a3b7f8ebc4 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (AURORA-1997) Consider using checksum-dependency-plugin for dependency verification
[ https://issues.apache.org/jira/browse/AURORA-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Renan DelValle closed AURORA-1997. -- Resolution: Later Hello [~vladimirsitnikov], While we appreciate your suggestions, there are currently no plans to integrate this plug in on our roadmap. In a perfect world this would be a priority, but we simply don't have the dev power right now upgrade to Gradle 6.x which makes integrating with this plugin a serious challenge. If you believe you can help us upgrade to Gradle 6.x we would be extremely grateful for a pull request on github: [https://github.com/apache/aurora] Until then, unfortunately, I will have to close without a promise of getting to it in the future. -Renan > Consider using checksum-dependency-plugin for dependency verification > - > > Key: AURORA-1997 > URL: https://issues.apache.org/jira/browse/AURORA-1997 > Project: Aurora > Issue Type: Story > Components: Build, Scheduler, Security >Reporter: Vladimir Sitnikov >Priority: Trivial > Labels: newbie > > {{checksum-dependency-plugin}} [1] is a superset of {{gradle-witness}}, and > it enables to increase the level of security. > Key features: > * Gradle plugins can be verified (grade-witness doesn't track plugins) > * All Gradle configurations are supported (e.g. `java-library` plugin is > supported). `checksum-dependency-plugin` intercepts detached configurations > as well (e.g. the ones that are created on demand) > * PGP can be used for verification. PGP can be used with or without > checksum. PGP enables to detect and prevent issues like > [https://blog.autsoft.hu/a-confusing-dependency/] > {{checksum-dependency-plugin}} aims to provide insulation against MITM > attacks via maven dependency downloads. > It is trivial to integrate, and it is not that hard to maintain (e.g. > updated checksum.xml could be updated automatically) > [1] > [https://github.com/vlsi/vlsi-release-plugins/tree/master/plugins/checksum-dependency-plugin] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (AURORA-1988) Report "[Errno 13] Permission denied" when run hello world when follow latest doc
[ https://issues.apache.org/jira/browse/AURORA-1988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16669141#comment-16669141 ] Renan DelValle commented on AURORA-1988: The issue has not been fixed yet but it will be fixed by the time 0.22.0 is released which will be compatible with 1.6 and 1.7 keeping in cadence with our +-1 compatibility rule. I suggest you keep an eye on that github PR. Once that PR is merged it should be more or less safe to upgrade to Mesos 1.6 though more testing should be done for Mesos 1.7 after that. > Report "[Errno 13] Permission denied" when run hello world when follow latest > doc > -- > > Key: AURORA-1988 > URL: https://issues.apache.org/jira/browse/AURORA-1988 > Project: Aurora > Issue Type: Bug > Components: Executor >Affects Versions: 0.20.0 > Environment: Mesos Version: 1.6.0 > Aurora Version: 0.20.0 > Aurora RPM: > aurora-scheduler-0.20.0-1.el7.centos.aurora.x86_64.rpm > aurora-executor-0.20.0-1.el7.centos.aurora.x86_64.rpm >Reporter: Geng Gang >Priority: Blocker > Labels: beginner > Attachments: screen1.jpg > > > Hi > I am new user for aurora. When I follow latest hello world doc > ([http://aurora.apache.org/documentation/latest/getting-started/tutorial/)] > to run first hello world aurora job, I meet below issues: > D0605 16:17:01.721896 8320 process.py:155] [process: 8320=hello_world]: Error > trying to execute hello_world: {color:#FF}[Errno 13] Permission > denied{color}: > How to solve this issue? > > +_*The below is "thermos_runner.DEBUG" log in Mesos Agent:*_+ > D0605 16:17:01.721050 8320 process.py:445] Wrapped cmdline: ['/bin/bash', > '-c', u'echo "gang---hello aurora---";'] > D0605 16:17:01.721333 8320 process.py:455] ENV is: \{'HOME': > '/var/lib/mesos/slaves/65c1f16f-4292-464f-954c-3471fafdc988-S0/frameworks/b6da6477-5047-4d02-a323-101dd2e6d8b6-/executors/thermos-www-data-devel-hello_world-0-1d35ab50-a8eb-42da-a4f0-1de0bfbcd8fe/runs/b1d09fc1-ff87-4ec3-9341-85b21040f304/sandbox', > 'LOGNAME': 'www-data', 'USER': 'www-data', 'PATH': > '/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin'} > D0605 16:17:01.721896 8320 process.py:155] [process: 8320=hello_world]: Error > trying to execute hello_world: {color:#FF}[Errno 13] Permission > denied{color}: > '/var/lib/mesos/slaves/65c1f16f-4292-464f-954c-3471fafdc988-S0/frameworks/b6da6477-5047-4d02-a323-101dd2e6d8b6-/executors/thermos-www-data-devel-hello_world-0-1d35ab50-a8eb-42da-a4f0-1de0bfbcd8fe/runs/b1d09fc1-ff87-4ec3-9341-85b21040f304/sandbox' > D0605 16:17:01.722104 8320 process.py:155] [process: 8320=hello_world]: > Coordinator exiting. > > > +_*The below is /etc/aurora/clusters.json:*_+ > [root@cloudpoc3 ~]# more /etc/aurora/clusters.json > [ > { > "auth_mechanism": "UNAUTHENTICATED", > "name": "devcluster", > "scheduler_zk_path": "/aurora/scheduler", > "slave_root": "/var/lib/mesos", > "slave_run_directory": "latest", > "zk": "127.0.0.1" > } > ] > +_*The below is hello_world.aurora file:*_+ > pkg_path = '/opt/aurora_test/hello_world.py' > # we use a trick here to make the configuration change with > # the contents of the file, for simplicity. in a normal setting, packages > would be > # versioned, and the version number would be changed in the configuration. > import hashlib > with open(pkg_path, 'rb') as f: > pkg_checksum = hashlib.md5(f.read()).hexdigest() > # copy hello_world.py into the local sandbox > install = Process( > name = 'fetch_package', > cmdline = 'cp %s . && echo %s && chmod +x hello_world.py' % (pkg_path, > pkg_checksum)) > # run the script > # cmdline = 'python -u hello_world.py' > {color:#FF}hello_world = Process({color} > {color:#FF} name = 'hello_world',{color} > {color:#FF} cmdline = 'echo "gang---hello aurora---";'){color} > # describe the task > hello_world_task = SequentialTask( > processes = [hello_world], > resources = Resources(cpu = 2, ram = 4096*MB, disk=4096*MB)) > jobs = [ > Service(cluster = 'devcluster', > environment = 'devel', > role = 'www-data', > name = 'hello_world', > task = hello_world_task) > ] > +_*From Mesos WebUI, it seems normal:*_+ > Please see attached screen1.jpg > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (AURORA-1988) Report "[Errno 13] Permission denied" when run hello world when follow latest doc
[ https://issues.apache.org/jira/browse/AURORA-1988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Renan DelValle closed AURORA-1988. -- Resolution: Cannot Reproduce > Report "[Errno 13] Permission denied" when run hello world when follow latest > doc > -- > > Key: AURORA-1988 > URL: https://issues.apache.org/jira/browse/AURORA-1988 > Project: Aurora > Issue Type: Bug > Components: Executor >Affects Versions: 0.20.0 > Environment: Mesos Version: 1.6.0 > Aurora Version: 0.20.0 > Aurora RPM: > aurora-scheduler-0.20.0-1.el7.centos.aurora.x86_64.rpm > aurora-executor-0.20.0-1.el7.centos.aurora.x86_64.rpm >Reporter: Geng Gang >Priority: Blocker > Labels: beginner > Attachments: screen1.jpg > > > Hi > I am new user for aurora. When I follow latest hello world doc > ([http://aurora.apache.org/documentation/latest/getting-started/tutorial/)] > to run first hello world aurora job, I meet below issues: > D0605 16:17:01.721896 8320 process.py:155] [process: 8320=hello_world]: Error > trying to execute hello_world: {color:#FF}[Errno 13] Permission > denied{color}: > How to solve this issue? > > +_*The below is "thermos_runner.DEBUG" log in Mesos Agent:*_+ > D0605 16:17:01.721050 8320 process.py:445] Wrapped cmdline: ['/bin/bash', > '-c', u'echo "gang---hello aurora---";'] > D0605 16:17:01.721333 8320 process.py:455] ENV is: \{'HOME': > '/var/lib/mesos/slaves/65c1f16f-4292-464f-954c-3471fafdc988-S0/frameworks/b6da6477-5047-4d02-a323-101dd2e6d8b6-/executors/thermos-www-data-devel-hello_world-0-1d35ab50-a8eb-42da-a4f0-1de0bfbcd8fe/runs/b1d09fc1-ff87-4ec3-9341-85b21040f304/sandbox', > 'LOGNAME': 'www-data', 'USER': 'www-data', 'PATH': > '/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin'} > D0605 16:17:01.721896 8320 process.py:155] [process: 8320=hello_world]: Error > trying to execute hello_world: {color:#FF}[Errno 13] Permission > denied{color}: > '/var/lib/mesos/slaves/65c1f16f-4292-464f-954c-3471fafdc988-S0/frameworks/b6da6477-5047-4d02-a323-101dd2e6d8b6-/executors/thermos-www-data-devel-hello_world-0-1d35ab50-a8eb-42da-a4f0-1de0bfbcd8fe/runs/b1d09fc1-ff87-4ec3-9341-85b21040f304/sandbox' > D0605 16:17:01.722104 8320 process.py:155] [process: 8320=hello_world]: > Coordinator exiting. > > > +_*The below is /etc/aurora/clusters.json:*_+ > [root@cloudpoc3 ~]# more /etc/aurora/clusters.json > [ > { > "auth_mechanism": "UNAUTHENTICATED", > "name": "devcluster", > "scheduler_zk_path": "/aurora/scheduler", > "slave_root": "/var/lib/mesos", > "slave_run_directory": "latest", > "zk": "127.0.0.1" > } > ] > +_*The below is hello_world.aurora file:*_+ > pkg_path = '/opt/aurora_test/hello_world.py' > # we use a trick here to make the configuration change with > # the contents of the file, for simplicity. in a normal setting, packages > would be > # versioned, and the version number would be changed in the configuration. > import hashlib > with open(pkg_path, 'rb') as f: > pkg_checksum = hashlib.md5(f.read()).hexdigest() > # copy hello_world.py into the local sandbox > install = Process( > name = 'fetch_package', > cmdline = 'cp %s . && echo %s && chmod +x hello_world.py' % (pkg_path, > pkg_checksum)) > # run the script > # cmdline = 'python -u hello_world.py' > {color:#FF}hello_world = Process({color} > {color:#FF} name = 'hello_world',{color} > {color:#FF} cmdline = 'echo "gang---hello aurora---";'){color} > # describe the task > hello_world_task = SequentialTask( > processes = [hello_world], > resources = Resources(cpu = 2, ram = 4096*MB, disk=4096*MB)) > jobs = [ > Service(cluster = 'devcluster', > environment = 'devel', > role = 'www-data', > name = 'hello_world', > task = hello_world_task) > ] > +_*From Mesos WebUI, it seems normal:*_+ > Please see attached screen1.jpg > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AURORA-1988) Report "[Errno 13] Permission denied" when run hello world when follow latest doc
[ https://issues.apache.org/jira/browse/AURORA-1988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16668876#comment-16668876 ] Renan DelValle commented on AURORA-1988: [~clems4ever], this is not the same issue you folks are experiencing. There was a change in Mesos 1.6 to the default permissions of the sandbox from 755 to 750. There was e-mail to the dev list regarding this issue because it's still not solved. [https://github.com/apache/aurora/pull/42] and [https://lists.apache.org/thread.html/c1cf974461bfdf696e3ac2596c6177761406cadf3a8b493929be690f@%3Cdev.aurora.apache.org%3E] Furthermore, we would not recommend running Aurora 0.16.0 with anything higher than Mesos 1.1.0 as there is only a guarantee of +-1 version compatibility with Aurora. Since Aurora 0.16.0 was released in Sept. 2016, it makes it impossible to foresee changes as the ones that were made in Mesos 1.6. > Report "[Errno 13] Permission denied" when run hello world when follow latest > doc > -- > > Key: AURORA-1988 > URL: https://issues.apache.org/jira/browse/AURORA-1988 > Project: Aurora > Issue Type: Bug > Components: Executor >Affects Versions: 0.20.0 > Environment: Mesos Version: 1.6.0 > Aurora Version: 0.20.0 > Aurora RPM: > aurora-scheduler-0.20.0-1.el7.centos.aurora.x86_64.rpm > aurora-executor-0.20.0-1.el7.centos.aurora.x86_64.rpm >Reporter: Geng Gang >Priority: Blocker > Labels: beginner > Attachments: screen1.jpg > > > Hi > I am new user for aurora. When I follow latest hello world doc > ([http://aurora.apache.org/documentation/latest/getting-started/tutorial/)] > to run first hello world aurora job, I meet below issues: > D0605 16:17:01.721896 8320 process.py:155] [process: 8320=hello_world]: Error > trying to execute hello_world: {color:#FF}[Errno 13] Permission > denied{color}: > How to solve this issue? > > +_*The below is "thermos_runner.DEBUG" log in Mesos Agent:*_+ > D0605 16:17:01.721050 8320 process.py:445] Wrapped cmdline: ['/bin/bash', > '-c', u'echo "gang---hello aurora---";'] > D0605 16:17:01.721333 8320 process.py:455] ENV is: \{'HOME': > '/var/lib/mesos/slaves/65c1f16f-4292-464f-954c-3471fafdc988-S0/frameworks/b6da6477-5047-4d02-a323-101dd2e6d8b6-/executors/thermos-www-data-devel-hello_world-0-1d35ab50-a8eb-42da-a4f0-1de0bfbcd8fe/runs/b1d09fc1-ff87-4ec3-9341-85b21040f304/sandbox', > 'LOGNAME': 'www-data', 'USER': 'www-data', 'PATH': > '/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin'} > D0605 16:17:01.721896 8320 process.py:155] [process: 8320=hello_world]: Error > trying to execute hello_world: {color:#FF}[Errno 13] Permission > denied{color}: > '/var/lib/mesos/slaves/65c1f16f-4292-464f-954c-3471fafdc988-S0/frameworks/b6da6477-5047-4d02-a323-101dd2e6d8b6-/executors/thermos-www-data-devel-hello_world-0-1d35ab50-a8eb-42da-a4f0-1de0bfbcd8fe/runs/b1d09fc1-ff87-4ec3-9341-85b21040f304/sandbox' > D0605 16:17:01.722104 8320 process.py:155] [process: 8320=hello_world]: > Coordinator exiting. > > > +_*The below is /etc/aurora/clusters.json:*_+ > [root@cloudpoc3 ~]# more /etc/aurora/clusters.json > [ > { > "auth_mechanism": "UNAUTHENTICATED", > "name": "devcluster", > "scheduler_zk_path": "/aurora/scheduler", > "slave_root": "/var/lib/mesos", > "slave_run_directory": "latest", > "zk": "127.0.0.1" > } > ] > +_*The below is hello_world.aurora file:*_+ > pkg_path = '/opt/aurora_test/hello_world.py' > # we use a trick here to make the configuration change with > # the contents of the file, for simplicity. in a normal setting, packages > would be > # versioned, and the version number would be changed in the configuration. > import hashlib > with open(pkg_path, 'rb') as f: > pkg_checksum = hashlib.md5(f.read()).hexdigest() > # copy hello_world.py into the local sandbox > install = Process( > name = 'fetch_package', > cmdline = 'cp %s . && echo %s && chmod +x hello_world.py' % (pkg_path, > pkg_checksum)) > # run the script > # cmdline = 'python -u hello_world.py' > {color:#FF}hello_world = Process({color} > {color:#FF} name = 'hello_world',{color} > {color:#FF} cmdline = 'echo "gang---hello aurora---";'){color} > # describe the task > hello_world_task = SequentialTask( > processes = [hello_world], > resources = Resources(cpu = 2, ram = 4096*MB, disk=4096*MB)) > jobs = [ > Service(cluster = 'devcluster', > environment = 'devel', > role = 'www-data', > name = 'hello_world', > task = hello_world_task) > ] > +_*From Mesos WebUI, it seems normal:*_+ > Please see attached screen1.jpg > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AURORA-1991) TaskEvents in API Thrift should have optional parameters
[ https://issues.apache.org/jira/browse/AURORA-1991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Renan DelValle resolved AURORA-1991. Resolution: Fixed Assignee: Ezequiel Torres Fix Version/s: 0.21 > TaskEvents in API Thrift should have optional parameters > > > Key: AURORA-1991 > URL: https://issues.apache.org/jira/browse/AURORA-1991 > Project: Aurora > Issue Type: Bug > Components: Client >Affects Versions: 0.19.1 >Reporter: Ezequiel Torres >Assignee: Ezequiel Torres >Priority: Minor > Fix For: 0.21 > > > h1. *+What?+* > Struct > [TaskQuery|https://git-wip-us.apache.org/repos/asf?p=aurora.git;a=blob;f=api/src/main/thrift/org/apache/aurora/gen/api.thrift;h=7265b11103aa12743c42355163ae64e98e965d7f;hb=HEAD#l579] > should have optional parameters in order to be able to be used in languages > like Go where types does not have a null value by default. > The following is the autogenerated code created by Thrift with optional > parameters and without optional parameters in Golang: > +*_Without Optional Parameters_*+ > {code} > type TaskQuery struct { > // unused field # 1 > JobName string `thrift:"jobName,2" json:"jobName"` > // unused field # 3 > TaskIds map[string]bool `thrift:"taskIds,4" json:"taskIds"` > Statuses map[ScheduleStatus]bool `thrift:"statuses,5" json:"statuses"` > // unused field # 6 > InstanceIds map[int32]bool `thrift:"instanceIds,7" json:"instanceIds"` > // unused field # 8 > Environment string `thrift:"environment,9" json:"environment"` > SlaveHosts map[string]bool `thrift:"slaveHosts,10" json:"slaveHosts"` > JobKeys map[*JobKey]bool `thrift:"jobKeys,11" json:"jobKeys"` > Offset int32 `thrift:"offset,12" json:"offset"` > Limit int32 `thrift:"limit,13" json:"limit"` > Role string `thrift:"role,14" json:"role"` > } > {code} > _*+With Optional Parameters+*_ > {code} > type TaskQuery struct { > // unused field # 1 > JobName *string `thrift:"jobName,2" json:"jobName"` > // unused field # 3 > TaskIds map[string]bool `thrift:"taskIds,4" json:"taskIds"` > Statuses map[ScheduleStatus]bool `thrift:"statuses,5" json:"statuses"` > // unused field # 6 > InstanceIds map[int32]bool `thrift:"instanceIds,7" json:"instanceIds"` > // unused field # 8 > Environment *string `thrift:"environment,9" json:"environment"` > SlaveHosts map[string]bool `thrift:"slaveHosts,10" json:"slaveHosts"` > JobKeys map[*JobKey]bool `thrift:"jobKeys,11" json:"jobKeys"` > Offset *int32 `thrift:"offset,12" json:"offset"` > Limit *int32 `thrift:"limit,13" json:"limit"` > Role*string `thrift:"role,14" json:"role"` > } > {code} > It can be seen that with an optional parameters like JobName, Role and > Environment now can be set with a null value > h1. *+Why?+* > With the current structure of the TaskQuery object, it is not possible to > make queries without explicitly setting all the fields of the TaskQuery > object in Golang. Moreover, the lack of a null value in the structure of the > TaskQuery object limits the type of queries that can be obtained from the > Aurora Thrift API in Golang since a parameter cannot be skipped. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AURORA-1991) TaskEvents in API Thrift should have optional parameters
[ https://issues.apache.org/jira/browse/AURORA-1991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16617039#comment-16617039 ] Renan DelValle commented on AURORA-1991: [~jingc] thanks for calling attention to this. [~ezetowers] submitted a patch to fix this and it landed in master in July :) [https://github.com/apache/aurora/commit/efe8656512373389771aff88c2141940f925ad58] Closing this as fixed! > TaskEvents in API Thrift should have optional parameters > > > Key: AURORA-1991 > URL: https://issues.apache.org/jira/browse/AURORA-1991 > Project: Aurora > Issue Type: Bug > Components: Client >Affects Versions: 0.19.1 >Reporter: Ezequiel Torres >Priority: Minor > > h1. *+What?+* > Struct > [TaskQuery|https://git-wip-us.apache.org/repos/asf?p=aurora.git;a=blob;f=api/src/main/thrift/org/apache/aurora/gen/api.thrift;h=7265b11103aa12743c42355163ae64e98e965d7f;hb=HEAD#l579] > should have optional parameters in order to be able to be used in languages > like Go where types does not have a null value by default. > The following is the autogenerated code created by Thrift with optional > parameters and without optional parameters in Golang: > +*_Without Optional Parameters_*+ > {code} > type TaskQuery struct { > // unused field # 1 > JobName string `thrift:"jobName,2" json:"jobName"` > // unused field # 3 > TaskIds map[string]bool `thrift:"taskIds,4" json:"taskIds"` > Statuses map[ScheduleStatus]bool `thrift:"statuses,5" json:"statuses"` > // unused field # 6 > InstanceIds map[int32]bool `thrift:"instanceIds,7" json:"instanceIds"` > // unused field # 8 > Environment string `thrift:"environment,9" json:"environment"` > SlaveHosts map[string]bool `thrift:"slaveHosts,10" json:"slaveHosts"` > JobKeys map[*JobKey]bool `thrift:"jobKeys,11" json:"jobKeys"` > Offset int32 `thrift:"offset,12" json:"offset"` > Limit int32 `thrift:"limit,13" json:"limit"` > Role string `thrift:"role,14" json:"role"` > } > {code} > _*+With Optional Parameters+*_ > {code} > type TaskQuery struct { > // unused field # 1 > JobName *string `thrift:"jobName,2" json:"jobName"` > // unused field # 3 > TaskIds map[string]bool `thrift:"taskIds,4" json:"taskIds"` > Statuses map[ScheduleStatus]bool `thrift:"statuses,5" json:"statuses"` > // unused field # 6 > InstanceIds map[int32]bool `thrift:"instanceIds,7" json:"instanceIds"` > // unused field # 8 > Environment *string `thrift:"environment,9" json:"environment"` > SlaveHosts map[string]bool `thrift:"slaveHosts,10" json:"slaveHosts"` > JobKeys map[*JobKey]bool `thrift:"jobKeys,11" json:"jobKeys"` > Offset *int32 `thrift:"offset,12" json:"offset"` > Limit *int32 `thrift:"limit,13" json:"limit"` > Role*string `thrift:"role,14" json:"role"` > } > {code} > It can be seen that with an optional parameters like JobName, Role and > Environment now can be set with a null value > h1. *+Why?+* > With the current structure of the TaskQuery object, it is not possible to > make queries without explicitly setting all the fields of the TaskQuery > object in Golang. Moreover, the lack of a null value in the structure of the > TaskQuery object limits the type of queries that can be obtained from the > Aurora Thrift API in Golang since a parameter cannot be skipped. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (AURORA-1993) Aurora crashes when handling an unknown custom resource
[ https://issues.apache.org/jira/browse/AURORA-1993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Renan DelValle closed AURORA-1993. -- Resolution: Fixed Fix Version/s: 0.17.0 This was fixed in 0.17.0 https://github.com/apache/aurora/commit/4797dfe33ba08183fa9596a46ac8be51a64e08bb > Aurora crashes when handling an unknown custom resource > --- > > Key: AURORA-1993 > URL: https://issues.apache.org/jira/browse/AURORA-1993 > Project: Aurora > Issue Type: Bug >Affects Versions: 0.16.0 >Reporter: Clément Michaud >Priority: Major > Fix For: 0.17.0 > > > While we tried to declare network bandwidth as a custom resource in Mesos, we > faced a crash in Aurora with the following stacktrace: > {code:java} > Jul 18, 2018 1:35:19 PM > com.google.common.util.concurrent.ServiceManager$ServiceListener failed > SEVERE: Service SlotSizeCounterService [FAILED] has failed in the RUNNING > state. > java.lang.NullPointerException: Unknown Mesos resource: name: > "network_bandwidth" > type: SCALAR > scalar { > value: 2000.0 > } > role: "*" > 11: "\n\adefault" > at java.util.Objects.requireNonNull(Objects.java:228) > at > org.apache.aurora.scheduler.resources.ResourceType.fromResource(ResourceType.java:355) > at > org.apache.aurora.scheduler.resources.ResourceManager.lambda$static$0(ResourceManager.java:52) > at com.google.common.collect.Iterators$7.computeNext(Iterators.java:675) > at > com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) > at > com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) > at java.util.Iterator.forEachRemaining(Iterator.java:115) > at > java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) > at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) > at > java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) > at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) > at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499) > at > org.apache.aurora.scheduler.resources.ResourceManager.bagFromResources(ResourceManager.java:274) > at > org.apache.aurora.scheduler.resources.ResourceManager.bagFromMesosResources(ResourceManager.java:239) > at > org.apache.aurora.scheduler.stats.AsyncStatsModule$OfferAdapter.get(AsyncStatsModule.java:153) > at > org.apache.aurora.scheduler.stats.SlotSizeCounter.run(SlotSizeCounter.java:168) > at > org.apache.aurora.scheduler.stats.AsyncStatsModule$SlotSizeCounterService.runOneIteration(AsyncStatsModule.java:130) > at > com.google.common.util.concurrent.AbstractScheduledService$ServiceDelegate$Task.run(AbstractScheduledService.java:189) > at com.google.common.util.concurrent.Callables$3.run(Callables.java:100) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > E0718 13:35:19.240 [SlotSizeCounterService RUNNING, > GuavaUtils$LifecycleShutdownListener:55] Service: SlotSizeCounterService > [FAILED] faile > I0718 13:35:19.240 [SlotSizeCounterService RUNNING, Lifecycle:84] Shutting > down application > I0718 13:35:19.240 [SlotSizeCounterService RUNNING, > ShutdownRegistry$ShutdownRegistryImpl:77] Executing 4 shutdown commands. > I0718 13:35:19.243 [SlotSizeCounterService RUNNING, StateMachine$Builder:389] > SchedulerLifecycle state machine transition ACTIVE -> DEAD > I0718 13:35:19.249073 331 sched.cpp:2021] Asked to stop the driver > I0718 13:35:19.249344 30748 sched.cpp:1203] Stopping framework > 2a905643-b76f-4f17-a406-524d406f49f8- > I0718 13:35:19.249 [SlotSizeCounterService RUNNING, StateMachine$Builder:389] > storage state machine transition READY -> STOPPED > I0718 13:35:19.250 [BlockingDriverJoin, SchedulerLifecycle$6:267] Driver > exited, terminating lifecycle. > I0718 13:35:19.250 [BlockingDriverJoin, StateMachine$Builder:389] > SchedulerLifecycle state machine transition DEAD -> DEAD > I0718 13:35:19.250 [BlockingDriverJoin, SchedulerLifecycle$7:287] Shutdown > already invoked, ignoring extra call. > I0718 13:35:19.255 [CronLifecycle STOPPING, CronLifecycle:90] Shutting down > Quartz cron scheduler. > I0718 13:35:19.255
[jira] [Created] (AURORA-1989) make-pycharm-virtualenv broken after pip drops --egg
Renan DelValle created AURORA-1989: -- Summary: make-pycharm-virtualenv broken after pip drops --egg Key: AURORA-1989 URL: https://issues.apache.org/jira/browse/AURORA-1989 Project: Aurora Issue Type: Bug Components: Client Reporter: Renan DelValle {{pip has dropped the --egg option in pip 10.0.1}} {{ [https://pip.pypa.io/en/stable/news/#b1-2018-03-31] which has broken our make-pycharm-virtualenv script needed to make development of the client easier on pycharm.}} Running the script results in the following error: + VIRTUALENV_VERSION=16.0.0}} {{ + which python2.7}} {{ ++ which python2.7}} {{ + PY=/usr/local/bin/python2.7}} {{ + echo 'Using /usr/local/bin/python2.7'}} {{ Using /usr/local/bin/python2.7}} {{ +++ dirname ./build-support/virtualenv}} {{ ++ cd ./build-support}} {{ ++ pwd}} {{ + HERE=/Users/rdelvalle/git/aurora/build-support}} {{ + '[' -f /Users/rdelvalle/git/aurora/build-support/virtualenv-16.0.0/BOOTSTRAPPED ']'}} {{ + exec /usr/local/bin/python2.7 /Users/rdelvalle/git/aurora/build-support/virtualenv-16.0.0/virtualenv.py --no-download build-support/python/pycharm.venv}} {{ New python executable in /Users/rdelvalle/git/aurora/build-support/python/pycharm.venv/bin/python2.7}} {{ Also creating executable in /Users/rdelvalle/git/aurora/build-support/python/pycharm.venv/bin/python}} {{ Installing setuptools, pip, wheel...done.}}{{Usage:}} {{ /Users/rdelvalle/git/aurora/build-support/python/pycharm.venv/bin/python -m pip install [options] [package-index-options] ...}} {{ /Users/rdelvalle/git/aurora/build-support/python/pycharm.venv/bin/python -m pip install [options] -r [package-index-options] ...}} {{ /Users/rdelvalle/git/aurora/build-support/python/pycharm.venv/bin/python -m pip install [options] [-e] ...}} {{ /Users/rdelvalle/git/aurora/build-support/python/pycharm.venv/bin/python -m pip install [options] [-e] ...}} {{ /Users/rdelvalle/git/aurora/build-support/python/pycharm.venv/bin/python -m pip install [options] ...}}{{no such option: --egg -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AURORA-1988) Report "[Errno 13] Permission denied" when run hello world when follow latest doc
[ https://issues.apache.org/jira/browse/AURORA-1988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16512884#comment-16512884 ] Renan DelValle commented on AURORA-1988: [~ggeng1] This looks like the python script doesn't have the correct permissions to run on the mesos-agent on which the task is getting scheduled. In the example, the executor is run as the part of the configuration. Therefore, this task is trying to run under the user www-data. If the python script being placed in the box by this task doesn't have the right permissions, user www-data will not be able to execute it. > Report "[Errno 13] Permission denied" when run hello world when follow latest > doc > -- > > Key: AURORA-1988 > URL: https://issues.apache.org/jira/browse/AURORA-1988 > Project: Aurora > Issue Type: Bug > Components: Executor >Affects Versions: 0.20.0 > Environment: Mesos Version: 1.6.0 > Aurora Version: 0.20.0 > Aurora RPM: > aurora-scheduler-0.20.0-1.el7.centos.aurora.x86_64.rpm > aurora-executor-0.20.0-1.el7.centos.aurora.x86_64.rpm >Reporter: Geng Gang >Priority: Blocker > Labels: beginner > Attachments: screen1.jpg > > > Hi > I am new user for aurora. When I follow latest hello world doc > ([http://aurora.apache.org/documentation/latest/getting-started/tutorial/)] > to run first hello world aurora job, I meet below issues: > D0605 16:17:01.721896 8320 process.py:155] [process: 8320=hello_world]: Error > trying to execute hello_world: {color:#FF}[Errno 13] Permission > denied{color}: > How to solve this issue? > > +_*The below is "thermos_runner.DEBUG" log in Mesos Agent:*_+ > D0605 16:17:01.721050 8320 process.py:445] Wrapped cmdline: ['/bin/bash', > '-c', u'echo "gang---hello aurora---";'] > D0605 16:17:01.721333 8320 process.py:455] ENV is: \{'HOME': > '/var/lib/mesos/slaves/65c1f16f-4292-464f-954c-3471fafdc988-S0/frameworks/b6da6477-5047-4d02-a323-101dd2e6d8b6-/executors/thermos-www-data-devel-hello_world-0-1d35ab50-a8eb-42da-a4f0-1de0bfbcd8fe/runs/b1d09fc1-ff87-4ec3-9341-85b21040f304/sandbox', > 'LOGNAME': 'www-data', 'USER': 'www-data', 'PATH': > '/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin'} > D0605 16:17:01.721896 8320 process.py:155] [process: 8320=hello_world]: Error > trying to execute hello_world: {color:#FF}[Errno 13] Permission > denied{color}: > '/var/lib/mesos/slaves/65c1f16f-4292-464f-954c-3471fafdc988-S0/frameworks/b6da6477-5047-4d02-a323-101dd2e6d8b6-/executors/thermos-www-data-devel-hello_world-0-1d35ab50-a8eb-42da-a4f0-1de0bfbcd8fe/runs/b1d09fc1-ff87-4ec3-9341-85b21040f304/sandbox' > D0605 16:17:01.722104 8320 process.py:155] [process: 8320=hello_world]: > Coordinator exiting. > > > +_*The below is /etc/aurora/clusters.json:*_+ > [root@cloudpoc3 ~]# more /etc/aurora/clusters.json > [ > { > "auth_mechanism": "UNAUTHENTICATED", > "name": "devcluster", > "scheduler_zk_path": "/aurora/scheduler", > "slave_root": "/var/lib/mesos", > "slave_run_directory": "latest", > "zk": "127.0.0.1" > } > ] > +_*The below is hello_world.aurora file:*_+ > pkg_path = '/opt/aurora_test/hello_world.py' > # we use a trick here to make the configuration change with > # the contents of the file, for simplicity. in a normal setting, packages > would be > # versioned, and the version number would be changed in the configuration. > import hashlib > with open(pkg_path, 'rb') as f: > pkg_checksum = hashlib.md5(f.read()).hexdigest() > # copy hello_world.py into the local sandbox > install = Process( > name = 'fetch_package', > cmdline = 'cp %s . && echo %s && chmod +x hello_world.py' % (pkg_path, > pkg_checksum)) > # run the script > # cmdline = 'python -u hello_world.py' > {color:#FF}hello_world = Process({color} > {color:#FF} name = 'hello_world',{color} > {color:#FF} cmdline = 'echo "gang---hello aurora---";'){color} > # describe the task > hello_world_task = SequentialTask( > processes = [hello_world], > resources = Resources(cpu = 2, ram = 4096*MB, disk=4096*MB)) > jobs = [ > Service(cluster = 'devcluster', > environment = 'devel', > role = 'www-data', > name = 'hello_world', > task = hello_world_task) > ] > +_*From Mesos WebUI, it seems normal:*_+ > Please see attached screen1.jpg > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (AURORA-1982) Add support for using Mesos fetcher from Aurora DSL
[ https://issues.apache.org/jira/browse/AURORA-1982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463105#comment-16463105 ] Renan DelValle edited comment on AURORA-1982 at 5/3/18 9:48 PM: [https://reviews.apache.org/r/66537/] by Steve Salevan was (Author: rdelvalle): [https://reviews.apache.org/r/66537/] > Add support for using Mesos fetcher from Aurora DSL > --- > > Key: AURORA-1982 > URL: https://issues.apache.org/jira/browse/AURORA-1982 > Project: Aurora > Issue Type: Sub-task > Components: Client >Reporter: Renan DelValle >Priority: Major > > The Aurora Scheduler supports fetching artifacts using the Mesos Fetcher. > However, there is currently no way to allow users to specify which artifacts > should be downloaded onto the sandbox. Mimicking this feature is possible in > Thermos but custom executors may lack this ability. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AURORA-1982) Add support for using Mesos fetcher from Aurora DSL
[ https://issues.apache.org/jira/browse/AURORA-1982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463105#comment-16463105 ] Renan DelValle commented on AURORA-1982: [https://reviews.apache.org/r/66537/] > Add support for using Mesos fetcher from Aurora DSL > --- > > Key: AURORA-1982 > URL: https://issues.apache.org/jira/browse/AURORA-1982 > Project: Aurora > Issue Type: Sub-task > Components: Client >Reporter: Renan DelValle >Priority: Major > > The Aurora Scheduler supports fetching artifacts using the Mesos Fetcher. > However, there is currently no way to allow users to specify which artifacts > should be downloaded onto the sandbox. Mimicking this feature is possible in > Thermos but custom executors may lack this ability. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AURORA-1983) Support for Docker Volume Isolator
[ https://issues.apache.org/jira/browse/AURORA-1983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16424709#comment-16424709 ] Renan DelValle commented on AURORA-1983: Thanks for the patch Justin! Would it be possible for you to submit this patch via ReviewBoard? You can find instructions to do this here: http://aurora.apache.org/documentation/latest/contributing/ > Support for Docker Volume Isolator > -- > > Key: AURORA-1983 > URL: https://issues.apache.org/jira/browse/AURORA-1983 > Project: Aurora > Issue Type: Story >Reporter: Justin Venus >Priority: Minor > > It would be really useful to support > [docker/volume|http://mesos.apache.org/documentation/latest/isolators/docker-volume/] > isolation in Aurora. This would allow for example ... operators in AWS to > be able to easily attach EBS volumes to their containers. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AURORA-1467) Replace org.apache.aurora.common.args with a standard third-party library
[ https://issues.apache.org/jira/browse/AURORA-1467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Renan DelValle updated AURORA-1467: --- Fix Version/s: 0.19.1 > Replace org.apache.aurora.common.args with a standard third-party library > - > > Key: AURORA-1467 > URL: https://issues.apache.org/jira/browse/AURORA-1467 > Project: Aurora > Issue Type: Task > Components: Scheduler >Reporter: Bill Farner >Assignee: Bill Farner >Priority: Major > Labels: newbie > Fix For: 0.19.1 > > > Our args parsing/processing system was inherited from Twitter Commons and > should be considered for replacement. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AURORA-1825) Enable async logging by default
[ https://issues.apache.org/jira/browse/AURORA-1825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416329#comment-16416329 ] Renan DelValle commented on AURORA-1825: [~jingc] I see that you closed this issue as Done, can you provide a link to the review as well as the version this landed in? Thanks! > Enable async logging by default > --- > > Key: AURORA-1825 > URL: https://issues.apache.org/jira/browse/AURORA-1825 > Project: Aurora > Issue Type: Task >Reporter: Zameer Manji >Assignee: Jing Chen >Priority: Minor > > Based on my experience while working on AURORA-1823 and [~StephanErb]'s work > on logging recently, I think it would be best if we enabled async logging. > For example if one attempts to parallelize the work inside > {{StateManagerImpl}} there isn't much benefit because all of the state > transitions are logged and all of the threads would contend for the lock. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AURORA-1981) Add support for choosing task Executor using Aurora DSL
[ https://issues.apache.org/jira/browse/AURORA-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Renan DelValle updated AURORA-1981: --- Fix Version/s: (was: 0.21.0) 0.20.0 > Add support for choosing task Executor using Aurora DSL > --- > > Key: AURORA-1981 > URL: https://issues.apache.org/jira/browse/AURORA-1981 > Project: Aurora > Issue Type: Sub-task > Components: Client >Reporter: Renan DelValle >Assignee: Renan DelValle >Priority: Major > Fix For: 0.20.0 > > > The Aurora scheduler supports launching tasks using custom executors. > However, there is currently no way to change the executor used for launching > a Job's tasks using the Aurora DSL. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AURORA-1974) Update sample Docker jobs for Vagrant tutorial
[ https://issues.apache.org/jira/browse/AURORA-1974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Renan DelValle resolved AURORA-1974. Resolution: Fixed Fix Version/s: 0.20.0 Fixed by https://github.com/apache/aurora/commit/b6e898b5e9f70b13db42db366b6d98c5baadcb57 > Update sample Docker jobs for Vagrant tutorial > -- > > Key: AURORA-1974 > URL: https://issues.apache.org/jira/browse/AURORA-1974 > Project: Aurora > Issue Type: Task > Components: Docker, Documentation >Affects Versions: 0.19.1 >Reporter: Mathias Sulser >Assignee: Renan DelValle >Priority: Trivial > Fix For: 0.20.0 > > > h2. Problem > As discussed with [~rdelvalle] on Slack, I am filing what is likely a > regression caused by the recent Vagrant upgrade in > [https://github.com/apache/aurora/commit/c52137e20bd2863234dc09116e1339364ffed77a] > As of now, submitting any jobs in > {{examples/jobs/hello_docker_engine.aurora}} or > {{examples/jobs/hello_docker_image.aurora}} will fail due to the following > error: > {code:java} > Traceback (most recent call last): > File "apache/aurora/executor/bin/thermos_executor_main.py", line 47, in > > from mesos.executor import MesosExecutorDriver > File > "/root/.pex/install/mesos.executor-1.4.0-py2.7-linux-x86_64.egg.bf19bd50eea04a23374924ed382340b7a2557be3/mesos.executor-1.4.0-py2.7-linux-x86_64.egg/mesos/executor/_init_.py", > line 17, in > from ._executor import MesosExecutorDriverImpl as MesosExecutorDriver > ImportError: /usr/lib/x86_64-linux-gnu/libstdc++.so.6: version > `GLIBCXX_3.4.21' not found (required by > /root/.pex/install/mesos.executor-1.4.0-py2.7-linux-x86_64.egg.bf19bd50eea04a23374924ed382340b7a2557be3/mesos.executor-1.4.0-py2.7-linux-x86_64.egg/mesos/executor/_executor.so) > {code} > h2. Solution > Changing the docker image from {{python:2.7}} to {{python:2.7-slim-stretch}} > will fix this. > > Hat-tip to [~rdelvalle] for figuring this out so quickly (y) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AURORA-1974) Update sample Docker jobs for Vagrant tutorial
[ https://issues.apache.org/jira/browse/AURORA-1974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Renan DelValle reassigned AURORA-1974: -- Assignee: Renan DelValle > Update sample Docker jobs for Vagrant tutorial > -- > > Key: AURORA-1974 > URL: https://issues.apache.org/jira/browse/AURORA-1974 > Project: Aurora > Issue Type: Task > Components: Docker, Documentation >Affects Versions: 0.19.1 >Reporter: Mathias Sulser >Assignee: Renan DelValle >Priority: Trivial > > h2. Problem > As discussed with [~rdelvalle] on Slack, I am filing what is likely a > regression caused by the recent Vagrant upgrade in > [https://github.com/apache/aurora/commit/c52137e20bd2863234dc09116e1339364ffed77a] > As of now, submitting any jobs in > {{examples/jobs/hello_docker_engine.aurora}} or > {{examples/jobs/hello_docker_image.aurora}} will fail due to the following > error: > {code:java} > Traceback (most recent call last): > File "apache/aurora/executor/bin/thermos_executor_main.py", line 47, in > > from mesos.executor import MesosExecutorDriver > File > "/root/.pex/install/mesos.executor-1.4.0-py2.7-linux-x86_64.egg.bf19bd50eea04a23374924ed382340b7a2557be3/mesos.executor-1.4.0-py2.7-linux-x86_64.egg/mesos/executor/_init_.py", > line 17, in > from ._executor import MesosExecutorDriverImpl as MesosExecutorDriver > ImportError: /usr/lib/x86_64-linux-gnu/libstdc++.so.6: version > `GLIBCXX_3.4.21' not found (required by > /root/.pex/install/mesos.executor-1.4.0-py2.7-linux-x86_64.egg.bf19bd50eea04a23374924ed382340b7a2557be3/mesos.executor-1.4.0-py2.7-linux-x86_64.egg/mesos/executor/_executor.so) > {code} > h2. Solution > Changing the docker image from {{python:2.7}} to {{python:2.7-slim-stretch}} > will fix this. > > Hat-tip to [~rdelvalle] for figuring this out so quickly (y) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AURORA-1981) Add support for choosing task Executor using Aurora DSL
[ https://issues.apache.org/jira/browse/AURORA-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Renan DelValle resolved AURORA-1981. Resolution: Implemented Fix Version/s: 0.21.0 https://reviews.apache.org/r/66154/ > Add support for choosing task Executor using Aurora DSL > --- > > Key: AURORA-1981 > URL: https://issues.apache.org/jira/browse/AURORA-1981 > Project: Aurora > Issue Type: Sub-task > Components: Client >Reporter: Renan DelValle >Assignee: Renan DelValle >Priority: Major > Fix For: 0.21.0 > > > The Aurora scheduler supports launching tasks using custom executors. > However, there is currently no way to change the executor used for launching > a Job's tasks using the Aurora DSL. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AURORA-1981) Add support for choosing task Executor using Aurora DSL
[ https://issues.apache.org/jira/browse/AURORA-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Renan DelValle updated AURORA-1981: --- Issue Type: Sub-task (was: Task) Parent: AURORA-1744 > Add support for choosing task Executor using Aurora DSL > --- > > Key: AURORA-1981 > URL: https://issues.apache.org/jira/browse/AURORA-1981 > Project: Aurora > Issue Type: Sub-task > Components: Client >Reporter: Renan DelValle >Assignee: Renan DelValle >Priority: Major > > The Aurora scheduler supports launching tasks using custom executors. > However, there is currently no way to change the executor used for launching > a Job's tasks using the Aurora DSL. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AURORA-1982) Add support for using Mesos fetcher from Aurora DSL
[ https://issues.apache.org/jira/browse/AURORA-1982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Renan DelValle updated AURORA-1982: --- Issue Type: Sub-task (was: Task) Parent: AURORA-1744 > Add support for using Mesos fetcher from Aurora DSL > --- > > Key: AURORA-1982 > URL: https://issues.apache.org/jira/browse/AURORA-1982 > Project: Aurora > Issue Type: Sub-task > Components: Client >Reporter: Renan DelValle >Assignee: Renan DelValle >Priority: Major > > The Aurora Scheduler supports fetching artifacts using the Mesos Fetcher. > However, there is currently no way to allow users to specify which artifacts > should be downloaded onto the sandbox. Mimicking this feature is possible in > Thermos but custom executors may lack this ability. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AURORA-1744) Add end to end testing for custom executors
[ https://issues.apache.org/jira/browse/AURORA-1744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Renan DelValle updated AURORA-1744: --- Issue Type: Story (was: Task) Summary: Add end to end testing for custom executors (was: Add end to end testing for multiple executors) > Add end to end testing for custom executors > --- > > Key: AURORA-1744 > URL: https://issues.apache.org/jira/browse/AURORA-1744 > Project: Aurora > Issue Type: Story > Components: Testing >Reporter: Renan DelValle >Assignee: Renan DelValle >Priority: Major > > Now that Aurora is capable of using multiple executors on a single scheduler, > it would be beneficial to add end to end testing for this feature. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AURORA-1982) Add support for using Mesos fetcher from Aurora DSL
Renan DelValle created AURORA-1982: -- Summary: Add support for using Mesos fetcher from Aurora DSL Key: AURORA-1982 URL: https://issues.apache.org/jira/browse/AURORA-1982 Project: Aurora Issue Type: Task Components: Client Reporter: Renan DelValle Assignee: Renan DelValle The Aurora Scheduler supports fetching artifacts using the Mesos Fetcher. However, there is currently no way to allow users to specify which artifacts should be downloaded onto the sandbox. Mimicking this feature is possible in Thermos but custom executors may lack this ability. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AURORA-1980) Integration tests fail with a pants exception: File name too long
[ https://issues.apache.org/jira/browse/AURORA-1980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16397708#comment-16397708 ] Renan DelValle commented on AURORA-1980: Tried {noformat} ./pants --pants-workdir=/tmp/pawd{noformat} Got the following error: {noformat} Pants working directory should end with '.pants.d', currently it is /tmp/pawd{noformat} So then I tried: {noformat} ./pants --pants-workdir=/tmp/pawd.pants.d {noformat} which results in: {noformat} FAILURE Exception caught: () Exception message: Spec has un-normalized path part '../../../../tmp/pawd.pants.d/gen/thrift-py/252d64521cf9/api.src.main.thrift.org.apache.aurora.gen._test/current'{noformat} Changing the pants version did work though! > Integration tests fail with a pants exception: File name too long > - > > Key: AURORA-1980 > URL: https://issues.apache.org/jira/browse/AURORA-1980 > Project: Aurora > Issue Type: Bug >Reporter: Renan DelValle >Priority: Major > > When running the integration tests the following error happens: > {noformat} > Executing tasks in goals: gen -> pyprep -> test > 17:13:42 00:01 [gen] > 17:13:42 00:01 [thrift-py] > 17:13:42 00:01 [cache] > No cached artifacts for 4 targets. > Invalidated 4 targets. > 17:13:42 00:01 [pyprep] > 17:13:42 00:01 [interpreter] > 17:13:46 00:05 [requirements] > 17:13:46 00:05 [cache] > No cached artifacts for 37 targets. > Invalidated 37 targets. > 17:14:06 00:25 [sources] > Waiting for background workers to finish. > 17:14:06 00:25 [complete] > FAILURE > Exception caught: () > > Exception message: [Errno 36] File name too long: > u'/home//aurora/.pants.d/build_invalidator/7/pants_backend_python_tasks2_gather_sources_GatherSources/.pants.d.gen.thrift-py.252d64521cf9.api.src.main.thrift.org.apache.aurora.gen._storage.current.api.src.main.thrift.org.apache.aurora.gen._storage.hash'{noformat} > > Where `` is longer than than five characters causing a violation of > the 255 character filename limit in Linux. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AURORA-1980) Integration tests fail with a pants exception: File name too long
[ https://issues.apache.org/jira/browse/AURORA-1980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Renan DelValle updated AURORA-1980: --- Description: When running the integration tests the following error happens: {noformat} Executing tasks in goals: gen -> pyprep -> test 17:13:42 00:01 [gen] 17:13:42 00:01 [thrift-py] 17:13:42 00:01 [cache] No cached artifacts for 4 targets. Invalidated 4 targets. 17:13:42 00:01 [pyprep] 17:13:42 00:01 [interpreter] 17:13:46 00:05 [requirements] 17:13:46 00:05 [cache] No cached artifacts for 37 targets. Invalidated 37 targets. 17:14:06 00:25 [sources] Waiting for background workers to finish. 17:14:06 00:25 [complete] FAILURE Exception caught: () Exception message: [Errno 36] File name too long: u'/home//aurora/.pants.d/build_invalidator/7/pants_backend_python_tasks2_gather_sources_GatherSources/.pants.d.gen.thrift-py.252d64521cf9.api.src.main.thrift.org.apache.aurora.gen._storage.current.api.src.main.thrift.org.apache.aurora.gen._storage.hash'{noformat} Where `` is longer than than five characters causing a violation of the 255 character filename limit in Linux. was: When running the integration tests the following error happens: {noformat} Executing tasks in goals: gen -> pyprep -> test 17:13:42 00:01 [gen] 17:13:42 00:01 [thrift-py] 17:13:42 00:01 [cache] No cached artifacts for 4 targets. Invalidated 4 targets. 17:13:42 00:01 [pyprep] 17:13:42 00:01 [interpreter] 17:13:46 00:05 [requirements] 17:13:46 00:05 [cache] No cached artifacts for 37 targets. Invalidated 37 targets. 17:14:06 00:25 [sources] Waiting for background workers to finish. 17:14:06 00:25 [complete] FAILURE Exception caught: () Exception message: [Errno 36] File name too long: u'/home//aurora/.pants.d/build_invalidator/7/pants_backend_python_tasks2_gather_sources_GatherSources/.pants.d.gen.thrift-py.252d64521cf9.api.src.main.thrift.org.apache.aurora.gen._storage.current.api.src.main.thrift.org.apache.aurora.gen._storage.hash'{noformat} > Integration tests fail with a pants exception: File name too long > - > > Key: AURORA-1980 > URL: https://issues.apache.org/jira/browse/AURORA-1980 > Project: Aurora > Issue Type: Bug >Reporter: Renan DelValle >Priority: Major > > When running the integration tests the following error happens: > {noformat} > Executing tasks in goals: gen -> pyprep -> test > 17:13:42 00:01 [gen] > 17:13:42 00:01 [thrift-py] > 17:13:42 00:01 [cache] > No cached artifacts for 4 targets. > Invalidated 4 targets. > 17:13:42 00:01 [pyprep] > 17:13:42 00:01 [interpreter] > 17:13:46 00:05 [requirements] > 17:13:46 00:05 [cache] > No cached artifacts for 37 targets. > Invalidated 37 targets. > 17:14:06 00:25 [sources] > Waiting for background workers to finish. > 17:14:06 00:25 [complete] > FAILURE > Exception caught: () > > Exception message: [Errno 36] File name too long: > u'/home//aurora/.pants.d/build_invalidator/7/pants_backend_python_tasks2_gather_sources_GatherSources/.pants.d.gen.thrift-py.252d64521cf9.api.src.main.thrift.org.apache.aurora.gen._storage.current.api.src.main.thrift.org.apache.aurora.gen._storage.hash'{noformat} > > Where `` is longer than than five characters causing a violation of > the 255 character filename limit in Linux. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AURORA-1980) Integration tests fail with a pants exception: File name too long
[ https://issues.apache.org/jira/browse/AURORA-1980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Renan DelValle updated AURORA-1980: --- Description: When running the integration tests the following error happens: {noformat} Executing tasks in goals: gen -> pyprep -> test 17:13:42 00:01 [gen] 17:13:42 00:01 [thrift-py] 17:13:42 00:01 [cache] No cached artifacts for 4 targets. Invalidated 4 targets. 17:13:42 00:01 [pyprep] 17:13:42 00:01 [interpreter] 17:13:46 00:05 [requirements] 17:13:46 00:05 [cache] No cached artifacts for 37 targets. Invalidated 37 targets. 17:14:06 00:25 [sources] Waiting for background workers to finish. 17:14:06 00:25 [complete] FAILURE Exception caught: () Exception message: [Errno 36] File name too long: u'/home//aurora/.pants.d/build_invalidator/7/pants_backend_python_tasks2_gather_sources_GatherSources/.pants.d.gen.thrift-py.252d64521cf9.api.src.main.thrift.org.apache.aurora.gen._storage.current.api.src.main.thrift.org.apache.aurora.gen._storage.hash'{noformat} was: When running the integration tests the following error happens: {noformat} Executing tasks in goals: gen -> pyprep -> test 17:13:42 00:01 [gen] 17:13:42 00:01 [thrift-py] 17:13:42 00:01 [cache] No cached artifacts for 4 targets. Invalidated 4 targets. 17:13:42 00:01 [pyprep] 17:13:42 00:01 [interpreter] 17:13:46 00:05 [requirements] 17:13:46 00:05 [cache] No cached artifacts for 37 targets. Invalidated 37 targets. 17:14:06 00:25 [sources] Waiting for background workers to finish. 17:14:06 00:25 [complete] FAILURE Exception caught: () Exception message: [Errno 36] File name too long: u'/home/user/aurora/.pants.d/build_invalidator/7/pants_backend_python_tasks2_gather_sources_GatherSources/.pants.d.gen.thrift-py.252d64521cf9.api.src.main.thrift.org.apache.aurora.gen._storage.current.api.src.main.thrift.org.apache.aurora.gen._storage.hash'{noformat} > Integration tests fail with a pants exception: File name too long > - > > Key: AURORA-1980 > URL: https://issues.apache.org/jira/browse/AURORA-1980 > Project: Aurora > Issue Type: Bug >Reporter: Renan DelValle >Priority: Major > > When running the integration tests the following error happens: > {noformat} > Executing tasks in goals: gen -> pyprep -> test > 17:13:42 00:01 [gen] > 17:13:42 00:01 [thrift-py] > 17:13:42 00:01 [cache] > No cached artifacts for 4 targets. > Invalidated 4 targets. > 17:13:42 00:01 [pyprep] > 17:13:42 00:01 [interpreter] > 17:13:46 00:05 [requirements] > 17:13:46 00:05 [cache] > No cached artifacts for 37 targets. > Invalidated 37 targets. > 17:14:06 00:25 [sources] > Waiting for background workers to finish. > 17:14:06 00:25 [complete] > FAILURE > Exception caught: () > > Exception message: [Errno 36] File name too long: > u'/home//aurora/.pants.d/build_invalidator/7/pants_backend_python_tasks2_gather_sources_GatherSources/.pants.d.gen.thrift-py.252d64521cf9.api.src.main.thrift.org.apache.aurora.gen._storage.current.api.src.main.thrift.org.apache.aurora.gen._storage.hash'{noformat} > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AURORA-1980) Pants exception: File name too long
Renan DelValle created AURORA-1980: -- Summary: Pants exception: File name too long Key: AURORA-1980 URL: https://issues.apache.org/jira/browse/AURORA-1980 Project: Aurora Issue Type: Bug Reporter: Renan DelValle When running the integration tests the following error happens: {noformat} Executing tasks in goals: gen -> pyprep -> test 17:13:42 00:01 [gen] 17:13:42 00:01 [thrift-py] 17:13:42 00:01 [cache] No cached artifacts for 4 targets. Invalidated 4 targets. 17:13:42 00:01 [pyprep] 17:13:42 00:01 [interpreter] 17:13:46 00:05 [requirements] 17:13:46 00:05 [cache] No cached artifacts for 37 targets. Invalidated 37 targets. 17:14:06 00:25 [sources] Waiting for background workers to finish. 17:14:06 00:25 [complete] FAILURE Exception caught: () Exception message: [Errno 36] File name too long: u'/home/user/aurora/.pants.d/build_invalidator/7/pants_backend_python_tasks2_gather_sources_GatherSources/.pants.d.gen.thrift-py.252d64521cf9.api.src.main.thrift.org.apache.aurora.gen._storage.current.api.src.main.thrift.org.apache.aurora.gen._storage.hash'{noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AURORA-1980) Integration tests fail with a pants exception: File name too long
[ https://issues.apache.org/jira/browse/AURORA-1980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Renan DelValle updated AURORA-1980: --- Summary: Integration tests fail with a pants exception: File name too long (was: Pants exception: File name too long) > Integration tests fail with a pants exception: File name too long > - > > Key: AURORA-1980 > URL: https://issues.apache.org/jira/browse/AURORA-1980 > Project: Aurora > Issue Type: Bug >Reporter: Renan DelValle >Priority: Major > > When running the integration tests the following error happens: > {noformat} > Executing tasks in goals: gen -> pyprep -> test > 17:13:42 00:01 [gen] > 17:13:42 00:01 [thrift-py] > 17:13:42 00:01 [cache] > No cached artifacts for 4 targets. > Invalidated 4 targets. > 17:13:42 00:01 [pyprep] > 17:13:42 00:01 [interpreter] > 17:13:46 00:05 [requirements] > 17:13:46 00:05 [cache] > No cached artifacts for 37 targets. > Invalidated 37 targets. > 17:14:06 00:25 [sources] > Waiting for background workers to finish. > 17:14:06 00:25 [complete] > FAILURE > Exception caught: () > > Exception message: [Errno 36] File name too long: > u'/home/user/aurora/.pants.d/build_invalidator/7/pants_backend_python_tasks2_gather_sources_GatherSources/.pants.d.gen.thrift-py.252d64521cf9.api.src.main.thrift.org.apache.aurora.gen._storage.current.api.src.main.thrift.org.apache.aurora.gen._storage.hash'{noformat} > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (AURORA-1734) Configurable Metadata prefix
[ https://issues.apache.org/jira/browse/AURORA-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Renan DelValle closed AURORA-1734. -- Resolution: Won't Do > Configurable Metadata prefix > > > Key: AURORA-1734 > URL: https://issues.apache.org/jira/browse/AURORA-1734 > Project: Aurora > Issue Type: Task > Components: Scheduler >Reporter: Renan DelValle >Priority: Trivial > > Currently, a prefix ("org.apache.aurora.metadata.") is injected into the > metadata key in the scheduler. It would be beneficial to allow users to set > their own metadata prefix (including an empty string). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AURORA-1467) Replace org.apache.aurora.common.args with a standard third-party library
[ https://issues.apache.org/jira/browse/AURORA-1467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Renan DelValle reassigned AURORA-1467: -- Assignee: Bill Farner > Replace org.apache.aurora.common.args with a standard third-party library > - > > Key: AURORA-1467 > URL: https://issues.apache.org/jira/browse/AURORA-1467 > Project: Aurora > Issue Type: Task > Components: Scheduler >Reporter: Bill Farner >Assignee: Bill Farner >Priority: Major > Labels: newbie > > Our args parsing/processing system was inherited from Twitter Commons and > should be considered for replacement. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AURORA-1467) Replace org.apache.aurora.common.args with a standard third-party library
[ https://issues.apache.org/jira/browse/AURORA-1467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Renan DelValle reassigned AURORA-1467: -- Assignee: (was: Bill Farner) > Replace org.apache.aurora.common.args with a standard third-party library > - > > Key: AURORA-1467 > URL: https://issues.apache.org/jira/browse/AURORA-1467 > Project: Aurora > Issue Type: Task > Components: Scheduler >Reporter: Bill Farner >Priority: Major > Labels: newbie > > Our args parsing/processing system was inherited from Twitter Commons and > should be considered for replacement. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AURORA-1467) Replace org.apache.aurora.common.args with a standard third-party library
[ https://issues.apache.org/jira/browse/AURORA-1467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16383014#comment-16383014 ] Renan DelValle commented on AURORA-1467: [~wfarner], given that we have moved on to use JCommander should we close this ticket? > Replace org.apache.aurora.common.args with a standard third-party library > - > > Key: AURORA-1467 > URL: https://issues.apache.org/jira/browse/AURORA-1467 > Project: Aurora > Issue Type: Task > Components: Scheduler >Reporter: Bill Farner >Priority: Major > Labels: newbie > > Our args parsing/processing system was inherited from Twitter Commons and > should be considered for replacement. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AURORA-1966) TASK_UNKNOWN to PARTITIONED mapping puts Scheduler to kill non-exist Task indefinitely
[ https://issues.apache.org/jira/browse/AURORA-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16383006#comment-16383006 ] Renan DelValle commented on AURORA-1966: [~davmclau] should we close this since the patch was committed? > TASK_UNKNOWN to PARTITIONED mapping puts Scheduler to kill non-exist Task > indefinitely > -- > > Key: AURORA-1966 > URL: https://issues.apache.org/jira/browse/AURORA-1966 > Project: Aurora > Issue Type: Bug >Reporter: Santhosh Kumar Shanmugham >Assignee: David McLaughlin >Priority: Major > > When a Task launch fails, it is moved from ASSIGNED to LOST, which performs a > RESCHEDULE and KILL. Unfortunately the KILL of a non-existent task to the > Mesos master results in a TASK_UNKNOWN status update, which gets mapped to > PARTITIONED. While the transition from LOST to PARTITIONED is not allowed, > some callbacks get executed despite the fact, resulting in a KILL and > RESCHEDULE action. This new KILL triggers another TASK_UNKNOWN and hence > PARTITIONED status update for the same task, putting the Scheduler to > indefinitely attempt KILLing the non-existent task. Attempting a client job > killall results in the same state for the scheduler. > Since the scheduler uses the LOST state for black-holing task the > {{TaskStateMachine}} needs to take those into account. > I was able to reproduce this in the Vagrant image by faking a launch failure. > {code:java} > I0124 05:48:23.198 [qtp1791010542-40, StateMachine] > vagrant-test-fail-partition_aware_disabled-0-07bec0cb-d6a3-4caa-9b6e-60e6d0934606 > state machine transition INIT -> PENDING I0124 05:48:23.213508 9748 > log.cpp:560] Attempting to append 1679 bytes to the log I0124 05:48:23.214570 > 9748 coordinator.cpp:348] Coordinator attempting to write APPEND action at > position 24778 I0124 05:48:23.214834 9748 replica.cpp:540] Replica received > write request for position 24778 from __req_res__(4)@192.168.33.7:8083 I0124 > 05:48:23.221982 9748 leveldb.cpp:341] Persisting action (1700 bytes) to > leveldb took 6.772102ms I0124 05:48:23.222174 9748 replica.cpp:711] Persisted > action APPEND at position 24778 I0124 05:48:23.222901 9748 replica.cpp:694] > Replica received learned notice for position 24778 from > log-network(1)@192.168.33.7:8083 I0124 05:48:23.226833 9748 leveldb.cpp:341] > Persisting action (1702 bytes) to leveldb took 3.227779ms I0124 > 05:48:23.227008 9748 replica.cpp:711] Persisted action APPEND at position > 24778 I0124 05:48:23.262 [qtp1791010542-40, RequestLog] 127.0.0.1 - - > [24/Jan/2018:05:48:23 +] "POST //aurora.local/api HTTP/1.1" 200 78 I0124 > 05:48:23.267 [qtp1791010542-40, LoggingInterceptor] > getTasksWithoutConfigs(TaskQuery(role:null, environment:null, jobName:null, > taskIds:null, statuses:null, instanceIds:null, slaveHosts:null, > jobKeys:[JobKey(role:vagrant, environment:test, > name:fail-partition_aware_disabled)], offset:0, limit:0)) I0124 05:48:23.285 > [qtp1791010542-40, RequestLog] 127.0.0.1 - - [24/Jan/2018:05:48:23 +] > "POST //aurora.local/api HTTP/1.1" 200 794 I0124 05:48:23.349 > [TaskGroupBatchWorker, StateMachine] Callback transition PENDING to ASSIGNED, > allow: true I0124 05:48:23.353 [TaskGroupBatchWorker, StateMachine] > vagrant-test-fail-partition_aware_disabled-0-07bec0cb-d6a3-4caa-9b6e-60e6d0934606 > state machine transition PENDING -> ASSIGNED I0124 05:48:23.356 > [TaskGroupBatchWorker, TaskAssignerImpl] Offer on agent 192.168.33.7 (id > fe8bc641-aa02-4363-a990-318d20de1bac-S0) is being assigned task for > vagrant-test-fail-partition_aware_disabled-0-07bec0cb-d6a3-4caa-9b6e-60e6d0934606. > W0124 05:48:23.445 [TaskGroupBatchWorker, TaskAssignerImpl] Failed to launch > task. org.apache.aurora.scheduler.offers.OfferManager$LaunchException: Failed > to launch task. at > org.apache.aurora.scheduler.offers.OfferManagerImpl.launchTask(OfferManagerImpl.java:212) > at > org.apache.aurora.common.inject.TimedInterceptor.invoke(TimedInterceptor.java:83) > at > org.apache.aurora.scheduler.scheduling.TaskAssignerImpl.launchUsingOffer(TaskAssignerImpl.java:126) > at > org.apache.aurora.scheduler.scheduling.TaskAssignerImpl.maybeAssign(TaskAssignerImpl.java:262) > at > org.apache.aurora.common.inject.TimedInterceptor.invoke(TimedInterceptor.java:83) > at > org.apache.aurora.scheduler.scheduling.TaskSchedulerImpl.scheduleTasks(TaskSchedulerImpl.java:154) > at > org.apache.aurora.scheduler.scheduling.TaskSchedulerImpl.schedule(TaskSchedulerImpl.java:108) > at > org.apache.aurora.common.inject.TimedInterceptor.invoke(TimedInterceptor.java:83) > at > org.apache.aurora.scheduler.scheduling.TaskGroups$1.lambda$run$0(TaskGroups.java:174) > at
[jira] [Commented] (AURORA-1973) Documentation issue in installation docs
[ https://issues.apache.org/jira/browse/AURORA-1973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16371767#comment-16371767 ] Renan DelValle commented on AURORA-1973: Hi [~tokuhirom], you are correct, this should be aurora-scheduler. Do you mind sending in a patch? [http://aurora.apache.org/documentation/latest/contributing/] We would really appreciate it! > Documentation issue in installation docs > > > Key: AURORA-1973 > URL: https://issues.apache.org/jira/browse/AURORA-1973 > Project: Aurora > Issue Type: Bug >Reporter: Tokuhiro Matsuno >Priority: Trivial > > In Installation docs, `sudo systemctl start aurora` was specified. But it's > incorrect. > It should be `sudo systemctl start aurora-scheduler` > https://github.com/apache/aurora/commit/537e052cf9bdd69b1454962d77bb90a3b7f8ebc4 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AURORA-1964) Move Vagrant setup from Trusty to Xenial
[ https://issues.apache.org/jira/browse/AURORA-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Renan DelValle resolved AURORA-1964. Resolution: Fixed Fix Version/s: 0.20.0 > Move Vagrant setup from Trusty to Xenial > > > Key: AURORA-1964 > URL: https://issues.apache.org/jira/browse/AURORA-1964 > Project: Aurora > Issue Type: Task >Reporter: Renan DelValle >Assignee: Renan DelValle >Priority: Major > Fix For: 0.20.0 > > > We're really behind the curve on this one as the next LTS will be released in > April. > The move is made difficult by the change in init systems between Trusty and > Xenial. > Furthermore, our recent upgrade to Thrift 0.10.0 has caused some issues with > our Packer set up as the deb packages for 0.10.0 are not in the correct > repository. Latest version in the repository is 0.9.3: > http://dl.bintray.com/apache/thrift/debian/dists/ > Making Packer fail at: > https://github.com/apache/aurora/blob/master/build-support/packer/build.sh#L118 > [~jfarrell] any chance you can help us unblock this by releasing official > packages? > Otherwise, we could compile the 0.10.0 from scratch in our packer process but > that might balloon the image size somewhat. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AURORA-1962) Incorrect parsing of empty strings into list command line options
[ https://issues.apache.org/jira/browse/AURORA-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Renan DelValle updated AURORA-1962: --- Fix Version/s: 0.19.1 > Incorrect parsing of empty strings into list command line options > - > > Key: AURORA-1962 > URL: https://issues.apache.org/jira/browse/AURORA-1962 > Project: Aurora > Issue Type: Bug > Components: Scheduler >Affects Versions: 0.19.0 >Reporter: Bill Farner >Assignee: Renan DelValle >Priority: Major > Fix For: 0.19.1 > > > When the scheduler parses a command line option like > {{-thermos_executor_resources=}}, which maps to {{List}}, the result > is equivalent to {{[""]}} (list of size 1 containing an empty string), while > we would expect {{[]}} (an empty list). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AURORA-1967) Move from FindBugs to SpotBugs
[ https://issues.apache.org/jira/browse/AURORA-1967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16345620#comment-16345620 ] Renan DelValle commented on AURORA-1967: Sorry, must have missed it! But this is the best turn around time I've had on any JIRA ticket I've ever filed. > Move from FindBugs to SpotBugs > -- > > Key: AURORA-1967 > URL: https://issues.apache.org/jira/browse/AURORA-1967 > Project: Aurora > Issue Type: Task >Reporter: Renan DelValle >Priority: Minor > > FindBugs project is dead: > [https://mailman.cs.umd.edu/pipermail/findbugs-discuss/2017-September/004383.html] > We should switch to it's successor, SpotBugs > ([https://spotbugs.github.io/|https://spotbugs.github.io/)] ) as soon as > possible to enjoy any enhancements that have been introduced since the > FindBugs 3.0.0 version. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AURORA-1967) Move from FindBugs to SpotBugs
Renan DelValle created AURORA-1967: -- Summary: Move from FindBugs to SpotBugs Key: AURORA-1967 URL: https://issues.apache.org/jira/browse/AURORA-1967 Project: Aurora Issue Type: Task Reporter: Renan DelValle FindBugs project is dead: [https://mailman.cs.umd.edu/pipermail/findbugs-discuss/2017-September/004383.html] We should switch to it's successor, SpotBugs ([https://spotbugs.github.io/|https://spotbugs.github.io/)] ) as soon as possible to enjoy any enhancements that have been introduced since the FindBugs 3.0.0 version. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AURORA-1964) Move Vagrant setup from Trusty to Xenial
[ https://issues.apache.org/jira/browse/AURORA-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16332534#comment-16332534 ] Renan DelValle commented on AURORA-1964: Works for me! Thanks for input guys, I'll see if I can't send a PR by the end of the day. > Move Vagrant setup from Trusty to Xenial > > > Key: AURORA-1964 > URL: https://issues.apache.org/jira/browse/AURORA-1964 > Project: Aurora > Issue Type: Task >Reporter: Renan DelValle >Assignee: Renan DelValle >Priority: Major > > We're really behind the curve on this one as the next LTS will be released in > April. > The move is made difficult by the change in init systems between Trusty and > Xenial. > Furthermore, our recent upgrade to Thrift 0.10.0 has caused some issues with > our Packer set up as the deb packages for 0.10.0 are not in the correct > repository. Latest version in the repository is 0.9.3: > http://dl.bintray.com/apache/thrift/debian/dists/ > Making Packer fail at: > https://github.com/apache/aurora/blob/master/build-support/packer/build.sh#L118 > [~jfarrell] any chance you can help us unblock this by releasing official > packages? > Otherwise, we could compile the 0.10.0 from scratch in our packer process but > that might balloon the image size somewhat. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AURORA-1964) Move Vagrant setup from Trusty to Xenial
Renan DelValle created AURORA-1964: -- Summary: Move Vagrant setup from Trusty to Xenial Key: AURORA-1964 URL: https://issues.apache.org/jira/browse/AURORA-1964 Project: Aurora Issue Type: Task Reporter: Renan DelValle Assignee: Renan DelValle We're really behind the curve on this one as the next LTS will be released in April. The move is made difficult by the change in init systems between Trusty and Xenial. Furthermore, our recent upgrade to Thrift 0.10.0 has caused some issues with our Packer set up as the deb packages for 0.10.0 are not in the correct repository. Latest version in the repository is 0.9.3: http://dl.bintray.com/apache/thrift/debian/dists/ Making Packer fail at: https://github.com/apache/aurora/blob/master/build-support/packer/build.sh#L118 [~jfarrell] any chance you can help us unblock this by releasing official packages? Otherwise, we could compile the 0.10.0 from scratch in our packer process but that might balloon the image size somewhat. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AURORA-1734) Configurable Metadata prefix
[ https://issues.apache.org/jira/browse/AURORA-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Renan DelValle reassigned AURORA-1734: -- Assignee: (was: Renan DelValle) Unassigning this from myself as I don't think it's worth the investment to change it any more. > Configurable Metadata prefix > > > Key: AURORA-1734 > URL: https://issues.apache.org/jira/browse/AURORA-1734 > Project: Aurora > Issue Type: Task > Components: Scheduler >Reporter: Renan DelValle >Priority: Trivial > > Currently, a prefix ("org.apache.aurora.metadata.") is injected into the > metadata key in the scheduler. It would be beneficial to allow users to set > their own metadata prefix (including an empty string). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (AURORA-1944) Aurora is unable to elect leader after losing ZK for an extended period of time
[ https://issues.apache.org/jira/browse/AURORA-1944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Renan DelValle updated AURORA-1944: --- Description: Using Apache Curator as the Zookeeper library causes an issue where Aurora is unable to elect a leader if Zookeeper loses quorum for an extended period of time. Scheduler seems to crash around: {{W0802 14:01:14.436 [TaskEventBatchWorker, SchedulerLifecycle] Failed to leave leadership: org.apache.aurora.common.zookeeper.SingletonService$LeaveException: Failed to abdicate leadership of group at /aurora/scheduler}} When the init system brings the scheduler back up, it is unable to elect a leader if ZK is still down. Specifically, the redirect monitor fails: {{E0802 14:09:37.063 [RedirectMonitor STARTING, GuavaUtils$LifecycleShutdownListener] Service: RedirectMonitor [FAILED] failed unexpectedly. Triggering shutdown.}} Leading to every scheduler showing the following: {{W0802 14:16:34.646 [qtp576711849-43, LeaderRedirect] No serviceGroupMonitor in host set, will not redirect despite not being leader.}} Once the scheduler enters this state, it is unable to snap out of it until it is manually restarted. was: Using Apache Curator as the Zookeeper library causes an issue where Aurora is unable to elect a leader if Zookeeper loses quorum for an extended period of time. Scheduler seems to crash around: {{W0802 14:01:14.436 [TaskEventBatchWorker, SchedulerLifecycle] Failed to leave leadership: org.apache.aurora.common.zookeeper.SingletonService$LeaveException: Failed to abdicate leadership of group at /aurora/scheduler }} When the init system brings the scheduler back up, it is unable to elect a leader if ZK is still down. Once the scheduler enters this state, it is unable to snap out of it until it is manually restarted. > Aurora is unable to elect leader after losing ZK for an extended period of > time > --- > > Key: AURORA-1944 > URL: https://issues.apache.org/jira/browse/AURORA-1944 > Project: Aurora > Issue Type: Bug > Components: Scheduler > Environment: Running on 0.17.0 >Reporter: Renan DelValle > Attachments: aurora-0.log, aurora-1.log, aurora-2.log > > > Using Apache Curator as the Zookeeper library causes an issue where Aurora is > unable to elect a leader if Zookeeper loses quorum for an extended period of > time. > Scheduler seems to crash around: > {{W0802 14:01:14.436 [TaskEventBatchWorker, SchedulerLifecycle] Failed to > leave leadership: > org.apache.aurora.common.zookeeper.SingletonService$LeaveException: Failed to > abdicate leadership of group at /aurora/scheduler}} > When the init system brings the scheduler back up, it is unable to elect a > leader if ZK is still down. > Specifically, the redirect monitor fails: > {{E0802 14:09:37.063 [RedirectMonitor STARTING, > GuavaUtils$LifecycleShutdownListener] Service: RedirectMonitor [FAILED] > failed unexpectedly. Triggering shutdown.}} > Leading to every scheduler showing the following: > {{W0802 14:16:34.646 [qtp576711849-43, LeaderRedirect] No serviceGroupMonitor > in host set, will not redirect despite not being leader.}} > Once the scheduler enters this state, it is unable to snap out of it until it > is manually restarted. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (AURORA-1944) Aurora is unable to elect leader after losing ZK for an extended period of time
[ https://issues.apache.org/jira/browse/AURORA-1944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Renan DelValle updated AURORA-1944: --- Attachment: aurora-1.log aurora-2.log Attaching logs for the other 2 schedulers. > Aurora is unable to elect leader after losing ZK for an extended period of > time > --- > > Key: AURORA-1944 > URL: https://issues.apache.org/jira/browse/AURORA-1944 > Project: Aurora > Issue Type: Bug > Components: Scheduler > Environment: Running on 0.17.0 >Reporter: Renan DelValle > Attachments: aurora-0.log, aurora-1.log, aurora-2.log > > > Using Apache Curator as the Zookeeper library causes an issue where Aurora is > unable to elect a leader if Zookeeper loses quorum for an extended period of > time. > Scheduler seems to crash around: > {{W0802 14:01:14.436 [TaskEventBatchWorker, SchedulerLifecycle] Failed to > leave leadership: > org.apache.aurora.common.zookeeper.SingletonService$LeaveException: Failed to > abdicate leadership of group at /aurora/scheduler }} > When the init system brings the scheduler back up, it is unable to elect a > leader if ZK is still down. > Once the scheduler enters this state, it is unable to snap out of it until it > is manually restarted. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (AURORA-1944) Aurora is unable to elect leader after losing ZK for an extended period of time
[ https://issues.apache.org/jira/browse/AURORA-1944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Renan DelValle updated AURORA-1944: --- Attachment: aurora-0.log Aurora scheduler 1 out of 3. > Aurora is unable to elect leader after losing ZK for an extended period of > time > --- > > Key: AURORA-1944 > URL: https://issues.apache.org/jira/browse/AURORA-1944 > Project: Aurora > Issue Type: Bug > Components: Scheduler > Environment: Running on 0.17.0 >Reporter: Renan DelValle > Attachments: aurora-0.log > > > Using Apache Curator as the Zookeeper library causes an issue where Aurora is > unable to elect a leader if Zookeeper loses quorum for an extended period of > time. > Scheduler seems to crash around: > {{W0802 14:01:14.436 [TaskEventBatchWorker, SchedulerLifecycle] Failed to > leave leadership: > org.apache.aurora.common.zookeeper.SingletonService$LeaveException: Failed to > abdicate leadership of group at /aurora/scheduler }} > When the init system brings the scheduler back up, it is unable to elect a > leader if ZK is still down. > Once the scheduler enters this state, it is unable to snap out of it until it > is manually restarted. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (AURORA-1942) Improve Aurora behavior with regards to Mesos Agents violating reregistration timeouts
Renan DelValle created AURORA-1942: -- Summary: Improve Aurora behavior with regards to Mesos Agents violating reregistration timeouts Key: AURORA-1942 URL: https://issues.apache.org/jira/browse/AURORA-1942 Project: Aurora Issue Type: Task Components: Scheduler Reporter: Renan DelValle A Mesos Agent Lost message can be received in two scenarios resulting in different outcomes: 1) A Mesos Agent can fail the health check done by the Mesos Master (max_agent_ping_timeouts violation) which leads to an Agent Lost message along with TASK_LOST messages for each task running on the unhealthy Agent. 2) A Mesos Agent can fail to re-register after an election has taken place (agent_reregister_timeout violation). In this situation the newly elected Mesos master, because Master's do not store any information concerning the tasks that are currently running, is unable to send a TASK_LOST message for the tasks that were running on the Agent that failed to re-register. Scenario number 2 can lead to (a) "missing" instances for the tasks scheduled on the rogue Agent until an explicit reconciliation is done and/or (b) "leaked" tasks if the Agent re-registers after Aurora has replaced the missing tasks that will only be cleaned upon an implicit reconciliation. For (a), one solution is to transition tasks in a missing Agent to the LOST state upon receiving a Slave Lost message. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (AURORA-1712) Debian Jessie packagaes are embedding the mesos egg build for Ubuntu trusty
[ https://issues.apache.org/jira/browse/AURORA-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Renan DelValle closed AURORA-1712. -- Resolution: Fixed Fix Version/s: 0.17.0 Added builder and test environment for Xenial as well as updated instructions on how to test it. Added distribution to release-candidate script. Bugs closed: AURORA-1872 Reviewed at https://reviews.apache.org/r/52437/ > Debian Jessie packagaes are embedding the mesos egg build for Ubuntu trusty > --- > > Key: AURORA-1712 > URL: https://issues.apache.org/jira/browse/AURORA-1712 > Project: Aurora > Issue Type: Bug >Reporter: Stephan Erb >Assignee: Renan DelValle > Fix For: 0.17.0 > > > The Debian packaging scripts for Trusty and Jessie are sharing the same > override mechanism for the pants third_party repository. We therefore end up > using egg-files build for Ubuntu also on Debian > (https://github.com/apache/aurora-packaging/blob/master/specs/debian/aurora-pants.ini) > It seems like this is kind of working, but is clearly not optimal. > We should extend > https://github.com/apache/aurora/blob/master/build-support/python/make-mesos-native-egg > to support Debian and then make use of it in our packaging infrastructure > https://github.com/apache/aurora-packaging. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (AURORA-1751) Update org.apache.aurora/aurora-api in Maven
[ https://issues.apache.org/jira/browse/AURORA-1751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15840652#comment-15840652 ] Renan DelValle commented on AURORA-1751: I wonder if it wouldn't be a good idea to ask the community if it was OK to drop hosting this on Maven but have a way to generate this locally with Gradle. It seems like this doesn't get used enough to justify the overhead keeping an updated version on Maven brings. > Update org.apache.aurora/aurora-api in Maven > > > Key: AURORA-1751 > URL: https://issues.apache.org/jira/browse/AURORA-1751 > Project: Aurora > Issue Type: Task > Components: Packaging >Affects Versions: 0.13.0 >Reporter: Derek Slager >Assignee: Jake Farrell >Priority: Minor > > Currently the version of org.apache.aurora/aurora-api available on Maven > Central is 0.8.0, which is several versions out of date. It would be ideal to > have up-to-date versions available as new Aurora releases are cut. > https://mvnrepository.com/artifact/org.apache.aurora/aurora-api -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AURORA-1780) Offers with unknown resources types to Aurora crash the scheduler
[ https://issues.apache.org/jira/browse/AURORA-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15678503#comment-15678503 ] Renan DelValle commented on AURORA-1780: Second stab at this: https://reviews.apache.org/r/53923/ I think this time I managed to take care of the corner cases where fromResource gets called for Protos.Resource. The error before was due to the the filters that called fromResource() in it (such as the NON_REVOCABLE) being placed on before the SUPPORTED_RESOURCE filter. fromResource() was then called before the SUPPORTED_RESOURCE filter had a change to filter out unsupported resources. So it went something like this Iterales.Filter(Iterables.Filter(resources, NON_REVOCABLE), SUPPORTED_RESOURCE), allowing the first filter to call fromResource on an unknown resource and crash the scheduler. > Offers with unknown resources types to Aurora crash the scheduler > - > > Key: AURORA-1780 > URL: https://issues.apache.org/jira/browse/AURORA-1780 > Project: Aurora > Issue Type: Bug > Environment: vagrant >Reporter: Renan DelValle >Assignee: Renan DelValle > Fix For: 0.17.0 > > > Taking offers from Agents which have resources that are not known to Aurora > cause the Scheduler to crash. > Steps to reproduce: > {code} > vagrant up > sudo service mesos-slave stop > echo > "cpus(aurora-role):0.5;cpus(*):3.5;mem(aurora-role):1024;disk:2;gpus(*):4;test:200" > | sudo tee /etc/mesos-slave/resources > sudo rm -f /var/lib/mesos/meta/slaves/latest > sudo service mesos-slave start > {code} > Wait around a few moments for the offer to be made to Aurora > {code} > I0922 02:41:57.839 [Thread-19, MesosSchedulerImpl:142] Received notification > of lost agent: value: "cadaf569-171d-42fc-a417-fbd608ea5bab-S0" > I0922 02:42:30.585597 2999 log.cpp:577] Attempting to append 109 bytes to > the log > I0922 02:42:30.585654 2999 coordinator.cpp:348] Coordinator attempting to > write APPEND action at position 4 > I0922 02:42:30.585747 2999 replica.cpp:537] Replica received write request > for position 4 from (10)@192.168.33.7:8083 > I0922 02:42:30.586858 2999 leveldb.cpp:341] Persisting action (125 bytes) to > leveldb took 1.086601ms > I0922 02:42:30.586897 2999 replica.cpp:712] Persisted action at 4 > I0922 02:42:30.587020 2999 replica.cpp:691] Replica received learned notice > for position 4 from @0.0.0.0:0 > I0922 02:42:30.587785 2999 leveldb.cpp:341] Persisting action (127 bytes) to > leveldb took 746999ns > I0922 02:42:30.587805 2999 replica.cpp:712] Persisted action at 4 > I0922 02:42:30.587811 2999 replica.cpp:697] Replica learned APPEND action at > position 4 > I0922 02:42:30.601 [SchedulerImpl-0, OfferManager$OfferManagerImpl:185] > Returning offers for cadaf569-171d-42fc-a417-fbd608ea5bab-S1 for compaction. > Sep 22, 2016 2:42:38 AM > com.google.common.util.concurrent.ServiceManager$ServiceListener failed > SEVERE: Service SlotSizeCounterService [FAILED] has failed in the RUNNING > state. > java.lang.NullPointerException: Unknown Mesos resource: name: "test" > type: SCALAR > scalar { > value: 200.0 > } > role: "*" > at java.util.Objects.requireNonNull(Objects.java:228) > at > org.apache.aurora.scheduler.resources.ResourceType.fromResource(ResourceType.java:355) > at > org.apache.aurora.scheduler.resources.ResourceManager.lambda$static$0(ResourceManager.java:52) > at com.google.common.collect.Iterators$7.computeNext(Iterators.java:675) > at > com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) > at > com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) > at java.util.Iterator.forEachRemaining(Iterator.java:115) > at > java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) > at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) > at > java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) > at > java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) > at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > at > java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499) > at > org.apache.aurora.scheduler.resources.ResourceManager.bagFromResources(ResourceManager.java:274) > at > org.apache.aurora.scheduler.resources.ResourceManager.bagFromMesosResources(ResourceManager.java:239) > at > org.apache.aurora.scheduler.stats.AsyncStatsModule$OfferAdapter.get(AsyncStatsModule.java:153) > at > org.apache.aurora.scheduler.stats.SlotSizeCounter.run(SlotSizeCounter.java:168) > at >
[jira] [Reopened] (AURORA-1780) Offers with unknown resources types to Aurora crash the scheduler
[ https://issues.apache.org/jira/browse/AURORA-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Renan DelValle reopened AURORA-1780: The filter gets ignored at some point once again by the SlotSizeCounter. At least now the bug now takes up a little longer to show up. Currently, investigating the root of the problem. > Offers with unknown resources types to Aurora crash the scheduler > - > > Key: AURORA-1780 > URL: https://issues.apache.org/jira/browse/AURORA-1780 > Project: Aurora > Issue Type: Bug > Environment: vagrant >Reporter: Renan DelValle >Assignee: Renan DelValle > Fix For: 0.17.0 > > > Taking offers from Agents which have resources that are not known to Aurora > cause the Scheduler to crash. > Steps to reproduce: > {code} > vagrant up > sudo service mesos-slave stop > echo > "cpus(aurora-role):0.5;cpus(*):3.5;mem(aurora-role):1024;disk:2;gpus(*):4;test:200" > | sudo tee /etc/mesos-slave/resources > sudo rm -f /var/lib/mesos/meta/slaves/latest > sudo service mesos-slave start > {code} > Wait around a few moments for the offer to be made to Aurora > {code} > I0922 02:41:57.839 [Thread-19, MesosSchedulerImpl:142] Received notification > of lost agent: value: "cadaf569-171d-42fc-a417-fbd608ea5bab-S0" > I0922 02:42:30.585597 2999 log.cpp:577] Attempting to append 109 bytes to > the log > I0922 02:42:30.585654 2999 coordinator.cpp:348] Coordinator attempting to > write APPEND action at position 4 > I0922 02:42:30.585747 2999 replica.cpp:537] Replica received write request > for position 4 from (10)@192.168.33.7:8083 > I0922 02:42:30.586858 2999 leveldb.cpp:341] Persisting action (125 bytes) to > leveldb took 1.086601ms > I0922 02:42:30.586897 2999 replica.cpp:712] Persisted action at 4 > I0922 02:42:30.587020 2999 replica.cpp:691] Replica received learned notice > for position 4 from @0.0.0.0:0 > I0922 02:42:30.587785 2999 leveldb.cpp:341] Persisting action (127 bytes) to > leveldb took 746999ns > I0922 02:42:30.587805 2999 replica.cpp:712] Persisted action at 4 > I0922 02:42:30.587811 2999 replica.cpp:697] Replica learned APPEND action at > position 4 > I0922 02:42:30.601 [SchedulerImpl-0, OfferManager$OfferManagerImpl:185] > Returning offers for cadaf569-171d-42fc-a417-fbd608ea5bab-S1 for compaction. > Sep 22, 2016 2:42:38 AM > com.google.common.util.concurrent.ServiceManager$ServiceListener failed > SEVERE: Service SlotSizeCounterService [FAILED] has failed in the RUNNING > state. > java.lang.NullPointerException: Unknown Mesos resource: name: "test" > type: SCALAR > scalar { > value: 200.0 > } > role: "*" > at java.util.Objects.requireNonNull(Objects.java:228) > at > org.apache.aurora.scheduler.resources.ResourceType.fromResource(ResourceType.java:355) > at > org.apache.aurora.scheduler.resources.ResourceManager.lambda$static$0(ResourceManager.java:52) > at com.google.common.collect.Iterators$7.computeNext(Iterators.java:675) > at > com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) > at > com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) > at java.util.Iterator.forEachRemaining(Iterator.java:115) > at > java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) > at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) > at > java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) > at > java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) > at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > at > java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499) > at > org.apache.aurora.scheduler.resources.ResourceManager.bagFromResources(ResourceManager.java:274) > at > org.apache.aurora.scheduler.resources.ResourceManager.bagFromMesosResources(ResourceManager.java:239) > at > org.apache.aurora.scheduler.stats.AsyncStatsModule$OfferAdapter.get(AsyncStatsModule.java:153) > at > org.apache.aurora.scheduler.stats.SlotSizeCounter.run(SlotSizeCounter.java:168) > at > org.apache.aurora.scheduler.stats.AsyncStatsModule$SlotSizeCounterService.runOneIteration(AsyncStatsModule.java:130) > at > com.google.common.util.concurrent.AbstractScheduledService$ServiceDelegate$Task.run(AbstractScheduledService.java:189) > at com.google.common.util.concurrent.Callables$3.run(Callables.java:100) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at >
[jira] [Resolved] (AURORA-1780) Offers with unknown resources types to Aurora crash the scheduler
[ https://issues.apache.org/jira/browse/AURORA-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Renan DelValle resolved AURORA-1780. Resolution: Fixed Fix Version/s: 0.17.0 > Offers with unknown resources types to Aurora crash the scheduler > - > > Key: AURORA-1780 > URL: https://issues.apache.org/jira/browse/AURORA-1780 > Project: Aurora > Issue Type: Bug > Environment: vagrant >Reporter: Renan DelValle >Assignee: Renan DelValle > Fix For: 0.17.0 > > > Taking offers from Agents which have resources that are not known to Aurora > cause the Scheduler to crash. > Steps to reproduce: > {code} > vagrant up > sudo service mesos-slave stop > echo > "cpus(aurora-role):0.5;cpus(*):3.5;mem(aurora-role):1024;disk:2;gpus(*):4;test:200" > | sudo tee /etc/mesos-slave/resources > sudo rm -f /var/lib/mesos/meta/slaves/latest > sudo service mesos-slave start > {code} > Wait around a few moments for the offer to be made to Aurora > {code} > I0922 02:41:57.839 [Thread-19, MesosSchedulerImpl:142] Received notification > of lost agent: value: "cadaf569-171d-42fc-a417-fbd608ea5bab-S0" > I0922 02:42:30.585597 2999 log.cpp:577] Attempting to append 109 bytes to > the log > I0922 02:42:30.585654 2999 coordinator.cpp:348] Coordinator attempting to > write APPEND action at position 4 > I0922 02:42:30.585747 2999 replica.cpp:537] Replica received write request > for position 4 from (10)@192.168.33.7:8083 > I0922 02:42:30.586858 2999 leveldb.cpp:341] Persisting action (125 bytes) to > leveldb took 1.086601ms > I0922 02:42:30.586897 2999 replica.cpp:712] Persisted action at 4 > I0922 02:42:30.587020 2999 replica.cpp:691] Replica received learned notice > for position 4 from @0.0.0.0:0 > I0922 02:42:30.587785 2999 leveldb.cpp:341] Persisting action (127 bytes) to > leveldb took 746999ns > I0922 02:42:30.587805 2999 replica.cpp:712] Persisted action at 4 > I0922 02:42:30.587811 2999 replica.cpp:697] Replica learned APPEND action at > position 4 > I0922 02:42:30.601 [SchedulerImpl-0, OfferManager$OfferManagerImpl:185] > Returning offers for cadaf569-171d-42fc-a417-fbd608ea5bab-S1 for compaction. > Sep 22, 2016 2:42:38 AM > com.google.common.util.concurrent.ServiceManager$ServiceListener failed > SEVERE: Service SlotSizeCounterService [FAILED] has failed in the RUNNING > state. > java.lang.NullPointerException: Unknown Mesos resource: name: "test" > type: SCALAR > scalar { > value: 200.0 > } > role: "*" > at java.util.Objects.requireNonNull(Objects.java:228) > at > org.apache.aurora.scheduler.resources.ResourceType.fromResource(ResourceType.java:355) > at > org.apache.aurora.scheduler.resources.ResourceManager.lambda$static$0(ResourceManager.java:52) > at com.google.common.collect.Iterators$7.computeNext(Iterators.java:675) > at > com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) > at > com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) > at java.util.Iterator.forEachRemaining(Iterator.java:115) > at > java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) > at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) > at > java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) > at > java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) > at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > at > java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499) > at > org.apache.aurora.scheduler.resources.ResourceManager.bagFromResources(ResourceManager.java:274) > at > org.apache.aurora.scheduler.resources.ResourceManager.bagFromMesosResources(ResourceManager.java:239) > at > org.apache.aurora.scheduler.stats.AsyncStatsModule$OfferAdapter.get(AsyncStatsModule.java:153) > at > org.apache.aurora.scheduler.stats.SlotSizeCounter.run(SlotSizeCounter.java:168) > at > org.apache.aurora.scheduler.stats.AsyncStatsModule$SlotSizeCounterService.runOneIteration(AsyncStatsModule.java:130) > at > com.google.common.util.concurrent.AbstractScheduledService$ServiceDelegate$Task.run(AbstractScheduledService.java:189) > at com.google.common.util.concurrent.Callables$3.run(Callables.java:100) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > at >
[jira] [Commented] (AURORA-1780) Offers with unknown resources types to Aurora crash the scheduler
[ https://issues.apache.org/jira/browse/AURORA-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15672216#comment-15672216 ] Renan DelValle commented on AURORA-1780: Review request available: https://reviews.apache.org/r/53831/ Would like some feedback on this approach. It seemed the best way to address this ticket without going overboard as support for arbitrary resources is somewhere in the pipeline (AURORA-1328). I'm open to other ways of tackling this issue. > Offers with unknown resources types to Aurora crash the scheduler > - > > Key: AURORA-1780 > URL: https://issues.apache.org/jira/browse/AURORA-1780 > Project: Aurora > Issue Type: Bug > Environment: vagrant >Reporter: Renan DelValle >Assignee: Renan DelValle > > Taking offers from Agents which have resources that are not known to Aurora > cause the Scheduler to crash. > Steps to reproduce: > {code} > vagrant up > sudo service mesos-slave stop > echo > "cpus(aurora-role):0.5;cpus(*):3.5;mem(aurora-role):1024;disk:2;gpus(*):4;test:200" > | sudo tee /etc/mesos-slave/resources > sudo rm -f /var/lib/mesos/meta/slaves/latest > sudo service mesos-slave start > {code} > Wait around a few moments for the offer to be made to Aurora > {code} > I0922 02:41:57.839 [Thread-19, MesosSchedulerImpl:142] Received notification > of lost agent: value: "cadaf569-171d-42fc-a417-fbd608ea5bab-S0" > I0922 02:42:30.585597 2999 log.cpp:577] Attempting to append 109 bytes to > the log > I0922 02:42:30.585654 2999 coordinator.cpp:348] Coordinator attempting to > write APPEND action at position 4 > I0922 02:42:30.585747 2999 replica.cpp:537] Replica received write request > for position 4 from (10)@192.168.33.7:8083 > I0922 02:42:30.586858 2999 leveldb.cpp:341] Persisting action (125 bytes) to > leveldb took 1.086601ms > I0922 02:42:30.586897 2999 replica.cpp:712] Persisted action at 4 > I0922 02:42:30.587020 2999 replica.cpp:691] Replica received learned notice > for position 4 from @0.0.0.0:0 > I0922 02:42:30.587785 2999 leveldb.cpp:341] Persisting action (127 bytes) to > leveldb took 746999ns > I0922 02:42:30.587805 2999 replica.cpp:712] Persisted action at 4 > I0922 02:42:30.587811 2999 replica.cpp:697] Replica learned APPEND action at > position 4 > I0922 02:42:30.601 [SchedulerImpl-0, OfferManager$OfferManagerImpl:185] > Returning offers for cadaf569-171d-42fc-a417-fbd608ea5bab-S1 for compaction. > Sep 22, 2016 2:42:38 AM > com.google.common.util.concurrent.ServiceManager$ServiceListener failed > SEVERE: Service SlotSizeCounterService [FAILED] has failed in the RUNNING > state. > java.lang.NullPointerException: Unknown Mesos resource: name: "test" > type: SCALAR > scalar { > value: 200.0 > } > role: "*" > at java.util.Objects.requireNonNull(Objects.java:228) > at > org.apache.aurora.scheduler.resources.ResourceType.fromResource(ResourceType.java:355) > at > org.apache.aurora.scheduler.resources.ResourceManager.lambda$static$0(ResourceManager.java:52) > at com.google.common.collect.Iterators$7.computeNext(Iterators.java:675) > at > com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) > at > com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) > at java.util.Iterator.forEachRemaining(Iterator.java:115) > at > java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) > at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) > at > java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) > at > java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) > at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > at > java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499) > at > org.apache.aurora.scheduler.resources.ResourceManager.bagFromResources(ResourceManager.java:274) > at > org.apache.aurora.scheduler.resources.ResourceManager.bagFromMesosResources(ResourceManager.java:239) > at > org.apache.aurora.scheduler.stats.AsyncStatsModule$OfferAdapter.get(AsyncStatsModule.java:153) > at > org.apache.aurora.scheduler.stats.SlotSizeCounter.run(SlotSizeCounter.java:168) > at > org.apache.aurora.scheduler.stats.AsyncStatsModule$SlotSizeCounterService.runOneIteration(AsyncStatsModule.java:130) > at > com.google.common.util.concurrent.AbstractScheduledService$ServiceDelegate$Task.run(AbstractScheduledService.java:189) > at com.google.common.util.concurrent.Callables$3.run(Callables.java:100) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at
[jira] [Assigned] (AURORA-1780) Offers with unknown resources types to Aurora crash the scheduler
[ https://issues.apache.org/jira/browse/AURORA-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Renan DelValle reassigned AURORA-1780: -- Assignee: Renan DelValle > Offers with unknown resources types to Aurora crash the scheduler > - > > Key: AURORA-1780 > URL: https://issues.apache.org/jira/browse/AURORA-1780 > Project: Aurora > Issue Type: Bug > Environment: vagrant >Reporter: Renan DelValle >Assignee: Renan DelValle > > Taking offers from Agents which have resources that are not known to Aurora > cause the Scheduler to crash. > Steps to reproduce: > {code} > vagrant up > sudo service mesos-slave stop > echo > "cpus(aurora-role):0.5;cpus(*):3.5;mem(aurora-role):1024;disk:2;gpus(*):4;test:200" > | sudo tee /etc/mesos-slave/resources > sudo rm -f /var/lib/mesos/meta/slaves/latest > sudo service mesos-slave start > {code} > Wait around a few moments for the offer to be made to Aurora > {code} > I0922 02:41:57.839 [Thread-19, MesosSchedulerImpl:142] Received notification > of lost agent: value: "cadaf569-171d-42fc-a417-fbd608ea5bab-S0" > I0922 02:42:30.585597 2999 log.cpp:577] Attempting to append 109 bytes to > the log > I0922 02:42:30.585654 2999 coordinator.cpp:348] Coordinator attempting to > write APPEND action at position 4 > I0922 02:42:30.585747 2999 replica.cpp:537] Replica received write request > for position 4 from (10)@192.168.33.7:8083 > I0922 02:42:30.586858 2999 leveldb.cpp:341] Persisting action (125 bytes) to > leveldb took 1.086601ms > I0922 02:42:30.586897 2999 replica.cpp:712] Persisted action at 4 > I0922 02:42:30.587020 2999 replica.cpp:691] Replica received learned notice > for position 4 from @0.0.0.0:0 > I0922 02:42:30.587785 2999 leveldb.cpp:341] Persisting action (127 bytes) to > leveldb took 746999ns > I0922 02:42:30.587805 2999 replica.cpp:712] Persisted action at 4 > I0922 02:42:30.587811 2999 replica.cpp:697] Replica learned APPEND action at > position 4 > I0922 02:42:30.601 [SchedulerImpl-0, OfferManager$OfferManagerImpl:185] > Returning offers for cadaf569-171d-42fc-a417-fbd608ea5bab-S1 for compaction. > Sep 22, 2016 2:42:38 AM > com.google.common.util.concurrent.ServiceManager$ServiceListener failed > SEVERE: Service SlotSizeCounterService [FAILED] has failed in the RUNNING > state. > java.lang.NullPointerException: Unknown Mesos resource: name: "test" > type: SCALAR > scalar { > value: 200.0 > } > role: "*" > at java.util.Objects.requireNonNull(Objects.java:228) > at > org.apache.aurora.scheduler.resources.ResourceType.fromResource(ResourceType.java:355) > at > org.apache.aurora.scheduler.resources.ResourceManager.lambda$static$0(ResourceManager.java:52) > at com.google.common.collect.Iterators$7.computeNext(Iterators.java:675) > at > com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) > at > com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) > at java.util.Iterator.forEachRemaining(Iterator.java:115) > at > java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) > at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) > at > java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) > at > java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) > at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > at > java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499) > at > org.apache.aurora.scheduler.resources.ResourceManager.bagFromResources(ResourceManager.java:274) > at > org.apache.aurora.scheduler.resources.ResourceManager.bagFromMesosResources(ResourceManager.java:239) > at > org.apache.aurora.scheduler.stats.AsyncStatsModule$OfferAdapter.get(AsyncStatsModule.java:153) > at > org.apache.aurora.scheduler.stats.SlotSizeCounter.run(SlotSizeCounter.java:168) > at > org.apache.aurora.scheduler.stats.AsyncStatsModule$SlotSizeCounterService.runOneIteration(AsyncStatsModule.java:130) > at > com.google.common.util.concurrent.AbstractScheduledService$ServiceDelegate$Task.run(AbstractScheduledService.java:189) > at com.google.common.util.concurrent.Callables$3.run(Callables.java:100) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > at >
[jira] [Commented] (AURORA-1780) Offers with unknown resources types to Aurora crash the scheduler
[ https://issues.apache.org/jira/browse/AURORA-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15634485#comment-15634485 ] Renan DelValle commented on AURORA-1780: Would everyone be OK with ignoring unknown resource types and letting the scheduler carry on for now? > Offers with unknown resources types to Aurora crash the scheduler > - > > Key: AURORA-1780 > URL: https://issues.apache.org/jira/browse/AURORA-1780 > Project: Aurora > Issue Type: Bug > Environment: vagrant >Reporter: Renan DelValle > > Taking offers from Agents which have resources that are not known to Aurora > cause the Scheduler to crash. > Steps to reproduce: > {code} > vagrant up > sudo service mesos-slave stop > echo > "cpus(aurora-role):0.5;cpus(*):3.5;mem(aurora-role):1024;disk:2;gpus(*):4;test:200" > | sudo tee /etc/mesos-slave/resources > sudo rm -f /var/lib/mesos/meta/slaves/latest > sudo service mesos-slave start > {code} > Wait around a few moments for the offer to be made to Aurora > {code} > I0922 02:41:57.839 [Thread-19, MesosSchedulerImpl:142] Received notification > of lost agent: value: "cadaf569-171d-42fc-a417-fbd608ea5bab-S0" > I0922 02:42:30.585597 2999 log.cpp:577] Attempting to append 109 bytes to > the log > I0922 02:42:30.585654 2999 coordinator.cpp:348] Coordinator attempting to > write APPEND action at position 4 > I0922 02:42:30.585747 2999 replica.cpp:537] Replica received write request > for position 4 from (10)@192.168.33.7:8083 > I0922 02:42:30.586858 2999 leveldb.cpp:341] Persisting action (125 bytes) to > leveldb took 1.086601ms > I0922 02:42:30.586897 2999 replica.cpp:712] Persisted action at 4 > I0922 02:42:30.587020 2999 replica.cpp:691] Replica received learned notice > for position 4 from @0.0.0.0:0 > I0922 02:42:30.587785 2999 leveldb.cpp:341] Persisting action (127 bytes) to > leveldb took 746999ns > I0922 02:42:30.587805 2999 replica.cpp:712] Persisted action at 4 > I0922 02:42:30.587811 2999 replica.cpp:697] Replica learned APPEND action at > position 4 > I0922 02:42:30.601 [SchedulerImpl-0, OfferManager$OfferManagerImpl:185] > Returning offers for cadaf569-171d-42fc-a417-fbd608ea5bab-S1 for compaction. > Sep 22, 2016 2:42:38 AM > com.google.common.util.concurrent.ServiceManager$ServiceListener failed > SEVERE: Service SlotSizeCounterService [FAILED] has failed in the RUNNING > state. > java.lang.NullPointerException: Unknown Mesos resource: name: "test" > type: SCALAR > scalar { > value: 200.0 > } > role: "*" > at java.util.Objects.requireNonNull(Objects.java:228) > at > org.apache.aurora.scheduler.resources.ResourceType.fromResource(ResourceType.java:355) > at > org.apache.aurora.scheduler.resources.ResourceManager.lambda$static$0(ResourceManager.java:52) > at com.google.common.collect.Iterators$7.computeNext(Iterators.java:675) > at > com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) > at > com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) > at java.util.Iterator.forEachRemaining(Iterator.java:115) > at > java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) > at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) > at > java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) > at > java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) > at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > at > java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499) > at > org.apache.aurora.scheduler.resources.ResourceManager.bagFromResources(ResourceManager.java:274) > at > org.apache.aurora.scheduler.resources.ResourceManager.bagFromMesosResources(ResourceManager.java:239) > at > org.apache.aurora.scheduler.stats.AsyncStatsModule$OfferAdapter.get(AsyncStatsModule.java:153) > at > org.apache.aurora.scheduler.stats.SlotSizeCounter.run(SlotSizeCounter.java:168) > at > org.apache.aurora.scheduler.stats.AsyncStatsModule$SlotSizeCounterService.runOneIteration(AsyncStatsModule.java:130) > at > com.google.common.util.concurrent.AbstractScheduledService$ServiceDelegate$Task.run(AbstractScheduledService.java:189) > at com.google.common.util.concurrent.Callables$3.run(Callables.java:100) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > at >
[jira] [Assigned] (AURORA-1712) Debian Jessie packagaes are embedding the mesos egg build for Ubuntu trusty
[ https://issues.apache.org/jira/browse/AURORA-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Renan DelValle reassigned AURORA-1712: -- Assignee: Renan DelValle > Debian Jessie packagaes are embedding the mesos egg build for Ubuntu trusty > --- > > Key: AURORA-1712 > URL: https://issues.apache.org/jira/browse/AURORA-1712 > Project: Aurora > Issue Type: Bug >Reporter: Stephan Erb >Assignee: Renan DelValle > > The Debian packaging scripts for Trusty and Jessie are sharing the same > override mechanism for the pants third_party repository. We therefore end up > using egg-files build for Ubuntu also on Debian > (https://github.com/apache/aurora-packaging/blob/master/specs/debian/aurora-pants.ini) > It seems like this is kind of working, but is clearly not optimal. > We should extend > https://github.com/apache/aurora/blob/master/build-support/python/make-mesos-native-egg > to support Debian and then make use of it in our packaging infrastructure > https://github.com/apache/aurora-packaging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AURORA-1712) Debian Jessie packagaes are embedding the mesos egg build for Ubuntu trusty
[ https://issues.apache.org/jira/browse/AURORA-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15546838#comment-15546838 ] Renan DelValle commented on AURORA-1712: https://reviews.apache.org/r/52531/ > Debian Jessie packagaes are embedding the mesos egg build for Ubuntu trusty > --- > > Key: AURORA-1712 > URL: https://issues.apache.org/jira/browse/AURORA-1712 > Project: Aurora > Issue Type: Bug >Reporter: Stephan Erb >Assignee: Renan DelValle > > The Debian packaging scripts for Trusty and Jessie are sharing the same > override mechanism for the pants third_party repository. We therefore end up > using egg-files build for Ubuntu also on Debian > (https://github.com/apache/aurora-packaging/blob/master/specs/debian/aurora-pants.ini) > It seems like this is kind of working, but is clearly not optimal. > We should extend > https://github.com/apache/aurora/blob/master/build-support/python/make-mesos-native-egg > to support Debian and then make use of it in our packaging infrastructure > https://github.com/apache/aurora-packaging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AURORA-1780) Offers with unknown resources types to Aurora crash the scheduler
[ https://issues.apache.org/jira/browse/AURORA-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514331#comment-15514331 ] Renan DelValle commented on AURORA-1780: FWIW, I have another framework running that relies on arbitrary resources (research oriented). To run Aurora on our cluster I have to shut it all down, remove the arbitrary resources, and bring the cluster back up. And then do the reverse when I run my research framework. So, all in all, this issue is a pretty big thorn on my side. On systems running systemd (tested on ubuntu xenial) this an even nastier issue because they system brings it back up after it crashes, hiding the issue in plain sight until the logs are checked. > Offers with unknown resources types to Aurora crash the scheduler > - > > Key: AURORA-1780 > URL: https://issues.apache.org/jira/browse/AURORA-1780 > Project: Aurora > Issue Type: Bug > Environment: vagrant >Reporter: Renan DelValle > > Taking offers from Agents which have resources that are not known to Aurora > cause the Scheduler to crash. > Steps to reproduce: > {code} > vagrant up > sudo service mesos-slave stop > echo > "cpus(aurora-role):0.5;cpus(*):3.5;mem(aurora-role):1024;disk:2;gpus(*):4;test:200" > | sudo tee /etc/mesos-slave/resources > sudo rm -f /var/lib/mesos/meta/slaves/latest > sudo service mesos-slave start > {code} > Wait around a few moments for the offer to be made to Aurora > {code} > I0922 02:41:57.839 [Thread-19, MesosSchedulerImpl:142] Received notification > of lost agent: value: "cadaf569-171d-42fc-a417-fbd608ea5bab-S0" > I0922 02:42:30.585597 2999 log.cpp:577] Attempting to append 109 bytes to > the log > I0922 02:42:30.585654 2999 coordinator.cpp:348] Coordinator attempting to > write APPEND action at position 4 > I0922 02:42:30.585747 2999 replica.cpp:537] Replica received write request > for position 4 from (10)@192.168.33.7:8083 > I0922 02:42:30.586858 2999 leveldb.cpp:341] Persisting action (125 bytes) to > leveldb took 1.086601ms > I0922 02:42:30.586897 2999 replica.cpp:712] Persisted action at 4 > I0922 02:42:30.587020 2999 replica.cpp:691] Replica received learned notice > for position 4 from @0.0.0.0:0 > I0922 02:42:30.587785 2999 leveldb.cpp:341] Persisting action (127 bytes) to > leveldb took 746999ns > I0922 02:42:30.587805 2999 replica.cpp:712] Persisted action at 4 > I0922 02:42:30.587811 2999 replica.cpp:697] Replica learned APPEND action at > position 4 > I0922 02:42:30.601 [SchedulerImpl-0, OfferManager$OfferManagerImpl:185] > Returning offers for cadaf569-171d-42fc-a417-fbd608ea5bab-S1 for compaction. > Sep 22, 2016 2:42:38 AM > com.google.common.util.concurrent.ServiceManager$ServiceListener failed > SEVERE: Service SlotSizeCounterService [FAILED] has failed in the RUNNING > state. > java.lang.NullPointerException: Unknown Mesos resource: name: "test" > type: SCALAR > scalar { > value: 200.0 > } > role: "*" > at java.util.Objects.requireNonNull(Objects.java:228) > at > org.apache.aurora.scheduler.resources.ResourceType.fromResource(ResourceType.java:355) > at > org.apache.aurora.scheduler.resources.ResourceManager.lambda$static$0(ResourceManager.java:52) > at com.google.common.collect.Iterators$7.computeNext(Iterators.java:675) > at > com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) > at > com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) > at java.util.Iterator.forEachRemaining(Iterator.java:115) > at > java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) > at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) > at > java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) > at > java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) > at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) > at > java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499) > at > org.apache.aurora.scheduler.resources.ResourceManager.bagFromResources(ResourceManager.java:274) > at > org.apache.aurora.scheduler.resources.ResourceManager.bagFromMesosResources(ResourceManager.java:239) > at > org.apache.aurora.scheduler.stats.AsyncStatsModule$OfferAdapter.get(AsyncStatsModule.java:153) > at > org.apache.aurora.scheduler.stats.SlotSizeCounter.run(SlotSizeCounter.java:168) > at > org.apache.aurora.scheduler.stats.AsyncStatsModule$SlotSizeCounterService.runOneIteration(AsyncStatsModule.java:130) > at > com.google.common.util.concurrent.AbstractScheduledService$ServiceDelegate$Task.run(AbstractScheduledService.java:189) > at
[jira] [Created] (AURORA-1780) Offers with unknown resources to Aurora crash the scheduler
Renan DelValle created AURORA-1780: -- Summary: Offers with unknown resources to Aurora crash the scheduler Key: AURORA-1780 URL: https://issues.apache.org/jira/browse/AURORA-1780 Project: Aurora Issue Type: Bug Environment: vagrant Reporter: Renan DelValle Taking offers from Agents which have resources that are not known to Aurora cause the Scheduler to crash. Steps to reproduce: vagrant up sudo service mesos-slave stop echo "cpus(aurora-role):0.5;cpus(*):3.5;mem(aurora-role):1024;disk:2;gpus(*):4;test:200" | sudo tee /etc/mesos-slave/resources sudo rm -f /var/lib/mesos/meta/slaves/latest sudo service mesos-slave start Wait around a few moments for the offer to be made to Aurora {code} I0922 02:41:57.839 [Thread-19, MesosSchedulerImpl:142] Received notification of lost agent: value: "cadaf569-171d-42fc-a417-fbd608ea5bab-S0" I0922 02:42:30.585597 2999 log.cpp:577] Attempting to append 109 bytes to the log I0922 02:42:30.585654 2999 coordinator.cpp:348] Coordinator attempting to write APPEND action at position 4 I0922 02:42:30.585747 2999 replica.cpp:537] Replica received write request for position 4 from (10)@192.168.33.7:8083 I0922 02:42:30.586858 2999 leveldb.cpp:341] Persisting action (125 bytes) to leveldb took 1.086601ms I0922 02:42:30.586897 2999 replica.cpp:712] Persisted action at 4 I0922 02:42:30.587020 2999 replica.cpp:691] Replica received learned notice for position 4 from @0.0.0.0:0 I0922 02:42:30.587785 2999 leveldb.cpp:341] Persisting action (127 bytes) to leveldb took 746999ns I0922 02:42:30.587805 2999 replica.cpp:712] Persisted action at 4 I0922 02:42:30.587811 2999 replica.cpp:697] Replica learned APPEND action at position 4 I0922 02:42:30.601 [SchedulerImpl-0, OfferManager$OfferManagerImpl:185] Returning offers for cadaf569-171d-42fc-a417-fbd608ea5bab-S1 for compaction. Sep 22, 2016 2:42:38 AM com.google.common.util.concurrent.ServiceManager$ServiceListener failed SEVERE: Service SlotSizeCounterService [FAILED] has failed in the RUNNING state. java.lang.NullPointerException: Unknown Mesos resource: name: "test" type: SCALAR scalar { value: 200.0 } role: "*" at java.util.Objects.requireNonNull(Objects.java:228) at org.apache.aurora.scheduler.resources.ResourceType.fromResource(ResourceType.java:355) at org.apache.aurora.scheduler.resources.ResourceManager.lambda$static$0(ResourceManager.java:52) at com.google.common.collect.Iterators$7.computeNext(Iterators.java:675) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at java.util.Iterator.forEachRemaining(Iterator.java:115) at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499) at org.apache.aurora.scheduler.resources.ResourceManager.bagFromResources(ResourceManager.java:274) at org.apache.aurora.scheduler.resources.ResourceManager.bagFromMesosResources(ResourceManager.java:239) at org.apache.aurora.scheduler.stats.AsyncStatsModule$OfferAdapter.get(AsyncStatsModule.java:153) at org.apache.aurora.scheduler.stats.SlotSizeCounter.run(SlotSizeCounter.java:168) at org.apache.aurora.scheduler.stats.AsyncStatsModule$SlotSizeCounterService.runOneIteration(AsyncStatsModule.java:130) at com.google.common.util.concurrent.AbstractScheduledService$ServiceDelegate$Task.run(AbstractScheduledService.java:189) at com.google.common.util.concurrent.Callables$3.run(Callables.java:100) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) E0922 02:42:38.353 [SlotSizeCounterService RUNNING, GuavaUtils$LifecycleShutdownListener:55] Service: SlotSizeCounterService [FAILED] failed unexpectedly. Triggering shutdown. I0922
[jira] [Resolved] (AURORA-1739) createJob thrift api for golang consistenly failing with empty CronSchedule
[ https://issues.apache.org/jira/browse/AURORA-1739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Renan DelValle resolved AURORA-1739. Resolution: Fixed Fix Version/s: 0.16.0 https://reviews.apache.org/r/51973/ > createJob thrift api for golang consistenly failing with empty CronSchedule > --- > > Key: AURORA-1739 > URL: https://issues.apache.org/jira/browse/AURORA-1739 > Project: Aurora > Issue Type: Bug > Components: Client >Affects Versions: 0.15.0 >Reporter: Jimmy Wu >Assignee: Renan DelValle >Priority: Critical > Fix For: 0.16.0 > > > trying to create non cron job via the thrift api for golang but consistently > getting error "Cron jobs may only be created/updated by calling > scheduleCronJob.". Root cause : CronSchedule is not set in JobConfiguration > hence an empty string is used, then create job request gets rejected because > aurora now treats empty cron schedule as failure (related changes > https://reviews.apache.org/r/28571/). This issue breaks all createJob > requests submitted from golang thrift api because empty string is default > value for string instead of nil. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AURORA-1739) createJob thrift api for golang consistenly failing with empty CronSchedule
[ https://issues.apache.org/jira/browse/AURORA-1739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15471614#comment-15471614 ] Renan DelValle commented on AURORA-1739: Good to hear that, I'll gear up the patch then. Unfortunately non-pointer variables can't be set to nil :/ In Go all variables have a zero value, in the case of string, its the empty string which will cause the scheduler to always think that {{cronScheduler}} is set. Changing the type to optional causes {{cronScheduler}} to be generated as a pointer which indeed can be set to nil. > createJob thrift api for golang consistenly failing with empty CronSchedule > --- > > Key: AURORA-1739 > URL: https://issues.apache.org/jira/browse/AURORA-1739 > Project: Aurora > Issue Type: Bug > Components: Client >Affects Versions: 0.15.0 >Reporter: Jimmy Wu >Priority: Critical > > trying to create non cron job via the thrift api for golang but consistently > getting error "Cron jobs may only be created/updated by calling > scheduleCronJob.". Root cause : CronSchedule is not set in JobConfiguration > hence an empty string is used, then create job request gets rejected because > aurora now treats empty cron schedule as failure (related changes > https://reviews.apache.org/r/28571/). This issue breaks all createJob > requests submitted from golang thrift api because empty string is default > value for string instead of nil. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AURORA-1739) createJob thrift api for golang consistenly failing with empty CronSchedule
[ https://issues.apache.org/jira/browse/AURORA-1739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15469144#comment-15469144 ] Renan DelValle commented on AURORA-1739: I encountered this as well but, thankfully, [~jfarrell] lent me a hand with this. I'm sure you've fixed this issue by now, but for anyone else this might help. This can be fixed by modifying the thrift API from which the go bindings get created. This line: https://github.com/apache/aurora/blob/master/api/src/main/thrift/org/apache/aurora/gen/api.thrift#L328 Has to be changed from: {code} 4: string cronSchedule {code} to: {code} 4: optional string cronSchedule {code} Maybe I should submit a patch for this but I have to see if this causes any issues when any other language's bindings are generated first. > createJob thrift api for golang consistenly failing with empty CronSchedule > --- > > Key: AURORA-1739 > URL: https://issues.apache.org/jira/browse/AURORA-1739 > Project: Aurora > Issue Type: Bug > Components: Client >Affects Versions: 0.15.0 >Reporter: Jimmy Wu >Priority: Critical > > trying to create non cron job via the thrift api for golang but consistently > getting error "Cron jobs may only be created/updated by calling > scheduleCronJob.". Root cause : CronSchedule is not set in JobConfiguration > hence an empty string is used, then create job request gets rejected because > aurora now treats empty cron schedule as failure (related changes > https://reviews.apache.org/r/28571/). This issue breaks all createJob > requests submitted from golang thrift api because empty string is default > value for string instead of nil. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AURORA-1762) /pendingtasks endpoint should show reason tasks are pending
[ https://issues.apache.org/jira/browse/AURORA-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15468270#comment-15468270 ] Renan DelValle commented on AURORA-1762: In that case, I'll ask someone from my research lab to take a crack at this. > /pendingtasks endpoint should show reason tasks are pending > --- > > Key: AURORA-1762 > URL: https://issues.apache.org/jira/browse/AURORA-1762 > Project: Aurora > Issue Type: Task >Reporter: David Robinson >Priority: Minor > Labels: newbie > > the /pendingtasks endpoint is essentially useless as is, it shows that tasks > are pending but doesn't show why. The information is also not easily > discovered via the /scheduler UI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AURORA-1734) Configurable Metadata prefix
[ https://issues.apache.org/jira/browse/AURORA-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427421#comment-15427421 ] Renan DelValle commented on AURORA-1734: So I've been thinking about this quite a bit. And it feels to me that we're jumping the gun a bit on a future "what-if" regarding the name collision. As an alternative, I'd like to propose that we reserve the `org.apache.aurora` namespace for future use and remove the prefix altogether (we can even go as far as rejecting a task that includes a label key with this prefix). I'm interested in hearing what everyone's opinion on this would be. The reason I've come to be against every label key having the prefix is that we need to pass labels to our containers with the compose executor. As such, if we have the prefix, we have to create a special "aurora ediiton" of the docker compose executor to filter out the prefix. (This will be true of any future Mesos executor that wants to make use of labels as well). Configuring the metadata from the scheduler is an acceptable solution as well, however, it's less flexible for executor devs. For example, if the community decides to make `environment` or `role` in the future, we would have to filter these out in the executor on a case by case basis (blacklist), instead of a filter for any task beginning with the prefix `org.apache.aurora`. As a bonus, the patch to do this is less messy :). Would like to know the community's thoughts on this before moving forward with the patch. > Configurable Metadata prefix > > > Key: AURORA-1734 > URL: https://issues.apache.org/jira/browse/AURORA-1734 > Project: Aurora > Issue Type: Task > Components: Scheduler >Reporter: Renan DelValle >Assignee: Renan DelValle >Priority: Trivial > > Currently, a prefix ("org.apache.aurora.metadata.") is injected into the > metadata key in the scheduler. It would be beneficial to allow users to set > their own metadata prefix (including an empty string). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (AURORA-1288) Design for supporting custom executor
[ https://issues.apache.org/jira/browse/AURORA-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Renan DelValle resolved AURORA-1288. Resolution: Implemented Fix Version/s: 0.16.0 After a year and a few months of work on this, I'm very happy to say support for custom executors in Aurora is now a reality. Thanks to everyone who contributed to this in any way shape or form. > Design for supporting custom executor > - > > Key: AURORA-1288 > URL: https://issues.apache.org/jira/browse/AURORA-1288 > Project: Aurora > Issue Type: Task >Reporter: Meghdoot Bhattacharya >Assignee: Renan DelValle > Fix For: 0.16.0 > > > The goal is to capture the list of changes in the client and the scheduler > required to support any executor other than thermos. This will help non > thermos use cases to adopt aurora easily. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (AURORA-1726) Create support for using multiple executors in the Scheduler
[ https://issues.apache.org/jira/browse/AURORA-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Renan DelValle resolved AURORA-1726. Resolution: Implemented Fix Version/s: 0.16.0 > Create support for using multiple executors in the Scheduler > > > Key: AURORA-1726 > URL: https://issues.apache.org/jira/browse/AURORA-1726 > Project: Aurora > Issue Type: Sub-task > Components: Scheduler >Reporter: Renan DelValle >Assignee: Renan DelValle > Fix For: 0.16.0 > > > Allow a single Aurora scheduler to schedule tasks on Mesos with different > executors. Configuration for executors will be server side and loaded at the > time the Scheduler is started. Users may specify the executor they wish to > use by specifying the name executor they wish their task to use. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AURORA-1726) Create support for using multiple executors in the Scheduler
[ https://issues.apache.org/jira/browse/AURORA-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15408343#comment-15408343 ] Renan DelValle commented on AURORA-1726: Implemented [d0533d2c7ac15a19cc63587481a75b9597613425|https://github.com/apache/aurora/commit/d0533d2c7ac15a19cc63587481a75b9597613425] > Create support for using multiple executors in the Scheduler > > > Key: AURORA-1726 > URL: https://issues.apache.org/jira/browse/AURORA-1726 > Project: Aurora > Issue Type: Sub-task > Components: Scheduler >Reporter: Renan DelValle >Assignee: Renan DelValle > > Allow a single Aurora scheduler to schedule tasks on Mesos with different > executors. Configuration for executors will be server side and loaded at the > time the Scheduler is started. Users may specify the executor they wish to > use by specifying the name executor they wish their task to use. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AURORA-1726) Create support for using multiple executors in the Scheduler
[ https://issues.apache.org/jira/browse/AURORA-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15388567#comment-15388567 ] Renan DelValle commented on AURORA-1726: I don't foresee modifying the executor resources very often, so I agree it won't be triggered too often. In any case, I thought it was worth bringing up and perhaps documenting it since it could potentially cause some weird behavior. > Create support for using multiple executors in the Scheduler > > > Key: AURORA-1726 > URL: https://issues.apache.org/jira/browse/AURORA-1726 > Project: Aurora > Issue Type: Sub-task > Components: Scheduler >Reporter: Renan DelValle >Assignee: Renan DelValle > > Allow a single Aurora scheduler to schedule tasks on Mesos with different > executors. Configuration for executors will be server side and loaded at the > time the Scheduler is started. Users may specify the executor they wish to > use by specifying the name executor they wish their task to use. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (AURORA-1726) Create support for using multiple executors in the Scheduler
[ https://issues.apache.org/jira/browse/AURORA-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15383446#comment-15383446 ] Renan DelValle edited comment on AURORA-1726 at 7/19/16 1:32 AM: - Currently have a working implementation of this that does its best not to get in the way of how things are run with thermos. One concern I have is about preemption. As far as I can see, the resources available for preemption are calculated using the resources being used by the victim task + executor overhead. This could result in a corner case that may or may not manifest itself. It is not exclusive to using multiple executors but may be magnified by the feature if the resource overhead is changed and the scheduler is restarted with a larger resource overhead. Maybe the more experienced devs can help me understand if this scenario is possible: {code} Overhead for thermos is set to C cpus and R ram task A is submitted with A[cpus] cpus, A[ram] ram, and A[disk] disk. task A begins to run with A[cpus] + C cpus, A[ram] + R ram, and A[disk] disk. Overhead is changed to C' cpus and R' ram. Scheduler is restarted and running tasks are reconciled. task B is submitted to the scheduler with B[cpus, B[ram] and B[disk]. Preemption calculations begin. Since the calculations take into account the current overhead set, the resources available for pre-emption are incorrectly calculated to be: A[cpus] + C', A[ram] + R', A[disk] When they should be using the values used at the time of scheduling: A[cpus] + C, A[ram] + R, A[disk] {code} If this scenario is possible, we should come up with a suitable solution to this issue which may involve storing the overhead used for tasks at the time of running them. was (Author: rdelvalle): Currently have a working implementation of this that does its best not to get in the way of how things are run with thermos. One concern I have is about preemption. As far as I can see, the resources available for preemption are calculated using the resources being used by the victim task + executor overhead. This could result in a corner case that may or may not manifest itself. It is not exclusive to using multiple executors but may be magnified by the feature if the resource overhead is changed and the scheduler is restarted with a larger resource overhead. Maybe the more experienced devs can help me understand if this scenario is possible: Overhead for thermos is set to C cpus and R ram task A is submitted with A[cpus] cpus, A[ram] ram, and A[disk] disk. task A begins to run with A[cpus] + C cpus, A[ram] + R ram, and A[disk] disk. Overhead is changed to C' cpus and R' ram. Scheduler is restarted and running tasks are reconciled. task B is submitted to the scheduler with B[cpus, B[ram] and B[disk]. Preemption calculations begin. Since the calculations take into account the current overhead set, the resources available for pre-emption are incorrectly calculated to be: A[cpus] + C', A[ram] + R', A[disk] When they should be using the values used at the time of scheduling: A[cpus] + C, A[ram] + R, A[disk] If this scenario is possible, we should come up with a suitable solution to this issue which may involve storing the overhead used for tasks at the time of running them. > Create support for using multiple executors in the Scheduler > > > Key: AURORA-1726 > URL: https://issues.apache.org/jira/browse/AURORA-1726 > Project: Aurora > Issue Type: Sub-task > Components: Scheduler >Reporter: Renan DelValle >Assignee: Renan DelValle > > Allow a single Aurora scheduler to schedule tasks on Mesos with different > executors. Configuration for executors will be server side and loaded at the > time the Scheduler is started. Users may specify the executor they wish to > use by specifying the name executor they wish their task to use. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AURORA-1734) Configurable Metadata prefix
[ https://issues.apache.org/jira/browse/AURORA-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15379956#comment-15379956 ] Renan DelValle commented on AURORA-1734: According to the NEWS file in release 0.12.0: Aurora task metadata is now mapped to Mesos task labels. Labels are prefixed with `org.apache.aurora.metadata.` to prevent clashes with other, external label sources. Unsure about the second question. Should be noted that up until this point, as far as I know, there is no way for users to create labels without using a custom thrift client. Another solution to this issue would be to place the prefix in the Aurora Client side. It would require a lot less changes: https://github.com/apache/aurora/compare/master...rdelval:clientLabelPrefix > Configurable Metadata prefix > > > Key: AURORA-1734 > URL: https://issues.apache.org/jira/browse/AURORA-1734 > Project: Aurora > Issue Type: Task > Components: Scheduler >Reporter: Renan DelValle >Assignee: Renan DelValle >Priority: Trivial > > Currently, a prefix ("org.apache.aurora.metadata.") is injected into the > metadata key in the scheduler. It would be beneficial to allow users to set > their own metadata prefix (including an empty string). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AURORA-1734) Create scheduler flag to turn off Metadata prefix
[ https://issues.apache.org/jira/browse/AURORA-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15377507#comment-15377507 ] Renan DelValle commented on AURORA-1734: That's actually a great idea, thanks for suggesting it. I'll go ahead and update the ticket. > Create scheduler flag to turn off Metadata prefix > - > > Key: AURORA-1734 > URL: https://issues.apache.org/jira/browse/AURORA-1734 > Project: Aurora > Issue Type: Task > Components: Scheduler >Reporter: Renan DelValle >Assignee: Renan DelValle >Priority: Trivial > > Currently, a prefix ("org.apache.aurora.metadata.") is injected into the > metadata key in the scheduler. It would be beneficial for those using custom > clients and/or custom executors to turn off the addition of this prefix to > allow metadata to be treated as a list of plain Mesos labels. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (AURORA-1734) Create Flag to turn off Metadata prefix
Renan DelValle created AURORA-1734: -- Summary: Create Flag to turn off Metadata prefix Key: AURORA-1734 URL: https://issues.apache.org/jira/browse/AURORA-1734 Project: Aurora Issue Type: Task Components: Scheduler Reporter: Renan DelValle Assignee: Renan DelValle Priority: Trivial Currently, a prefix ("org.apache.aurora.metadata.") is injected into the metadata key in the scheduler. It would be beneficial for those using custom clients to turn off the addition of this prefix to allow metadata to be treated as a list of plain Mesos labels. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (AURORA-1376) Create support for custom executors in Scheduler
[ https://issues.apache.org/jira/browse/AURORA-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Renan DelValle resolved AURORA-1376. Resolution: Implemented Fix Version/s: 0.11.0 Support for using a single executor has been included in Aurora 0.11. Moving multiple executor support to it's own ticket. > Create support for custom executors in Scheduler > > > Key: AURORA-1376 > URL: https://issues.apache.org/jira/browse/AURORA-1376 > Project: Aurora > Issue Type: Sub-task > Components: Scheduler >Reporter: Renan DelValle > Fix For: 0.11.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (AURORA-1288) Design for supporting custom executor
[ https://issues.apache.org/jira/browse/AURORA-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Renan DelValle reassigned AURORA-1288: -- Assignee: Renan DelValle > Design for supporting custom executor > - > > Key: AURORA-1288 > URL: https://issues.apache.org/jira/browse/AURORA-1288 > Project: Aurora > Issue Type: Task >Reporter: Meghdoot Bhattacharya >Assignee: Renan DelValle > > The goal is to capture the list of changes in the client and the scheduler > required to support any executor other than thermos. This will help non > thermos use cases to adopt aurora easily. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (AURORA-1723) Add support for Mesos Fetcher
[ https://issues.apache.org/jira/browse/AURORA-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Renan DelValle resolved AURORA-1723. Resolution: Implemented Fix Version/s: 0.15.0 Committed 4e28b9c > Add support for Mesos Fetcher > - > > Key: AURORA-1723 > URL: https://issues.apache.org/jira/browse/AURORA-1723 > Project: Aurora > Issue Type: Task > Components: Scheduler >Reporter: Renan DelValle >Assignee: Renan DelValle >Priority: Minor > Labels: features > Fix For: 0.15.0 > > > Adding support for Aurora Tasks to be capable of using the [Mesos > Fetcher|http://mesos.apache.org/documentation/latest/fetcher/] by allowing > the client to provide arbitrary URIs at which resources can be retrieved. > Resources will be marked non-executable to avoid security risks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (AURORA-1723) Add support for Mesos Fetcher
[ https://issues.apache.org/jira/browse/AURORA-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Renan DelValle reassigned AURORA-1723: -- Assignee: Renan DelValle > Add support for Mesos Fetcher > - > > Key: AURORA-1723 > URL: https://issues.apache.org/jira/browse/AURORA-1723 > Project: Aurora > Issue Type: Task > Components: Scheduler >Reporter: Renan DelValle >Assignee: Renan DelValle >Priority: Minor > Labels: features > > Adding support for Aurora Tasks to be capable of using the [Mesos > Fetcher|http://mesos.apache.org/documentation/latest/fetcher/] by allowing > the client to provide arbitrary URIs at which resources can be retrieved. > Resources will be marked non-executable to avoid security risks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AURORA-1723) Add support for Mesos Fetcher
[ https://issues.apache.org/jira/browse/AURORA-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15348825#comment-15348825 ] Renan DelValle commented on AURORA-1723: Review request: https://reviews.apache.org/r/49218/ > Add support for Mesos Fetcher > - > > Key: AURORA-1723 > URL: https://issues.apache.org/jira/browse/AURORA-1723 > Project: Aurora > Issue Type: Task > Components: Scheduler >Reporter: Renan DelValle >Priority: Minor > Labels: features > > Adding support for Aurora Tasks to be capable of using the [Mesos > Fetcher|http://mesos.apache.org/documentation/latest/fetcher/] by allowing > the client to provide arbitrary URIs at which resources can be retrieved. > Resources will be marked non-executable to avoid security risks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AURORA-1723) Add support for Mesos Fetcher
[ https://issues.apache.org/jira/browse/AURORA-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15343689#comment-15343689 ] Renan DelValle commented on AURORA-1723: Thanks [~wfarner], this sounds like the right place to look at. As usual, thanks for saving me a few hours of pouring through code. > Add support for Mesos Fetcher > - > > Key: AURORA-1723 > URL: https://issues.apache.org/jira/browse/AURORA-1723 > Project: Aurora > Issue Type: Task > Components: Scheduler >Reporter: Renan DelValle >Priority: Minor > Labels: features > > Adding support for Aurora Tasks to be capable of using the [Mesos > Fetcher|http://mesos.apache.org/documentation/latest/fetcher/] by allowing > the client to provide arbitrary URIs at which resources can be retrieved. > Resources will be marked non-executable to avoid security risks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (AURORA-1723) Add support for Mesos Fetcher
Renan DelValle created AURORA-1723: -- Summary: Add support for Mesos Fetcher Key: AURORA-1723 URL: https://issues.apache.org/jira/browse/AURORA-1723 Project: Aurora Issue Type: Task Components: Scheduler Reporter: Renan DelValle Priority: Minor Adding support for Aurora Tasks to be capable of using the [Mesos Fetcher|http://mesos.apache.org/documentation/latest/fetcher/] by allowing the client to provide arbitrary URIs at which resources can be retrieved. Resources will be marked non-executable to avoid security risks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (AURORA-1376) Create support for custom executors in Scheduler
[ https://issues.apache.org/jira/browse/AURORA-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Renan DelValle updated AURORA-1376: --- Comment: was deleted (was: I'm working on accepting multiple executors in the optional configuration. I think the best way to do this is to maintain valid JSON formatting by turning the config file into an JSON array like I had in my previous patch. I'm thinking about introducing another argument that will define the default executor. Whenever an name is not specified in the ExecutorConfig.name, the default executor should be used. If anyone has any objections to these ideas, please let me know. ) > Create support for custom executors in Scheduler > > > Key: AURORA-1376 > URL: https://issues.apache.org/jira/browse/AURORA-1376 > Project: Aurora > Issue Type: Sub-task > Components: Scheduler >Reporter: Renan DelValle > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AURORA-1376) Create support for custom executors in Scheduler
[ https://issues.apache.org/jira/browse/AURORA-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070291#comment-15070291 ] Renan DelValle commented on AURORA-1376: That sounds reasonable. My main concern was that if that contract is broken, it could cause some trouble, but then again, we can just elect to reject that job request for violating the contract. > Create support for custom executors in Scheduler > > > Key: AURORA-1376 > URL: https://issues.apache.org/jira/browse/AURORA-1376 > Project: Aurora > Issue Type: Sub-task > Components: Scheduler >Reporter: Renan DelValle > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AURORA-1376) Create support for custom executors in Scheduler
[ https://issues.apache.org/jira/browse/AURORA-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070251#comment-15070251 ] Renan DelValle commented on AURORA-1376: I'm working on accepting multiple executors in the optional configuration. I think the best way to do this is to maintain valid JSON formatting by turning the config file into an JSON array like I had in my previous patch. I'm thinking about introducing another argument that will define the default executor. Whenever an name is not specified in the ExecutorConfig.name, the default executor should be used. If anyone has any objections to these ideas, please let me know. > Create support for custom executors in Scheduler > > > Key: AURORA-1376 > URL: https://issues.apache.org/jira/browse/AURORA-1376 > Project: Aurora > Issue Type: Sub-task > Components: Scheduler >Reporter: Renan DelValle > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AURORA-1376) Create support for custom executors in Scheduler
[ https://issues.apache.org/jira/browse/AURORA-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070252#comment-15070252 ] Renan DelValle commented on AURORA-1376: I'm working on accepting multiple executors in the optional configuration. I think the best way to do this is to maintain valid JSON formatting by turning the config file into an JSON array like I had in my previous patch. I'm thinking about introducing another argument that will define the default executor. Whenever an name is not specified in the ExecutorConfig.name, the default executor should be used. If anyone has any objections to these ideas, please let me know. > Create support for custom executors in Scheduler > > > Key: AURORA-1376 > URL: https://issues.apache.org/jira/browse/AURORA-1376 > Project: Aurora > Issue Type: Sub-task > Components: Scheduler >Reporter: Renan DelValle > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AURORA-1376) Create support for custom executors in Scheduler
[ https://issues.apache.org/jira/browse/AURORA-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15061095#comment-15061095 ] Renan DelValle commented on AURORA-1376: First stab at getting using a command line arg to override the configuration: https://reviews.apache.org/r/41473/ I pulled from the Apache git repo right before submitting, looks like it did more harm than good on the diff. > Create support for custom executors in Scheduler > > > Key: AURORA-1376 > URL: https://issues.apache.org/jira/browse/AURORA-1376 > Project: Aurora > Issue Type: Sub-task > Components: Scheduler >Reporter: Renan DelValle > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AURORA-1376) Create support for custom executors in Scheduler
[ https://issues.apache.org/jira/browse/AURORA-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15025210#comment-15025210 ] Renan DelValle commented on AURORA-1376: Glad to see this has survived, I was getting worried it would be dropped entirely after not hearing back for a while. If at all possible, I'd still like to be part of the development of this patch in any way shape or form. > Create support for custom executors in Scheduler > > > Key: AURORA-1376 > URL: https://issues.apache.org/jira/browse/AURORA-1376 > Project: Aurora > Issue Type: Sub-task > Components: Scheduler >Reporter: Renan DelValle > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AURORA-1376) Create support for custom executors in Scheduler
[ https://issues.apache.org/jira/browse/AURORA-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14715636#comment-14715636 ] Renan DelValle commented on AURORA-1376: First patch in a series of patches to add custom executor support: https://reviews.apache.org/r/37818/ Many thanks to [~kevints] for all the suggestions and taking the time at the Mesos hackathon to help me out with this. Create support for custom executors in Scheduler Key: AURORA-1376 URL: https://issues.apache.org/jira/browse/AURORA-1376 Project: Aurora Issue Type: Sub-task Components: Scheduler Reporter: Renan DelValle -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AURORA-1376) Create support for custom executors in Scheduler
[ https://issues.apache.org/jira/browse/AURORA-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14620942#comment-14620942 ] Renan DelValle commented on AURORA-1376: [~jaybuff]], you make some great points. The JSON schema comes from trying to simulate the ExecutorSettings data structure in the Aurora Scheduler. As I've said previously, the schema is subject to change, so I appreciate your suggestions. Here are the reasons why ExecutorSettings, as it exists right now, is different from ExecutorInfo: a. The ExecutorSettings data struct uses the CommandUtil (org/apache/aurora/scheduler/base/CommandUtil.java) wrapper to configure the CommandInfo to fetch and execute given URIs for an executor. [~wfarner], is this feature considered part of Aurora or part of Thermos? b. It also stores info about global container mounts. which it uses when creating docker containers. If we create an interface, IMO, it should return a TaskInfo.Builder, as that would spare us from having a special case for the mesos command executor. For what it's worth, an early version of my code is here https://reviews.apache.org/r/36289/. I've fixed all the errors that were caused by my changes and will be putting up a new version up later on today. As for having the client send in any info to configure the executor, we had a discussion a short while ago and came to the conclusion, that amongst other things, it is considered a security risk due to the fact that Aurora runs as root (https://issues.apache.org/jira/browse/AURORA-1288?focusedCommentId=14601470page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14601470) Thus, IMO, using a server-side config file is still the best option. Create support for custom executors in Scheduler Key: AURORA-1376 URL: https://issues.apache.org/jira/browse/AURORA-1376 Project: Aurora Issue Type: Sub-task Components: Scheduler Reporter: Renan DelValle -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AURORA-1376) Create support for custom executors in Scheduler
[ https://issues.apache.org/jira/browse/AURORA-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621069#comment-14621069 ] Renan DelValle commented on AURORA-1376: {quote}But some settings have to come from the user, such as what command can run. e.g. If you're using the mesos command executor, you have to set CommandInfo.value from something submitted by the user. I think that should go into executorConfig.data, perhaps as a json blob that only the mesos-command-executor plugin understands.{quote} Agree with this. I think this has to also be considered from the client side of things, which is really going to be up to what the extension of the DSL to support custom executors is going to look like. [AURORA-1377|https://issues.apache.org/jira/browse/AURORA-1377] I think as soon as we have a more concrete idea of what Pystachio with support for custom executors look like, we'll be in a better position to determine what this should look like. The JSON blob is a good starting point. Re: the MesosTaskFactory interface, if I understand correctly, the idea would be to have every executor implement a MesosTaskFactory, correct? If so I think that's a great idea. I'll look into implementing that today. Create support for custom executors in Scheduler Key: AURORA-1376 URL: https://issues.apache.org/jira/browse/AURORA-1376 Project: Aurora Issue Type: Sub-task Components: Scheduler Reporter: Renan DelValle -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AURORA-1376) Create support for custom executors in Scheduler
[ https://issues.apache.org/jira/browse/AURORA-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14612519#comment-14612519 ] Renan DelValle commented on AURORA-1376: None that I can think of, just picked something that was supported by the Apache Commons Configurator to speed things along. YAML sounds good to me; any particular parsers that are compatible with the Apache license? I looked at SnakeYAML but I can't find what license is is released under. Create support for custom executors in Scheduler Key: AURORA-1376 URL: https://issues.apache.org/jira/browse/AURORA-1376 Project: Aurora Issue Type: Sub-task Components: Scheduler Reporter: Renan DelValle -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (AURORA-1376) Create support for custom executors in Scheduler
[ https://issues.apache.org/jira/browse/AURORA-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14612519#comment-14612519 ] Renan DelValle edited comment on AURORA-1376 at 7/2/15 9:25 PM: None that I can think of, just picked something that was supported by the Apache Commons Configurator to speed things along. YAML sounds good to me; any particular parsers that are compatible with the Apache license? I looked at SnakeYAML but I can't find what license is is released under. Edit- SnakeYAML is Apache 2.0. was (Author: rdelvalle): None that I can think of, just picked something that was supported by the Apache Commons Configurator to speed things along. YAML sounds good to me; any particular parsers that are compatible with the Apache license? I looked at SnakeYAML but I can't find what license is is released under. Create support for custom executors in Scheduler Key: AURORA-1376 URL: https://issues.apache.org/jira/browse/AURORA-1376 Project: Aurora Issue Type: Sub-task Components: Scheduler Reporter: Renan DelValle -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (AURORA-1376) Create support for custom executors in Scheduler
[ https://issues.apache.org/jira/browse/AURORA-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14609444#comment-14609444 ] Renan DelValle edited comment on AURORA-1376 at 7/1/15 2:00 AM: So this is a WIP, but since I wanted to keep pushing this work along, I created a simple XML schema that should cover most cases for custom executors and thermos. If there is any objections to using an XML file for the configuration of the executor settings, I am open to using anything else, I simply wanted to get the ball rolling. I'm using the Apache Commons Configurator library to parse the XML. {code:xml} ?xml version=1.0 encoding=ISO-8859-1 ? executors executor namethermos/name path/path/to/thermos/path flags/flags overhead disk_mb/disk_mb ram_mb/ram_mb cpus/cpus ports/ports /overhead resources uri/uri /resources observer/observer cmd/cmd /executor /executors {code} was (Author: rdelvalle): So this is a WIP, but since I wanted to keep pushing this work along, I created a simple XML schema that should cover most cases for custom executors and thermos. If there is any objections to using an XML file for the configuration of the executor settings, I am open to using anything else, I simply wanted to get the ball rolling. I'm using the Apache Commons Configurator library to parse the XML. {code:xml} ?xml version=1.0 encoding=ISO-8859-1 ? executors executor namethermos/name path/test//path flags/flags overhead disk_mb/disk_mb ram_mb/ram_mb cpus/cpus ports/ports /overhead resources uri/uri /resources observer/observer cmd/cmd /executor /executors {code} Create support for custom executors in Scheduler Key: AURORA-1376 URL: https://issues.apache.org/jira/browse/AURORA-1376 Project: Aurora Issue Type: Sub-task Components: Scheduler Reporter: Renan DelValle -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AURORA-1376) Create support for custom executors in Scheduler
[ https://issues.apache.org/jira/browse/AURORA-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606031#comment-14606031 ] Renan DelValle commented on AURORA-1376: [~wfarner] or [~wickman], What would be the preferred method of populating the key, value pairs. Would a config file be preferred or would another approach make more sense? Create support for custom executors in Scheduler Key: AURORA-1376 URL: https://issues.apache.org/jira/browse/AURORA-1376 Project: Aurora Issue Type: Sub-task Components: Scheduler Reporter: Renan DelValle -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (AURORA-1376) Create support for custom executors in Scheduler
Renan DelValle created AURORA-1376: -- Summary: Create support for custom executors in Scheduler Key: AURORA-1376 URL: https://issues.apache.org/jira/browse/AURORA-1376 Project: Aurora Issue Type: Sub-task Components: Scheduler Reporter: Renan DelValle -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AURORA-1288) Design for supporting custom executor
[ https://issues.apache.org/jira/browse/AURORA-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14600531#comment-14600531 ] Renan DelValle commented on AURORA-1288: Right, my line of thought was that Thermos is kind of the default custom executor for Aurora right now. It made sense in my head to give it a special case. I guess, to me, its more of an issue as to wether Aurora will be configured out of the box for Thermos or something else (like the Command Executor) and how this will be accomplished. In terms of using an ExecutorType enumerator, I was thinking of having only those 3 cases included (maybe even 2 if it is decided that Thermos will fall under the Custom moniker). The idea is that, for any custom executor, only the path is passed from the client side, making every single executor that is not thermos, or command executor, fall under the custom umbrella.* I think it's another point where we should come to a consensus of how we want this to implement this before moving forward. Let me know if I'm any of this doesn't make sense. *This assumes the custom executor is able to use the information currently generated by the MesosTaskFactory. Design for supporting custom executor - Key: AURORA-1288 URL: https://issues.apache.org/jira/browse/AURORA-1288 Project: Aurora Issue Type: Task Reporter: Meghdoot Bhattacharya The goal is to capture the list of changes in the client and the scheduler required to support any executor other than thermos. This will help non thermos use cases to adopt aurora easily. -- This message was sent by Atlassian JIRA (v6.3.4#6332)