[jira] [Commented] (AURORA-1973) Documentation issue in installation docs

2020-02-24 Thread Renan DelValle (Jira)


[ 
https://issues.apache.org/jira/browse/AURORA-1973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17043709#comment-17043709
 ] 

Renan DelValle commented on AURORA-1973:


[~asutosh_pandya] the Apache version of this project has now been archived. If 
you would like to submit a patch for documentation, I highly suggest sending it 
over to the spiritual successor https://github.com/aurora-scheduler/aurora

> Documentation issue in installation docs
> 
>
> Key: AURORA-1973
> URL: https://issues.apache.org/jira/browse/AURORA-1973
> Project: Aurora
>  Issue Type: Bug
>Reporter: Tokuhiro Matsuno
>Priority: Trivial
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In Installation docs, `sudo systemctl start aurora` was specified. But it's 
> incorrect.
> It should be `sudo systemctl start aurora-scheduler`
> https://github.com/apache/aurora/commit/537e052cf9bdd69b1454962d77bb90a3b7f8ebc4



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (AURORA-1997) Consider using checksum-dependency-plugin for dependency verification

2019-12-26 Thread Renan DelValle (Jira)


 [ 
https://issues.apache.org/jira/browse/AURORA-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Renan DelValle closed AURORA-1997.
--
Resolution: Later

Hello [~vladimirsitnikov],

While we appreciate your suggestions, there are currently no plans to integrate 
this plug in on our roadmap.

In a perfect world this would be a priority, but we simply don't have the dev 
power right now upgrade to Gradle 6.x which makes integrating with this plugin 
a serious challenge.

If you believe you can help us upgrade to Gradle 6.x we would be extremely 
grateful for a pull request on github: [https://github.com/apache/aurora]

Until then, unfortunately, I will have to close without a promise of getting to 
it in the future.

-Renan

> Consider using checksum-dependency-plugin for dependency verification
> -
>
> Key: AURORA-1997
> URL: https://issues.apache.org/jira/browse/AURORA-1997
> Project: Aurora
>  Issue Type: Story
>  Components: Build, Scheduler, Security
>Reporter: Vladimir Sitnikov
>Priority: Trivial
>  Labels: newbie
>
> {{checksum-dependency-plugin}} [1] is a superset of {{gradle-witness}}, and 
> it enables to increase the level of security.
> Key features:
>  * Gradle plugins can be verified (grade-witness doesn't track plugins)
>  * All Gradle configurations are supported (e.g. `java-library` plugin is 
> supported). `checksum-dependency-plugin` intercepts detached configurations 
> as well (e.g. the ones that are created on demand)
>  * PGP can be used for verification. PGP can be used with or without 
> checksum. PGP enables to detect and prevent issues like 
> [https://blog.autsoft.hu/a-confusing-dependency/]
> {{checksum-dependency-plugin}} aims to provide insulation against MITM 
> attacks via maven dependency downloads.
>  It is trivial to integrate, and it is not that hard to maintain (e.g. 
> updated checksum.xml could be updated automatically)
> [1] 
> [https://github.com/vlsi/vlsi-release-plugins/tree/master/plugins/checksum-dependency-plugin]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (AURORA-1988) Report "[Errno 13] Permission denied" when run hello world when follow latest doc

2018-10-30 Thread Renan DelValle (JIRA)


[ 
https://issues.apache.org/jira/browse/AURORA-1988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16669141#comment-16669141
 ] 

Renan DelValle commented on AURORA-1988:


The issue has not been fixed yet but it will be fixed by the time 0.22.0 is 
released which will be compatible with 1.6 and 1.7 keeping in cadence with our 
+-1 compatibility rule. I suggest you keep an eye on that github PR. Once that 
PR is merged it should be more or less safe to upgrade to Mesos 1.6 though more 
testing should be done for Mesos 1.7 after that.

> Report "[Errno 13] Permission denied" when run hello world when follow latest 
> doc 
> --
>
> Key: AURORA-1988
> URL: https://issues.apache.org/jira/browse/AURORA-1988
> Project: Aurora
>  Issue Type: Bug
>  Components: Executor
>Affects Versions: 0.20.0
> Environment: Mesos Version: 1.6.0
> Aurora Version: 0.20.0
> Aurora RPM:
> aurora-scheduler-0.20.0-1.el7.centos.aurora.x86_64.rpm
> aurora-executor-0.20.0-1.el7.centos.aurora.x86_64.rpm
>Reporter: Geng Gang
>Priority: Blocker
>  Labels: beginner
> Attachments: screen1.jpg
>
>
> Hi 
> I am new user for aurora. When I follow latest hello world doc 
> ([http://aurora.apache.org/documentation/latest/getting-started/tutorial/)] 
> to run first hello world aurora job, I meet below issues:
> D0605 16:17:01.721896 8320 process.py:155] [process: 8320=hello_world]: Error 
> trying to execute hello_world: {color:#FF}[Errno 13] Permission 
> denied{color}: 
> How to solve this issue?
>  
> +_*The below is "thermos_runner.DEBUG" log in Mesos Agent:*_+ 
> D0605 16:17:01.721050 8320 process.py:445] Wrapped cmdline: ['/bin/bash', 
> '-c', u'echo "gang---hello aurora---";']
> D0605 16:17:01.721333 8320 process.py:455] ENV is: \{'HOME': 
> '/var/lib/mesos/slaves/65c1f16f-4292-464f-954c-3471fafdc988-S0/frameworks/b6da6477-5047-4d02-a323-101dd2e6d8b6-/executors/thermos-www-data-devel-hello_world-0-1d35ab50-a8eb-42da-a4f0-1de0bfbcd8fe/runs/b1d09fc1-ff87-4ec3-9341-85b21040f304/sandbox',
>  'LOGNAME': 'www-data', 'USER': 'www-data', 'PATH': 
> '/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin'}
> D0605 16:17:01.721896 8320 process.py:155] [process: 8320=hello_world]: Error 
> trying to execute hello_world: {color:#FF}[Errno 13] Permission 
> denied{color}: 
> '/var/lib/mesos/slaves/65c1f16f-4292-464f-954c-3471fafdc988-S0/frameworks/b6da6477-5047-4d02-a323-101dd2e6d8b6-/executors/thermos-www-data-devel-hello_world-0-1d35ab50-a8eb-42da-a4f0-1de0bfbcd8fe/runs/b1d09fc1-ff87-4ec3-9341-85b21040f304/sandbox'
> D0605 16:17:01.722104 8320 process.py:155] [process: 8320=hello_world]: 
> Coordinator exiting.
>  
>  
> +_*The below is /etc/aurora/clusters.json:*_+
> [root@cloudpoc3 ~]# more /etc/aurora/clusters.json
> [
>  {
>  "auth_mechanism": "UNAUTHENTICATED",
>  "name": "devcluster",
>  "scheduler_zk_path": "/aurora/scheduler",
>  "slave_root": "/var/lib/mesos",
>  "slave_run_directory": "latest",
>  "zk": "127.0.0.1"
>  }
> ]
> +_*The below is hello_world.aurora file:*_+
> pkg_path = '/opt/aurora_test/hello_world.py'
> # we use a trick here to make the configuration change with
> # the contents of the file, for simplicity. in a normal setting, packages 
> would be
> # versioned, and the version number would be changed in the configuration.
> import hashlib
> with open(pkg_path, 'rb') as f:
>  pkg_checksum = hashlib.md5(f.read()).hexdigest()
> # copy hello_world.py into the local sandbox
> install = Process(
>  name = 'fetch_package',
>  cmdline = 'cp %s . && echo %s && chmod +x hello_world.py' % (pkg_path, 
> pkg_checksum))
> # run the script 
> # cmdline = 'python -u hello_world.py'
> {color:#FF}hello_world = Process({color}
> {color:#FF} name = 'hello_world',{color}
> {color:#FF} cmdline = 'echo "gang---hello aurora---";'){color}
> # describe the task
> hello_world_task = SequentialTask(
>  processes = [hello_world],
>  resources = Resources(cpu = 2, ram = 4096*MB, disk=4096*MB))
> jobs = [
>  Service(cluster = 'devcluster',
>  environment = 'devel',
>  role = 'www-data',
>  name = 'hello_world',
>  task = hello_world_task)
> ]
> +_*From Mesos WebUI, it seems normal:*_+
> Please see attached screen1.jpg
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (AURORA-1988) Report "[Errno 13] Permission denied" when run hello world when follow latest doc

2018-10-30 Thread Renan DelValle (JIRA)


 [ 
https://issues.apache.org/jira/browse/AURORA-1988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Renan DelValle closed AURORA-1988.
--
Resolution: Cannot Reproduce

> Report "[Errno 13] Permission denied" when run hello world when follow latest 
> doc 
> --
>
> Key: AURORA-1988
> URL: https://issues.apache.org/jira/browse/AURORA-1988
> Project: Aurora
>  Issue Type: Bug
>  Components: Executor
>Affects Versions: 0.20.0
> Environment: Mesos Version: 1.6.0
> Aurora Version: 0.20.0
> Aurora RPM:
> aurora-scheduler-0.20.0-1.el7.centos.aurora.x86_64.rpm
> aurora-executor-0.20.0-1.el7.centos.aurora.x86_64.rpm
>Reporter: Geng Gang
>Priority: Blocker
>  Labels: beginner
> Attachments: screen1.jpg
>
>
> Hi 
> I am new user for aurora. When I follow latest hello world doc 
> ([http://aurora.apache.org/documentation/latest/getting-started/tutorial/)] 
> to run first hello world aurora job, I meet below issues:
> D0605 16:17:01.721896 8320 process.py:155] [process: 8320=hello_world]: Error 
> trying to execute hello_world: {color:#FF}[Errno 13] Permission 
> denied{color}: 
> How to solve this issue?
>  
> +_*The below is "thermos_runner.DEBUG" log in Mesos Agent:*_+ 
> D0605 16:17:01.721050 8320 process.py:445] Wrapped cmdline: ['/bin/bash', 
> '-c', u'echo "gang---hello aurora---";']
> D0605 16:17:01.721333 8320 process.py:455] ENV is: \{'HOME': 
> '/var/lib/mesos/slaves/65c1f16f-4292-464f-954c-3471fafdc988-S0/frameworks/b6da6477-5047-4d02-a323-101dd2e6d8b6-/executors/thermos-www-data-devel-hello_world-0-1d35ab50-a8eb-42da-a4f0-1de0bfbcd8fe/runs/b1d09fc1-ff87-4ec3-9341-85b21040f304/sandbox',
>  'LOGNAME': 'www-data', 'USER': 'www-data', 'PATH': 
> '/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin'}
> D0605 16:17:01.721896 8320 process.py:155] [process: 8320=hello_world]: Error 
> trying to execute hello_world: {color:#FF}[Errno 13] Permission 
> denied{color}: 
> '/var/lib/mesos/slaves/65c1f16f-4292-464f-954c-3471fafdc988-S0/frameworks/b6da6477-5047-4d02-a323-101dd2e6d8b6-/executors/thermos-www-data-devel-hello_world-0-1d35ab50-a8eb-42da-a4f0-1de0bfbcd8fe/runs/b1d09fc1-ff87-4ec3-9341-85b21040f304/sandbox'
> D0605 16:17:01.722104 8320 process.py:155] [process: 8320=hello_world]: 
> Coordinator exiting.
>  
>  
> +_*The below is /etc/aurora/clusters.json:*_+
> [root@cloudpoc3 ~]# more /etc/aurora/clusters.json
> [
>  {
>  "auth_mechanism": "UNAUTHENTICATED",
>  "name": "devcluster",
>  "scheduler_zk_path": "/aurora/scheduler",
>  "slave_root": "/var/lib/mesos",
>  "slave_run_directory": "latest",
>  "zk": "127.0.0.1"
>  }
> ]
> +_*The below is hello_world.aurora file:*_+
> pkg_path = '/opt/aurora_test/hello_world.py'
> # we use a trick here to make the configuration change with
> # the contents of the file, for simplicity. in a normal setting, packages 
> would be
> # versioned, and the version number would be changed in the configuration.
> import hashlib
> with open(pkg_path, 'rb') as f:
>  pkg_checksum = hashlib.md5(f.read()).hexdigest()
> # copy hello_world.py into the local sandbox
> install = Process(
>  name = 'fetch_package',
>  cmdline = 'cp %s . && echo %s && chmod +x hello_world.py' % (pkg_path, 
> pkg_checksum))
> # run the script 
> # cmdline = 'python -u hello_world.py'
> {color:#FF}hello_world = Process({color}
> {color:#FF} name = 'hello_world',{color}
> {color:#FF} cmdline = 'echo "gang---hello aurora---";'){color}
> # describe the task
> hello_world_task = SequentialTask(
>  processes = [hello_world],
>  resources = Resources(cpu = 2, ram = 4096*MB, disk=4096*MB))
> jobs = [
>  Service(cluster = 'devcluster',
>  environment = 'devel',
>  role = 'www-data',
>  name = 'hello_world',
>  task = hello_world_task)
> ]
> +_*From Mesos WebUI, it seems normal:*_+
> Please see attached screen1.jpg
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AURORA-1988) Report "[Errno 13] Permission denied" when run hello world when follow latest doc

2018-10-30 Thread Renan DelValle (JIRA)


[ 
https://issues.apache.org/jira/browse/AURORA-1988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16668876#comment-16668876
 ] 

Renan DelValle commented on AURORA-1988:


[~clems4ever], this is not the same issue you folks are experiencing.

There was a change in Mesos 1.6 to the default permissions of the sandbox from 
755 to 750. There was e-mail to the dev list regarding this issue because it's 
still not solved. [https://github.com/apache/aurora/pull/42] and 
[https://lists.apache.org/thread.html/c1cf974461bfdf696e3ac2596c6177761406cadf3a8b493929be690f@%3Cdev.aurora.apache.org%3E]

Furthermore, we would not recommend running Aurora 0.16.0 with anything higher 
than Mesos 1.1.0 as there is only a guarantee of +-1 version compatibility with 
Aurora. Since Aurora 0.16.0 was released in Sept. 2016, it makes it impossible 
to foresee changes as the ones that were made in Mesos 1.6.

> Report "[Errno 13] Permission denied" when run hello world when follow latest 
> doc 
> --
>
> Key: AURORA-1988
> URL: https://issues.apache.org/jira/browse/AURORA-1988
> Project: Aurora
>  Issue Type: Bug
>  Components: Executor
>Affects Versions: 0.20.0
> Environment: Mesos Version: 1.6.0
> Aurora Version: 0.20.0
> Aurora RPM:
> aurora-scheduler-0.20.0-1.el7.centos.aurora.x86_64.rpm
> aurora-executor-0.20.0-1.el7.centos.aurora.x86_64.rpm
>Reporter: Geng Gang
>Priority: Blocker
>  Labels: beginner
> Attachments: screen1.jpg
>
>
> Hi 
> I am new user for aurora. When I follow latest hello world doc 
> ([http://aurora.apache.org/documentation/latest/getting-started/tutorial/)] 
> to run first hello world aurora job, I meet below issues:
> D0605 16:17:01.721896 8320 process.py:155] [process: 8320=hello_world]: Error 
> trying to execute hello_world: {color:#FF}[Errno 13] Permission 
> denied{color}: 
> How to solve this issue?
>  
> +_*The below is "thermos_runner.DEBUG" log in Mesos Agent:*_+ 
> D0605 16:17:01.721050 8320 process.py:445] Wrapped cmdline: ['/bin/bash', 
> '-c', u'echo "gang---hello aurora---";']
> D0605 16:17:01.721333 8320 process.py:455] ENV is: \{'HOME': 
> '/var/lib/mesos/slaves/65c1f16f-4292-464f-954c-3471fafdc988-S0/frameworks/b6da6477-5047-4d02-a323-101dd2e6d8b6-/executors/thermos-www-data-devel-hello_world-0-1d35ab50-a8eb-42da-a4f0-1de0bfbcd8fe/runs/b1d09fc1-ff87-4ec3-9341-85b21040f304/sandbox',
>  'LOGNAME': 'www-data', 'USER': 'www-data', 'PATH': 
> '/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin'}
> D0605 16:17:01.721896 8320 process.py:155] [process: 8320=hello_world]: Error 
> trying to execute hello_world: {color:#FF}[Errno 13] Permission 
> denied{color}: 
> '/var/lib/mesos/slaves/65c1f16f-4292-464f-954c-3471fafdc988-S0/frameworks/b6da6477-5047-4d02-a323-101dd2e6d8b6-/executors/thermos-www-data-devel-hello_world-0-1d35ab50-a8eb-42da-a4f0-1de0bfbcd8fe/runs/b1d09fc1-ff87-4ec3-9341-85b21040f304/sandbox'
> D0605 16:17:01.722104 8320 process.py:155] [process: 8320=hello_world]: 
> Coordinator exiting.
>  
>  
> +_*The below is /etc/aurora/clusters.json:*_+
> [root@cloudpoc3 ~]# more /etc/aurora/clusters.json
> [
>  {
>  "auth_mechanism": "UNAUTHENTICATED",
>  "name": "devcluster",
>  "scheduler_zk_path": "/aurora/scheduler",
>  "slave_root": "/var/lib/mesos",
>  "slave_run_directory": "latest",
>  "zk": "127.0.0.1"
>  }
> ]
> +_*The below is hello_world.aurora file:*_+
> pkg_path = '/opt/aurora_test/hello_world.py'
> # we use a trick here to make the configuration change with
> # the contents of the file, for simplicity. in a normal setting, packages 
> would be
> # versioned, and the version number would be changed in the configuration.
> import hashlib
> with open(pkg_path, 'rb') as f:
>  pkg_checksum = hashlib.md5(f.read()).hexdigest()
> # copy hello_world.py into the local sandbox
> install = Process(
>  name = 'fetch_package',
>  cmdline = 'cp %s . && echo %s && chmod +x hello_world.py' % (pkg_path, 
> pkg_checksum))
> # run the script 
> # cmdline = 'python -u hello_world.py'
> {color:#FF}hello_world = Process({color}
> {color:#FF} name = 'hello_world',{color}
> {color:#FF} cmdline = 'echo "gang---hello aurora---";'){color}
> # describe the task
> hello_world_task = SequentialTask(
>  processes = [hello_world],
>  resources = Resources(cpu = 2, ram = 4096*MB, disk=4096*MB))
> jobs = [
>  Service(cluster = 'devcluster',
>  environment = 'devel',
>  role = 'www-data',
>  name = 'hello_world',
>  task = hello_world_task)
> ]
> +_*From Mesos WebUI, it seems normal:*_+
> Please see attached screen1.jpg
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AURORA-1991) TaskEvents in API Thrift should have optional parameters

2018-09-16 Thread Renan DelValle (JIRA)


 [ 
https://issues.apache.org/jira/browse/AURORA-1991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Renan DelValle resolved AURORA-1991.

   Resolution: Fixed
 Assignee: Ezequiel Torres
Fix Version/s: 0.21

> TaskEvents in API Thrift should have optional parameters
> 
>
> Key: AURORA-1991
> URL: https://issues.apache.org/jira/browse/AURORA-1991
> Project: Aurora
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 0.19.1
>Reporter: Ezequiel Torres
>Assignee: Ezequiel Torres
>Priority: Minor
> Fix For: 0.21
>
>
> h1. *+What?+*
> Struct 
> [TaskQuery|https://git-wip-us.apache.org/repos/asf?p=aurora.git;a=blob;f=api/src/main/thrift/org/apache/aurora/gen/api.thrift;h=7265b11103aa12743c42355163ae64e98e965d7f;hb=HEAD#l579]
>  should have optional parameters in order to be able to be used in languages 
> like Go where types does not have a null value by default.
> The following is the autogenerated code created by Thrift with optional 
> parameters and without optional parameters in Golang:
> +*_Without Optional Parameters_*+
> {code}
> type TaskQuery struct {
>   // unused field # 1
>   JobName string `thrift:"jobName,2" json:"jobName"`
>   // unused field # 3
>   TaskIds map[string]bool `thrift:"taskIds,4" json:"taskIds"`
>   Statuses map[ScheduleStatus]bool `thrift:"statuses,5" json:"statuses"`
>   // unused field # 6
>   InstanceIds map[int32]bool `thrift:"instanceIds,7" json:"instanceIds"`
>   // unused field # 8
>   Environment string `thrift:"environment,9" json:"environment"`
>   SlaveHosts map[string]bool `thrift:"slaveHosts,10" json:"slaveHosts"`
>   JobKeys map[*JobKey]bool `thrift:"jobKeys,11" json:"jobKeys"`
>   Offset int32 `thrift:"offset,12" json:"offset"`
>   Limit int32 `thrift:"limit,13" json:"limit"`
>   Role string `thrift:"role,14" json:"role"`
> }
> {code}
> _*+With Optional Parameters+*_
> {code}
> type TaskQuery struct {
>   // unused field # 1
>   JobName *string `thrift:"jobName,2" json:"jobName"`
>   // unused field # 3
>   TaskIds  map[string]bool `thrift:"taskIds,4" json:"taskIds"`
>   Statuses map[ScheduleStatus]bool `thrift:"statuses,5" json:"statuses"`
>   // unused field # 6
>   InstanceIds map[int32]bool `thrift:"instanceIds,7" json:"instanceIds"`
>   // unused field # 8
>   Environment *string  `thrift:"environment,9" json:"environment"`
>   SlaveHosts  map[string]bool  `thrift:"slaveHosts,10" json:"slaveHosts"`
>   JobKeys map[*JobKey]bool `thrift:"jobKeys,11" json:"jobKeys"`
>   Offset  *int32   `thrift:"offset,12" json:"offset"`
>   Limit   *int32   `thrift:"limit,13" json:"limit"`
>   Role*string  `thrift:"role,14" json:"role"`
> }
> {code}
> It can be seen that with an optional parameters like JobName, Role and 
> Environment now can be set with a null value
> h1. *+Why?+*
> With the current structure of the TaskQuery object, it is not possible to 
> make queries without explicitly setting all the fields of the TaskQuery 
> object in Golang. Moreover, the lack of a null value in the structure of the 
> TaskQuery object limits the type of queries that can be obtained from the 
> Aurora Thrift API in Golang since a parameter cannot be skipped.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AURORA-1991) TaskEvents in API Thrift should have optional parameters

2018-09-16 Thread Renan DelValle (JIRA)


[ 
https://issues.apache.org/jira/browse/AURORA-1991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16617039#comment-16617039
 ] 

Renan DelValle commented on AURORA-1991:


[~jingc] thanks for calling attention to this. [~ezetowers] submitted a patch 
to fix this and it landed in master in July :) 
[https://github.com/apache/aurora/commit/efe8656512373389771aff88c2141940f925ad58]

Closing this as fixed!

> TaskEvents in API Thrift should have optional parameters
> 
>
> Key: AURORA-1991
> URL: https://issues.apache.org/jira/browse/AURORA-1991
> Project: Aurora
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 0.19.1
>Reporter: Ezequiel Torres
>Priority: Minor
>
> h1. *+What?+*
> Struct 
> [TaskQuery|https://git-wip-us.apache.org/repos/asf?p=aurora.git;a=blob;f=api/src/main/thrift/org/apache/aurora/gen/api.thrift;h=7265b11103aa12743c42355163ae64e98e965d7f;hb=HEAD#l579]
>  should have optional parameters in order to be able to be used in languages 
> like Go where types does not have a null value by default.
> The following is the autogenerated code created by Thrift with optional 
> parameters and without optional parameters in Golang:
> +*_Without Optional Parameters_*+
> {code}
> type TaskQuery struct {
>   // unused field # 1
>   JobName string `thrift:"jobName,2" json:"jobName"`
>   // unused field # 3
>   TaskIds map[string]bool `thrift:"taskIds,4" json:"taskIds"`
>   Statuses map[ScheduleStatus]bool `thrift:"statuses,5" json:"statuses"`
>   // unused field # 6
>   InstanceIds map[int32]bool `thrift:"instanceIds,7" json:"instanceIds"`
>   // unused field # 8
>   Environment string `thrift:"environment,9" json:"environment"`
>   SlaveHosts map[string]bool `thrift:"slaveHosts,10" json:"slaveHosts"`
>   JobKeys map[*JobKey]bool `thrift:"jobKeys,11" json:"jobKeys"`
>   Offset int32 `thrift:"offset,12" json:"offset"`
>   Limit int32 `thrift:"limit,13" json:"limit"`
>   Role string `thrift:"role,14" json:"role"`
> }
> {code}
> _*+With Optional Parameters+*_
> {code}
> type TaskQuery struct {
>   // unused field # 1
>   JobName *string `thrift:"jobName,2" json:"jobName"`
>   // unused field # 3
>   TaskIds  map[string]bool `thrift:"taskIds,4" json:"taskIds"`
>   Statuses map[ScheduleStatus]bool `thrift:"statuses,5" json:"statuses"`
>   // unused field # 6
>   InstanceIds map[int32]bool `thrift:"instanceIds,7" json:"instanceIds"`
>   // unused field # 8
>   Environment *string  `thrift:"environment,9" json:"environment"`
>   SlaveHosts  map[string]bool  `thrift:"slaveHosts,10" json:"slaveHosts"`
>   JobKeys map[*JobKey]bool `thrift:"jobKeys,11" json:"jobKeys"`
>   Offset  *int32   `thrift:"offset,12" json:"offset"`
>   Limit   *int32   `thrift:"limit,13" json:"limit"`
>   Role*string  `thrift:"role,14" json:"role"`
> }
> {code}
> It can be seen that with an optional parameters like JobName, Role and 
> Environment now can be set with a null value
> h1. *+Why?+*
> With the current structure of the TaskQuery object, it is not possible to 
> make queries without explicitly setting all the fields of the TaskQuery 
> object in Golang. Moreover, the lack of a null value in the structure of the 
> TaskQuery object limits the type of queries that can be obtained from the 
> Aurora Thrift API in Golang since a parameter cannot be skipped.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (AURORA-1993) Aurora crashes when handling an unknown custom resource

2018-07-20 Thread Renan DelValle (JIRA)


 [ 
https://issues.apache.org/jira/browse/AURORA-1993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Renan DelValle closed AURORA-1993.
--
   Resolution: Fixed
Fix Version/s: 0.17.0

This was fixed in 0.17.0 
https://github.com/apache/aurora/commit/4797dfe33ba08183fa9596a46ac8be51a64e08bb

> Aurora crashes when handling an unknown custom resource
> ---
>
> Key: AURORA-1993
> URL: https://issues.apache.org/jira/browse/AURORA-1993
> Project: Aurora
>  Issue Type: Bug
>Affects Versions: 0.16.0
>Reporter: Clément Michaud
>Priority: Major
> Fix For: 0.17.0
>
>
> While we tried to declare network bandwidth as a custom resource in Mesos, we 
> faced a crash in Aurora with the following stacktrace:
> {code:java}
> Jul 18, 2018 1:35:19 PM 
> com.google.common.util.concurrent.ServiceManager$ServiceListener failed
> SEVERE: Service SlotSizeCounterService [FAILED] has failed in the RUNNING 
> state.
> java.lang.NullPointerException: Unknown Mesos resource: name: 
> "network_bandwidth"
> type: SCALAR
> scalar {
> value: 2000.0
> }
> role: "*"
> 11: "\n\adefault"
> at java.util.Objects.requireNonNull(Objects.java:228)
> at 
> org.apache.aurora.scheduler.resources.ResourceType.fromResource(ResourceType.java:355)
> at 
> org.apache.aurora.scheduler.resources.ResourceManager.lambda$static$0(ResourceManager.java:52)
> at com.google.common.collect.Iterators$7.computeNext(Iterators.java:675)
> at 
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
> at 
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
> at java.util.Iterator.forEachRemaining(Iterator.java:115)
> at 
> java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
> at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
> at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
> at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
> at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
> at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
> at 
> org.apache.aurora.scheduler.resources.ResourceManager.bagFromResources(ResourceManager.java:274)
> at 
> org.apache.aurora.scheduler.resources.ResourceManager.bagFromMesosResources(ResourceManager.java:239)
> at 
> org.apache.aurora.scheduler.stats.AsyncStatsModule$OfferAdapter.get(AsyncStatsModule.java:153)
> at 
> org.apache.aurora.scheduler.stats.SlotSizeCounter.run(SlotSizeCounter.java:168)
> at 
> org.apache.aurora.scheduler.stats.AsyncStatsModule$SlotSizeCounterService.runOneIteration(AsyncStatsModule.java:130)
> at 
> com.google.common.util.concurrent.AbstractScheduledService$ServiceDelegate$Task.run(AbstractScheduledService.java:189)
> at com.google.common.util.concurrent.Callables$3.run(Callables.java:100)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> E0718 13:35:19.240 [SlotSizeCounterService RUNNING, 
> GuavaUtils$LifecycleShutdownListener:55] Service: SlotSizeCounterService 
> [FAILED] faile
> I0718 13:35:19.240 [SlotSizeCounterService RUNNING, Lifecycle:84] Shutting 
> down application
> I0718 13:35:19.240 [SlotSizeCounterService RUNNING, 
> ShutdownRegistry$ShutdownRegistryImpl:77] Executing 4 shutdown commands.
> I0718 13:35:19.243 [SlotSizeCounterService RUNNING, StateMachine$Builder:389] 
> SchedulerLifecycle state machine transition ACTIVE -> DEAD
> I0718 13:35:19.249073 331 sched.cpp:2021] Asked to stop the driver
> I0718 13:35:19.249344 30748 sched.cpp:1203] Stopping framework 
> 2a905643-b76f-4f17-a406-524d406f49f8-
> I0718 13:35:19.249 [SlotSizeCounterService RUNNING, StateMachine$Builder:389] 
> storage state machine transition READY -> STOPPED
> I0718 13:35:19.250 [BlockingDriverJoin, SchedulerLifecycle$6:267] Driver 
> exited, terminating lifecycle.
> I0718 13:35:19.250 [BlockingDriverJoin, StateMachine$Builder:389] 
> SchedulerLifecycle state machine transition DEAD -> DEAD
> I0718 13:35:19.250 [BlockingDriverJoin, SchedulerLifecycle$7:287] Shutdown 
> already invoked, ignoring extra call.
> I0718 13:35:19.255 [CronLifecycle STOPPING, CronLifecycle:90] Shutting down 
> Quartz cron scheduler.
> I0718 13:35:19.255 

[jira] [Created] (AURORA-1989) make-pycharm-virtualenv broken after pip drops --egg

2018-06-14 Thread Renan DelValle (JIRA)
Renan DelValle created AURORA-1989:
--

 Summary: make-pycharm-virtualenv broken after pip drops --egg
 Key: AURORA-1989
 URL: https://issues.apache.org/jira/browse/AURORA-1989
 Project: Aurora
  Issue Type: Bug
  Components: Client
Reporter: Renan DelValle


{{pip has dropped the --egg option in pip 10.0.1}}
{{ [https://pip.pypa.io/en/stable/news/#b1-2018-03-31] which has broken our 
make-pycharm-virtualenv script needed to make development of the client easier 
on pycharm.}}

Running the script results in the following error:

+ VIRTUALENV_VERSION=16.0.0}}
{{ + which python2.7}}
{{ ++ which python2.7}}
{{ + PY=/usr/local/bin/python2.7}}
{{ + echo 'Using /usr/local/bin/python2.7'}}
{{ Using /usr/local/bin/python2.7}}
{{ +++ dirname ./build-support/virtualenv}}
{{ ++ cd ./build-support}}
{{ ++ pwd}}
{{ + HERE=/Users/rdelvalle/git/aurora/build-support}}
{{ + '[' -f 
/Users/rdelvalle/git/aurora/build-support/virtualenv-16.0.0/BOOTSTRAPPED ']'}}
{{ + exec /usr/local/bin/python2.7 
/Users/rdelvalle/git/aurora/build-support/virtualenv-16.0.0/virtualenv.py 
--no-download build-support/python/pycharm.venv}}
{{ New python executable in 
/Users/rdelvalle/git/aurora/build-support/python/pycharm.venv/bin/python2.7}}
{{ Also creating executable in 
/Users/rdelvalle/git/aurora/build-support/python/pycharm.venv/bin/python}}
{{ Installing setuptools, pip, wheel...done.}}{{Usage:}}
{{ /Users/rdelvalle/git/aurora/build-support/python/pycharm.venv/bin/python -m 
pip install [options]  [package-index-options] ...}}
{{ /Users/rdelvalle/git/aurora/build-support/python/pycharm.venv/bin/python -m 
pip install [options] -r  [package-index-options] ...}}
{{ /Users/rdelvalle/git/aurora/build-support/python/pycharm.venv/bin/python -m 
pip install [options] [-e]  ...}}
{{ /Users/rdelvalle/git/aurora/build-support/python/pycharm.venv/bin/python -m 
pip install [options] [-e]  ...}}
{{ /Users/rdelvalle/git/aurora/build-support/python/pycharm.venv/bin/python -m 
pip install [options]  ...}}{{no such option: --egg



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AURORA-1988) Report "[Errno 13] Permission denied" when run hello world when follow latest doc

2018-06-14 Thread Renan DelValle (JIRA)


[ 
https://issues.apache.org/jira/browse/AURORA-1988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16512884#comment-16512884
 ] 

Renan DelValle commented on AURORA-1988:


[~ggeng1] This looks like the python script doesn't have the correct 
permissions to run on the mesos-agent on which the task is getting scheduled. 
In the example, the executor is run as the  part of the configuration. 
Therefore, this task is trying to run under the user www-data. If the python 
script being placed in the box by this task doesn't have the right permissions, 
user www-data will not be able to execute it.

> Report "[Errno 13] Permission denied" when run hello world when follow latest 
> doc 
> --
>
> Key: AURORA-1988
> URL: https://issues.apache.org/jira/browse/AURORA-1988
> Project: Aurora
>  Issue Type: Bug
>  Components: Executor
>Affects Versions: 0.20.0
> Environment: Mesos Version: 1.6.0
> Aurora Version: 0.20.0
> Aurora RPM:
> aurora-scheduler-0.20.0-1.el7.centos.aurora.x86_64.rpm
> aurora-executor-0.20.0-1.el7.centos.aurora.x86_64.rpm
>Reporter: Geng Gang
>Priority: Blocker
>  Labels: beginner
> Attachments: screen1.jpg
>
>
> Hi 
> I am new user for aurora. When I follow latest hello world doc 
> ([http://aurora.apache.org/documentation/latest/getting-started/tutorial/)] 
> to run first hello world aurora job, I meet below issues:
> D0605 16:17:01.721896 8320 process.py:155] [process: 8320=hello_world]: Error 
> trying to execute hello_world: {color:#FF}[Errno 13] Permission 
> denied{color}: 
> How to solve this issue?
>  
> +_*The below is "thermos_runner.DEBUG" log in Mesos Agent:*_+ 
> D0605 16:17:01.721050 8320 process.py:445] Wrapped cmdline: ['/bin/bash', 
> '-c', u'echo "gang---hello aurora---";']
> D0605 16:17:01.721333 8320 process.py:455] ENV is: \{'HOME': 
> '/var/lib/mesos/slaves/65c1f16f-4292-464f-954c-3471fafdc988-S0/frameworks/b6da6477-5047-4d02-a323-101dd2e6d8b6-/executors/thermos-www-data-devel-hello_world-0-1d35ab50-a8eb-42da-a4f0-1de0bfbcd8fe/runs/b1d09fc1-ff87-4ec3-9341-85b21040f304/sandbox',
>  'LOGNAME': 'www-data', 'USER': 'www-data', 'PATH': 
> '/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin'}
> D0605 16:17:01.721896 8320 process.py:155] [process: 8320=hello_world]: Error 
> trying to execute hello_world: {color:#FF}[Errno 13] Permission 
> denied{color}: 
> '/var/lib/mesos/slaves/65c1f16f-4292-464f-954c-3471fafdc988-S0/frameworks/b6da6477-5047-4d02-a323-101dd2e6d8b6-/executors/thermos-www-data-devel-hello_world-0-1d35ab50-a8eb-42da-a4f0-1de0bfbcd8fe/runs/b1d09fc1-ff87-4ec3-9341-85b21040f304/sandbox'
> D0605 16:17:01.722104 8320 process.py:155] [process: 8320=hello_world]: 
> Coordinator exiting.
>  
>  
> +_*The below is /etc/aurora/clusters.json:*_+
> [root@cloudpoc3 ~]# more /etc/aurora/clusters.json
> [
>  {
>  "auth_mechanism": "UNAUTHENTICATED",
>  "name": "devcluster",
>  "scheduler_zk_path": "/aurora/scheduler",
>  "slave_root": "/var/lib/mesos",
>  "slave_run_directory": "latest",
>  "zk": "127.0.0.1"
>  }
> ]
> +_*The below is hello_world.aurora file:*_+
> pkg_path = '/opt/aurora_test/hello_world.py'
> # we use a trick here to make the configuration change with
> # the contents of the file, for simplicity. in a normal setting, packages 
> would be
> # versioned, and the version number would be changed in the configuration.
> import hashlib
> with open(pkg_path, 'rb') as f:
>  pkg_checksum = hashlib.md5(f.read()).hexdigest()
> # copy hello_world.py into the local sandbox
> install = Process(
>  name = 'fetch_package',
>  cmdline = 'cp %s . && echo %s && chmod +x hello_world.py' % (pkg_path, 
> pkg_checksum))
> # run the script 
> # cmdline = 'python -u hello_world.py'
> {color:#FF}hello_world = Process({color}
> {color:#FF} name = 'hello_world',{color}
> {color:#FF} cmdline = 'echo "gang---hello aurora---";'){color}
> # describe the task
> hello_world_task = SequentialTask(
>  processes = [hello_world],
>  resources = Resources(cpu = 2, ram = 4096*MB, disk=4096*MB))
> jobs = [
>  Service(cluster = 'devcluster',
>  environment = 'devel',
>  role = 'www-data',
>  name = 'hello_world',
>  task = hello_world_task)
> ]
> +_*From Mesos WebUI, it seems normal:*_+
> Please see attached screen1.jpg
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (AURORA-1982) Add support for using Mesos fetcher from Aurora DSL

2018-05-03 Thread Renan DelValle (JIRA)

[ 
https://issues.apache.org/jira/browse/AURORA-1982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463105#comment-16463105
 ] 

Renan DelValle edited comment on AURORA-1982 at 5/3/18 9:48 PM:


[https://reviews.apache.org/r/66537/] by Steve Salevan


was (Author: rdelvalle):
[https://reviews.apache.org/r/66537/]

> Add support for using Mesos fetcher from Aurora DSL
> ---
>
> Key: AURORA-1982
> URL: https://issues.apache.org/jira/browse/AURORA-1982
> Project: Aurora
>  Issue Type: Sub-task
>  Components: Client
>Reporter: Renan DelValle
>Priority: Major
>
> The Aurora Scheduler supports fetching artifacts using the Mesos Fetcher. 
> However, there is currently no way to allow users to specify which artifacts 
> should be downloaded onto the sandbox. Mimicking this feature is possible in 
> Thermos but custom executors may lack this ability.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AURORA-1982) Add support for using Mesos fetcher from Aurora DSL

2018-05-03 Thread Renan DelValle (JIRA)

[ 
https://issues.apache.org/jira/browse/AURORA-1982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463105#comment-16463105
 ] 

Renan DelValle commented on AURORA-1982:


[https://reviews.apache.org/r/66537/]

> Add support for using Mesos fetcher from Aurora DSL
> ---
>
> Key: AURORA-1982
> URL: https://issues.apache.org/jira/browse/AURORA-1982
> Project: Aurora
>  Issue Type: Sub-task
>  Components: Client
>Reporter: Renan DelValle
>Priority: Major
>
> The Aurora Scheduler supports fetching artifacts using the Mesos Fetcher. 
> However, there is currently no way to allow users to specify which artifacts 
> should be downloaded onto the sandbox. Mimicking this feature is possible in 
> Thermos but custom executors may lack this ability.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AURORA-1983) Support for Docker Volume Isolator

2018-04-03 Thread Renan DelValle (JIRA)

[ 
https://issues.apache.org/jira/browse/AURORA-1983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16424709#comment-16424709
 ] 

Renan DelValle commented on AURORA-1983:


Thanks for the patch Justin! Would it be possible for you to submit this patch 
via ReviewBoard? You can find instructions to do this here: 
http://aurora.apache.org/documentation/latest/contributing/

> Support for Docker Volume Isolator
> --
>
> Key: AURORA-1983
> URL: https://issues.apache.org/jira/browse/AURORA-1983
> Project: Aurora
>  Issue Type: Story
>Reporter: Justin Venus
>Priority: Minor
>
> It would be really useful to support 
> [docker/volume|http://mesos.apache.org/documentation/latest/isolators/docker-volume/]
>  isolation in Aurora.  This would allow for example ... operators in AWS to 
> be able to easily attach EBS volumes to their containers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AURORA-1467) Replace org.apache.aurora.common.args with a standard third-party library

2018-03-27 Thread Renan DelValle (JIRA)

 [ 
https://issues.apache.org/jira/browse/AURORA-1467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Renan DelValle updated AURORA-1467:
---
Fix Version/s: 0.19.1

> Replace org.apache.aurora.common.args with a standard third-party library
> -
>
> Key: AURORA-1467
> URL: https://issues.apache.org/jira/browse/AURORA-1467
> Project: Aurora
>  Issue Type: Task
>  Components: Scheduler
>Reporter: Bill Farner
>Assignee: Bill Farner
>Priority: Major
>  Labels: newbie
> Fix For: 0.19.1
>
>
> Our args parsing/processing system was inherited from Twitter Commons and 
> should be considered for replacement.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AURORA-1825) Enable async logging by default

2018-03-27 Thread Renan DelValle (JIRA)

[ 
https://issues.apache.org/jira/browse/AURORA-1825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416329#comment-16416329
 ] 

Renan DelValle commented on AURORA-1825:


[~jingc] I see that you closed this issue as Done, can you provide a link to 
the review as well as the version this landed in? 

Thanks!

> Enable async logging by default
> ---
>
> Key: AURORA-1825
> URL: https://issues.apache.org/jira/browse/AURORA-1825
> Project: Aurora
>  Issue Type: Task
>Reporter: Zameer Manji
>Assignee: Jing Chen
>Priority: Minor
>
> Based on my experience while working on AURORA-1823 and [~StephanErb]'s work 
> on logging recently, I think it would be best if we enabled async logging.
> For example if one attempts to parallelize the work inside 
> {{StateManagerImpl}} there isn't much benefit because all of the state 
> transitions are logged and all of the threads would contend for the lock.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AURORA-1981) Add support for choosing task Executor using Aurora DSL

2018-03-27 Thread Renan DelValle (JIRA)

 [ 
https://issues.apache.org/jira/browse/AURORA-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Renan DelValle updated AURORA-1981:
---
Fix Version/s: (was: 0.21.0)
   0.20.0

> Add support for choosing task Executor using Aurora DSL
> ---
>
> Key: AURORA-1981
> URL: https://issues.apache.org/jira/browse/AURORA-1981
> Project: Aurora
>  Issue Type: Sub-task
>  Components: Client
>Reporter: Renan DelValle
>Assignee: Renan DelValle
>Priority: Major
> Fix For: 0.20.0
>
>
> The Aurora scheduler supports launching tasks using custom executors. 
> However, there is currently no way to change the executor used for launching 
> a Job's tasks using the Aurora DSL.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AURORA-1974) Update sample Docker jobs for Vagrant tutorial

2018-03-27 Thread Renan DelValle (JIRA)

 [ 
https://issues.apache.org/jira/browse/AURORA-1974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Renan DelValle resolved AURORA-1974.

   Resolution: Fixed
Fix Version/s: 0.20.0

Fixed by 
https://github.com/apache/aurora/commit/b6e898b5e9f70b13db42db366b6d98c5baadcb57

> Update sample Docker jobs for Vagrant tutorial
> --
>
> Key: AURORA-1974
> URL: https://issues.apache.org/jira/browse/AURORA-1974
> Project: Aurora
>  Issue Type: Task
>  Components: Docker, Documentation
>Affects Versions: 0.19.1
>Reporter: Mathias Sulser
>Assignee: Renan DelValle
>Priority: Trivial
> Fix For: 0.20.0
>
>
> h2. Problem
> As discussed with [~rdelvalle] on Slack, I am filing what is likely a 
> regression caused by the recent Vagrant upgrade in 
> [https://github.com/apache/aurora/commit/c52137e20bd2863234dc09116e1339364ffed77a]
> As of now, submitting any jobs in 
> {{examples/jobs/hello_docker_engine.aurora}} or 
> {{examples/jobs/hello_docker_image.aurora}} will fail due to the following 
> error:
> {code:java}
> Traceback (most recent call last):
> File "apache/aurora/executor/bin/thermos_executor_main.py", line 47, in 
> 
> from mesos.executor import MesosExecutorDriver
> File 
> "/root/.pex/install/mesos.executor-1.4.0-py2.7-linux-x86_64.egg.bf19bd50eea04a23374924ed382340b7a2557be3/mesos.executor-1.4.0-py2.7-linux-x86_64.egg/mesos/executor/_init_.py",
>  line 17, in 
> from ._executor import MesosExecutorDriverImpl as MesosExecutorDriver
> ImportError: /usr/lib/x86_64-linux-gnu/libstdc++.so.6: version 
> `GLIBCXX_3.4.21' not found (required by 
> /root/.pex/install/mesos.executor-1.4.0-py2.7-linux-x86_64.egg.bf19bd50eea04a23374924ed382340b7a2557be3/mesos.executor-1.4.0-py2.7-linux-x86_64.egg/mesos/executor/_executor.so)
> {code}
> h2. Solution
> Changing the docker image from {{python:2.7}} to {{python:2.7-slim-stretch}} 
> will fix this.
>  
> Hat-tip to [~rdelvalle] for figuring this out so quickly (y)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AURORA-1974) Update sample Docker jobs for Vagrant tutorial

2018-03-25 Thread Renan DelValle (JIRA)

 [ 
https://issues.apache.org/jira/browse/AURORA-1974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Renan DelValle reassigned AURORA-1974:
--

Assignee: Renan DelValle

> Update sample Docker jobs for Vagrant tutorial
> --
>
> Key: AURORA-1974
> URL: https://issues.apache.org/jira/browse/AURORA-1974
> Project: Aurora
>  Issue Type: Task
>  Components: Docker, Documentation
>Affects Versions: 0.19.1
>Reporter: Mathias Sulser
>Assignee: Renan DelValle
>Priority: Trivial
>
> h2. Problem
> As discussed with [~rdelvalle] on Slack, I am filing what is likely a 
> regression caused by the recent Vagrant upgrade in 
> [https://github.com/apache/aurora/commit/c52137e20bd2863234dc09116e1339364ffed77a]
> As of now, submitting any jobs in 
> {{examples/jobs/hello_docker_engine.aurora}} or 
> {{examples/jobs/hello_docker_image.aurora}} will fail due to the following 
> error:
> {code:java}
> Traceback (most recent call last):
> File "apache/aurora/executor/bin/thermos_executor_main.py", line 47, in 
> 
> from mesos.executor import MesosExecutorDriver
> File 
> "/root/.pex/install/mesos.executor-1.4.0-py2.7-linux-x86_64.egg.bf19bd50eea04a23374924ed382340b7a2557be3/mesos.executor-1.4.0-py2.7-linux-x86_64.egg/mesos/executor/_init_.py",
>  line 17, in 
> from ._executor import MesosExecutorDriverImpl as MesosExecutorDriver
> ImportError: /usr/lib/x86_64-linux-gnu/libstdc++.so.6: version 
> `GLIBCXX_3.4.21' not found (required by 
> /root/.pex/install/mesos.executor-1.4.0-py2.7-linux-x86_64.egg.bf19bd50eea04a23374924ed382340b7a2557be3/mesos.executor-1.4.0-py2.7-linux-x86_64.egg/mesos/executor/_executor.so)
> {code}
> h2. Solution
> Changing the docker image from {{python:2.7}} to {{python:2.7-slim-stretch}} 
> will fix this.
>  
> Hat-tip to [~rdelvalle] for figuring this out so quickly (y)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AURORA-1981) Add support for choosing task Executor using Aurora DSL

2018-03-20 Thread Renan DelValle (JIRA)

 [ 
https://issues.apache.org/jira/browse/AURORA-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Renan DelValle resolved AURORA-1981.

   Resolution: Implemented
Fix Version/s: 0.21.0

https://reviews.apache.org/r/66154/

> Add support for choosing task Executor using Aurora DSL
> ---
>
> Key: AURORA-1981
> URL: https://issues.apache.org/jira/browse/AURORA-1981
> Project: Aurora
>  Issue Type: Sub-task
>  Components: Client
>Reporter: Renan DelValle
>Assignee: Renan DelValle
>Priority: Major
> Fix For: 0.21.0
>
>
> The Aurora scheduler supports launching tasks using custom executors. 
> However, there is currently no way to change the executor used for launching 
> a Job's tasks using the Aurora DSL.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AURORA-1981) Add support for choosing task Executor using Aurora DSL

2018-03-20 Thread Renan DelValle (JIRA)

 [ 
https://issues.apache.org/jira/browse/AURORA-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Renan DelValle updated AURORA-1981:
---
Issue Type: Sub-task  (was: Task)
Parent: AURORA-1744

> Add support for choosing task Executor using Aurora DSL
> ---
>
> Key: AURORA-1981
> URL: https://issues.apache.org/jira/browse/AURORA-1981
> Project: Aurora
>  Issue Type: Sub-task
>  Components: Client
>Reporter: Renan DelValle
>Assignee: Renan DelValle
>Priority: Major
>
> The Aurora scheduler supports launching tasks using custom executors. 
> However, there is currently no way to change the executor used for launching 
> a Job's tasks using the Aurora DSL.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AURORA-1982) Add support for using Mesos fetcher from Aurora DSL

2018-03-20 Thread Renan DelValle (JIRA)

 [ 
https://issues.apache.org/jira/browse/AURORA-1982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Renan DelValle updated AURORA-1982:
---
Issue Type: Sub-task  (was: Task)
Parent: AURORA-1744

> Add support for using Mesos fetcher from Aurora DSL
> ---
>
> Key: AURORA-1982
> URL: https://issues.apache.org/jira/browse/AURORA-1982
> Project: Aurora
>  Issue Type: Sub-task
>  Components: Client
>Reporter: Renan DelValle
>Assignee: Renan DelValle
>Priority: Major
>
> The Aurora Scheduler supports fetching artifacts using the Mesos Fetcher. 
> However, there is currently no way to allow users to specify which artifacts 
> should be downloaded onto the sandbox. Mimicking this feature is possible in 
> Thermos but custom executors may lack this ability.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AURORA-1744) Add end to end testing for custom executors

2018-03-20 Thread Renan DelValle (JIRA)

 [ 
https://issues.apache.org/jira/browse/AURORA-1744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Renan DelValle updated AURORA-1744:
---
Issue Type: Story  (was: Task)
   Summary: Add end to end testing for custom executors  (was: Add end to 
end testing for multiple executors)

> Add end to end testing for custom executors
> ---
>
> Key: AURORA-1744
> URL: https://issues.apache.org/jira/browse/AURORA-1744
> Project: Aurora
>  Issue Type: Story
>  Components: Testing
>Reporter: Renan DelValle
>Assignee: Renan DelValle
>Priority: Major
>
> Now that Aurora is capable of using multiple executors on a single scheduler, 
> it would be beneficial to add end to end testing for this feature.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AURORA-1982) Add support for using Mesos fetcher from Aurora DSL

2018-03-19 Thread Renan DelValle (JIRA)
Renan DelValle created AURORA-1982:
--

 Summary: Add support for using Mesos fetcher from Aurora DSL
 Key: AURORA-1982
 URL: https://issues.apache.org/jira/browse/AURORA-1982
 Project: Aurora
  Issue Type: Task
  Components: Client
Reporter: Renan DelValle
Assignee: Renan DelValle


The Aurora Scheduler supports fetching artifacts using the Mesos Fetcher. 
However, there is currently no way to allow users to specify which artifacts 
should be downloaded onto the sandbox. Mimicking this feature is possible in 
Thermos but custom executors may lack this ability.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AURORA-1980) Integration tests fail with a pants exception: File name too long

2018-03-13 Thread Renan DelValle (JIRA)

[ 
https://issues.apache.org/jira/browse/AURORA-1980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16397708#comment-16397708
 ] 

Renan DelValle commented on AURORA-1980:


Tried 

 
{noformat}
./pants --pants-workdir=/tmp/pawd{noformat}
 

Got the following error:

 
{noformat}
Pants working directory should end with '.pants.d', currently it is 
/tmp/pawd{noformat}
 

So then I tried:

 
{noformat}
./pants --pants-workdir=/tmp/pawd.pants.d
{noformat}
which results in:

 
{noformat}
FAILURE
Exception caught: ()

Exception message: Spec has un-normalized path part 
'../../../../tmp/pawd.pants.d/gen/thrift-py/252d64521cf9/api.src.main.thrift.org.apache.aurora.gen._test/current'{noformat}
 

 

Changing the pants version did work though! 

> Integration tests fail with a pants exception: File name too long
> -
>
> Key: AURORA-1980
> URL: https://issues.apache.org/jira/browse/AURORA-1980
> Project: Aurora
>  Issue Type: Bug
>Reporter: Renan DelValle
>Priority: Major
>
> When running the integration tests the following error happens:
> {noformat}
>    Executing tasks in goals: gen -> pyprep -> test
> 17:13:42 00:01   [gen]
> 17:13:42 00:01     [thrift-py]
> 17:13:42 00:01       [cache]    
>                    No cached artifacts for 4 targets.
>                    Invalidated 4 targets.
> 17:13:42 00:01   [pyprep]
> 17:13:42 00:01     [interpreter]
> 17:13:46 00:05     [requirements]
> 17:13:46 00:05       [cache]                                     
>                    No cached artifacts for 37 targets.
>                    Invalidated 37 targets.
> 17:14:06 00:25     [sources]
>                Waiting for background workers to finish.
> 17:14:06 00:25   [complete]
>                FAILURE
> Exception caught: ()
>  
> Exception message: [Errno 36] File name too long: 
> u'/home//aurora/.pants.d/build_invalidator/7/pants_backend_python_tasks2_gather_sources_GatherSources/.pants.d.gen.thrift-py.252d64521cf9.api.src.main.thrift.org.apache.aurora.gen._storage.current.api.src.main.thrift.org.apache.aurora.gen._storage.hash'{noformat}
>  
> Where `` is longer than than five characters causing a violation of 
> the 255 character filename limit in Linux.
>         



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AURORA-1980) Integration tests fail with a pants exception: File name too long

2018-03-13 Thread Renan DelValle (JIRA)

 [ 
https://issues.apache.org/jira/browse/AURORA-1980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Renan DelValle updated AURORA-1980:
---
Description: 
When running the integration tests the following error happens:
{noformat}
   Executing tasks in goals: gen -> pyprep -> test
17:13:42 00:01   [gen]
17:13:42 00:01     [thrift-py]
17:13:42 00:01       [cache]    
                   No cached artifacts for 4 targets.
                   Invalidated 4 targets.
17:13:42 00:01   [pyprep]
17:13:42 00:01     [interpreter]
17:13:46 00:05     [requirements]
17:13:46 00:05       [cache]                                     
                   No cached artifacts for 37 targets.
                   Invalidated 37 targets.
17:14:06 00:25     [sources]
               Waiting for background workers to finish.
17:14:06 00:25   [complete]
               FAILURE
Exception caught: ()
 
Exception message: [Errno 36] File name too long: 
u'/home//aurora/.pants.d/build_invalidator/7/pants_backend_python_tasks2_gather_sources_GatherSources/.pants.d.gen.thrift-py.252d64521cf9.api.src.main.thrift.org.apache.aurora.gen._storage.current.api.src.main.thrift.org.apache.aurora.gen._storage.hash'{noformat}
 

Where `` is longer than than five characters causing a violation of 
the 255 character filename limit in Linux.
        

  was:
When running the integration tests the following error happens:
{noformat}
   Executing tasks in goals: gen -> pyprep -> test
17:13:42 00:01   [gen]
17:13:42 00:01     [thrift-py]
17:13:42 00:01       [cache]    
                   No cached artifacts for 4 targets.
                   Invalidated 4 targets.
17:13:42 00:01   [pyprep]
17:13:42 00:01     [interpreter]
17:13:46 00:05     [requirements]
17:13:46 00:05       [cache]                                     
                   No cached artifacts for 37 targets.
                   Invalidated 37 targets.
17:14:06 00:25     [sources]
               Waiting for background workers to finish.
17:14:06 00:25   [complete]
               FAILURE
Exception caught: ()
 
Exception message: [Errno 36] File name too long: 
u'/home//aurora/.pants.d/build_invalidator/7/pants_backend_python_tasks2_gather_sources_GatherSources/.pants.d.gen.thrift-py.252d64521cf9.api.src.main.thrift.org.apache.aurora.gen._storage.current.api.src.main.thrift.org.apache.aurora.gen._storage.hash'{noformat}
 
        


> Integration tests fail with a pants exception: File name too long
> -
>
> Key: AURORA-1980
> URL: https://issues.apache.org/jira/browse/AURORA-1980
> Project: Aurora
>  Issue Type: Bug
>Reporter: Renan DelValle
>Priority: Major
>
> When running the integration tests the following error happens:
> {noformat}
>    Executing tasks in goals: gen -> pyprep -> test
> 17:13:42 00:01   [gen]
> 17:13:42 00:01     [thrift-py]
> 17:13:42 00:01       [cache]    
>                    No cached artifacts for 4 targets.
>                    Invalidated 4 targets.
> 17:13:42 00:01   [pyprep]
> 17:13:42 00:01     [interpreter]
> 17:13:46 00:05     [requirements]
> 17:13:46 00:05       [cache]                                     
>                    No cached artifacts for 37 targets.
>                    Invalidated 37 targets.
> 17:14:06 00:25     [sources]
>                Waiting for background workers to finish.
> 17:14:06 00:25   [complete]
>                FAILURE
> Exception caught: ()
>  
> Exception message: [Errno 36] File name too long: 
> u'/home//aurora/.pants.d/build_invalidator/7/pants_backend_python_tasks2_gather_sources_GatherSources/.pants.d.gen.thrift-py.252d64521cf9.api.src.main.thrift.org.apache.aurora.gen._storage.current.api.src.main.thrift.org.apache.aurora.gen._storage.hash'{noformat}
>  
> Where `` is longer than than five characters causing a violation of 
> the 255 character filename limit in Linux.
>         



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AURORA-1980) Integration tests fail with a pants exception: File name too long

2018-03-13 Thread Renan DelValle (JIRA)

 [ 
https://issues.apache.org/jira/browse/AURORA-1980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Renan DelValle updated AURORA-1980:
---
Description: 
When running the integration tests the following error happens:
{noformat}
   Executing tasks in goals: gen -> pyprep -> test
17:13:42 00:01   [gen]
17:13:42 00:01     [thrift-py]
17:13:42 00:01       [cache]    
                   No cached artifacts for 4 targets.
                   Invalidated 4 targets.
17:13:42 00:01   [pyprep]
17:13:42 00:01     [interpreter]
17:13:46 00:05     [requirements]
17:13:46 00:05       [cache]                                     
                   No cached artifacts for 37 targets.
                   Invalidated 37 targets.
17:14:06 00:25     [sources]
               Waiting for background workers to finish.
17:14:06 00:25   [complete]
               FAILURE
Exception caught: ()
 
Exception message: [Errno 36] File name too long: 
u'/home//aurora/.pants.d/build_invalidator/7/pants_backend_python_tasks2_gather_sources_GatherSources/.pants.d.gen.thrift-py.252d64521cf9.api.src.main.thrift.org.apache.aurora.gen._storage.current.api.src.main.thrift.org.apache.aurora.gen._storage.hash'{noformat}
 
        

  was:
When running the integration tests the following error happens:
{noformat}
   Executing tasks in goals: gen -> pyprep -> test
17:13:42 00:01   [gen]
17:13:42 00:01     [thrift-py]
17:13:42 00:01       [cache]    
                   No cached artifacts for 4 targets.
                   Invalidated 4 targets.
17:13:42 00:01   [pyprep]
17:13:42 00:01     [interpreter]
17:13:46 00:05     [requirements]
17:13:46 00:05       [cache]                                     
                   No cached artifacts for 37 targets.
                   Invalidated 37 targets.
17:14:06 00:25     [sources]
               Waiting for background workers to finish.
17:14:06 00:25   [complete]
               FAILURE
Exception caught: ()
 
Exception message: [Errno 36] File name too long: 
u'/home/user/aurora/.pants.d/build_invalidator/7/pants_backend_python_tasks2_gather_sources_GatherSources/.pants.d.gen.thrift-py.252d64521cf9.api.src.main.thrift.org.apache.aurora.gen._storage.current.api.src.main.thrift.org.apache.aurora.gen._storage.hash'{noformat}
 
       


> Integration tests fail with a pants exception: File name too long
> -
>
> Key: AURORA-1980
> URL: https://issues.apache.org/jira/browse/AURORA-1980
> Project: Aurora
>  Issue Type: Bug
>Reporter: Renan DelValle
>Priority: Major
>
> When running the integration tests the following error happens:
> {noformat}
>    Executing tasks in goals: gen -> pyprep -> test
> 17:13:42 00:01   [gen]
> 17:13:42 00:01     [thrift-py]
> 17:13:42 00:01       [cache]    
>                    No cached artifacts for 4 targets.
>                    Invalidated 4 targets.
> 17:13:42 00:01   [pyprep]
> 17:13:42 00:01     [interpreter]
> 17:13:46 00:05     [requirements]
> 17:13:46 00:05       [cache]                                     
>                    No cached artifacts for 37 targets.
>                    Invalidated 37 targets.
> 17:14:06 00:25     [sources]
>                Waiting for background workers to finish.
> 17:14:06 00:25   [complete]
>                FAILURE
> Exception caught: ()
>  
> Exception message: [Errno 36] File name too long: 
> u'/home//aurora/.pants.d/build_invalidator/7/pants_backend_python_tasks2_gather_sources_GatherSources/.pants.d.gen.thrift-py.252d64521cf9.api.src.main.thrift.org.apache.aurora.gen._storage.current.api.src.main.thrift.org.apache.aurora.gen._storage.hash'{noformat}
>  
>         



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AURORA-1980) Pants exception: File name too long

2018-03-12 Thread Renan DelValle (JIRA)
Renan DelValle created AURORA-1980:
--

 Summary: Pants exception: File name too long
 Key: AURORA-1980
 URL: https://issues.apache.org/jira/browse/AURORA-1980
 Project: Aurora
  Issue Type: Bug
Reporter: Renan DelValle


When running the integration tests the following error happens:
{noformat}
   Executing tasks in goals: gen -> pyprep -> test
17:13:42 00:01   [gen]
17:13:42 00:01     [thrift-py]
17:13:42 00:01       [cache]    
                   No cached artifacts for 4 targets.
                   Invalidated 4 targets.
17:13:42 00:01   [pyprep]
17:13:42 00:01     [interpreter]
17:13:46 00:05     [requirements]
17:13:46 00:05       [cache]                                     
                   No cached artifacts for 37 targets.
                   Invalidated 37 targets.
17:14:06 00:25     [sources]
               Waiting for background workers to finish.
17:14:06 00:25   [complete]
               FAILURE
Exception caught: ()
 
Exception message: [Errno 36] File name too long: 
u'/home/user/aurora/.pants.d/build_invalidator/7/pants_backend_python_tasks2_gather_sources_GatherSources/.pants.d.gen.thrift-py.252d64521cf9.api.src.main.thrift.org.apache.aurora.gen._storage.current.api.src.main.thrift.org.apache.aurora.gen._storage.hash'{noformat}
 
       



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AURORA-1980) Integration tests fail with a pants exception: File name too long

2018-03-12 Thread Renan DelValle (JIRA)

 [ 
https://issues.apache.org/jira/browse/AURORA-1980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Renan DelValle updated AURORA-1980:
---
Summary: Integration tests fail with a pants exception: File name too long  
(was: Pants exception: File name too long)

> Integration tests fail with a pants exception: File name too long
> -
>
> Key: AURORA-1980
> URL: https://issues.apache.org/jira/browse/AURORA-1980
> Project: Aurora
>  Issue Type: Bug
>Reporter: Renan DelValle
>Priority: Major
>
> When running the integration tests the following error happens:
> {noformat}
>    Executing tasks in goals: gen -> pyprep -> test
> 17:13:42 00:01   [gen]
> 17:13:42 00:01     [thrift-py]
> 17:13:42 00:01       [cache]    
>                    No cached artifacts for 4 targets.
>                    Invalidated 4 targets.
> 17:13:42 00:01   [pyprep]
> 17:13:42 00:01     [interpreter]
> 17:13:46 00:05     [requirements]
> 17:13:46 00:05       [cache]                                     
>                    No cached artifacts for 37 targets.
>                    Invalidated 37 targets.
> 17:14:06 00:25     [sources]
>                Waiting for background workers to finish.
> 17:14:06 00:25   [complete]
>                FAILURE
> Exception caught: ()
>  
> Exception message: [Errno 36] File name too long: 
> u'/home/user/aurora/.pants.d/build_invalidator/7/pants_backend_python_tasks2_gather_sources_GatherSources/.pants.d.gen.thrift-py.252d64521cf9.api.src.main.thrift.org.apache.aurora.gen._storage.current.api.src.main.thrift.org.apache.aurora.gen._storage.hash'{noformat}
>  
>        



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (AURORA-1734) Configurable Metadata prefix

2018-03-02 Thread Renan DelValle (JIRA)

 [ 
https://issues.apache.org/jira/browse/AURORA-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Renan DelValle closed AURORA-1734.
--
Resolution: Won't Do

> Configurable Metadata prefix
> 
>
> Key: AURORA-1734
> URL: https://issues.apache.org/jira/browse/AURORA-1734
> Project: Aurora
>  Issue Type: Task
>  Components: Scheduler
>Reporter: Renan DelValle
>Priority: Trivial
>
> Currently, a prefix ("org.apache.aurora.metadata.") is injected into the 
> metadata key in the scheduler. It would be beneficial to allow users to set 
> their own metadata prefix (including an empty string).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AURORA-1467) Replace org.apache.aurora.common.args with a standard third-party library

2018-03-01 Thread Renan DelValle (JIRA)

 [ 
https://issues.apache.org/jira/browse/AURORA-1467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Renan DelValle reassigned AURORA-1467:
--

Assignee: Bill Farner

> Replace org.apache.aurora.common.args with a standard third-party library
> -
>
> Key: AURORA-1467
> URL: https://issues.apache.org/jira/browse/AURORA-1467
> Project: Aurora
>  Issue Type: Task
>  Components: Scheduler
>Reporter: Bill Farner
>Assignee: Bill Farner
>Priority: Major
>  Labels: newbie
>
> Our args parsing/processing system was inherited from Twitter Commons and 
> should be considered for replacement.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AURORA-1467) Replace org.apache.aurora.common.args with a standard third-party library

2018-03-01 Thread Renan DelValle (JIRA)

 [ 
https://issues.apache.org/jira/browse/AURORA-1467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Renan DelValle reassigned AURORA-1467:
--

Assignee: (was: Bill Farner)

> Replace org.apache.aurora.common.args with a standard third-party library
> -
>
> Key: AURORA-1467
> URL: https://issues.apache.org/jira/browse/AURORA-1467
> Project: Aurora
>  Issue Type: Task
>  Components: Scheduler
>Reporter: Bill Farner
>Priority: Major
>  Labels: newbie
>
> Our args parsing/processing system was inherited from Twitter Commons and 
> should be considered for replacement.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AURORA-1467) Replace org.apache.aurora.common.args with a standard third-party library

2018-03-01 Thread Renan DelValle (JIRA)

[ 
https://issues.apache.org/jira/browse/AURORA-1467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16383014#comment-16383014
 ] 

Renan DelValle commented on AURORA-1467:


[~wfarner], given that we have moved on to use JCommander should we close this 
ticket?

> Replace org.apache.aurora.common.args with a standard third-party library
> -
>
> Key: AURORA-1467
> URL: https://issues.apache.org/jira/browse/AURORA-1467
> Project: Aurora
>  Issue Type: Task
>  Components: Scheduler
>Reporter: Bill Farner
>Priority: Major
>  Labels: newbie
>
> Our args parsing/processing system was inherited from Twitter Commons and 
> should be considered for replacement.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AURORA-1966) TASK_UNKNOWN to PARTITIONED mapping puts Scheduler to kill non-exist Task indefinitely

2018-03-01 Thread Renan DelValle (JIRA)

[ 
https://issues.apache.org/jira/browse/AURORA-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16383006#comment-16383006
 ] 

Renan DelValle commented on AURORA-1966:


[~davmclau] should we close this since the patch was committed?

> TASK_UNKNOWN to PARTITIONED mapping puts Scheduler to kill non-exist Task 
> indefinitely
> --
>
> Key: AURORA-1966
> URL: https://issues.apache.org/jira/browse/AURORA-1966
> Project: Aurora
>  Issue Type: Bug
>Reporter: Santhosh Kumar Shanmugham
>Assignee: David McLaughlin
>Priority: Major
>
> When a Task launch fails, it is moved from ASSIGNED to LOST, which performs a 
> RESCHEDULE and KILL. Unfortunately the KILL of a non-existent task to the 
> Mesos master results in a TASK_UNKNOWN status update, which gets mapped to 
> PARTITIONED. While the transition from LOST to PARTITIONED is not allowed, 
> some callbacks get executed despite the fact, resulting in a KILL and 
> RESCHEDULE action. This new KILL triggers another TASK_UNKNOWN and hence 
> PARTITIONED status update for the same task, putting the Scheduler to 
> indefinitely attempt KILLing the non-existent task. Attempting a client job 
> killall results in the same state for the scheduler.
> Since the scheduler uses the LOST state for black-holing task the 
> {{TaskStateMachine}} needs to take those into account.
> I was able to reproduce this in the Vagrant image by faking a launch failure.
> {code:java}
> I0124 05:48:23.198 [qtp1791010542-40, StateMachine] 
> vagrant-test-fail-partition_aware_disabled-0-07bec0cb-d6a3-4caa-9b6e-60e6d0934606
>  state machine transition INIT -> PENDING I0124 05:48:23.213508 9748 
> log.cpp:560] Attempting to append 1679 bytes to the log I0124 05:48:23.214570 
> 9748 coordinator.cpp:348] Coordinator attempting to write APPEND action at 
> position 24778 I0124 05:48:23.214834 9748 replica.cpp:540] Replica received 
> write request for position 24778 from __req_res__(4)@192.168.33.7:8083 I0124 
> 05:48:23.221982 9748 leveldb.cpp:341] Persisting action (1700 bytes) to 
> leveldb took 6.772102ms I0124 05:48:23.222174 9748 replica.cpp:711] Persisted 
> action APPEND at position 24778 I0124 05:48:23.222901 9748 replica.cpp:694] 
> Replica received learned notice for position 24778 from 
> log-network(1)@192.168.33.7:8083 I0124 05:48:23.226833 9748 leveldb.cpp:341] 
> Persisting action (1702 bytes) to leveldb took 3.227779ms I0124 
> 05:48:23.227008 9748 replica.cpp:711] Persisted action APPEND at position 
> 24778 I0124 05:48:23.262 [qtp1791010542-40, RequestLog] 127.0.0.1 - - 
> [24/Jan/2018:05:48:23 +] "POST //aurora.local/api HTTP/1.1" 200 78 I0124 
> 05:48:23.267 [qtp1791010542-40, LoggingInterceptor] 
> getTasksWithoutConfigs(TaskQuery(role:null, environment:null, jobName:null, 
> taskIds:null, statuses:null, instanceIds:null, slaveHosts:null, 
> jobKeys:[JobKey(role:vagrant, environment:test, 
> name:fail-partition_aware_disabled)], offset:0, limit:0)) I0124 05:48:23.285 
> [qtp1791010542-40, RequestLog] 127.0.0.1 - - [24/Jan/2018:05:48:23 +] 
> "POST //aurora.local/api HTTP/1.1" 200 794 I0124 05:48:23.349 
> [TaskGroupBatchWorker, StateMachine] Callback transition PENDING to ASSIGNED, 
> allow: true I0124 05:48:23.353 [TaskGroupBatchWorker, StateMachine] 
> vagrant-test-fail-partition_aware_disabled-0-07bec0cb-d6a3-4caa-9b6e-60e6d0934606
>  state machine transition PENDING -> ASSIGNED I0124 05:48:23.356 
> [TaskGroupBatchWorker, TaskAssignerImpl] Offer on agent 192.168.33.7 (id 
> fe8bc641-aa02-4363-a990-318d20de1bac-S0) is being assigned task for 
> vagrant-test-fail-partition_aware_disabled-0-07bec0cb-d6a3-4caa-9b6e-60e6d0934606.
>  W0124 05:48:23.445 [TaskGroupBatchWorker, TaskAssignerImpl] Failed to launch 
> task. org.apache.aurora.scheduler.offers.OfferManager$LaunchException: Failed 
> to launch task. at 
> org.apache.aurora.scheduler.offers.OfferManagerImpl.launchTask(OfferManagerImpl.java:212)
>  at 
> org.apache.aurora.common.inject.TimedInterceptor.invoke(TimedInterceptor.java:83)
>  at 
> org.apache.aurora.scheduler.scheduling.TaskAssignerImpl.launchUsingOffer(TaskAssignerImpl.java:126)
>  at 
> org.apache.aurora.scheduler.scheduling.TaskAssignerImpl.maybeAssign(TaskAssignerImpl.java:262)
>  at 
> org.apache.aurora.common.inject.TimedInterceptor.invoke(TimedInterceptor.java:83)
>  at 
> org.apache.aurora.scheduler.scheduling.TaskSchedulerImpl.scheduleTasks(TaskSchedulerImpl.java:154)
>  at 
> org.apache.aurora.scheduler.scheduling.TaskSchedulerImpl.schedule(TaskSchedulerImpl.java:108)
>  at 
> org.apache.aurora.common.inject.TimedInterceptor.invoke(TimedInterceptor.java:83)
>  at 
> org.apache.aurora.scheduler.scheduling.TaskGroups$1.lambda$run$0(TaskGroups.java:174)
>  at 

[jira] [Commented] (AURORA-1973) Documentation issue in installation docs

2018-02-21 Thread Renan DelValle (JIRA)

[ 
https://issues.apache.org/jira/browse/AURORA-1973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16371767#comment-16371767
 ] 

Renan DelValle commented on AURORA-1973:


Hi [~tokuhirom], you are correct, this should be aurora-scheduler. Do you mind 
sending in a patch? 
[http://aurora.apache.org/documentation/latest/contributing/]

We would really appreciate it!

> Documentation issue in installation docs
> 
>
> Key: AURORA-1973
> URL: https://issues.apache.org/jira/browse/AURORA-1973
> Project: Aurora
>  Issue Type: Bug
>Reporter: Tokuhiro Matsuno
>Priority: Trivial
>
> In Installation docs, `sudo systemctl start aurora` was specified. But it's 
> incorrect.
> It should be `sudo systemctl start aurora-scheduler`
> https://github.com/apache/aurora/commit/537e052cf9bdd69b1454962d77bb90a3b7f8ebc4



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (AURORA-1964) Move Vagrant setup from Trusty to Xenial

2018-02-21 Thread Renan DelValle (JIRA)

 [ 
https://issues.apache.org/jira/browse/AURORA-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Renan DelValle resolved AURORA-1964.

   Resolution: Fixed
Fix Version/s: 0.20.0

> Move Vagrant setup from Trusty to Xenial
> 
>
> Key: AURORA-1964
> URL: https://issues.apache.org/jira/browse/AURORA-1964
> Project: Aurora
>  Issue Type: Task
>Reporter: Renan DelValle
>Assignee: Renan DelValle
>Priority: Major
> Fix For: 0.20.0
>
>
> We're really behind the curve on this one as the next LTS will be released in 
> April.
> The move is made difficult by the change in init systems between Trusty and 
> Xenial.
> Furthermore, our recent upgrade to Thrift 0.10.0 has caused some issues with 
> our Packer set up as the deb packages for 0.10.0 are not in the correct 
> repository. Latest version in the repository is 0.9.3: 
> http://dl.bintray.com/apache/thrift/debian/dists/
> Making Packer fail at: 
> https://github.com/apache/aurora/blob/master/build-support/packer/build.sh#L118
> [~jfarrell] any chance you can help us unblock this by releasing official 
> packages?
> Otherwise, we could compile the 0.10.0 from scratch in our packer process but 
> that might balloon the image size somewhat.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (AURORA-1962) Incorrect parsing of empty strings into list command line options

2018-02-07 Thread Renan DelValle (JIRA)

 [ 
https://issues.apache.org/jira/browse/AURORA-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Renan DelValle updated AURORA-1962:
---
Fix Version/s: 0.19.1

> Incorrect parsing of empty strings into list command line options
> -
>
> Key: AURORA-1962
> URL: https://issues.apache.org/jira/browse/AURORA-1962
> Project: Aurora
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 0.19.0
>Reporter: Bill Farner
>Assignee: Renan DelValle
>Priority: Major
> Fix For: 0.19.1
>
>
> When the scheduler parses a command line option like 
> {{-thermos_executor_resources=}}, which maps to {{List}}, the result 
> is equivalent to {{[""]}} (list of size 1 containing an empty string), while 
> we would expect {{[]}} (an empty list).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AURORA-1967) Move from FindBugs to SpotBugs

2018-01-30 Thread Renan DelValle (JIRA)

[ 
https://issues.apache.org/jira/browse/AURORA-1967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16345620#comment-16345620
 ] 

Renan DelValle commented on AURORA-1967:


Sorry, must have missed it! But this is the best turn around time I've had on 
any JIRA ticket I've ever filed. 

> Move from FindBugs to SpotBugs
> --
>
> Key: AURORA-1967
> URL: https://issues.apache.org/jira/browse/AURORA-1967
> Project: Aurora
>  Issue Type: Task
>Reporter: Renan DelValle
>Priority: Minor
>
> FindBugs project is dead: 
> [https://mailman.cs.umd.edu/pipermail/findbugs-discuss/2017-September/004383.html]
> We should switch to it's successor, SpotBugs 
> ([https://spotbugs.github.io/|https://spotbugs.github.io/)] ) as soon as 
> possible to enjoy any enhancements that have been introduced since the 
> FindBugs 3.0.0 version.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AURORA-1967) Move from FindBugs to SpotBugs

2018-01-29 Thread Renan DelValle (JIRA)
Renan DelValle created AURORA-1967:
--

 Summary: Move from FindBugs to SpotBugs
 Key: AURORA-1967
 URL: https://issues.apache.org/jira/browse/AURORA-1967
 Project: Aurora
  Issue Type: Task
Reporter: Renan DelValle


FindBugs project is dead: 
[https://mailman.cs.umd.edu/pipermail/findbugs-discuss/2017-September/004383.html]

We should switch to it's successor, SpotBugs 
([https://spotbugs.github.io/|https://spotbugs.github.io/)] ) as soon as 
possible to enjoy any enhancements that have been introduced since the FindBugs 
3.0.0 version.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (AURORA-1964) Move Vagrant setup from Trusty to Xenial

2018-01-19 Thread Renan DelValle (JIRA)

[ 
https://issues.apache.org/jira/browse/AURORA-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16332534#comment-16332534
 ] 

Renan DelValle commented on AURORA-1964:


Works for me! Thanks for input guys, I'll see if I can't send a PR by the end 
of the day.

> Move Vagrant setup from Trusty to Xenial
> 
>
> Key: AURORA-1964
> URL: https://issues.apache.org/jira/browse/AURORA-1964
> Project: Aurora
>  Issue Type: Task
>Reporter: Renan DelValle
>Assignee: Renan DelValle
>Priority: Major
>
> We're really behind the curve on this one as the next LTS will be released in 
> April.
> The move is made difficult by the change in init systems between Trusty and 
> Xenial.
> Furthermore, our recent upgrade to Thrift 0.10.0 has caused some issues with 
> our Packer set up as the deb packages for 0.10.0 are not in the correct 
> repository. Latest version in the repository is 0.9.3: 
> http://dl.bintray.com/apache/thrift/debian/dists/
> Making Packer fail at: 
> https://github.com/apache/aurora/blob/master/build-support/packer/build.sh#L118
> [~jfarrell] any chance you can help us unblock this by releasing official 
> packages?
> Otherwise, we could compile the 0.10.0 from scratch in our packer process but 
> that might balloon the image size somewhat.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (AURORA-1964) Move Vagrant setup from Trusty to Xenial

2018-01-18 Thread Renan DelValle (JIRA)
Renan DelValle created AURORA-1964:
--

 Summary: Move Vagrant setup from Trusty to Xenial
 Key: AURORA-1964
 URL: https://issues.apache.org/jira/browse/AURORA-1964
 Project: Aurora
  Issue Type: Task
Reporter: Renan DelValle
Assignee: Renan DelValle


We're really behind the curve on this one as the next LTS will be released in 
April.

The move is made difficult by the change in init systems between Trusty and 
Xenial.

Furthermore, our recent upgrade to Thrift 0.10.0 has caused some issues with 
our Packer set up as the deb packages for 0.10.0 are not in the correct 
repository. Latest version in the repository is 0.9.3: 
http://dl.bintray.com/apache/thrift/debian/dists/

Making Packer fail at: 
https://github.com/apache/aurora/blob/master/build-support/packer/build.sh#L118

[~jfarrell] any chance you can help us unblock this by releasing official 
packages?

Otherwise, we could compile the 0.10.0 from scratch in our packer process but 
that might balloon the image size somewhat.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (AURORA-1734) Configurable Metadata prefix

2017-09-28 Thread Renan DelValle (JIRA)

 [ 
https://issues.apache.org/jira/browse/AURORA-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Renan DelValle reassigned AURORA-1734:
--

Assignee: (was: Renan DelValle)

Unassigning this from myself as I don't think it's worth the investment to 
change it any more.

> Configurable Metadata prefix
> 
>
> Key: AURORA-1734
> URL: https://issues.apache.org/jira/browse/AURORA-1734
> Project: Aurora
>  Issue Type: Task
>  Components: Scheduler
>Reporter: Renan DelValle
>Priority: Trivial
>
> Currently, a prefix ("org.apache.aurora.metadata.") is injected into the 
> metadata key in the scheduler. It would be beneficial to allow users to set 
> their own metadata prefix (including an empty string).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (AURORA-1944) Aurora is unable to elect leader after losing ZK for an extended period of time

2017-08-04 Thread Renan DelValle (JIRA)

 [ 
https://issues.apache.org/jira/browse/AURORA-1944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Renan DelValle updated AURORA-1944:
---
Description: 
Using Apache Curator as the Zookeeper library causes an issue where Aurora is 
unable to elect a leader if Zookeeper loses quorum for an extended period of 
time.

Scheduler seems to crash around:

{{W0802 14:01:14.436 [TaskEventBatchWorker, SchedulerLifecycle] Failed to leave 
leadership: org.apache.aurora.common.zookeeper.SingletonService$LeaveException: 
Failed to abdicate leadership of group at /aurora/scheduler}}

When the init system brings the scheduler back up, it is unable to elect a 
leader if ZK is still down.

Specifically, the redirect monitor fails:

{{E0802 14:09:37.063 [RedirectMonitor STARTING, 
GuavaUtils$LifecycleShutdownListener] Service: RedirectMonitor [FAILED] failed 
unexpectedly. Triggering shutdown.}}

Leading to every scheduler showing the following:

{{W0802 14:16:34.646 [qtp576711849-43, LeaderRedirect] No serviceGroupMonitor 
in host set, will not redirect despite not being leader.}}

Once the scheduler enters this state, it is unable to snap out of it until it 
is manually restarted.



  was:
Using Apache Curator as the Zookeeper library causes an issue where Aurora is 
unable to elect a leader if Zookeeper loses quorum for an extended period of 
time.

Scheduler seems to crash around:
{{W0802 14:01:14.436 [TaskEventBatchWorker, SchedulerLifecycle] Failed to leave 
leadership: org.apache.aurora.common.zookeeper.SingletonService$LeaveException: 
Failed to abdicate leadership of group at /aurora/scheduler }}

When the init system brings the scheduler back up, it is unable to elect a 
leader if ZK is still down.

Once the scheduler enters this state, it is unable to snap out of it until it 
is manually restarted.





> Aurora is unable to elect leader after losing ZK for an extended period of 
> time
> ---
>
> Key: AURORA-1944
> URL: https://issues.apache.org/jira/browse/AURORA-1944
> Project: Aurora
>  Issue Type: Bug
>  Components: Scheduler
> Environment: Running on 0.17.0
>Reporter: Renan DelValle
> Attachments: aurora-0.log, aurora-1.log, aurora-2.log
>
>
> Using Apache Curator as the Zookeeper library causes an issue where Aurora is 
> unable to elect a leader if Zookeeper loses quorum for an extended period of 
> time.
> Scheduler seems to crash around:
> {{W0802 14:01:14.436 [TaskEventBatchWorker, SchedulerLifecycle] Failed to 
> leave leadership: 
> org.apache.aurora.common.zookeeper.SingletonService$LeaveException: Failed to 
> abdicate leadership of group at /aurora/scheduler}}
> When the init system brings the scheduler back up, it is unable to elect a 
> leader if ZK is still down.
> Specifically, the redirect monitor fails:
> {{E0802 14:09:37.063 [RedirectMonitor STARTING, 
> GuavaUtils$LifecycleShutdownListener] Service: RedirectMonitor [FAILED] 
> failed unexpectedly. Triggering shutdown.}}
> Leading to every scheduler showing the following:
> {{W0802 14:16:34.646 [qtp576711849-43, LeaderRedirect] No serviceGroupMonitor 
> in host set, will not redirect despite not being leader.}}
> Once the scheduler enters this state, it is unable to snap out of it until it 
> is manually restarted.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (AURORA-1944) Aurora is unable to elect leader after losing ZK for an extended period of time

2017-08-04 Thread Renan DelValle (JIRA)

 [ 
https://issues.apache.org/jira/browse/AURORA-1944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Renan DelValle updated AURORA-1944:
---
Attachment: aurora-1.log
aurora-2.log

Attaching logs for the other 2 schedulers.

> Aurora is unable to elect leader after losing ZK for an extended period of 
> time
> ---
>
> Key: AURORA-1944
> URL: https://issues.apache.org/jira/browse/AURORA-1944
> Project: Aurora
>  Issue Type: Bug
>  Components: Scheduler
> Environment: Running on 0.17.0
>Reporter: Renan DelValle
> Attachments: aurora-0.log, aurora-1.log, aurora-2.log
>
>
> Using Apache Curator as the Zookeeper library causes an issue where Aurora is 
> unable to elect a leader if Zookeeper loses quorum for an extended period of 
> time.
> Scheduler seems to crash around:
> {{W0802 14:01:14.436 [TaskEventBatchWorker, SchedulerLifecycle] Failed to 
> leave leadership: 
> org.apache.aurora.common.zookeeper.SingletonService$LeaveException: Failed to 
> abdicate leadership of group at /aurora/scheduler }}
> When the init system brings the scheduler back up, it is unable to elect a 
> leader if ZK is still down.
> Once the scheduler enters this state, it is unable to snap out of it until it 
> is manually restarted.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (AURORA-1944) Aurora is unable to elect leader after losing ZK for an extended period of time

2017-08-04 Thread Renan DelValle (JIRA)

 [ 
https://issues.apache.org/jira/browse/AURORA-1944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Renan DelValle updated AURORA-1944:
---
Attachment: aurora-0.log

Aurora scheduler 1 out of 3.

> Aurora is unable to elect leader after losing ZK for an extended period of 
> time
> ---
>
> Key: AURORA-1944
> URL: https://issues.apache.org/jira/browse/AURORA-1944
> Project: Aurora
>  Issue Type: Bug
>  Components: Scheduler
> Environment: Running on 0.17.0
>Reporter: Renan DelValle
> Attachments: aurora-0.log
>
>
> Using Apache Curator as the Zookeeper library causes an issue where Aurora is 
> unable to elect a leader if Zookeeper loses quorum for an extended period of 
> time.
> Scheduler seems to crash around:
> {{W0802 14:01:14.436 [TaskEventBatchWorker, SchedulerLifecycle] Failed to 
> leave leadership: 
> org.apache.aurora.common.zookeeper.SingletonService$LeaveException: Failed to 
> abdicate leadership of group at /aurora/scheduler }}
> When the init system brings the scheduler back up, it is unable to elect a 
> leader if ZK is still down.
> Once the scheduler enters this state, it is unable to snap out of it until it 
> is manually restarted.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (AURORA-1942) Improve Aurora behavior with regards to Mesos Agents violating reregistration timeouts

2017-07-18 Thread Renan DelValle (JIRA)
Renan DelValle created AURORA-1942:
--

 Summary: Improve Aurora behavior with regards to Mesos Agents 
violating reregistration timeouts
 Key: AURORA-1942
 URL: https://issues.apache.org/jira/browse/AURORA-1942
 Project: Aurora
  Issue Type: Task
  Components: Scheduler
Reporter: Renan DelValle


A Mesos Agent Lost message can be received in two scenarios resulting in 
different outcomes:

1) A Mesos Agent can fail the health check done by the Mesos Master 
(max_agent_ping_timeouts violation) which leads to an Agent Lost message along 
with TASK_LOST messages for each task running on the unhealthy Agent.

2) A Mesos Agent can fail to re-register after an election has taken place 
(agent_reregister_timeout violation). In this situation the newly elected Mesos 
master, because Master's do not store any information concerning the tasks that 
are currently running, is unable to send a TASK_LOST message for the tasks that 
were running on the Agent that failed to re-register.

Scenario number 2 can lead to (a) "missing" instances for the tasks scheduled 
on the rogue Agent until an explicit reconciliation is done and/or (b) "leaked" 
tasks if the Agent re-registers after Aurora has replaced the missing tasks 
that will only be cleaned upon an implicit reconciliation.

For (a), one solution is to transition tasks in a missing Agent to the LOST 
state upon receiving a Slave Lost message.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (AURORA-1712) Debian Jessie packagaes are embedding the mesos egg build for Ubuntu trusty

2017-05-11 Thread Renan DelValle (JIRA)

 [ 
https://issues.apache.org/jira/browse/AURORA-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Renan DelValle closed AURORA-1712.
--
   Resolution: Fixed
Fix Version/s: 0.17.0

Added builder and test environment for Xenial as well as updated instructions
on how to test it. Added distribution to release-candidate script.

Bugs closed: AURORA-1872

Reviewed at https://reviews.apache.org/r/52437/

> Debian Jessie packagaes are embedding the mesos egg build for Ubuntu trusty
> ---
>
> Key: AURORA-1712
> URL: https://issues.apache.org/jira/browse/AURORA-1712
> Project: Aurora
>  Issue Type: Bug
>Reporter: Stephan Erb
>Assignee: Renan DelValle
> Fix For: 0.17.0
>
>
> The Debian packaging scripts for Trusty and Jessie are sharing the same 
> override mechanism for the pants third_party repository. We therefore end up  
> using egg-files build for Ubuntu also on Debian 
> (https://github.com/apache/aurora-packaging/blob/master/specs/debian/aurora-pants.ini)
> It seems like this is kind of working, but is clearly not optimal.
> We should extend 
> https://github.com/apache/aurora/blob/master/build-support/python/make-mesos-native-egg
>  to support Debian and then make use of it in our packaging infrastructure 
> https://github.com/apache/aurora-packaging.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (AURORA-1751) Update org.apache.aurora/aurora-api in Maven

2017-01-26 Thread Renan DelValle (JIRA)

[ 
https://issues.apache.org/jira/browse/AURORA-1751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15840652#comment-15840652
 ] 

Renan DelValle commented on AURORA-1751:


I wonder if it wouldn't be a good idea to ask the community if it was OK to 
drop hosting this on Maven but have a way to generate this locally with Gradle. 
It seems like this doesn't get used enough to justify the overhead keeping an 
updated version on Maven brings.

> Update org.apache.aurora/aurora-api in Maven
> 
>
> Key: AURORA-1751
> URL: https://issues.apache.org/jira/browse/AURORA-1751
> Project: Aurora
>  Issue Type: Task
>  Components: Packaging
>Affects Versions: 0.13.0
>Reporter: Derek Slager
>Assignee: Jake Farrell
>Priority: Minor
>
> Currently the version of org.apache.aurora/aurora-api available on Maven 
> Central is 0.8.0, which is several versions out of date. It would be ideal to 
> have up-to-date versions available as new Aurora releases are cut.
> https://mvnrepository.com/artifact/org.apache.aurora/aurora-api



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AURORA-1780) Offers with unknown resources types to Aurora crash the scheduler

2016-11-18 Thread Renan DelValle (JIRA)

[ 
https://issues.apache.org/jira/browse/AURORA-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15678503#comment-15678503
 ] 

Renan DelValle commented on AURORA-1780:


Second stab at this:
https://reviews.apache.org/r/53923/

I think this time I managed to take care of the corner cases where fromResource 
gets called for Protos.Resource. The error before was due to the the filters 
that called fromResource() in it (such as the NON_REVOCABLE) being placed on 
before the SUPPORTED_RESOURCE filter. fromResource() was then called before the 
SUPPORTED_RESOURCE filter had a change to filter out unsupported resources.

So it went something like this Iterales.Filter(Iterables.Filter(resources, 
NON_REVOCABLE), SUPPORTED_RESOURCE), allowing the first filter to call 
fromResource on an unknown resource and crash the scheduler.  

> Offers with unknown resources types to Aurora crash the scheduler
> -
>
> Key: AURORA-1780
> URL: https://issues.apache.org/jira/browse/AURORA-1780
> Project: Aurora
>  Issue Type: Bug
> Environment: vagrant
>Reporter: Renan DelValle
>Assignee: Renan DelValle
> Fix For: 0.17.0
>
>
> Taking offers from Agents which have resources that are not known to Aurora 
> cause the Scheduler to crash.
> Steps to reproduce:
> {code}
> vagrant up
> sudo service mesos-slave stop
> echo 
> "cpus(aurora-role):0.5;cpus(*):3.5;mem(aurora-role):1024;disk:2;gpus(*):4;test:200"
>  | sudo tee /etc/mesos-slave/resources
> sudo rm -f /var/lib/mesos/meta/slaves/latest
> sudo service mesos-slave start
> {code}
> Wait around a few moments for the offer to be made to Aurora
> {code}
> I0922 02:41:57.839 [Thread-19, MesosSchedulerImpl:142] Received notification 
> of lost agent: value: "cadaf569-171d-42fc-a417-fbd608ea5bab-S0"
> I0922 02:42:30.585597  2999 log.cpp:577] Attempting to append 109 bytes to 
> the log
> I0922 02:42:30.585654  2999 coordinator.cpp:348] Coordinator attempting to 
> write APPEND action at position 4
> I0922 02:42:30.585747  2999 replica.cpp:537] Replica received write request 
> for position 4 from (10)@192.168.33.7:8083
> I0922 02:42:30.586858  2999 leveldb.cpp:341] Persisting action (125 bytes) to 
> leveldb took 1.086601ms
> I0922 02:42:30.586897  2999 replica.cpp:712] Persisted action at 4
> I0922 02:42:30.587020  2999 replica.cpp:691] Replica received learned notice 
> for position 4 from @0.0.0.0:0
> I0922 02:42:30.587785  2999 leveldb.cpp:341] Persisting action (127 bytes) to 
> leveldb took 746999ns
> I0922 02:42:30.587805  2999 replica.cpp:712] Persisted action at 4
> I0922 02:42:30.587811  2999 replica.cpp:697] Replica learned APPEND action at 
> position 4
> I0922 02:42:30.601 [SchedulerImpl-0, OfferManager$OfferManagerImpl:185] 
> Returning offers for cadaf569-171d-42fc-a417-fbd608ea5bab-S1 for compaction.
> Sep 22, 2016 2:42:38 AM 
> com.google.common.util.concurrent.ServiceManager$ServiceListener failed
> SEVERE: Service SlotSizeCounterService [FAILED] has failed in the RUNNING 
> state.
> java.lang.NullPointerException: Unknown Mesos resource: name: "test"
> type: SCALAR
> scalar {
>   value: 200.0
> }
> role: "*"
>   at java.util.Objects.requireNonNull(Objects.java:228)
>   at 
> org.apache.aurora.scheduler.resources.ResourceType.fromResource(ResourceType.java:355)
>   at 
> org.apache.aurora.scheduler.resources.ResourceManager.lambda$static$0(ResourceManager.java:52)
>   at com.google.common.collect.Iterators$7.computeNext(Iterators.java:675)
>   at 
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
>   at 
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
>   at java.util.Iterator.forEachRemaining(Iterator.java:115)
>   at 
> java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
>   at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
>   at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
>   at 
> java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
>   at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>   at 
> java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
>   at 
> org.apache.aurora.scheduler.resources.ResourceManager.bagFromResources(ResourceManager.java:274)
>   at 
> org.apache.aurora.scheduler.resources.ResourceManager.bagFromMesosResources(ResourceManager.java:239)
>   at 
> org.apache.aurora.scheduler.stats.AsyncStatsModule$OfferAdapter.get(AsyncStatsModule.java:153)
>   at 
> org.apache.aurora.scheduler.stats.SlotSizeCounter.run(SlotSizeCounter.java:168)
>   at 
> 

[jira] [Reopened] (AURORA-1780) Offers with unknown resources types to Aurora crash the scheduler

2016-11-18 Thread Renan DelValle (JIRA)

 [ 
https://issues.apache.org/jira/browse/AURORA-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Renan DelValle reopened AURORA-1780:


The filter gets ignored at some point once again by the SlotSizeCounter. At 
least now the bug now takes up a little longer to show up. Currently, 
investigating the root of the problem.

> Offers with unknown resources types to Aurora crash the scheduler
> -
>
> Key: AURORA-1780
> URL: https://issues.apache.org/jira/browse/AURORA-1780
> Project: Aurora
>  Issue Type: Bug
> Environment: vagrant
>Reporter: Renan DelValle
>Assignee: Renan DelValle
> Fix For: 0.17.0
>
>
> Taking offers from Agents which have resources that are not known to Aurora 
> cause the Scheduler to crash.
> Steps to reproduce:
> {code}
> vagrant up
> sudo service mesos-slave stop
> echo 
> "cpus(aurora-role):0.5;cpus(*):3.5;mem(aurora-role):1024;disk:2;gpus(*):4;test:200"
>  | sudo tee /etc/mesos-slave/resources
> sudo rm -f /var/lib/mesos/meta/slaves/latest
> sudo service mesos-slave start
> {code}
> Wait around a few moments for the offer to be made to Aurora
> {code}
> I0922 02:41:57.839 [Thread-19, MesosSchedulerImpl:142] Received notification 
> of lost agent: value: "cadaf569-171d-42fc-a417-fbd608ea5bab-S0"
> I0922 02:42:30.585597  2999 log.cpp:577] Attempting to append 109 bytes to 
> the log
> I0922 02:42:30.585654  2999 coordinator.cpp:348] Coordinator attempting to 
> write APPEND action at position 4
> I0922 02:42:30.585747  2999 replica.cpp:537] Replica received write request 
> for position 4 from (10)@192.168.33.7:8083
> I0922 02:42:30.586858  2999 leveldb.cpp:341] Persisting action (125 bytes) to 
> leveldb took 1.086601ms
> I0922 02:42:30.586897  2999 replica.cpp:712] Persisted action at 4
> I0922 02:42:30.587020  2999 replica.cpp:691] Replica received learned notice 
> for position 4 from @0.0.0.0:0
> I0922 02:42:30.587785  2999 leveldb.cpp:341] Persisting action (127 bytes) to 
> leveldb took 746999ns
> I0922 02:42:30.587805  2999 replica.cpp:712] Persisted action at 4
> I0922 02:42:30.587811  2999 replica.cpp:697] Replica learned APPEND action at 
> position 4
> I0922 02:42:30.601 [SchedulerImpl-0, OfferManager$OfferManagerImpl:185] 
> Returning offers for cadaf569-171d-42fc-a417-fbd608ea5bab-S1 for compaction.
> Sep 22, 2016 2:42:38 AM 
> com.google.common.util.concurrent.ServiceManager$ServiceListener failed
> SEVERE: Service SlotSizeCounterService [FAILED] has failed in the RUNNING 
> state.
> java.lang.NullPointerException: Unknown Mesos resource: name: "test"
> type: SCALAR
> scalar {
>   value: 200.0
> }
> role: "*"
>   at java.util.Objects.requireNonNull(Objects.java:228)
>   at 
> org.apache.aurora.scheduler.resources.ResourceType.fromResource(ResourceType.java:355)
>   at 
> org.apache.aurora.scheduler.resources.ResourceManager.lambda$static$0(ResourceManager.java:52)
>   at com.google.common.collect.Iterators$7.computeNext(Iterators.java:675)
>   at 
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
>   at 
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
>   at java.util.Iterator.forEachRemaining(Iterator.java:115)
>   at 
> java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
>   at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
>   at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
>   at 
> java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
>   at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>   at 
> java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
>   at 
> org.apache.aurora.scheduler.resources.ResourceManager.bagFromResources(ResourceManager.java:274)
>   at 
> org.apache.aurora.scheduler.resources.ResourceManager.bagFromMesosResources(ResourceManager.java:239)
>   at 
> org.apache.aurora.scheduler.stats.AsyncStatsModule$OfferAdapter.get(AsyncStatsModule.java:153)
>   at 
> org.apache.aurora.scheduler.stats.SlotSizeCounter.run(SlotSizeCounter.java:168)
>   at 
> org.apache.aurora.scheduler.stats.AsyncStatsModule$SlotSizeCounterService.runOneIteration(AsyncStatsModule.java:130)
>   at 
> com.google.common.util.concurrent.AbstractScheduledService$ServiceDelegate$Task.run(AbstractScheduledService.java:189)
>   at com.google.common.util.concurrent.Callables$3.run(Callables.java:100)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>   at 
> 

[jira] [Resolved] (AURORA-1780) Offers with unknown resources types to Aurora crash the scheduler

2016-11-17 Thread Renan DelValle (JIRA)

 [ 
https://issues.apache.org/jira/browse/AURORA-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Renan DelValle resolved AURORA-1780.

   Resolution: Fixed
Fix Version/s: 0.17.0

> Offers with unknown resources types to Aurora crash the scheduler
> -
>
> Key: AURORA-1780
> URL: https://issues.apache.org/jira/browse/AURORA-1780
> Project: Aurora
>  Issue Type: Bug
> Environment: vagrant
>Reporter: Renan DelValle
>Assignee: Renan DelValle
> Fix For: 0.17.0
>
>
> Taking offers from Agents which have resources that are not known to Aurora 
> cause the Scheduler to crash.
> Steps to reproduce:
> {code}
> vagrant up
> sudo service mesos-slave stop
> echo 
> "cpus(aurora-role):0.5;cpus(*):3.5;mem(aurora-role):1024;disk:2;gpus(*):4;test:200"
>  | sudo tee /etc/mesos-slave/resources
> sudo rm -f /var/lib/mesos/meta/slaves/latest
> sudo service mesos-slave start
> {code}
> Wait around a few moments for the offer to be made to Aurora
> {code}
> I0922 02:41:57.839 [Thread-19, MesosSchedulerImpl:142] Received notification 
> of lost agent: value: "cadaf569-171d-42fc-a417-fbd608ea5bab-S0"
> I0922 02:42:30.585597  2999 log.cpp:577] Attempting to append 109 bytes to 
> the log
> I0922 02:42:30.585654  2999 coordinator.cpp:348] Coordinator attempting to 
> write APPEND action at position 4
> I0922 02:42:30.585747  2999 replica.cpp:537] Replica received write request 
> for position 4 from (10)@192.168.33.7:8083
> I0922 02:42:30.586858  2999 leveldb.cpp:341] Persisting action (125 bytes) to 
> leveldb took 1.086601ms
> I0922 02:42:30.586897  2999 replica.cpp:712] Persisted action at 4
> I0922 02:42:30.587020  2999 replica.cpp:691] Replica received learned notice 
> for position 4 from @0.0.0.0:0
> I0922 02:42:30.587785  2999 leveldb.cpp:341] Persisting action (127 bytes) to 
> leveldb took 746999ns
> I0922 02:42:30.587805  2999 replica.cpp:712] Persisted action at 4
> I0922 02:42:30.587811  2999 replica.cpp:697] Replica learned APPEND action at 
> position 4
> I0922 02:42:30.601 [SchedulerImpl-0, OfferManager$OfferManagerImpl:185] 
> Returning offers for cadaf569-171d-42fc-a417-fbd608ea5bab-S1 for compaction.
> Sep 22, 2016 2:42:38 AM 
> com.google.common.util.concurrent.ServiceManager$ServiceListener failed
> SEVERE: Service SlotSizeCounterService [FAILED] has failed in the RUNNING 
> state.
> java.lang.NullPointerException: Unknown Mesos resource: name: "test"
> type: SCALAR
> scalar {
>   value: 200.0
> }
> role: "*"
>   at java.util.Objects.requireNonNull(Objects.java:228)
>   at 
> org.apache.aurora.scheduler.resources.ResourceType.fromResource(ResourceType.java:355)
>   at 
> org.apache.aurora.scheduler.resources.ResourceManager.lambda$static$0(ResourceManager.java:52)
>   at com.google.common.collect.Iterators$7.computeNext(Iterators.java:675)
>   at 
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
>   at 
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
>   at java.util.Iterator.forEachRemaining(Iterator.java:115)
>   at 
> java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
>   at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
>   at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
>   at 
> java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
>   at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>   at 
> java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
>   at 
> org.apache.aurora.scheduler.resources.ResourceManager.bagFromResources(ResourceManager.java:274)
>   at 
> org.apache.aurora.scheduler.resources.ResourceManager.bagFromMesosResources(ResourceManager.java:239)
>   at 
> org.apache.aurora.scheduler.stats.AsyncStatsModule$OfferAdapter.get(AsyncStatsModule.java:153)
>   at 
> org.apache.aurora.scheduler.stats.SlotSizeCounter.run(SlotSizeCounter.java:168)
>   at 
> org.apache.aurora.scheduler.stats.AsyncStatsModule$SlotSizeCounterService.runOneIteration(AsyncStatsModule.java:130)
>   at 
> com.google.common.util.concurrent.AbstractScheduledService$ServiceDelegate$Task.run(AbstractScheduledService.java:189)
>   at com.google.common.util.concurrent.Callables$3.run(Callables.java:100)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>   at 
> 

[jira] [Commented] (AURORA-1780) Offers with unknown resources types to Aurora crash the scheduler

2016-11-16 Thread Renan DelValle (JIRA)

[ 
https://issues.apache.org/jira/browse/AURORA-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15672216#comment-15672216
 ] 

Renan DelValle commented on AURORA-1780:


Review request available: https://reviews.apache.org/r/53831/

Would like some feedback on this approach. It seemed the best way to address 
this ticket without going overboard as support for arbitrary resources is 
somewhere in the pipeline (AURORA-1328). I'm open to other ways of tackling 
this issue.

> Offers with unknown resources types to Aurora crash the scheduler
> -
>
> Key: AURORA-1780
> URL: https://issues.apache.org/jira/browse/AURORA-1780
> Project: Aurora
>  Issue Type: Bug
> Environment: vagrant
>Reporter: Renan DelValle
>Assignee: Renan DelValle
>
> Taking offers from Agents which have resources that are not known to Aurora 
> cause the Scheduler to crash.
> Steps to reproduce:
> {code}
> vagrant up
> sudo service mesos-slave stop
> echo 
> "cpus(aurora-role):0.5;cpus(*):3.5;mem(aurora-role):1024;disk:2;gpus(*):4;test:200"
>  | sudo tee /etc/mesos-slave/resources
> sudo rm -f /var/lib/mesos/meta/slaves/latest
> sudo service mesos-slave start
> {code}
> Wait around a few moments for the offer to be made to Aurora
> {code}
> I0922 02:41:57.839 [Thread-19, MesosSchedulerImpl:142] Received notification 
> of lost agent: value: "cadaf569-171d-42fc-a417-fbd608ea5bab-S0"
> I0922 02:42:30.585597  2999 log.cpp:577] Attempting to append 109 bytes to 
> the log
> I0922 02:42:30.585654  2999 coordinator.cpp:348] Coordinator attempting to 
> write APPEND action at position 4
> I0922 02:42:30.585747  2999 replica.cpp:537] Replica received write request 
> for position 4 from (10)@192.168.33.7:8083
> I0922 02:42:30.586858  2999 leveldb.cpp:341] Persisting action (125 bytes) to 
> leveldb took 1.086601ms
> I0922 02:42:30.586897  2999 replica.cpp:712] Persisted action at 4
> I0922 02:42:30.587020  2999 replica.cpp:691] Replica received learned notice 
> for position 4 from @0.0.0.0:0
> I0922 02:42:30.587785  2999 leveldb.cpp:341] Persisting action (127 bytes) to 
> leveldb took 746999ns
> I0922 02:42:30.587805  2999 replica.cpp:712] Persisted action at 4
> I0922 02:42:30.587811  2999 replica.cpp:697] Replica learned APPEND action at 
> position 4
> I0922 02:42:30.601 [SchedulerImpl-0, OfferManager$OfferManagerImpl:185] 
> Returning offers for cadaf569-171d-42fc-a417-fbd608ea5bab-S1 for compaction.
> Sep 22, 2016 2:42:38 AM 
> com.google.common.util.concurrent.ServiceManager$ServiceListener failed
> SEVERE: Service SlotSizeCounterService [FAILED] has failed in the RUNNING 
> state.
> java.lang.NullPointerException: Unknown Mesos resource: name: "test"
> type: SCALAR
> scalar {
>   value: 200.0
> }
> role: "*"
>   at java.util.Objects.requireNonNull(Objects.java:228)
>   at 
> org.apache.aurora.scheduler.resources.ResourceType.fromResource(ResourceType.java:355)
>   at 
> org.apache.aurora.scheduler.resources.ResourceManager.lambda$static$0(ResourceManager.java:52)
>   at com.google.common.collect.Iterators$7.computeNext(Iterators.java:675)
>   at 
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
>   at 
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
>   at java.util.Iterator.forEachRemaining(Iterator.java:115)
>   at 
> java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
>   at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
>   at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
>   at 
> java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
>   at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>   at 
> java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
>   at 
> org.apache.aurora.scheduler.resources.ResourceManager.bagFromResources(ResourceManager.java:274)
>   at 
> org.apache.aurora.scheduler.resources.ResourceManager.bagFromMesosResources(ResourceManager.java:239)
>   at 
> org.apache.aurora.scheduler.stats.AsyncStatsModule$OfferAdapter.get(AsyncStatsModule.java:153)
>   at 
> org.apache.aurora.scheduler.stats.SlotSizeCounter.run(SlotSizeCounter.java:168)
>   at 
> org.apache.aurora.scheduler.stats.AsyncStatsModule$SlotSizeCounterService.runOneIteration(AsyncStatsModule.java:130)
>   at 
> com.google.common.util.concurrent.AbstractScheduledService$ServiceDelegate$Task.run(AbstractScheduledService.java:189)
>   at com.google.common.util.concurrent.Callables$3.run(Callables.java:100)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at 

[jira] [Assigned] (AURORA-1780) Offers with unknown resources types to Aurora crash the scheduler

2016-11-03 Thread Renan DelValle (JIRA)

 [ 
https://issues.apache.org/jira/browse/AURORA-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Renan DelValle reassigned AURORA-1780:
--

Assignee: Renan DelValle

> Offers with unknown resources types to Aurora crash the scheduler
> -
>
> Key: AURORA-1780
> URL: https://issues.apache.org/jira/browse/AURORA-1780
> Project: Aurora
>  Issue Type: Bug
> Environment: vagrant
>Reporter: Renan DelValle
>Assignee: Renan DelValle
>
> Taking offers from Agents which have resources that are not known to Aurora 
> cause the Scheduler to crash.
> Steps to reproduce:
> {code}
> vagrant up
> sudo service mesos-slave stop
> echo 
> "cpus(aurora-role):0.5;cpus(*):3.5;mem(aurora-role):1024;disk:2;gpus(*):4;test:200"
>  | sudo tee /etc/mesos-slave/resources
> sudo rm -f /var/lib/mesos/meta/slaves/latest
> sudo service mesos-slave start
> {code}
> Wait around a few moments for the offer to be made to Aurora
> {code}
> I0922 02:41:57.839 [Thread-19, MesosSchedulerImpl:142] Received notification 
> of lost agent: value: "cadaf569-171d-42fc-a417-fbd608ea5bab-S0"
> I0922 02:42:30.585597  2999 log.cpp:577] Attempting to append 109 bytes to 
> the log
> I0922 02:42:30.585654  2999 coordinator.cpp:348] Coordinator attempting to 
> write APPEND action at position 4
> I0922 02:42:30.585747  2999 replica.cpp:537] Replica received write request 
> for position 4 from (10)@192.168.33.7:8083
> I0922 02:42:30.586858  2999 leveldb.cpp:341] Persisting action (125 bytes) to 
> leveldb took 1.086601ms
> I0922 02:42:30.586897  2999 replica.cpp:712] Persisted action at 4
> I0922 02:42:30.587020  2999 replica.cpp:691] Replica received learned notice 
> for position 4 from @0.0.0.0:0
> I0922 02:42:30.587785  2999 leveldb.cpp:341] Persisting action (127 bytes) to 
> leveldb took 746999ns
> I0922 02:42:30.587805  2999 replica.cpp:712] Persisted action at 4
> I0922 02:42:30.587811  2999 replica.cpp:697] Replica learned APPEND action at 
> position 4
> I0922 02:42:30.601 [SchedulerImpl-0, OfferManager$OfferManagerImpl:185] 
> Returning offers for cadaf569-171d-42fc-a417-fbd608ea5bab-S1 for compaction.
> Sep 22, 2016 2:42:38 AM 
> com.google.common.util.concurrent.ServiceManager$ServiceListener failed
> SEVERE: Service SlotSizeCounterService [FAILED] has failed in the RUNNING 
> state.
> java.lang.NullPointerException: Unknown Mesos resource: name: "test"
> type: SCALAR
> scalar {
>   value: 200.0
> }
> role: "*"
>   at java.util.Objects.requireNonNull(Objects.java:228)
>   at 
> org.apache.aurora.scheduler.resources.ResourceType.fromResource(ResourceType.java:355)
>   at 
> org.apache.aurora.scheduler.resources.ResourceManager.lambda$static$0(ResourceManager.java:52)
>   at com.google.common.collect.Iterators$7.computeNext(Iterators.java:675)
>   at 
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
>   at 
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
>   at java.util.Iterator.forEachRemaining(Iterator.java:115)
>   at 
> java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
>   at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
>   at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
>   at 
> java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
>   at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>   at 
> java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
>   at 
> org.apache.aurora.scheduler.resources.ResourceManager.bagFromResources(ResourceManager.java:274)
>   at 
> org.apache.aurora.scheduler.resources.ResourceManager.bagFromMesosResources(ResourceManager.java:239)
>   at 
> org.apache.aurora.scheduler.stats.AsyncStatsModule$OfferAdapter.get(AsyncStatsModule.java:153)
>   at 
> org.apache.aurora.scheduler.stats.SlotSizeCounter.run(SlotSizeCounter.java:168)
>   at 
> org.apache.aurora.scheduler.stats.AsyncStatsModule$SlotSizeCounterService.runOneIteration(AsyncStatsModule.java:130)
>   at 
> com.google.common.util.concurrent.AbstractScheduledService$ServiceDelegate$Task.run(AbstractScheduledService.java:189)
>   at com.google.common.util.concurrent.Callables$3.run(Callables.java:100)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>   at 
> 

[jira] [Commented] (AURORA-1780) Offers with unknown resources types to Aurora crash the scheduler

2016-11-03 Thread Renan DelValle (JIRA)

[ 
https://issues.apache.org/jira/browse/AURORA-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15634485#comment-15634485
 ] 

Renan DelValle commented on AURORA-1780:


Would everyone be OK with ignoring unknown resource types and letting the 
scheduler carry on for now?

> Offers with unknown resources types to Aurora crash the scheduler
> -
>
> Key: AURORA-1780
> URL: https://issues.apache.org/jira/browse/AURORA-1780
> Project: Aurora
>  Issue Type: Bug
> Environment: vagrant
>Reporter: Renan DelValle
>
> Taking offers from Agents which have resources that are not known to Aurora 
> cause the Scheduler to crash.
> Steps to reproduce:
> {code}
> vagrant up
> sudo service mesos-slave stop
> echo 
> "cpus(aurora-role):0.5;cpus(*):3.5;mem(aurora-role):1024;disk:2;gpus(*):4;test:200"
>  | sudo tee /etc/mesos-slave/resources
> sudo rm -f /var/lib/mesos/meta/slaves/latest
> sudo service mesos-slave start
> {code}
> Wait around a few moments for the offer to be made to Aurora
> {code}
> I0922 02:41:57.839 [Thread-19, MesosSchedulerImpl:142] Received notification 
> of lost agent: value: "cadaf569-171d-42fc-a417-fbd608ea5bab-S0"
> I0922 02:42:30.585597  2999 log.cpp:577] Attempting to append 109 bytes to 
> the log
> I0922 02:42:30.585654  2999 coordinator.cpp:348] Coordinator attempting to 
> write APPEND action at position 4
> I0922 02:42:30.585747  2999 replica.cpp:537] Replica received write request 
> for position 4 from (10)@192.168.33.7:8083
> I0922 02:42:30.586858  2999 leveldb.cpp:341] Persisting action (125 bytes) to 
> leveldb took 1.086601ms
> I0922 02:42:30.586897  2999 replica.cpp:712] Persisted action at 4
> I0922 02:42:30.587020  2999 replica.cpp:691] Replica received learned notice 
> for position 4 from @0.0.0.0:0
> I0922 02:42:30.587785  2999 leveldb.cpp:341] Persisting action (127 bytes) to 
> leveldb took 746999ns
> I0922 02:42:30.587805  2999 replica.cpp:712] Persisted action at 4
> I0922 02:42:30.587811  2999 replica.cpp:697] Replica learned APPEND action at 
> position 4
> I0922 02:42:30.601 [SchedulerImpl-0, OfferManager$OfferManagerImpl:185] 
> Returning offers for cadaf569-171d-42fc-a417-fbd608ea5bab-S1 for compaction.
> Sep 22, 2016 2:42:38 AM 
> com.google.common.util.concurrent.ServiceManager$ServiceListener failed
> SEVERE: Service SlotSizeCounterService [FAILED] has failed in the RUNNING 
> state.
> java.lang.NullPointerException: Unknown Mesos resource: name: "test"
> type: SCALAR
> scalar {
>   value: 200.0
> }
> role: "*"
>   at java.util.Objects.requireNonNull(Objects.java:228)
>   at 
> org.apache.aurora.scheduler.resources.ResourceType.fromResource(ResourceType.java:355)
>   at 
> org.apache.aurora.scheduler.resources.ResourceManager.lambda$static$0(ResourceManager.java:52)
>   at com.google.common.collect.Iterators$7.computeNext(Iterators.java:675)
>   at 
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
>   at 
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
>   at java.util.Iterator.forEachRemaining(Iterator.java:115)
>   at 
> java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
>   at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
>   at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
>   at 
> java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
>   at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>   at 
> java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
>   at 
> org.apache.aurora.scheduler.resources.ResourceManager.bagFromResources(ResourceManager.java:274)
>   at 
> org.apache.aurora.scheduler.resources.ResourceManager.bagFromMesosResources(ResourceManager.java:239)
>   at 
> org.apache.aurora.scheduler.stats.AsyncStatsModule$OfferAdapter.get(AsyncStatsModule.java:153)
>   at 
> org.apache.aurora.scheduler.stats.SlotSizeCounter.run(SlotSizeCounter.java:168)
>   at 
> org.apache.aurora.scheduler.stats.AsyncStatsModule$SlotSizeCounterService.runOneIteration(AsyncStatsModule.java:130)
>   at 
> com.google.common.util.concurrent.AbstractScheduledService$ServiceDelegate$Task.run(AbstractScheduledService.java:189)
>   at com.google.common.util.concurrent.Callables$3.run(Callables.java:100)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>   at 
> 

[jira] [Assigned] (AURORA-1712) Debian Jessie packagaes are embedding the mesos egg build for Ubuntu trusty

2016-10-04 Thread Renan DelValle (JIRA)

 [ 
https://issues.apache.org/jira/browse/AURORA-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Renan DelValle reassigned AURORA-1712:
--

Assignee: Renan DelValle

> Debian Jessie packagaes are embedding the mesos egg build for Ubuntu trusty
> ---
>
> Key: AURORA-1712
> URL: https://issues.apache.org/jira/browse/AURORA-1712
> Project: Aurora
>  Issue Type: Bug
>Reporter: Stephan Erb
>Assignee: Renan DelValle
>
> The Debian packaging scripts for Trusty and Jessie are sharing the same 
> override mechanism for the pants third_party repository. We therefore end up  
> using egg-files build for Ubuntu also on Debian 
> (https://github.com/apache/aurora-packaging/blob/master/specs/debian/aurora-pants.ini)
> It seems like this is kind of working, but is clearly not optimal.
> We should extend 
> https://github.com/apache/aurora/blob/master/build-support/python/make-mesos-native-egg
>  to support Debian and then make use of it in our packaging infrastructure 
> https://github.com/apache/aurora-packaging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AURORA-1712) Debian Jessie packagaes are embedding the mesos egg build for Ubuntu trusty

2016-10-04 Thread Renan DelValle (JIRA)

[ 
https://issues.apache.org/jira/browse/AURORA-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15546838#comment-15546838
 ] 

Renan DelValle commented on AURORA-1712:


https://reviews.apache.org/r/52531/

> Debian Jessie packagaes are embedding the mesos egg build for Ubuntu trusty
> ---
>
> Key: AURORA-1712
> URL: https://issues.apache.org/jira/browse/AURORA-1712
> Project: Aurora
>  Issue Type: Bug
>Reporter: Stephan Erb
>Assignee: Renan DelValle
>
> The Debian packaging scripts for Trusty and Jessie are sharing the same 
> override mechanism for the pants third_party repository. We therefore end up  
> using egg-files build for Ubuntu also on Debian 
> (https://github.com/apache/aurora-packaging/blob/master/specs/debian/aurora-pants.ini)
> It seems like this is kind of working, but is clearly not optimal.
> We should extend 
> https://github.com/apache/aurora/blob/master/build-support/python/make-mesos-native-egg
>  to support Debian and then make use of it in our packaging infrastructure 
> https://github.com/apache/aurora-packaging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AURORA-1780) Offers with unknown resources types to Aurora crash the scheduler

2016-09-22 Thread Renan DelValle (JIRA)

[ 
https://issues.apache.org/jira/browse/AURORA-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15514331#comment-15514331
 ] 

Renan DelValle commented on AURORA-1780:


FWIW,  I have another framework running that relies on arbitrary resources 
(research oriented). To run Aurora on our cluster I have to shut it all down, 
remove the arbitrary resources, and bring the cluster back up.  And then do the 
reverse when I run my research framework. So, all in all, this issue is a 
pretty big thorn on my side. On systems running systemd (tested on ubuntu 
xenial) this an even nastier issue because they system brings it back up after 
it crashes, hiding the issue in plain sight until the logs are checked.

> Offers with unknown resources types to Aurora crash the scheduler
> -
>
> Key: AURORA-1780
> URL: https://issues.apache.org/jira/browse/AURORA-1780
> Project: Aurora
>  Issue Type: Bug
> Environment: vagrant
>Reporter: Renan DelValle
>
> Taking offers from Agents which have resources that are not known to Aurora 
> cause the Scheduler to crash.
> Steps to reproduce:
> {code}
> vagrant up
> sudo service mesos-slave stop
> echo 
> "cpus(aurora-role):0.5;cpus(*):3.5;mem(aurora-role):1024;disk:2;gpus(*):4;test:200"
>  | sudo tee /etc/mesos-slave/resources
> sudo rm -f /var/lib/mesos/meta/slaves/latest
> sudo service mesos-slave start
> {code}
> Wait around a few moments for the offer to be made to Aurora
> {code}
> I0922 02:41:57.839 [Thread-19, MesosSchedulerImpl:142] Received notification 
> of lost agent: value: "cadaf569-171d-42fc-a417-fbd608ea5bab-S0"
> I0922 02:42:30.585597  2999 log.cpp:577] Attempting to append 109 bytes to 
> the log
> I0922 02:42:30.585654  2999 coordinator.cpp:348] Coordinator attempting to 
> write APPEND action at position 4
> I0922 02:42:30.585747  2999 replica.cpp:537] Replica received write request 
> for position 4 from (10)@192.168.33.7:8083
> I0922 02:42:30.586858  2999 leveldb.cpp:341] Persisting action (125 bytes) to 
> leveldb took 1.086601ms
> I0922 02:42:30.586897  2999 replica.cpp:712] Persisted action at 4
> I0922 02:42:30.587020  2999 replica.cpp:691] Replica received learned notice 
> for position 4 from @0.0.0.0:0
> I0922 02:42:30.587785  2999 leveldb.cpp:341] Persisting action (127 bytes) to 
> leveldb took 746999ns
> I0922 02:42:30.587805  2999 replica.cpp:712] Persisted action at 4
> I0922 02:42:30.587811  2999 replica.cpp:697] Replica learned APPEND action at 
> position 4
> I0922 02:42:30.601 [SchedulerImpl-0, OfferManager$OfferManagerImpl:185] 
> Returning offers for cadaf569-171d-42fc-a417-fbd608ea5bab-S1 for compaction.
> Sep 22, 2016 2:42:38 AM 
> com.google.common.util.concurrent.ServiceManager$ServiceListener failed
> SEVERE: Service SlotSizeCounterService [FAILED] has failed in the RUNNING 
> state.
> java.lang.NullPointerException: Unknown Mesos resource: name: "test"
> type: SCALAR
> scalar {
>   value: 200.0
> }
> role: "*"
>   at java.util.Objects.requireNonNull(Objects.java:228)
>   at 
> org.apache.aurora.scheduler.resources.ResourceType.fromResource(ResourceType.java:355)
>   at 
> org.apache.aurora.scheduler.resources.ResourceManager.lambda$static$0(ResourceManager.java:52)
>   at com.google.common.collect.Iterators$7.computeNext(Iterators.java:675)
>   at 
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
>   at 
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
>   at java.util.Iterator.forEachRemaining(Iterator.java:115)
>   at 
> java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
>   at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
>   at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
>   at 
> java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
>   at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
>   at 
> java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
>   at 
> org.apache.aurora.scheduler.resources.ResourceManager.bagFromResources(ResourceManager.java:274)
>   at 
> org.apache.aurora.scheduler.resources.ResourceManager.bagFromMesosResources(ResourceManager.java:239)
>   at 
> org.apache.aurora.scheduler.stats.AsyncStatsModule$OfferAdapter.get(AsyncStatsModule.java:153)
>   at 
> org.apache.aurora.scheduler.stats.SlotSizeCounter.run(SlotSizeCounter.java:168)
>   at 
> org.apache.aurora.scheduler.stats.AsyncStatsModule$SlotSizeCounterService.runOneIteration(AsyncStatsModule.java:130)
>   at 
> com.google.common.util.concurrent.AbstractScheduledService$ServiceDelegate$Task.run(AbstractScheduledService.java:189)
>   at 

[jira] [Created] (AURORA-1780) Offers with unknown resources to Aurora crash the scheduler

2016-09-21 Thread Renan DelValle (JIRA)
Renan DelValle created AURORA-1780:
--

 Summary: Offers with unknown resources to Aurora crash the 
scheduler
 Key: AURORA-1780
 URL: https://issues.apache.org/jira/browse/AURORA-1780
 Project: Aurora
  Issue Type: Bug
 Environment: vagrant
Reporter: Renan DelValle


Taking offers from Agents which have resources that are not known to Aurora 
cause the Scheduler to crash.

Steps to reproduce:
vagrant up
sudo service mesos-slave stop
echo 
"cpus(aurora-role):0.5;cpus(*):3.5;mem(aurora-role):1024;disk:2;gpus(*):4;test:200"
 | sudo tee /etc/mesos-slave/resources
sudo rm -f /var/lib/mesos/meta/slaves/latest
sudo service mesos-slave start

Wait around a few moments for the offer to be made to Aurora

{code}
I0922 02:41:57.839 [Thread-19, MesosSchedulerImpl:142] Received notification of 
lost agent: value: "cadaf569-171d-42fc-a417-fbd608ea5bab-S0"

I0922 02:42:30.585597  2999 log.cpp:577] Attempting to append 109 bytes to the 
log
I0922 02:42:30.585654  2999 coordinator.cpp:348] Coordinator attempting to 
write APPEND action at position 4
I0922 02:42:30.585747  2999 replica.cpp:537] Replica received write request for 
position 4 from (10)@192.168.33.7:8083
I0922 02:42:30.586858  2999 leveldb.cpp:341] Persisting action (125 bytes) to 
leveldb took 1.086601ms
I0922 02:42:30.586897  2999 replica.cpp:712] Persisted action at 4
I0922 02:42:30.587020  2999 replica.cpp:691] Replica received learned notice 
for position 4 from @0.0.0.0:0
I0922 02:42:30.587785  2999 leveldb.cpp:341] Persisting action (127 bytes) to 
leveldb took 746999ns
I0922 02:42:30.587805  2999 replica.cpp:712] Persisted action at 4
I0922 02:42:30.587811  2999 replica.cpp:697] Replica learned APPEND action at 
position 4
I0922 02:42:30.601 [SchedulerImpl-0, OfferManager$OfferManagerImpl:185] 
Returning offers for cadaf569-171d-42fc-a417-fbd608ea5bab-S1 for compaction.
Sep 22, 2016 2:42:38 AM 
com.google.common.util.concurrent.ServiceManager$ServiceListener failed
SEVERE: Service SlotSizeCounterService [FAILED] has failed in the RUNNING state.
java.lang.NullPointerException: Unknown Mesos resource: name: "test"
type: SCALAR
scalar {
  value: 200.0
}
role: "*"

at java.util.Objects.requireNonNull(Objects.java:228)
at 
org.apache.aurora.scheduler.resources.ResourceType.fromResource(ResourceType.java:355)
at 
org.apache.aurora.scheduler.resources.ResourceManager.lambda$static$0(ResourceManager.java:52)
at com.google.common.collect.Iterators$7.computeNext(Iterators.java:675)
at 
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
at 
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
at java.util.Iterator.forEachRemaining(Iterator.java:115)
at 
java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at 
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at 
java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at 
java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
at 
org.apache.aurora.scheduler.resources.ResourceManager.bagFromResources(ResourceManager.java:274)
at 
org.apache.aurora.scheduler.resources.ResourceManager.bagFromMesosResources(ResourceManager.java:239)
at 
org.apache.aurora.scheduler.stats.AsyncStatsModule$OfferAdapter.get(AsyncStatsModule.java:153)
at 
org.apache.aurora.scheduler.stats.SlotSizeCounter.run(SlotSizeCounter.java:168)
at 
org.apache.aurora.scheduler.stats.AsyncStatsModule$SlotSizeCounterService.runOneIteration(AsyncStatsModule.java:130)
at 
com.google.common.util.concurrent.AbstractScheduledService$ServiceDelegate$Task.run(AbstractScheduledService.java:189)
at com.google.common.util.concurrent.Callables$3.run(Callables.java:100)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

E0922 02:42:38.353 [SlotSizeCounterService RUNNING, 
GuavaUtils$LifecycleShutdownListener:55] Service: SlotSizeCounterService 
[FAILED] failed unexpectedly. Triggering shutdown.
I0922 

[jira] [Resolved] (AURORA-1739) createJob thrift api for golang consistenly failing with empty CronSchedule

2016-09-16 Thread Renan DelValle (JIRA)

 [ 
https://issues.apache.org/jira/browse/AURORA-1739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Renan DelValle resolved AURORA-1739.

   Resolution: Fixed
Fix Version/s: 0.16.0

https://reviews.apache.org/r/51973/

> createJob thrift api for golang consistenly failing with empty CronSchedule
> ---
>
> Key: AURORA-1739
> URL: https://issues.apache.org/jira/browse/AURORA-1739
> Project: Aurora
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 0.15.0
>Reporter: Jimmy Wu
>Assignee: Renan DelValle
>Priority: Critical
> Fix For: 0.16.0
>
>
> trying to create non cron job via the thrift api for golang  but consistently 
> getting error "Cron jobs may only be created/updated by calling 
> scheduleCronJob.".  Root cause :  CronSchedule is not set in JobConfiguration 
> hence an empty string is used, then create job request gets rejected because 
> aurora now treats empty cron schedule as failure (related changes 
> https://reviews.apache.org/r/28571/).  This issue breaks all createJob 
> requests submitted from golang thrift api because empty string is default 
> value for string instead of nil.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AURORA-1739) createJob thrift api for golang consistenly failing with empty CronSchedule

2016-09-07 Thread Renan DelValle (JIRA)

[ 
https://issues.apache.org/jira/browse/AURORA-1739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15471614#comment-15471614
 ] 

Renan DelValle commented on AURORA-1739:


Good to hear that, I'll gear up the patch then.

Unfortunately non-pointer variables can't be set to nil :/

In Go all variables have a zero value, in the case of string, its the empty 
string which will cause the scheduler to always think that {{cronScheduler}} is 
set.

Changing the type to optional causes {{cronScheduler}} to be generated as a 
pointer which indeed can be set to nil.



> createJob thrift api for golang consistenly failing with empty CronSchedule
> ---
>
> Key: AURORA-1739
> URL: https://issues.apache.org/jira/browse/AURORA-1739
> Project: Aurora
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 0.15.0
>Reporter: Jimmy Wu
>Priority: Critical
>
> trying to create non cron job via the thrift api for golang  but consistently 
> getting error "Cron jobs may only be created/updated by calling 
> scheduleCronJob.".  Root cause :  CronSchedule is not set in JobConfiguration 
> hence an empty string is used, then create job request gets rejected because 
> aurora now treats empty cron schedule as failure (related changes 
> https://reviews.apache.org/r/28571/).  This issue breaks all createJob 
> requests submitted from golang thrift api because empty string is default 
> value for string instead of nil.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AURORA-1739) createJob thrift api for golang consistenly failing with empty CronSchedule

2016-09-06 Thread Renan DelValle (JIRA)

[ 
https://issues.apache.org/jira/browse/AURORA-1739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15469144#comment-15469144
 ] 

Renan DelValle commented on AURORA-1739:


I encountered this as well but, thankfully, [~jfarrell] lent me a hand with 
this.  I'm sure you've fixed this issue by now, but for anyone else this might 
help.

This can be fixed by modifying the thrift API from which the go bindings get 
created.

This line: 
https://github.com/apache/aurora/blob/master/api/src/main/thrift/org/apache/aurora/gen/api.thrift#L328

Has to be changed from:
{code}
  4: string cronSchedule
{code}
to:
{code}
  4: optional string cronSchedule
{code}

Maybe I should submit a patch for this but I have to see if this causes any 
issues when any other language's bindings are generated first.

> createJob thrift api for golang consistenly failing with empty CronSchedule
> ---
>
> Key: AURORA-1739
> URL: https://issues.apache.org/jira/browse/AURORA-1739
> Project: Aurora
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 0.15.0
>Reporter: Jimmy Wu
>Priority: Critical
>
> trying to create non cron job via the thrift api for golang  but consistently 
> getting error "Cron jobs may only be created/updated by calling 
> scheduleCronJob.".  Root cause :  CronSchedule is not set in JobConfiguration 
> hence an empty string is used, then create job request gets rejected because 
> aurora now treats empty cron schedule as failure (related changes 
> https://reviews.apache.org/r/28571/).  This issue breaks all createJob 
> requests submitted from golang thrift api because empty string is default 
> value for string instead of nil.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AURORA-1762) /pendingtasks endpoint should show reason tasks are pending

2016-09-06 Thread Renan DelValle (JIRA)

[ 
https://issues.apache.org/jira/browse/AURORA-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15468270#comment-15468270
 ] 

Renan DelValle commented on AURORA-1762:


In that case, I'll ask someone from my research lab to take a crack at this.

> /pendingtasks endpoint should show reason tasks are pending
> ---
>
> Key: AURORA-1762
> URL: https://issues.apache.org/jira/browse/AURORA-1762
> Project: Aurora
>  Issue Type: Task
>Reporter: David Robinson
>Priority: Minor
>  Labels: newbie
>
> the /pendingtasks endpoint is essentially useless as is, it shows that tasks 
> are pending but doesn't show why. The information is also not easily 
> discovered via the /scheduler UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AURORA-1734) Configurable Metadata prefix

2016-08-18 Thread Renan DelValle (JIRA)

[ 
https://issues.apache.org/jira/browse/AURORA-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427421#comment-15427421
 ] 

Renan DelValle commented on AURORA-1734:


So I've been thinking about this quite a bit. And it feels to me that we're 
jumping the gun a bit on a future "what-if" regarding the name collision.

As an alternative, I'd like to propose that we reserve the `org.apache.aurora` 
namespace for future use and remove the prefix altogether (we can even go as 
far as rejecting a task that includes a label key with this prefix). I'm 
interested in hearing what everyone's opinion on this would be.

The reason I've come to be against every label key having the prefix is that we 
need to pass labels to our containers with the compose executor. As such, if we 
have the prefix, we have to create a special "aurora ediiton" of the docker 
compose executor to filter out the prefix. (This will be true of any future 
Mesos executor that wants to make use of labels as well).

Configuring the metadata from the scheduler is an acceptable solution as well, 
however, it's less flexible for executor devs. For example, if the community 
decides to make `environment` or `role` in the future, we would have to filter 
these out in the executor on a case by case basis (blacklist), instead of a  
filter for any task beginning with the prefix `org.apache.aurora`. As a bonus, 
the patch to do this is less messy :).

Would like to know the community's thoughts on this before moving forward with 
the patch.

> Configurable Metadata prefix
> 
>
> Key: AURORA-1734
> URL: https://issues.apache.org/jira/browse/AURORA-1734
> Project: Aurora
>  Issue Type: Task
>  Components: Scheduler
>Reporter: Renan DelValle
>Assignee: Renan DelValle
>Priority: Trivial
>
> Currently, a prefix ("org.apache.aurora.metadata.") is injected into the 
> metadata key in the scheduler. It would be beneficial to allow users to set 
> their own metadata prefix (including an empty string).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (AURORA-1288) Design for supporting custom executor

2016-08-04 Thread Renan DelValle (JIRA)

 [ 
https://issues.apache.org/jira/browse/AURORA-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Renan DelValle resolved AURORA-1288.

   Resolution: Implemented
Fix Version/s: 0.16.0

After a year and a few months of work on this, I'm very happy to say support 
for custom executors in Aurora is now a reality. Thanks to everyone who 
contributed to this in any way shape or form.

> Design for supporting custom executor
> -
>
> Key: AURORA-1288
> URL: https://issues.apache.org/jira/browse/AURORA-1288
> Project: Aurora
>  Issue Type: Task
>Reporter: Meghdoot Bhattacharya
>Assignee: Renan DelValle
> Fix For: 0.16.0
>
>
> The goal is to capture the list of changes in the client and the scheduler 
> required to support any executor other than thermos. This will help non 
> thermos use cases to adopt aurora easily.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (AURORA-1726) Create support for using multiple executors in the Scheduler

2016-08-04 Thread Renan DelValle (JIRA)

 [ 
https://issues.apache.org/jira/browse/AURORA-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Renan DelValle resolved AURORA-1726.

   Resolution: Implemented
Fix Version/s: 0.16.0

> Create support for using multiple executors in the Scheduler
> 
>
> Key: AURORA-1726
> URL: https://issues.apache.org/jira/browse/AURORA-1726
> Project: Aurora
>  Issue Type: Sub-task
>  Components: Scheduler
>Reporter: Renan DelValle
>Assignee: Renan DelValle
> Fix For: 0.16.0
>
>
> Allow a single Aurora scheduler to schedule tasks on Mesos with different 
> executors. Configuration for executors will be server side and loaded at the 
> time the Scheduler is started. Users may specify the executor they wish to 
> use by specifying the name executor they wish their task to use.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AURORA-1726) Create support for using multiple executors in the Scheduler

2016-08-04 Thread Renan DelValle (JIRA)

[ 
https://issues.apache.org/jira/browse/AURORA-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15408343#comment-15408343
 ] 

Renan DelValle commented on AURORA-1726:


Implemented 
[d0533d2c7ac15a19cc63587481a75b9597613425|https://github.com/apache/aurora/commit/d0533d2c7ac15a19cc63587481a75b9597613425]

> Create support for using multiple executors in the Scheduler
> 
>
> Key: AURORA-1726
> URL: https://issues.apache.org/jira/browse/AURORA-1726
> Project: Aurora
>  Issue Type: Sub-task
>  Components: Scheduler
>Reporter: Renan DelValle
>Assignee: Renan DelValle
>
> Allow a single Aurora scheduler to schedule tasks on Mesos with different 
> executors. Configuration for executors will be server side and loaded at the 
> time the Scheduler is started. Users may specify the executor they wish to 
> use by specifying the name executor they wish their task to use.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AURORA-1726) Create support for using multiple executors in the Scheduler

2016-07-21 Thread Renan DelValle (JIRA)

[ 
https://issues.apache.org/jira/browse/AURORA-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15388567#comment-15388567
 ] 

Renan DelValle commented on AURORA-1726:


I don't foresee modifying the executor resources very often, so I agree it 
won't be triggered too often. In any case, I thought it was worth bringing up 
and perhaps documenting it since it could potentially cause some weird behavior.

> Create support for using multiple executors in the Scheduler
> 
>
> Key: AURORA-1726
> URL: https://issues.apache.org/jira/browse/AURORA-1726
> Project: Aurora
>  Issue Type: Sub-task
>  Components: Scheduler
>Reporter: Renan DelValle
>Assignee: Renan DelValle
>
> Allow a single Aurora scheduler to schedule tasks on Mesos with different 
> executors. Configuration for executors will be server side and loaded at the 
> time the Scheduler is started. Users may specify the executor they wish to 
> use by specifying the name executor they wish their task to use.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (AURORA-1726) Create support for using multiple executors in the Scheduler

2016-07-18 Thread Renan DelValle (JIRA)

[ 
https://issues.apache.org/jira/browse/AURORA-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15383446#comment-15383446
 ] 

Renan DelValle edited comment on AURORA-1726 at 7/19/16 1:32 AM:
-

Currently have a working implementation of this that does its best not to get 
in the way of how things are run with thermos.

One concern I have is about preemption. As far as I can see, the resources 
available for preemption are calculated using the resources being used by the 
victim task + executor overhead.

This could result in a corner case that may or may not manifest itself. It is 
not exclusive to using multiple executors but may be magnified by the feature 
if the resource overhead is changed and the scheduler is restarted with a 
larger resource overhead.

Maybe the more experienced devs can help me understand if this scenario is 
possible:
{code}
Overhead for thermos is set to C cpus and R ram
task A is submitted with A[cpus] cpus, A[ram] ram, and A[disk] disk. 
task A begins to run with A[cpus] + C cpus, A[ram] + R ram, and A[disk] disk.

Overhead is changed to C' cpus and R' ram. Scheduler is restarted and running 
tasks are reconciled.

task B is submitted to the scheduler with B[cpus, B[ram] and B[disk].

Preemption calculations begin. Since the calculations take into account the 
current overhead set,  the resources available for pre-emption are incorrectly 
calculated to be:
A[cpus] + C', A[ram] + R', A[disk]
When they should be using the values used at the time of scheduling:
A[cpus] + C, A[ram] + R, A[disk]
{code}
If this scenario is possible, we should come up with a suitable solution to 
this issue which may involve storing the overhead used for tasks at the time of 
running them.


was (Author: rdelvalle):
Currently have a working implementation of this that does its best not to get 
in the way of how things are run with thermos.

One concern I have is about preemption. As far as I can see, the resources 
available for preemption are calculated using the resources being used by the 
victim task + executor overhead.

This could result in a corner case that may or may not manifest itself. It is 
not exclusive to using multiple executors but may be magnified by the feature 
if the resource overhead is changed and the scheduler is restarted with a 
larger resource overhead.

Maybe the more experienced devs can help me understand if this scenario is 
possible:
Overhead for thermos is set to C cpus and R ram
task A is submitted with A[cpus] cpus, A[ram] ram, and A[disk] disk. 
task A begins to run with A[cpus] + C cpus, A[ram] + R ram, and A[disk] disk.

Overhead is changed to C' cpus and R' ram. Scheduler is restarted and running 
tasks are reconciled.

task B is submitted to the scheduler with B[cpus, B[ram] and B[disk].

Preemption calculations begin. Since the calculations take into account the 
current overhead set,  the resources available for pre-emption are incorrectly 
calculated to be:
A[cpus] + C', A[ram] + R', A[disk]
When they should be using the values used at the time of scheduling:
A[cpus] + C, A[ram] + R, A[disk]

If this scenario is possible, we should come up with a suitable solution to 
this issue which may involve storing the overhead used for tasks at the time of 
running them.

> Create support for using multiple executors in the Scheduler
> 
>
> Key: AURORA-1726
> URL: https://issues.apache.org/jira/browse/AURORA-1726
> Project: Aurora
>  Issue Type: Sub-task
>  Components: Scheduler
>Reporter: Renan DelValle
>Assignee: Renan DelValle
>
> Allow a single Aurora scheduler to schedule tasks on Mesos with different 
> executors. Configuration for executors will be server side and loaded at the 
> time the Scheduler is started. Users may specify the executor they wish to 
> use by specifying the name executor they wish their task to use.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AURORA-1734) Configurable Metadata prefix

2016-07-15 Thread Renan DelValle (JIRA)

[ 
https://issues.apache.org/jira/browse/AURORA-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15379956#comment-15379956
 ] 

Renan DelValle commented on AURORA-1734:


According to the NEWS file in release 0.12.0:

 Aurora task metadata is now mapped to Mesos task labels. Labels are prefixed 
with
  `org.apache.aurora.metadata.` to prevent clashes with other, external label 
sources.

Unsure about the second question. 

Should be noted that up until this point, as far as I know, there is no way for 
users to create labels without using a custom thrift client.

Another solution to this issue would be to place the prefix in the Aurora 
Client side. It would require a lot less changes:
https://github.com/apache/aurora/compare/master...rdelval:clientLabelPrefix

> Configurable Metadata prefix
> 
>
> Key: AURORA-1734
> URL: https://issues.apache.org/jira/browse/AURORA-1734
> Project: Aurora
>  Issue Type: Task
>  Components: Scheduler
>Reporter: Renan DelValle
>Assignee: Renan DelValle
>Priority: Trivial
>
> Currently, a prefix ("org.apache.aurora.metadata.") is injected into the 
> metadata key in the scheduler. It would be beneficial to allow users to set 
> their own metadata prefix (including an empty string).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AURORA-1734) Create scheduler flag to turn off Metadata prefix

2016-07-14 Thread Renan DelValle (JIRA)

[ 
https://issues.apache.org/jira/browse/AURORA-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15377507#comment-15377507
 ] 

Renan DelValle commented on AURORA-1734:


That's actually a great idea, thanks for suggesting it. I'll go ahead and 
update the ticket.

> Create scheduler flag to turn off Metadata prefix
> -
>
> Key: AURORA-1734
> URL: https://issues.apache.org/jira/browse/AURORA-1734
> Project: Aurora
>  Issue Type: Task
>  Components: Scheduler
>Reporter: Renan DelValle
>Assignee: Renan DelValle
>Priority: Trivial
>
> Currently, a prefix ("org.apache.aurora.metadata.") is injected into the 
> metadata key in the scheduler. It would be beneficial for those using custom 
> clients and/or custom executors to turn off the addition of this prefix to 
> allow metadata to be treated as a list of plain Mesos labels.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (AURORA-1734) Create Flag to turn off Metadata prefix

2016-07-13 Thread Renan DelValle (JIRA)
Renan DelValle created AURORA-1734:
--

 Summary: Create Flag to turn off Metadata prefix
 Key: AURORA-1734
 URL: https://issues.apache.org/jira/browse/AURORA-1734
 Project: Aurora
  Issue Type: Task
  Components: Scheduler
Reporter: Renan DelValle
Assignee: Renan DelValle
Priority: Trivial


Currently, a prefix ("org.apache.aurora.metadata.") is injected into the 
metadata key in the scheduler. It would be beneficial for those using custom 
clients to turn off the addition of this prefix to allow metadata to be treated 
as a list of plain Mesos labels.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (AURORA-1376) Create support for custom executors in Scheduler

2016-06-30 Thread Renan DelValle (JIRA)

 [ 
https://issues.apache.org/jira/browse/AURORA-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Renan DelValle resolved AURORA-1376.

   Resolution: Implemented
Fix Version/s: 0.11.0

Support for using a single executor has been included in Aurora 0.11. Moving 
multiple executor support to it's own ticket.

> Create support for custom executors in Scheduler
> 
>
> Key: AURORA-1376
> URL: https://issues.apache.org/jira/browse/AURORA-1376
> Project: Aurora
>  Issue Type: Sub-task
>  Components: Scheduler
>Reporter: Renan DelValle
> Fix For: 0.11.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (AURORA-1288) Design for supporting custom executor

2016-06-30 Thread Renan DelValle (JIRA)

 [ 
https://issues.apache.org/jira/browse/AURORA-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Renan DelValle reassigned AURORA-1288:
--

Assignee: Renan DelValle

> Design for supporting custom executor
> -
>
> Key: AURORA-1288
> URL: https://issues.apache.org/jira/browse/AURORA-1288
> Project: Aurora
>  Issue Type: Task
>Reporter: Meghdoot Bhattacharya
>Assignee: Renan DelValle
>
> The goal is to capture the list of changes in the client and the scheduler 
> required to support any executor other than thermos. This will help non 
> thermos use cases to adopt aurora easily.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (AURORA-1723) Add support for Mesos Fetcher

2016-06-29 Thread Renan DelValle (JIRA)

 [ 
https://issues.apache.org/jira/browse/AURORA-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Renan DelValle resolved AURORA-1723.

   Resolution: Implemented
Fix Version/s: 0.15.0

Committed 4e28b9c

> Add support for Mesos Fetcher
> -
>
> Key: AURORA-1723
> URL: https://issues.apache.org/jira/browse/AURORA-1723
> Project: Aurora
>  Issue Type: Task
>  Components: Scheduler
>Reporter: Renan DelValle
>Assignee: Renan DelValle
>Priority: Minor
>  Labels: features
> Fix For: 0.15.0
>
>
> Adding support for Aurora Tasks to be capable of using the [Mesos 
> Fetcher|http://mesos.apache.org/documentation/latest/fetcher/] by allowing 
> the client to provide arbitrary URIs at which resources can be retrieved. 
> Resources will be marked non-executable to avoid security risks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (AURORA-1723) Add support for Mesos Fetcher

2016-06-29 Thread Renan DelValle (JIRA)

 [ 
https://issues.apache.org/jira/browse/AURORA-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Renan DelValle reassigned AURORA-1723:
--

Assignee: Renan DelValle

> Add support for Mesos Fetcher
> -
>
> Key: AURORA-1723
> URL: https://issues.apache.org/jira/browse/AURORA-1723
> Project: Aurora
>  Issue Type: Task
>  Components: Scheduler
>Reporter: Renan DelValle
>Assignee: Renan DelValle
>Priority: Minor
>  Labels: features
>
> Adding support for Aurora Tasks to be capable of using the [Mesos 
> Fetcher|http://mesos.apache.org/documentation/latest/fetcher/] by allowing 
> the client to provide arbitrary URIs at which resources can be retrieved. 
> Resources will be marked non-executable to avoid security risks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AURORA-1723) Add support for Mesos Fetcher

2016-06-24 Thread Renan DelValle (JIRA)

[ 
https://issues.apache.org/jira/browse/AURORA-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15348825#comment-15348825
 ] 

Renan DelValle commented on AURORA-1723:


Review request: https://reviews.apache.org/r/49218/

> Add support for Mesos Fetcher
> -
>
> Key: AURORA-1723
> URL: https://issues.apache.org/jira/browse/AURORA-1723
> Project: Aurora
>  Issue Type: Task
>  Components: Scheduler
>Reporter: Renan DelValle
>Priority: Minor
>  Labels: features
>
> Adding support for Aurora Tasks to be capable of using the [Mesos 
> Fetcher|http://mesos.apache.org/documentation/latest/fetcher/] by allowing 
> the client to provide arbitrary URIs at which resources can be retrieved. 
> Resources will be marked non-executable to avoid security risks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AURORA-1723) Add support for Mesos Fetcher

2016-06-21 Thread Renan DelValle (JIRA)

[ 
https://issues.apache.org/jira/browse/AURORA-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15343689#comment-15343689
 ] 

Renan DelValle commented on AURORA-1723:


Thanks [~wfarner], this sounds like the right place to look at. As usual, 
thanks for saving me a few hours of pouring through code.

> Add support for Mesos Fetcher
> -
>
> Key: AURORA-1723
> URL: https://issues.apache.org/jira/browse/AURORA-1723
> Project: Aurora
>  Issue Type: Task
>  Components: Scheduler
>Reporter: Renan DelValle
>Priority: Minor
>  Labels: features
>
> Adding support for Aurora Tasks to be capable of using the [Mesos 
> Fetcher|http://mesos.apache.org/documentation/latest/fetcher/] by allowing 
> the client to provide arbitrary URIs at which resources can be retrieved. 
> Resources will be marked non-executable to avoid security risks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (AURORA-1723) Add support for Mesos Fetcher

2016-06-21 Thread Renan DelValle (JIRA)
Renan DelValle created AURORA-1723:
--

 Summary: Add support for Mesos Fetcher
 Key: AURORA-1723
 URL: https://issues.apache.org/jira/browse/AURORA-1723
 Project: Aurora
  Issue Type: Task
  Components: Scheduler
Reporter: Renan DelValle
Priority: Minor


Adding support for Aurora Tasks to be capable of using the [Mesos 
Fetcher|http://mesos.apache.org/documentation/latest/fetcher/] by allowing the 
client to provide arbitrary URIs at which resources can be retrieved. Resources 
will be marked non-executable to avoid security risks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (AURORA-1376) Create support for custom executors in Scheduler

2015-12-23 Thread Renan DelValle (JIRA)

 [ 
https://issues.apache.org/jira/browse/AURORA-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Renan DelValle updated AURORA-1376:
---
Comment: was deleted

(was: I'm working on accepting multiple executors in the optional 
configuration. I think the best way to do this is to maintain valid JSON 
formatting by turning the config file into an JSON array like I had in my 
previous patch.

I'm thinking about introducing another argument that will define the default 
executor. Whenever an name is not specified in the ExecutorConfig.name, the 
default executor should be used.

If anyone has any objections to these ideas, please let me know.
)

> Create support for custom executors in Scheduler
> 
>
> Key: AURORA-1376
> URL: https://issues.apache.org/jira/browse/AURORA-1376
> Project: Aurora
>  Issue Type: Sub-task
>  Components: Scheduler
>Reporter: Renan DelValle
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AURORA-1376) Create support for custom executors in Scheduler

2015-12-23 Thread Renan DelValle (JIRA)

[ 
https://issues.apache.org/jira/browse/AURORA-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070291#comment-15070291
 ] 

Renan DelValle commented on AURORA-1376:


That sounds reasonable. My main concern was that if that contract is broken, it 
could cause some trouble, but then again, we can just elect to reject that job 
request for violating the contract.

> Create support for custom executors in Scheduler
> 
>
> Key: AURORA-1376
> URL: https://issues.apache.org/jira/browse/AURORA-1376
> Project: Aurora
>  Issue Type: Sub-task
>  Components: Scheduler
>Reporter: Renan DelValle
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AURORA-1376) Create support for custom executors in Scheduler

2015-12-23 Thread Renan DelValle (JIRA)

[ 
https://issues.apache.org/jira/browse/AURORA-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070251#comment-15070251
 ] 

Renan DelValle commented on AURORA-1376:


I'm working on accepting multiple executors in the optional configuration. I 
think the best way to do this is to maintain valid JSON formatting by turning 
the config file into an JSON array like I had in my previous patch.

I'm thinking about introducing another argument that will define the default 
executor. Whenever an name is not specified in the ExecutorConfig.name, the 
default executor should be used.

If anyone has any objections to these ideas, please let me know.


> Create support for custom executors in Scheduler
> 
>
> Key: AURORA-1376
> URL: https://issues.apache.org/jira/browse/AURORA-1376
> Project: Aurora
>  Issue Type: Sub-task
>  Components: Scheduler
>Reporter: Renan DelValle
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AURORA-1376) Create support for custom executors in Scheduler

2015-12-23 Thread Renan DelValle (JIRA)

[ 
https://issues.apache.org/jira/browse/AURORA-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070252#comment-15070252
 ] 

Renan DelValle commented on AURORA-1376:


I'm working on accepting multiple executors in the optional configuration. I 
think the best way to do this is to maintain valid JSON formatting by turning 
the config file into an JSON array like I had in my previous patch.

I'm thinking about introducing another argument that will define the default 
executor. Whenever an name is not specified in the ExecutorConfig.name, the 
default executor should be used.

If anyone has any objections to these ideas, please let me know.


> Create support for custom executors in Scheduler
> 
>
> Key: AURORA-1376
> URL: https://issues.apache.org/jira/browse/AURORA-1376
> Project: Aurora
>  Issue Type: Sub-task
>  Components: Scheduler
>Reporter: Renan DelValle
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AURORA-1376) Create support for custom executors in Scheduler

2015-12-16 Thread Renan DelValle (JIRA)

[ 
https://issues.apache.org/jira/browse/AURORA-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15061095#comment-15061095
 ] 

Renan DelValle commented on AURORA-1376:


First stab at getting using a command line arg to override the configuration:

https://reviews.apache.org/r/41473/

I pulled from the Apache git repo right before submitting, looks like it did 
more harm than good on the diff.

> Create support for custom executors in Scheduler
> 
>
> Key: AURORA-1376
> URL: https://issues.apache.org/jira/browse/AURORA-1376
> Project: Aurora
>  Issue Type: Sub-task
>  Components: Scheduler
>Reporter: Renan DelValle
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AURORA-1376) Create support for custom executors in Scheduler

2015-11-24 Thread Renan DelValle (JIRA)

[ 
https://issues.apache.org/jira/browse/AURORA-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15025210#comment-15025210
 ] 

Renan DelValle commented on AURORA-1376:


Glad to see this has survived, I was getting worried it would be dropped 
entirely after not hearing back for a while. If at all possible, I'd still like 
to be part of the development of this patch in any way shape or form.

> Create support for custom executors in Scheduler
> 
>
> Key: AURORA-1376
> URL: https://issues.apache.org/jira/browse/AURORA-1376
> Project: Aurora
>  Issue Type: Sub-task
>  Components: Scheduler
>Reporter: Renan DelValle
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AURORA-1376) Create support for custom executors in Scheduler

2015-08-26 Thread Renan DelValle (JIRA)

[ 
https://issues.apache.org/jira/browse/AURORA-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14715636#comment-14715636
 ] 

Renan DelValle commented on AURORA-1376:


First patch in a series of patches to add custom executor support:
https://reviews.apache.org/r/37818/

Many thanks to [~kevints] for all the suggestions and taking the time at the 
Mesos hackathon to help me out with this.

 Create support for custom executors in Scheduler
 

 Key: AURORA-1376
 URL: https://issues.apache.org/jira/browse/AURORA-1376
 Project: Aurora
  Issue Type: Sub-task
  Components: Scheduler
Reporter: Renan DelValle





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AURORA-1376) Create support for custom executors in Scheduler

2015-07-09 Thread Renan DelValle (JIRA)

[ 
https://issues.apache.org/jira/browse/AURORA-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14620942#comment-14620942
 ] 

Renan DelValle commented on AURORA-1376:


[~jaybuff]], you make some great points.

The JSON schema comes from trying to simulate the ExecutorSettings data 
structure in the Aurora Scheduler. As I've said previously, the schema is 
subject to change, so I appreciate your suggestions.

Here are the reasons why ExecutorSettings, as it exists right now, is different 
from ExecutorInfo:

a. The ExecutorSettings data struct uses the CommandUtil 
(org/apache/aurora/scheduler/base/CommandUtil.java) wrapper to configure the 
CommandInfo to fetch and execute  given URIs for an executor.  [~wfarner], is 
this feature considered part of Aurora or part of Thermos?

b. It also stores info about global container mounts. which it uses when 
creating docker containers.


If we create an interface, IMO, it should return a TaskInfo.Builder, as that 
would spare us from having a special case for the mesos command executor. For 
what it's worth, an early version of my code is here 
https://reviews.apache.org/r/36289/. I've fixed all the errors that were caused 
by my changes and will be putting up a new version up later on today.

As for having the client send in any info to configure the executor, we had a 
discussion a short while ago and came to the conclusion, that amongst other 
things, it is considered a security risk due to the fact that Aurora runs as 
root 
(https://issues.apache.org/jira/browse/AURORA-1288?focusedCommentId=14601470page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14601470)
 Thus, IMO, using a server-side config file is still the best option.

 Create support for custom executors in Scheduler
 

 Key: AURORA-1376
 URL: https://issues.apache.org/jira/browse/AURORA-1376
 Project: Aurora
  Issue Type: Sub-task
  Components: Scheduler
Reporter: Renan DelValle





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AURORA-1376) Create support for custom executors in Scheduler

2015-07-09 Thread Renan DelValle (JIRA)

[ 
https://issues.apache.org/jira/browse/AURORA-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621069#comment-14621069
 ] 

Renan DelValle commented on AURORA-1376:


{quote}But some settings have to come from the user, such as what command can 
run. e.g. If you're using the mesos command executor, you have to set 
CommandInfo.value from something submitted by the user. I think that should go 
into executorConfig.data, perhaps as a json blob that only the 
mesos-command-executor plugin understands.{quote}
Agree with this. I think this has to also be considered from the client side of 
things, which is really going to be up to what the extension of the DSL to 
support custom executors is going to look like. 
[AURORA-1377|https://issues.apache.org/jira/browse/AURORA-1377]

I think as soon as we have a more concrete idea of what Pystachio with support 
for custom executors look like, we'll be in a better position to determine what 
this should look like. The JSON blob is a good starting point.

Re: the MesosTaskFactory interface, if I understand correctly, the idea would 
be to have every executor implement a MesosTaskFactory, correct? If so I think 
that's a great idea. I'll look into implementing that today.

 Create support for custom executors in Scheduler
 

 Key: AURORA-1376
 URL: https://issues.apache.org/jira/browse/AURORA-1376
 Project: Aurora
  Issue Type: Sub-task
  Components: Scheduler
Reporter: Renan DelValle





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AURORA-1376) Create support for custom executors in Scheduler

2015-07-02 Thread Renan DelValle (JIRA)

[ 
https://issues.apache.org/jira/browse/AURORA-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14612519#comment-14612519
 ] 

Renan DelValle commented on AURORA-1376:


None that I can think of, just picked something that was supported by the 
Apache Commons Configurator to speed things along. YAML sounds good to me; any 
particular parsers that are compatible with the Apache license? I looked at 
SnakeYAML but I can't find what license is is released under.

 Create support for custom executors in Scheduler
 

 Key: AURORA-1376
 URL: https://issues.apache.org/jira/browse/AURORA-1376
 Project: Aurora
  Issue Type: Sub-task
  Components: Scheduler
Reporter: Renan DelValle





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (AURORA-1376) Create support for custom executors in Scheduler

2015-07-02 Thread Renan DelValle (JIRA)

[ 
https://issues.apache.org/jira/browse/AURORA-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14612519#comment-14612519
 ] 

Renan DelValle edited comment on AURORA-1376 at 7/2/15 9:25 PM:


None that I can think of, just picked something that was supported by the 
Apache Commons Configurator to speed things along. YAML sounds good to me; any 
particular parsers that are compatible with the Apache license? I looked at 
SnakeYAML but I can't find what license is is released under.

Edit- SnakeYAML is Apache 2.0.


was (Author: rdelvalle):
None that I can think of, just picked something that was supported by the 
Apache Commons Configurator to speed things along. YAML sounds good to me; any 
particular parsers that are compatible with the Apache license? I looked at 
SnakeYAML but I can't find what license is is released under.

 Create support for custom executors in Scheduler
 

 Key: AURORA-1376
 URL: https://issues.apache.org/jira/browse/AURORA-1376
 Project: Aurora
  Issue Type: Sub-task
  Components: Scheduler
Reporter: Renan DelValle





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (AURORA-1376) Create support for custom executors in Scheduler

2015-06-30 Thread Renan DelValle (JIRA)

[ 
https://issues.apache.org/jira/browse/AURORA-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14609444#comment-14609444
 ] 

Renan DelValle edited comment on AURORA-1376 at 7/1/15 2:00 AM:


So this is a WIP, but since I wanted to keep pushing this work along, I created 
a simple XML schema that should cover most cases for custom executors and 
thermos. If there is any objections to using an XML file for the configuration 
of the executor settings, I am open to using anything else, I simply wanted to 
get the ball rolling. I'm using the Apache Commons Configurator library to 
parse the XML.

{code:xml}
?xml version=1.0 encoding=ISO-8859-1 ?
executors
executor
namethermos/name
path/path/to/thermos/path
flags/flags
overhead
disk_mb/disk_mb
ram_mb/ram_mb
cpus/cpus
ports/ports
/overhead
resources
uri/uri
/resources
observer/observer
cmd/cmd
/executor
/executors
{code}


was (Author: rdelvalle):
So this is a WIP, but since I wanted to keep pushing this work along, I created 
a simple XML schema that should cover most cases for custom executors and 
thermos. If there is any objections to using an XML file for the configuration 
of the executor settings, I am open to using anything else, I simply wanted to 
get the ball rolling. I'm using the Apache Commons Configurator library to 
parse the XML.

{code:xml}
?xml version=1.0 encoding=ISO-8859-1 ?
executors
executor
namethermos/name
path/test//path
flags/flags
overhead
disk_mb/disk_mb
ram_mb/ram_mb
cpus/cpus
ports/ports
/overhead
resources
uri/uri
/resources
observer/observer
cmd/cmd
/executor
/executors
{code}

 Create support for custom executors in Scheduler
 

 Key: AURORA-1376
 URL: https://issues.apache.org/jira/browse/AURORA-1376
 Project: Aurora
  Issue Type: Sub-task
  Components: Scheduler
Reporter: Renan DelValle





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AURORA-1376) Create support for custom executors in Scheduler

2015-06-29 Thread Renan DelValle (JIRA)

[ 
https://issues.apache.org/jira/browse/AURORA-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606031#comment-14606031
 ] 

Renan DelValle commented on AURORA-1376:


[~wfarner] or [~wickman],

What would be the preferred method of populating the key, value pairs. Would a 
config file be preferred or would another approach make more sense?

 Create support for custom executors in Scheduler
 

 Key: AURORA-1376
 URL: https://issues.apache.org/jira/browse/AURORA-1376
 Project: Aurora
  Issue Type: Sub-task
  Components: Scheduler
Reporter: Renan DelValle





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (AURORA-1376) Create support for custom executors in Scheduler

2015-06-26 Thread Renan DelValle (JIRA)
Renan DelValle created AURORA-1376:
--

 Summary: Create support for custom executors in Scheduler
 Key: AURORA-1376
 URL: https://issues.apache.org/jira/browse/AURORA-1376
 Project: Aurora
  Issue Type: Sub-task
  Components: Scheduler
Reporter: Renan DelValle






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AURORA-1288) Design for supporting custom executor

2015-06-24 Thread Renan DelValle (JIRA)

[ 
https://issues.apache.org/jira/browse/AURORA-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14600531#comment-14600531
 ] 

Renan DelValle commented on AURORA-1288:


Right, my line of thought was that Thermos is kind of the default custom 
executor for Aurora right now. It made sense in my head to give it a special 
case. I guess, to me, its more of an issue as to wether Aurora will be 
configured out of the box for Thermos or something else (like the Command 
Executor) and how this will be accomplished.

In terms of using an ExecutorType enumerator, I was thinking of having only 
those 3 cases included (maybe even 2 if it is decided that Thermos will fall 
under the Custom moniker).  The idea is that, for any custom executor, only the 
path is passed from the client side, making every single executor that is not 
thermos, or command executor, fall under the custom umbrella.* I think it's 
another point where we should come to a consensus of how we want this to 
implement this before moving forward.

Let me know if I'm any of this doesn't make sense. 

*This assumes the custom executor is able to use the information currently 
generated by the MesosTaskFactory.


 Design for supporting custom executor
 -

 Key: AURORA-1288
 URL: https://issues.apache.org/jira/browse/AURORA-1288
 Project: Aurora
  Issue Type: Task
Reporter: Meghdoot Bhattacharya

 The goal is to capture the list of changes in the client and the scheduler 
 required to support any executor other than thermos. This will help non 
 thermos use cases to adopt aurora easily.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)