[jira] [Resolved] (AURORA-1225) Modify executor state transition logic to rely on health checks (if enabled)

2016-09-29 Thread Kai Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/AURORA-1225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kai Huang resolved AURORA-1225.
---
Resolution: Fixed

> Modify executor state transition logic to rely on health checks (if enabled)
> 
>
> Key: AURORA-1225
> URL: https://issues.apache.org/jira/browse/AURORA-1225
> Project: Aurora
>  Issue Type: Task
>  Components: Executor
>Reporter: Maxim Khutornenko
>Assignee: Kai Huang
>
> Executor needs to start executing user content in STARTING and transition to 
> RUNNING when a successful required number of health checks is reached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AURORA-1751) Update org.apache.aurora/aurora-api in Maven

2016-09-29 Thread Jake Farrell (JIRA)

[ 
https://issues.apache.org/jira/browse/AURORA-1751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15534950#comment-15534950
 ] 

Jake Farrell commented on AURORA-1751:
--

{code}
git co rel/0.16.0
./gradlew publishToMavenLocal
cd ~/.m2/lib/java/org/apache/aurora/aurora-api/0.16.0/
gpg --armor --output aurora-api-0.16.0.pom.asc --detach-sig 
aurora-api-0.16.0.pom 
gpg --armor --output aurora-api-0.16.0.jar.asc --detach-sig 
aurora-api-0.16.0.jar
gpg --armor --output aurora-api-0.16.0-sources.jar.asc --detach-sig 
aurora-api-0.16.0-sources.jar
{code}

go to https://repository.apache.org/#staging-upload and upload as "Artifact(s) 
with a POM" and upload with all signature files. Then go to 
https://repository.apache.org/#stagingRepositories to find the Apache Aurora 
staging repo and update the vote email url. 

When the vote is done go back to 
https://repository.apache.org/#stagingRepositories and either release or delete 
the repo depending on the vote. Maven Central will mirror the release files 
within 24 hours.

TODO: 
- Convert sign/upload/staging of files to a gradle task
- Add more PMC members to nexus 
- Document or automate as part of release process



> Update org.apache.aurora/aurora-api in Maven
> 
>
> Key: AURORA-1751
> URL: https://issues.apache.org/jira/browse/AURORA-1751
> Project: Aurora
>  Issue Type: Task
>  Components: Packaging
>Affects Versions: 0.13.0
>Reporter: Derek Slager
>Assignee: Jake Farrell
>Priority: Minor
>
> Currently the version of org.apache.aurora/aurora-api available on Maven 
> Central is 0.8.0, which is several versions out of date. It would be ideal to 
> have up-to-date versions available as new Aurora releases are cut.
> https://mvnrepository.com/artifact/org.apache.aurora/aurora-api



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (AURORA-1786) -zk_session_timeout option does not work

2016-09-29 Thread David Robinson (JIRA)
David Robinson created AURORA-1786:
--

 Summary: -zk_session_timeout option does not work
 Key: AURORA-1786
 URL: https://issues.apache.org/jira/browse/AURORA-1786
 Project: Aurora
  Issue Type: Bug
Reporter: David Robinson


Looks like the -zk_session_timeout option has no affect. I've set 
-zk_session_timeout="60mins" to attempt to work around ZK session timeouts (due 
to GC pauses caused by TaskHistoryPruner pruning a huge number of inactive 
tasks), but the default 30 seconds seems to always be used.

{noformat}
I0929 22:36:10.804 [main, ArgScanner:411] zk_chroot_path: null 
I0929 22:36:10.804 [main, ArgScanner:411] zk_digest_credentials: : 
I0929 22:36:10.805 [main, ArgScanner:411] zk_endpoints: [zk.example.com:2181] 
I0929 22:36:10.805 [main, ArgScanner:411] zk_in_proc: false 
I0929 22:36:10.805 [main, ArgScanner:411] zk_session_timeout: (30, mins) 
I0929 22:36:10.805 [main, ArgScanner:411] zk_use_curator: true 
{noformat}

{noformat}
I0929 22:48:37.678 [AsyncProcessor-3, TaskHistoryPruner:137] Pruning inactive 
tasks [mesos-test-healthy-daemon-19-3588-e2d79602-e354-4dc0-bfaa-b16d32e2b09d, 
mesos-test-healthy-daemon-19-1551-b4b7e52f-f468-44ba-a1a9-ad3c95b602a3, 
mesos-test-healthy-daemon-19-4105-ff87bef1-af09-4201-9cc2-863c8ece3621, 
mesos-test-healthy-daemon-19-7416-66de9261-5fe5-47c4-be37-3dd5
I0929 22:48:37.738 [AsyncProcessor-5, TaskHistoryPruner:137] Pruning inactive 
tasks [mesos-test-healthy-daemon-19-3588-e2d79602-e354-4dc0-bfaa-b16d32e2b09d, 
mesos-test-healthy-daemon-19-1551-b4b7e52f-f468-44ba-a1a9-ad3c95b602a3, 
mesos-test-healthy-daemon-19-4105-ff87bef1-af09-4201-9cc2-863c8ece3621, 
mesos-test-healthy-daemon-19-7416-66de9261-5fe5-47c4-be37-3dd5
2016-09-29 22:48:37,794:47040(0x7f07f4c3c940):ZOO_WARN@zookeeper_interest@1570: 
Exceeded deadline by 12ms
I0929 22:48:37.805 [AsyncProcessor-0, TaskHistoryPruner:137] Pruning inactive 
tasks [mesos-test-healthy-daemon-19-3588-e2d79602-e354-4dc0-bfaa-b16d32e2b09d, 
mesos-test-healthy-daemon-19-1551-b4b7e52f-f468-44ba-a1a9-ad3c95b602a3, 
mesos-test-healthy-daemon-19-4105-ff87bef1-af09-4201-9cc2-863c8ece3621, 
mesos-test-healthy-daemon-19-7416-66de9261-5fe5-47c4-be37-3dd5
I0929 22:48:37.814 [AsyncProcessor-6, MemTaskStore:148] Query took 588 ms: 
ITaskQuery{role=null, environment=null, jobName=null, taskIds=[], 
statuses=[FINISHED, FAILED, KILLED, LOST], instanceIds=[], slaveHosts=[], 
jobKeys=[IJobKey{role=mesos, environment=test, name=healthy-daemon-19}], 
offset=0, limit=0} 
I0929 22:48:37.867 [AsyncProcessor-1, TaskHistoryPruner:137] Pruning inactive 
tasks [mesos-test-healthy-daemon-19-3588-e2d79602-e354-4dc0-bfaa-b16d32e2b09d, 
mesos-test-healthy-daemon-19-1551-b4b7e52f-f468-44ba-a1a9-ad3c95b602a3, 
mesos-test-healthy-daemon-19-4105-ff87bef1-af09-4201-9cc2-863c8ece3621, 
mesos-test-healthy-daemon-19-7416-66de9261-5fe5-47c4-be37-3dd5
I0929 22:48:37.873 [AsyncProcessor-2, MemTaskStore:148] Query took 304 ms: 
ITaskQuery{role=null, environment=null, jobName=null, taskIds=[], 
statuses=[FINISHED, FAILED, KILLED, LOST], instanceIds=[], slaveHosts=[], 
jobKeys=[IJobKey{role=mesos, environment=test, name=healthy-daemon-19}], 
offset=0, limit=0} 
I0929 22:48:37.875 [AsyncProcessor-7, MemTaskStore:148] Query took 289 ms: 
ITaskQuery{role=null, environment=null, jobName=null, taskIds=[], 
statuses=[FINISHED, FAILED, KILLED, LOST], instanceIds=[], slaveHosts=[], 
jobKeys=[IJobKey{role=mesos, environment=test, name=healthy-daemon-19}], 
offset=0, limit=0} 
I0929 22:48:37.886 [AsyncProcessor-4, TaskHistoryPruner:137] Pruning inactive 
tasks [mesos-test-healthy-daemon-19-3588-e2d79602-e354-4dc0-bfaa-b16d32e2b09d, 
mesos-test-healthy-daemon-19-1551-b4b7e52f-f468-44ba-a1a9-ad3c95b602a3, 
mesos-test-healthy-daemon-19-4105-ff87bef1-af09-4201-9cc2-863c8ece3621, 
mesos-test-healthy-daemon-19-7416-66de9261-5fe5-47c4-be37-3dd5
I0929 22:48:38.045 [AsyncProcessor-3, MemTaskStore:148] Query took 359 ms: 
ITaskQuery{role=null, environment=null, jobName=null, taskIds=[], 
statuses=[FINISHED, FAILED, KILLED, LOST], instanceIds=[], slaveHosts=[], 
jobKeys=[IJobKey{role=mesos, environment=test, name=healthy-daemon-19}], 
offset=0, limit=0} 
I0929 22:48:38.152 [AsyncProcessor-5, MemTaskStore:148] Query took 405 ms: 
ITaskQuery{role=null, environment=null, jobName=null, taskIds=[], 
statuses=[FINISHED, FAILED, KILLED, LOST], instanceIds=[], slaveHosts=[], 
jobKeys=[IJobKey{role=mesos, environment=test, name=healthy-daemon-19}], 
offset=0, limit=0} 
I0929 22:48:38.407 [AsyncProcessor-0, MemTaskStore:148] Query took 594 ms: 
ITaskQuery{role=null, environment=null, jobName=null, taskIds=[], 
statuses=[FINISHED, FAILED, KILLED, LOST], instanceIds=[], slaveHosts=[], 
jobKeys=[IJobKey{role=mesos, environment=test, name=healthy-daemon-19}], 
offset=0, limit=0} 
I0929 22:48:38.442 [AsyncProcessor-1, MemTaskStore:148] Query took 566 ms: 
ITaskQuery{role=null, 

[jira] [Created] (AURORA-1785) Populate curator latches with scheduler information

2016-09-29 Thread Zameer Manji (JIRA)
Zameer Manji created AURORA-1785:


 Summary: Populate curator latches with scheduler information
 Key: AURORA-1785
 URL: https://issues.apache.org/jira/browse/AURORA-1785
 Project: Aurora
  Issue Type: Task
Reporter: Zameer Manji
Assignee: John Sirois
Priority: Minor


(Assigning this to John, who is our curator expert for triage/feasibility)

If you look at the mesos ZK node for leader election you see something like 
this:

{noformat}
 u'json.info_000104',
 u'json.info_000102',
 u'json.info_000101',
 u'json.info_98',
 u'json.info_97'
{noformat}

Each of these nodes contains data about the machine contending for leadership. 
It is a JSON serialized {{MasterInfo}} protobuf. This means an operator can 
inspect who is contending for leadership by checking the content of the nodes.

When you check the aurora ZK node you see something like this:

{noformat}
 u'_c_2884a0d3-b5b0-4445-b8d6-b271a6df6220-latch-000774',
 u'_c_86a21335-c5a2-4bcb-b471-4ce128b67616-latch-000776',
 u'_c_a4f8b0f7-d063-4df2-958b-7b3e6f666a95-latch-000775',
 u'_c_120cd9da-3bc1-495b-b02f-2142fb22c0a0-latch-000784',
 u'_c_46547c31-c5c2-4fb1-8a53-237e3cb0292f-latch-000780',
 u'member_000781'
{noformat}

Only the leader node contains information. The curator latches contain no 
information. It is not possible to figure out which machines are contending for 
leadership purely from ZK.

I think we should attach data to the latches like mesos.
Being able to do this is invaluable to debug issues if an extra master is added 
to the cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (AURORA-1751) Update org.apache.aurora/aurora-api in Maven

2016-09-29 Thread Jake Farrell (JIRA)

[ 
https://issues.apache.org/jira/browse/AURORA-1751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15534148#comment-15534148
 ] 

Jake Farrell commented on AURORA-1751:
--

this would be on me, I did this by hand and never got back to automating the 
upload as a gradle task. Happy to add anyone else on the pmc to nexus to be 
able to stage and publish the jar

> Update org.apache.aurora/aurora-api in Maven
> 
>
> Key: AURORA-1751
> URL: https://issues.apache.org/jira/browse/AURORA-1751
> Project: Aurora
>  Issue Type: Task
>  Components: Packaging
>Affects Versions: 0.13.0
>Reporter: Derek Slager
>Priority: Minor
>
> Currently the version of org.apache.aurora/aurora-api available on Maven 
> Central is 0.8.0, which is several versions out of date. It would be ideal to 
> have up-to-date versions available as new Aurora releases are cut.
> https://mvnrepository.com/artifact/org.apache.aurora/aurora-api



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (AURORA-655) Order job update events and instance events by ID rather than timestamp

2016-09-29 Thread Jing Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/AURORA-655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Chen reassigned AURORA-655:


Assignee: Jing Chen

> Order job update events and instance events by ID rather than timestamp
> ---
>
> Key: AURORA-655
> URL: https://issues.apache.org/jira/browse/AURORA-655
> Project: Aurora
>  Issue Type: Story
>  Components: Scheduler
>Reporter: Bill Farner
>Assignee: Jing Chen
>Priority: Trivial
>  Labels: newbie
>
> In {{JobUpdateDetailsMapper.xml}} we order by timestamps, which could be 
> brittle if the system time changes.  Instead of using the timestamp, use the 
> built-in database {{IDENTITY}} for sort order.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)