[jira] [Commented] (MESOS-5593) Devolve v1 operator protos before using them in Master/Agent.

2016-06-09 Thread Jay Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15323838#comment-15323838
 ] 

Jay Guo commented on MESOS-5593:


Sounds reasonable. Just wanna understand it further, are we going to have 
versioned protobuf messages in {{mesos/v1/}} and unversioned protobuf message 
in {{mesos/}}? I suppose all versioned messages are aggregated in unversioned 
one? In this case, if we have structure change in v2, how do we maintain 
compatibility in unversioned one? Simply adding the new one?

/J

> Devolve v1 operator protos before using them in Master/Agent.
> -
>
> Key: MESOS-5593
> URL: https://issues.apache.org/jira/browse/MESOS-5593
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Anand Mazumdar
>Assignee: haosdent
>Priority: Critical
>  Labels: mesosphere
>
> We had adopted the following workflow for the Scheduler/Executor endpoints on 
> the Master/Agent.
> - The user makes a call to the versioned endpoint with a versioned protobuf. 
> e.g., {{v1::mesos::Call}}
> - We {{devolve}} the versioned protobuf into an unversioned protobuf before 
> using it internally.
> {code}
> scheduler::Call call = devolve(v1Call);
> {code}
> The above approach has the advantage that the internal Mesos code only has to 
> deal with unversioned protobufs. It looks like we have not been following 
> this idiom for the Operator API. We should create a unversioned protobuf file 
> similar to we did for the Scheduler/Executor API and then {{devolve}} the 
> versioned protobufs. (e.g., mesos/master/master.proto)
> The signature of some of the operator endpoints would then change to only be 
> dealing with unversioned protobufs:
> {code}
> Future Master::Http::getHealth(
> const master::Call& call,
> const Option& principal,
> const ContentType& contentType) const
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5593) Devolve v1 operator protos before using them in Master/Agent.

2016-06-09 Thread Jay Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15323870#comment-15323870
 ] 

Jay Guo commented on MESOS-5593:


OK. However this evolving/devolving sound heavy and I wonder if we will 
encounter performance issue. Another thing is that we implement Operator APIs 
in different styles, some of them are implemented from scratch and others are 
reusing methods/logics from previous implementation. I imagine we would have 
similar problems at v2. Therefore, we end up with a very mixed and tedious 
codebase.

> Devolve v1 operator protos before using them in Master/Agent.
> -
>
> Key: MESOS-5593
> URL: https://issues.apache.org/jira/browse/MESOS-5593
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Anand Mazumdar
>Assignee: haosdent
>Priority: Critical
>  Labels: mesosphere
>
> We had adopted the following workflow for the Scheduler/Executor endpoints on 
> the Master/Agent.
> - The user makes a call to the versioned endpoint with a versioned protobuf. 
> e.g., {{v1::mesos::Call}}
> - We {{devolve}} the versioned protobuf into an unversioned protobuf before 
> using it internally.
> {code}
> scheduler::Call call = devolve(v1Call);
> {code}
> The above approach has the advantage that the internal Mesos code only has to 
> deal with unversioned protobufs. It looks like we have not been following 
> this idiom for the Operator API. We should create a unversioned protobuf file 
> similar to we did for the Scheduler/Executor API and then {{devolve}} the 
> versioned protobufs. (e.g., mesos/master/master.proto)
> The signature of some of the operator endpoints would then change to only be 
> dealing with unversioned protobufs:
> {code}
> Future Master::Http::getHealth(
> const master::Call& call,
> const Option& principal,
> const ContentType& contentType) const
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-5593) Devolve v1 operator protos before using them in Master/Agent.

2016-06-09 Thread Jay Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15323870#comment-15323870
 ] 

Jay Guo edited comment on MESOS-5593 at 6/10/16 4:55 AM:
-

OK. However this evolving/devolving sound heavy and I wonder if we will 
encounter performance issue. Another thing is that we implement Operator APIs 
in different styles, some of them are implemented from scratch and others are 
reusing methods/logics from previous implementation. I imagine we would have 
similar problems at v2. Therefore, we end up with a very mixed and tedious 
codebase.

I don't really have a clear idea on this but my point is that we want to 
maintain performance while enhancing readability. For people who are not 
familiar with the history, first sight of current code is quite intimidating.

Backwards compatibility is always a pain.


was (Author: guoger):
OK. However this evolving/devolving sound heavy and I wonder if we will 
encounter performance issue. Another thing is that we implement Operator APIs 
in different styles, some of them are implemented from scratch and others are 
reusing methods/logics from previous implementation. I imagine we would have 
similar problems at v2. Therefore, we end up with a very mixed and tedious 
codebase.

> Devolve v1 operator protos before using them in Master/Agent.
> -
>
> Key: MESOS-5593
> URL: https://issues.apache.org/jira/browse/MESOS-5593
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Anand Mazumdar
>Assignee: haosdent
>Priority: Critical
>  Labels: mesosphere
>
> We had adopted the following workflow for the Scheduler/Executor endpoints on 
> the Master/Agent.
> - The user makes a call to the versioned endpoint with a versioned protobuf. 
> e.g., {{v1::mesos::Call}}
> - We {{devolve}} the versioned protobuf into an unversioned protobuf before 
> using it internally.
> {code}
> scheduler::Call call = devolve(v1Call);
> {code}
> The above approach has the advantage that the internal Mesos code only has to 
> deal with unversioned protobufs. It looks like we have not been following 
> this idiom for the Operator API. We should create a unversioned protobuf file 
> similar to we did for the Scheduler/Executor API and then {{devolve}} the 
> versioned protobufs. (e.g., mesos/master/master.proto)
> The signature of some of the operator endpoints would then change to only be 
> dealing with unversioned protobufs:
> {code}
> Future Master::Http::getHealth(
> const master::Call& call,
> const Option& principal,
> const ContentType& contentType) const
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5593) Devolve v1 operator protos before using them in Master/Agent.

2016-06-09 Thread Jay Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15323896#comment-15323896
 ] 

Jay Guo commented on MESOS-5593:


OK.

BTW, do you guys do other forms of scale tests other than concurrency test in 
MESOS-5222?

> Devolve v1 operator protos before using them in Master/Agent.
> -
>
> Key: MESOS-5593
> URL: https://issues.apache.org/jira/browse/MESOS-5593
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Anand Mazumdar
>Assignee: haosdent
>Priority: Critical
>  Labels: mesosphere
>
> We had adopted the following workflow for the Scheduler/Executor endpoints on 
> the Master/Agent.
> - The user makes a call to the versioned endpoint with a versioned protobuf. 
> e.g., {{v1::mesos::Call}}
> - We {{devolve}} the versioned protobuf into an unversioned protobuf before 
> using it internally.
> {code}
> scheduler::Call call = devolve(v1Call);
> {code}
> The above approach has the advantage that the internal Mesos code only has to 
> deal with unversioned protobufs. It looks like we have not been following 
> this idiom for the Operator API. We should create a unversioned protobuf file 
> similar to we did for the Scheduler/Executor API and then {{devolve}} the 
> versioned protobufs. (e.g., mesos/master/master.proto)
> The signature of some of the operator endpoints would then change to only be 
> dealing with unversioned protobufs:
> {code}
> Future Master::Http::getHealth(
> const master::Call& call,
> const Option& principal,
> const ContentType& contentType) const
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5593) Devolve v1 operator protos before using them in Master/Agent.

2016-06-09 Thread Jay Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15323895#comment-15323895
 ] 

Jay Guo commented on MESOS-5593:


 [~haosd...@gmail.com] let me know when you are done with internal master.proto 
and I will rebase accordingly.

> Devolve v1 operator protos before using them in Master/Agent.
> -
>
> Key: MESOS-5593
> URL: https://issues.apache.org/jira/browse/MESOS-5593
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Anand Mazumdar
>Assignee: haosdent
>Priority: Critical
>  Labels: mesosphere
>
> We had adopted the following workflow for the Scheduler/Executor endpoints on 
> the Master/Agent.
> - The user makes a call to the versioned endpoint with a versioned protobuf. 
> e.g., {{v1::mesos::Call}}
> - We {{devolve}} the versioned protobuf into an unversioned protobuf before 
> using it internally.
> {code}
> scheduler::Call call = devolve(v1Call);
> {code}
> The above approach has the advantage that the internal Mesos code only has to 
> deal with unversioned protobufs. It looks like we have not been following 
> this idiom for the Operator API. We should create a unversioned protobuf file 
> similar to we did for the Scheduler/Executor API and then {{devolve}} the 
> versioned protobufs. (e.g., mesos/master/master.proto)
> The signature of some of the operator endpoints would then change to only be 
> dealing with unversioned protobufs:
> {code}
> Future Master::Http::getHealth(
> const master::Call& call,
> const Option& principal,
> const ContentType& contentType) const
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3302) Scheduler API v1 improvements

2016-05-25 Thread Jay Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15300176#comment-15300176
 ] 

Jay Guo commented on MESOS-3302:


[~vinodkone]

We are manually testing HTTP APIs now and here are some observations:

*Cluster setup:*
* Bring up 3 masters, 3 agents, 3 zookeepers
* Agents should be started with --use_http_command_executor flag (which uses 
http command executor)
* Start long lived framework (which uses http scheduler api)

*Test cases:*
* Restart leading master
_The framework is started with {{--master=}}. Therefore, it always 
talks to fixed master no matter being leader or follower._ 
*Expected:* {{307 Temporary Redirect}} and scheduler actually handles redirect 
and talks to real leader master, and these should be transparent to framework
*Actual:* It reports this back to framework.
Is this intended behaviour? On the other hand, when framework is started with 
--master=zk://... it correctly handles master detection and resumes when new 
leader master is elected. Although master detection happens continuously 
without a break. Do we consider to introduce an interval?

* Restart agent
*Expected:* Workload is migrated to other agents if current agent is down for a 
period longer than timeout, therefore removed. If agent is resurrected within 
the timeout, it resumes the tasks.
*Actual:* Framework keeps waiting for the agent to recover. It does resume 
working if agent is back in time. Otherwise, it keeps waiting indefinitely.
I guess this is reasonable since that long-lived-framework declines other 
offers, which will not be offered again to this framework. I don't see there's 
an option to expire the decline-offer-filter though, or am I missing something?
There are also chances that the agent resumes running tasks for a little while 
and then _asked to terminate_ by master. This is somewhat flaky, need to 
investigate further.

* Restart long lived framework
*Expected:* Recover
*Actual:* Recover

* Restart all masters at once
Same behaviour as _restarting leading master_

* Emulate network partitions (1 way - 2 way) between long lived framework and 
master
_network partition is emulated at tcp layer using iptables rule {{iptables -A 
INPUT -p tcp -s  -dport 5050 -j DROP}}
** One-way: Master <--X-- Framework
For most cases it works as expected: framework simply hangs. Agent keeps 
resending messages since acknowledgements are blocked. When block is lifted, 
everything resumes to work. However there was once that agent keeps launching 
new tasks without framework being aware of it during partition. Need to find a 
way to reproduce it. I guess it has something to do with the status when 
network is cut.
** Two-way: WIP

* Restart leading Zookeeper
WIP

* Restart all Zookeepers at once
WIP

> Scheduler API v1 improvements
> -
>
> Key: MESOS-3302
> URL: https://issues.apache.org/jira/browse/MESOS-3302
> Project: Mesos
>  Issue Type: Epic
>Reporter: Marco Massenzio
>  Labels: mesosphere, twitter
>
> This Epic covers all the refinements that we may want to build on top of the 
> {{HTTP API}} MVP epic (MESOS-2288) which was released initially with Mesos 
> {{0.24.0}}.
> The tasks/stories here cover the necessary work to bring the API v1 to what 
> we would regard as "Production-ready" state in preparation for the {{1.0.0}} 
> release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-5451) Show Framework ID in log for long-lived-framework

2016-05-25 Thread Jay Guo (JIRA)
Jay Guo created MESOS-5451:
--

 Summary: Show Framework ID in log for long-lived-framework
 Key: MESOS-5451
 URL: https://issues.apache.org/jira/browse/MESOS-5451
 Project: Mesos
  Issue Type: Bug
  Components: framework
Reporter: Jay Guo
Assignee: Jay Guo
Priority: Trivial


In long-lived-framework, framework id is not shown if registered for the first 
time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5451) Show Framework ID in log for long-lived-framework

2016-05-25 Thread Jay Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15299559#comment-15299559
 ] 

Jay Guo commented on MESOS-5451:


RR: https://reviews.apache.org/r/47816/

> Show Framework ID in log for long-lived-framework
> -
>
> Key: MESOS-5451
> URL: https://issues.apache.org/jira/browse/MESOS-5451
> Project: Mesos
>  Issue Type: Bug
>  Components: framework
>Reporter: Jay Guo
>Assignee: Jay Guo
>Priority: Trivial
>
> In long-lived-framework, framework id is not shown if registered for the 
> first time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-5493) Implement GET_TASKS Call in v1 master API.

2016-05-30 Thread Jay Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Guo reassigned MESOS-5493:
--

Assignee: Jay Guo

> Implement GET_TASKS Call in v1 master API.
> --
>
> Key: MESOS-5493
> URL: https://issues.apache.org/jira/browse/MESOS-5493
> Project: Mesos
>  Issue Type: Task
>Reporter: Vinod Kone
>Assignee: Jay Guo
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5493) Implement GET_TASKS Call in v1 master API.

2016-05-30 Thread Jay Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15306802#comment-15306802
 ] 

Jay Guo commented on MESOS-5493:


RR: https://reviews.apache.org/r/48046/

Need some comments on how to refactor out {{jsonify}} and parse it back to 
JSON::Object, and further into Response message.

> Implement GET_TASKS Call in v1 master API.
> --
>
> Key: MESOS-5493
> URL: https://issues.apache.org/jira/browse/MESOS-5493
> Project: Mesos
>  Issue Type: Task
>Reporter: Vinod Kone
>Assignee: Jay Guo
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-5407) Slave/Agent rename: diagrams in docs

2016-05-26 Thread Jay Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Guo reassigned MESOS-5407:
--

Assignee: Jay Guo

> Slave/Agent rename: diagrams in docs
> 
>
> Key: MESOS-5407
> URL: https://issues.apache.org/jira/browse/MESOS-5407
> Project: Mesos
>  Issue Type: Bug
>  Components: documentation
>Reporter: Jay Guo
>Assignee: Jay Guo
>Priority: Minor
>
> Rename 'slave' in diagrams



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5468) Add logic in long-lived-framework to handle network partitions.

2016-05-27 Thread Jay Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15304360#comment-15304360
 ] 

Jay Guo commented on MESOS-5468:


What is your iptables command? I can constantly reproduce the problem on latest 
build.

* How long does it take for master to disconnect the framework after network 
partition {{iptables command issued}}?

* Do tcp sockets go into FIN_WAIT_1 state?

I think the point is how does a master notice network partition? IIUC, it 
relies on tcp socket timeout, which is typically 13-30 min on a linux box 
(manpage of tcp), and that is the duration I experienced between disconnect and 
give-up. And at this point, tcp socket informs user (mesos-master) of broken 
link while remaining ESTABLISHED. It is up to the app now to handle this 
failure and I suspect that libprocess does not properly close the socket here. 
I'll need to do some more investigation.

I see other users experiencing {{Transport endpoint is not connected}} error 
and I personally see this for many times as well. So I think we should 
definitely take a serious look into that.

Another question, why don't we use a mature http library at the very beginning, 
instead of having our own implementation?

Cheers,
/J

> Add logic in long-lived-framework to handle network partitions.
> ---
>
> Key: MESOS-5468
> URL: https://issues.apache.org/jira/browse/MESOS-5468
> Project: Mesos
>  Issue Type: Task
>  Components: framework, master
>Reporter: Jay Guo
>
> Currently long-lived-framework does not handle network partitions i.e 
> explicitly trying to {{reconnect}} with the master upon not receiving 
> {{HEARTBEAT}} events for a prolonged amount of time. If the master 
> disconnects a framework without the framework being aware of it (one way 
> partition), the framework should explicitly issue a {{reconnect}} request via 
> the scheduler library after a certain period of time.
> *On the other hand*, should we close TCP socket on master side when teardown 
> a framework? Currently the tcp socket is left alive even framework has been 
> deactivated. This results in framework sending invalid {{Call}} to master and 
> re-detection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5468) Add logic in long-lived-framework to handle network partitions.

2016-05-31 Thread Jay Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15309244#comment-15309244
 ] 

Jay Guo commented on MESOS-5468:


[~anandmazumdar] Sorry for the delay.
One out of two connections between framework and master is successfully closed, 
however another one is left ESTABLISHED when master attempts to remove the 
framework. Upon network rejoin, master repeatedly denied subscription call from 
framework. So the question is, is the EVENT connection left open intentionally 
or accidentally?

Here's the full log:
{code:title=master.log}
I0601 12:12:03.671700  2252 master.cpp:5195] Status update TASK_FINISHED (UUID: 
e370dac6-2915-4090-876f-c000d0fe71c7) for task 3 of framework 
e8288e1d-2c05-4e05-9db7-713a366f7f5f- from agent 
edbc3730-e55b-4390-a1f2-5de5a66497f5-S0 at slave(1)@127.0.1.1:5051 (ubuntu)
I0601 12:12:03.671931  2252 master.cpp:5243] Forwarding status update 
TASK_FINISHED (UUID: e370dac6-2915-4090-876f-c000d0fe71c7) for task 3 of 
framework e8288e1d-2c05-4e05-9db7-713a366f7f5f-
I0601 12:12:03.672360  2252 master.cpp:6853] Updating the state of task 3 of 
framework e8288e1d-2c05-4e05-9db7-713a366f7f5f- (latest state: 
TASK_FINISHED, status update state: TASK_FINISHED)
I0601 12:14:43.677433  2247 master.cpp:5195] Status update TASK_FINISHED (UUID: 
e370dac6-2915-4090-876f-c000d0fe71c7) for task 3 of framework 
e8288e1d-2c05-4e05-9db7-713a366f7f5f- from agent 
edbc3730-e55b-4390-a1f2-5de5a66497f5-S0 at slave(1)@127.0.1.1:5051 (ubuntu)
I0601 12:14:43.677781  2247 master.cpp:5243] Forwarding status update 
TASK_FINISHED (UUID: e370dac6-2915-4090-876f-c000d0fe71c7) for task 3 of 
framework e8288e1d-2c05-4e05-9db7-713a366f7f5f-
I0601 12:14:43.678387  2247 master.cpp:6853] Updating the state of task 3 of 
framework e8288e1d-2c05-4e05-9db7-713a366f7f5f- (latest state: 
TASK_FINISHED, status update state: TASK_FINISHED)
I0601 12:20:03.679064  2251 master.cpp:5195] Status update TASK_FINISHED (UUID: 
e370dac6-2915-4090-876f-c000d0fe71c7) for task 3 of framework 
e8288e1d-2c05-4e05-9db7-713a366f7f5f- from agent 
edbc3730-e55b-4390-a1f2-5de5a66497f5-S0 at slave(1)@127.0.1.1:5051 (ubuntu)
I0601 12:20:03.679194  2251 master.cpp:5243] Forwarding status update 
TASK_FINISHED (UUID: e370dac6-2915-4090-876f-c000d0fe71c7) for task 3 of 
framework e8288e1d-2c05-4e05-9db7-713a366f7f5f-
I0601 12:20:03.679565  2251 master.cpp:6853] Updating the state of task 3 of 
framework e8288e1d-2c05-4e05-9db7-713a366f7f5f- (latest state: 
TASK_FINISHED, status update state: TASK_FINISHED)
E0601 12:25:02.891707  2254 process.cpp:2040] Failed to shutdown socket with fd 
13: Transport endpoint is not connected
I0601 12:25:02.895753  2248 master.cpp:1388] Framework 
e8288e1d-2c05-4e05-9db7-713a366f7f5f- (Long Lived Framework (C++)) 
disconnected
I0601 12:25:02.896077  2248 master.cpp:2822] Disconnecting framework 
e8288e1d-2c05-4e05-9db7-713a366f7f5f- (Long Lived Framework (C++))
I0601 12:25:02.896289  2248 master.cpp:2846] Deactivating framework 
e8288e1d-2c05-4e05-9db7-713a366f7f5f- (Long Lived Framework (C++))
W0601 12:25:02.896682  2248 master.hpp:1903] Master attempted to send message 
to disconnected framework e8288e1d-2c05-4e05-9db7-713a366f7f5f- (Long Lived 
Framework (C++))
W0601 12:25:02.897027  2248 master.hpp:1909] Unable to send event to framework 
e8288e1d-2c05-4e05-9db7-713a366f7f5f- (Long Lived Framework (C++)): 
connection closed
I0601 12:25:02.897341  2248 master.cpp:1401] Giving framework 
e8288e1d-2c05-4e05-9db7-713a366f7f5f- (Long Lived Framework (C++)) 0ns to 
failover
I0601 12:25:02.896751  2249 hierarchical.cpp:375] Deactivated framework 
e8288e1d-2c05-4e05-9db7-713a366f7f5f-
I0601 12:25:02.901005  2251 master.cpp:5608] Framework failover timeout, 
removing framework e8288e1d-2c05-4e05-9db7-713a366f7f5f- (Long Lived 
Framework (C++))
I0601 12:25:02.901053  2251 master.cpp:6338] Removing framework 
e8288e1d-2c05-4e05-9db7-713a366f7f5f- (Long Lived Framework (C++))
I0601 12:25:02.901409  2251 master.cpp:6853] Updating the state of task 3 of 
framework e8288e1d-2c05-4e05-9db7-713a366f7f5f- (latest state: 
TASK_FINISHED, status update state: TASK_KILLED)
I0601 12:25:02.901449  2251 master.cpp:6919] Removing task 3 with resources 
cpus(*):0.001; mem(*):1 of framework e8288e1d-2c05-4e05-9db7-713a366f7f5f- 
on agent edbc3730-e55b-4390-a1f2-5de5a66497f5-S0 at slave(1)@127.0.1.1:5051 
(ubuntu)
I0601 12:25:02.901721  2251 master.cpp:6948] Removing executor 'default' with 
resources cpus(*):0.1; mem(*):32 of framework 
e8288e1d-2c05-4e05-9db7-713a366f7f5f- on agent 
edbc3730-e55b-4390-a1f2-5de5a66497f5-S0 at slave(1)@127.0.1.1:5051 (ubuntu)
I0601 12:25:02.902426  2251 hierarchical.cpp:326] Removed framework 
e8288e1d-2c05-4e05-9db7-713a366f7f5f-
W0601 12:25:08.007905  2253 master.cpp:5291] Ignoring unknown exited executor 
'default' 

[jira] [Commented] (MESOS-5468) Add logic in long-lived-framework to handle network partitions.

2016-05-27 Thread Jay Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15303589#comment-15303589
 ] 

Jay Guo commented on MESOS-5468:


Another question, how long do we timeout a framework? I don't see the option in 
configurations. Or are we using other mechanisms to invalidate a framework 
instead of timeout?

> Add logic in long-lived-framework to handle network partitions.
> ---
>
> Key: MESOS-5468
> URL: https://issues.apache.org/jira/browse/MESOS-5468
> Project: Mesos
>  Issue Type: Task
>  Components: framework, master
>Reporter: Jay Guo
>
> Currently long-lived-framework does not handle network partitions i.e 
> explicitly trying to {{reconnect}} with the master upon not receiving 
> {{HEARTBEAT}} events for a prolonged amount of time. If the master 
> disconnects a framework without the framework being aware of it (one way 
> partition), the framework should explicitly issue a {{reconnect}} request via 
> the scheduler library after a certain period of time.
> *On the other hand*, should we close TCP socket on master side when teardown 
> a framework? Currently the tcp socket is left alive even framework has been 
> deactivated. This results in framework sending invalid {{Call}} to master and 
> re-detection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5468) Add logic to long-lived-framework to handle HEARTBEAT timeout

2016-05-26 Thread Jay Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15303509#comment-15303509
 ] 

Jay Guo commented on MESOS-5468:


To reproduce:
* Start master and agent
* Run long-lived-framework
* Issue {{# iptables -A OUTPUT -p tcp -d  --dport 5050 -j DROP}} on 
framework machine to emulate network partition
* Wait till master deactivates the framework
* Remove iptables rule added above to emulate network rejoin
* See log of both long-lived-framework and master. {{netstat -tpn}} also shows 
enormous {{TIME_WAIT}} sockets which is the result of re-detection

> Add logic to long-lived-framework to handle HEARTBEAT timeout
> -
>
> Key: MESOS-5468
> URL: https://issues.apache.org/jira/browse/MESOS-5468
> Project: Mesos
>  Issue Type: Bug
>  Components: framework, master
>Reporter: Jay Guo
>
> Currently long-lived-framework does not handle HEARTBEAT timeout. If master 
> teardown the framework without framework being aware of it (network 
> partition), the framework keeps waiting for {{Event}} until reconnected.
> *On the other hand*, should we close TCP socket on master side when teardown 
> a framework? Currently the tcp socket is left alive even framework has been 
> deactivated. This results in framework sending invalid {{Call}} to master and 
> re-detection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-5468) Add logic to long-lived-framework to handle HEARTBEAT timeout

2016-05-26 Thread Jay Guo (JIRA)
Jay Guo created MESOS-5468:
--

 Summary: Add logic to long-lived-framework to handle HEARTBEAT 
timeout
 Key: MESOS-5468
 URL: https://issues.apache.org/jira/browse/MESOS-5468
 Project: Mesos
  Issue Type: Bug
  Components: framework, master
Reporter: Jay Guo


Currently long-lived-framework does not handle HEARTBEAT timeout. If master 
teardown the framework without framework being aware of it (network partition), 
the framework keeps waiting for {{Event}} until reconnected.

*On the other hand*, should we close TCP socket on master side when teardown a 
framework? Currently the tcp socket is left alive even framework has been 
deactivated. This results in framework sending invalid {{Call}} to master and 
re-detection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5468) Add logic in long-lived-framework to handle network partitions.

2016-05-27 Thread Jay Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15303587#comment-15303587
 ] 

Jay Guo commented on MESOS-5468:


See steps to reproduce in my first comment.

> Add logic in long-lived-framework to handle network partitions.
> ---
>
> Key: MESOS-5468
> URL: https://issues.apache.org/jira/browse/MESOS-5468
> Project: Mesos
>  Issue Type: Task
>  Components: framework, master
>Reporter: Jay Guo
>
> Currently long-lived-framework does not handle network partitions i.e 
> explicitly trying to {{reconnect}} with the master upon not receiving 
> {{HEARTBEAT}} events for a prolonged amount of time. If the master 
> disconnects a framework without the framework being aware of it (one way 
> partition), the framework should explicitly issue a {{reconnect}} request via 
> the scheduler library after a certain period of time.
> *On the other hand*, should we close TCP socket on master side when teardown 
> a framework? Currently the tcp socket is left alive even framework has been 
> deactivated. This results in framework sending invalid {{Call}} to master and 
> re-detection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5468) Add logic in long-lived-framework to handle network partitions.

2016-05-27 Thread Jay Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15303581#comment-15303581
 ] 

Jay Guo commented on MESOS-5468:


[~anandmazumdar]
The socket is NOT successfully closed and still left in ESTABLISHED (can be 
observed from {{netstat}}). And I suspect it somehow happens before master 
explicitly issues close. Here's the log:
{code:title=master.log}
E0527 05:48:45.564194 13105 process.cpp:2033] Failed to shutdown socket with fd 
33: Transport endpoint is not connected
I0527 05:48:45.573005 13101 master.cpp:1383] Framework 
61100b89-f964-4aa2-b084-e1089d205b83- (Long Lived Framework (C++)) 
disconnected
I0527 05:48:45.573212 13101 master.cpp:2792] Disconnecting framework 
61100b89-f964-4aa2-b084-e1089d205b83- (Long Lived Framework (C++))
I0527 05:48:45.573431 13101 master.cpp:2816] Deactivating framework 
61100b89-f964-4aa2-b084-e1089d205b83- (Long Lived Framework (C++))
W0527 05:48:45.574806 13101 master.hpp:1846] Master attempted to send message 
to disconnected framework 61100b89-f964-4aa2-b084-e1089d205b83- (Long Lived 
Framework (C++))
I0527 05:48:45.575145 13100 hierarchical.cpp:375] Deactivated framework 
61100b89-f964-4aa2-b084-e1089d205b83-
W0527 05:48:45.580201 13101 master.hpp:1852] Unable to send event to framework 
61100b89-f964-4aa2-b084-e1089d205b83- (Long Lived Framework (C++)): 
connection closed
W0527 05:48:45.581838 13101 master.hpp:1846] Master attempted to send message 
to disconnected framework 61100b89-f964-4aa2-b084-e1089d205b83- (Long Lived 
Framework (C++))
W0527 05:48:45.582034 13101 master.hpp:1852] Unable to send event to framework 
61100b89-f964-4aa2-b084-e1089d205b83- (Long Lived Framework (C++)): 
connection closed
W0527 05:48:45.583015 13101 master.hpp:1846] Master attempted to send message 
to disconnected framework 61100b89-f964-4aa2-b084-e1089d205b83- (Long Lived 
Framework (C++))
W0527 05:48:45.583124 13101 master.hpp:1852] Unable to send event to framework 
61100b89-f964-4aa2-b084-e1089d205b83- (Long Lived Framework (C++)): 
connection closed
I0527 05:48:45.583395 13101 master.cpp:1396] Giving framework 
61100b89-f964-4aa2-b084-e1089d205b83- (Long Lived Framework (C++)) 0ns to 
failover
I0527 05:48:45.585503 13102 master.cpp:5516] Framework failover timeout, 
removing framework 61100b89-f964-4aa2-b084-e1089d205b83- (Long Lived 
Framework (C++))
I0527 05:48:45.585793 13102 master.cpp:6246] Removing framework 
61100b89-f964-4aa2-b084-e1089d205b83- (Long Lived Framework (C++))
I0527 05:48:45.588471 13102 master.cpp:6761] Updating the state of task 2 of 
framework 61100b89-f964-4aa2-b084-e1089d205b83- (latest state: 
TASK_FINISHED, status update state: TASK_KILLED)
I0527 05:48:45.589534 13102 master.cpp:6827] Removing task 2 with resources 
cpus(*):0.001; mem(*):1 of framework 61100b89-f964-4aa2-b084-e1089d205b83- 
on agent af46d7b0-4e75-443d-9e11-e89d5605f012-S2 at slave(1)@10.11.13.10:5051 
(agent-3.novalocal)
I0527 05:48:45.590454 13102 master.cpp:6856] Removing executor 'default' with 
resources cpus(*):0.1; mem(*):32 of framework 
61100b89-f964-4aa2-b084-e1089d205b83- on agent 
af46d7b0-4e75-443d-9e11-e89d5605f012-S2 at slave(1)@10.11.13.10:5051 
(agent-3.novalocal)
I0527 05:48:45.592897 13100 hierarchical.cpp:326] Removed framework 
61100b89-f964-4aa2-b084-e1089d205b83-
W0527 05:48:50.662726 13098 master.cpp:5199] Ignoring unknown exited executor 
'default' of framework 61100b89-f964-4aa2-b084-e1089d205b83- on agent 
af46d7b0-4e75-443d-9e11-e89d5605f012-S2 at slave(1)@10.11.13.10:5051 
(agent-3.novalocal)
{code}

The build is not super fresh (within 1 week), so you may find line number not 
consistent with latest code.

> Add logic in long-lived-framework to handle network partitions.
> ---
>
> Key: MESOS-5468
> URL: https://issues.apache.org/jira/browse/MESOS-5468
> Project: Mesos
>  Issue Type: Task
>  Components: framework, master
>Reporter: Jay Guo
>
> Currently long-lived-framework does not handle network partitions i.e 
> explicitly trying to {{reconnect}} with the master upon not receiving 
> {{HEARTBEAT}} events for a prolonged amount of time. If the master 
> disconnects a framework without the framework being aware of it (one way 
> partition), the framework should explicitly issue a {{reconnect}} request via 
> the scheduler library after a certain period of time.
> *On the other hand*, should we close TCP socket on master side when teardown 
> a framework? Currently the tcp socket is left alive even framework has been 
> deactivated. This results in framework sending invalid {{Call}} to master and 
> re-detection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-5518) Implement GET_CONTAINERS Call in v1 agent API.

2016-05-29 Thread Jay Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Guo reassigned MESOS-5518:
--

Assignee: Jay Guo

> Implement GET_CONTAINERS Call in v1 agent API.
> --
>
> Key: MESOS-5518
> URL: https://issues.apache.org/jira/browse/MESOS-5518
> Project: Mesos
>  Issue Type: Task
>Reporter: Vinod Kone
>Assignee: Jay Guo
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-5490) Implement GET_STATE_SUMMARY Call in v1 master API.

2016-06-22 Thread Jay Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Guo reassigned MESOS-5490:
--

Assignee: Jay Guo

> Implement GET_STATE_SUMMARY Call in v1 master API.
> --
>
> Key: MESOS-5490
> URL: https://issues.apache.org/jira/browse/MESOS-5490
> Project: Mesos
>  Issue Type: Task
>Reporter: Vinod Kone
>Assignee: Jay Guo
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-4732) Migrate rest of the endpoints to use `jsonify`

2016-06-22 Thread Jay Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15344035#comment-15344035
 ] 

Jay Guo edited comment on MESOS-4732 at 6/22/16 9:56 AM:
-

[~neilconway] [~mcypark]
Is this migration is still active? I'm working on v1 operator API and I 
observed that some of existing endpoints are not transformed to use 
{{jsonify}}. I wonder whether it makes sense at all to rework those endpoints 
to use {{jsonify}}, since we are refactoring them anyway. The particular API 
I'm looking at right now is {{slave/containers}}


was (Author: guoger):
[~neilconway][~mcypark]
Is this migration is still active? I'm working on v1 operator API and I 
observed that some of existing endpoints are not transformed to use 
{{jsonify}}. I wonder whether it makes sense at all to rework those endpoints 
to use {{jsonify}}, since we are refactoring them anyway. The particular API 
I'm looking at right now is {{slave/containers}}

> Migrate rest of the endpoints to use `jsonify`
> --
>
> Key: MESOS-4732
> URL: https://issues.apache.org/jira/browse/MESOS-4732
> Project: Mesos
>  Issue Type: Task
>  Components: master
>Reporter: Michael Park
>Assignee: Neil Conway
>
> As MVP, we shipped `/state` and `/state-summary` to use `jsonify`. We need to 
> follow through with the migration of the rest of the endpoints to use 
> `jsonify` as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4732) Migrate rest of the endpoints to use `jsonify`

2016-06-22 Thread Jay Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15344035#comment-15344035
 ] 

Jay Guo commented on MESOS-4732:


[~neilconway][~mcypark]
Is this migration is still active? I'm working on v1 operator API and I 
observed that some of existing endpoints are not transformed to use 
{{jsonify}}. I wonder whether it makes sense at all to rework those endpoints 
to use {{jsonify}}, since we are refactoring them anyway. The particular API 
I'm looking at right now is {{slave/containers}}

> Migrate rest of the endpoints to use `jsonify`
> --
>
> Key: MESOS-4732
> URL: https://issues.apache.org/jira/browse/MESOS-4732
> Project: Mesos
>  Issue Type: Task
>  Components: master
>Reporter: Michael Park
>Assignee: Neil Conway
>
> As MVP, we shipped `/state` and `/state-summary` to use `jsonify`. We need to 
> follow through with the migration of the rest of the endpoints to use 
> `jsonify` as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-3505) Support specifying Docker image by Image ID.

2016-06-24 Thread Jay Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Guo reassigned MESOS-3505:
--

Assignee: Jay Guo

> Support specifying Docker image by Image ID.
> 
>
> Key: MESOS-3505
> URL: https://issues.apache.org/jira/browse/MESOS-3505
> Project: Mesos
>  Issue Type: Story
>Reporter: Yan Xu
>Assignee: Jay Guo
>  Labels: mesosphere
>
> A common way to specify a Docker image with the docker engine is through 
> {{repo:tag}}, which is convenient and sufficient for most people in most 
> scenarios. However this combination is neither precise nor immutable.
> For this reason, it's possible when an image with a {{repo:tag}} already 
> cached locally on an agent host and a task requiring this {{repo:tag}} 
> arrives, it's using an image that's different than the one the user intended.
> Docker CLI already supports referring to an image by {{repo@id}}, where the 
> ID can have two forms:
> * v1 Image ID
> * digest
> Native Mesos provisioner should support the same for Docker images. IMO it's 
> fine if image discovery by ID is not supported (and thus still requiring 
> {{repo:tag}} to be specified) (looks like [v2 
> registry|http://docs.docker.com/registry/spec/api/] does support it) but the 
> user can optionally specify an image ID and match it against the cached / 
> newly pulled image. If the ID doesn't match the cached image, the store can 
> re-pull it; if the ID doesn't match the newly pulled image (manifest), the 
> provisioner can fail the request without having the user unknowingly running 
> its task on the wrong image.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5227) Implement HTTP Docker Executor that uses the Executor Library

2016-06-26 Thread Jay Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15350409#comment-15350409
 ] 

Jay Guo commented on MESOS-5227:


It would be great if you could put a few words on each review link to summarize.

> Implement HTTP Docker Executor that uses the Executor Library
> -
>
> Key: MESOS-5227
> URL: https://issues.apache.org/jira/browse/MESOS-5227
> Project: Mesos
>  Issue Type: Bug
>Reporter: Vinod Kone
>Assignee: Yong Tang
>
> Similar to what we did with the HTTP command executor in MESOS-3558 we should 
> have a HTTP docker executor that can speak the v1 Executor API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-3727) File permission inconsistency for mesos-master executable and mesos-init-wrapper.

2016-02-22 Thread Jay Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Guo reassigned MESOS-3727:
--

Assignee: (was: Jay Guo)

> File permission inconsistency for mesos-master executable and 
> mesos-init-wrapper.
> -
>
> Key: MESOS-3727
> URL: https://issues.apache.org/jira/browse/MESOS-3727
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.25.0
>Reporter: Sarjeet Singh
>Priority: Trivial
>
> There seems some file permission inconsistency for mesos-master executable 
> and mesos-init-wrapper script with mesos-version 0.25.
> node-1:~# dpkg -l | grep mesos
> ii  mesos   0.25.0-0.2.70.ubuntu1404
> node-1:~# ls -ld /usr/sbin/mesos-master
> -rwxr-xr-x 1 root root 289173 Oct 12 14:07 /usr/sbin/mesos-master
> node-1:~# ls -ld /usr/bin/mesos-init-wrapper
> -rwxrwx--- 1 root root 5202 Oct  1 11:17 /usr/bin/mesos-init-wrapper
> Observed the issue when tried to execute the mesos-master executable with 
> non-root user and since, init-wrapper doesn't have any non-root user 
> permission, it didn't get executed and mesos-master didn't get started.
> Should be make these file permission consistent for executable & init-script? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-3727) File permission inconsistency for mesos-master executable and mesos-init-wrapper.

2016-02-22 Thread Jay Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15158251#comment-15158251
 ] 

Jay Guo edited comment on MESOS-3727 at 2/23/16 4:11 AM:
-

We have just confirmed in *0.27.0-0.2.190.ubuntu1404*, the problem persists. We 
should modify permissions of following files in release:

/usr/bin/mesos-init-wrapper 770 --> 775
/etc/default/mesos   640 --> 644
/etc/default/mesos-master   640 --> 644
/etc/default/mesos-slave  640 --> 644

However, where is Mesos release maintained?


was (Author: guoger):
We have just confirmed in *0.27.0-0.2.190.ubuntu1404*, the problem persists. We 
should modify permissions of following files in release:
/usr/bin/mesos-init-wrapper 770 --> 775
/etc/default/mesos   640 --> 644
/etc/default/mesos-master   640 --> 644
/etc/default/mesos-slave  640 --> 644

However, where is Mesos release maintained?

> File permission inconsistency for mesos-master executable and 
> mesos-init-wrapper.
> -
>
> Key: MESOS-3727
> URL: https://issues.apache.org/jira/browse/MESOS-3727
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.25.0
>Reporter: Sarjeet Singh
>Assignee: Jay Guo
>Priority: Trivial
>
> There seems some file permission inconsistency for mesos-master executable 
> and mesos-init-wrapper script with mesos-version 0.25.
> node-1:~# dpkg -l | grep mesos
> ii  mesos   0.25.0-0.2.70.ubuntu1404
> node-1:~# ls -ld /usr/sbin/mesos-master
> -rwxr-xr-x 1 root root 289173 Oct 12 14:07 /usr/sbin/mesos-master
> node-1:~# ls -ld /usr/bin/mesos-init-wrapper
> -rwxrwx--- 1 root root 5202 Oct  1 11:17 /usr/bin/mesos-init-wrapper
> Observed the issue when tried to execute the mesos-master executable with 
> non-root user and since, init-wrapper doesn't have any non-root user 
> permission, it didn't get executed and mesos-master didn't get started.
> Should be make these file permission consistent for executable & init-script? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3481) Add const accessor to Master flags

2016-02-22 Thread Jay Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15158302#comment-15158302
 ] 

Jay Guo commented on MESOS-3481:


We have submitted patch for review: https://reviews.apache.org/r/43868/

IBM community pair: Jay Guo & Zhou Xing

> Add const accessor to Master flags
> --
>
> Key: MESOS-3481
> URL: https://issues.apache.org/jira/browse/MESOS-3481
> Project: Mesos
>  Issue Type: Task
>Reporter: Joseph Wu
>Assignee: zhou xing
>Priority: Trivial
>  Labels: mesosphere, newbie
>
> It would make sense to have an accessor to the master's flags, especially for 
> tests.
> For example, see [this 
> test|https://github.com/apache/mesos/blob/2876b8c918814347dd56f6f87d461e414a90650a/src/tests/master_maintenance_tests.cpp#L1231-L1235].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4580) Consider returning `202` (Accepted) for /reserve and related endpoints

2016-02-22 Thread Jay Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15158385#comment-15158385
 ] 

Jay Guo commented on MESOS-4580:


Hi, we found this bug interesting. Can we proceed to confirm it as accepted and 
contribute? We are quite new to this community and still need to get familiar 
with work processes. Thanks

/IBM Pair: Jay Guo & Zhou Xing

> Consider returning `202` (Accepted) for /reserve and related endpoints
> --
>
> Key: MESOS-4580
> URL: https://issues.apache.org/jira/browse/MESOS-4580
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Reporter: Neil Conway
>Assignee: Jay Guo
>  Labels: mesosphere
>
> We currently return {{200}} (OK) when a POST to {{/reserve}}, {{/unreserve}}, 
> {{/create-volumes}}, and {{/destroy-volumes}} is validated successfully. This 
> is misleading, because the underlying operation is still dispatched 
> asynchronously and might subsequently fail. It would be more accurate to 
> return {{202}} (Accepted) instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-4580) Consider returning `202` (Accepted) for /reserve and related endpoints

2016-02-22 Thread Jay Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Guo reassigned MESOS-4580:
--

Assignee: Jay Guo

> Consider returning `202` (Accepted) for /reserve and related endpoints
> --
>
> Key: MESOS-4580
> URL: https://issues.apache.org/jira/browse/MESOS-4580
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Reporter: Neil Conway
>Assignee: Jay Guo
>  Labels: mesosphere
>
> We currently return {{200}} (OK) when a POST to {{/reserve}}, {{/unreserve}}, 
> {{/create-volumes}}, and {{/destroy-volumes}} is validated successfully. This 
> is misleading, because the underlying operation is still dispatched 
> asynchronously and might subsequently fail. It would be more accurate to 
> return {{202}} (Accepted) instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3727) File permission inconsistency for mesos-master executable and mesos-init-wrapper.

2016-02-22 Thread Jay Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15158251#comment-15158251
 ] 

Jay Guo commented on MESOS-3727:


We have just confirmed in *0.27.0-0.2.190.ubuntu1404*, the problem persists. We 
should modify permissions of following files in release:
/usr/bin/mesos-init-wrapper 770 --> 775
/etc/default/mesos   640 --> 644
/etc/default/mesos-master   640 --> 644
/etc/default/mesos-slave  640 --> 644

However, where is Mesos release maintained?

> File permission inconsistency for mesos-master executable and 
> mesos-init-wrapper.
> -
>
> Key: MESOS-3727
> URL: https://issues.apache.org/jira/browse/MESOS-3727
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.25.0
>Reporter: Sarjeet Singh
>Assignee: Jay Guo
>Priority: Trivial
>
> There seems some file permission inconsistency for mesos-master executable 
> and mesos-init-wrapper script with mesos-version 0.25.
> node-1:~# dpkg -l | grep mesos
> ii  mesos   0.25.0-0.2.70.ubuntu1404
> node-1:~# ls -ld /usr/sbin/mesos-master
> -rwxr-xr-x 1 root root 289173 Oct 12 14:07 /usr/sbin/mesos-master
> node-1:~# ls -ld /usr/bin/mesos-init-wrapper
> -rwxrwx--- 1 root root 5202 Oct  1 11:17 /usr/bin/mesos-init-wrapper
> Observed the issue when tried to execute the mesos-master executable with 
> non-root user and since, init-wrapper doesn't have any non-root user 
> permission, it didn't get executed and mesos-master didn't get started.
> Should be make these file permission consistent for executable & init-script? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-3727) File permission inconsistency for mesos-master executable and mesos-init-wrapper.

2016-02-14 Thread Jay Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Guo reassigned MESOS-3727:
--

Assignee: Jay Guo

> File permission inconsistency for mesos-master executable and 
> mesos-init-wrapper.
> -
>
> Key: MESOS-3727
> URL: https://issues.apache.org/jira/browse/MESOS-3727
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.25.0
>Reporter: Sarjeet Singh
>Assignee: Jay Guo
>Priority: Trivial
>
> There seems some file permission inconsistency for mesos-master executable 
> and mesos-init-wrapper script with mesos-version 0.25.
> node-1:~# dpkg -l | grep mesos
> ii  mesos   0.25.0-0.2.70.ubuntu1404
> node-1:~# ls -ld /usr/sbin/mesos-master
> -rwxr-xr-x 1 root root 289173 Oct 12 14:07 /usr/sbin/mesos-master
> node-1:~# ls -ld /usr/bin/mesos-init-wrapper
> -rwxrwx--- 1 root root 5202 Oct  1 11:17 /usr/bin/mesos-init-wrapper
> Observed the issue when tried to execute the mesos-master executable with 
> non-root user and since, init-wrapper doesn't have any non-root user 
> permission, it didn't get executed and mesos-master didn't get started.
> Should be make these file permission consistent for executable & init-script? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-3481) Add const accessor to Master flags

2016-02-21 Thread Jay Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Guo reassigned MESOS-3481:
--

Assignee: Jay Guo  (was: zhou xing)

> Add const accessor to Master flags
> --
>
> Key: MESOS-3481
> URL: https://issues.apache.org/jira/browse/MESOS-3481
> Project: Mesos
>  Issue Type: Task
>Reporter: Joseph Wu
>Assignee: Jay Guo
>Priority: Trivial
>  Labels: mesosphere, newbie
>
> It would make sense to have an accessor to the master's flags, especially for 
> tests.
> For example, see [this 
> test|https://github.com/apache/mesos/blob/2876b8c918814347dd56f6f87d461e414a90650a/src/tests/master_maintenance_tests.cpp#L1231-L1235].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4930) Update example frameworks in Mesos codebase to assign proper TaskId in order to be sorted correctly in WebUI

2016-03-14 Thread Jay Guo (JIRA)
Jay Guo created MESOS-4930:
--

 Summary: Update example frameworks in Mesos codebase to assign 
proper TaskId in order to be sorted correctly in WebUI
 Key: MESOS-4930
 URL: https://issues.apache.org/jira/browse/MESOS-4930
 Project: Mesos
  Issue Type: Improvement
  Components: framework, webui
Reporter: Jay Guo
Priority: Trivial


Frameworks should assign fixed number of digits to tasks as the TaskId, which 
will be lexically sorted by WebUI in correct order.

For instance, `1`, `2`, `10`, `11` will be sorted to `1`, `10`, `11`, `2`. But 
`001`, `002`, `010`, `011` will be sorted in ascending order.

/src/examples/long_lived_framework.cpp should be updated



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4929) WebUI does not correctly sort tasks when their TaskIds are of different length

2016-03-14 Thread Jay Guo (JIRA)
Jay Guo created MESOS-4929:
--

 Summary: WebUI does not correctly sort tasks when their TaskIds 
are of different length
 Key: MESOS-4929
 URL: https://issues.apache.org/jira/browse/MESOS-4929
 Project: Mesos
  Issue Type: Bug
  Components: webui
Affects Versions: 0.27.1
 Environment: Safari, Firefox, Chrome
Reporter: Jay Guo
Priority: Trivial


On completed tasks page, tasks with multiple length of TaskIds are not 
displayed in correct order when sorting by TaskId.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4929) WebUI does not correctly sort tasks when their TaskIds are of different length

2016-03-14 Thread Jay Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Guo updated MESOS-4929:
---
Attachment: screenshot-1.png

> WebUI does not correctly sort tasks when their TaskIds are of different length
> --
>
> Key: MESOS-4929
> URL: https://issues.apache.org/jira/browse/MESOS-4929
> Project: Mesos
>  Issue Type: Bug
>  Components: webui
>Affects Versions: 0.27.1
> Environment: Safari, Firefox, Chrome
>Reporter: Jay Guo
>Priority: Trivial
> Attachments: taskid.png
>
>
> On completed tasks page, tasks with multiple length of TaskIds are not 
> displayed in correct order when sorting by TaskId.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4929) WebUI does not correctly sort tasks when their TaskIds are of different length

2016-03-14 Thread Jay Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Guo updated MESOS-4929:
---
Attachment: taskid.png

> WebUI does not correctly sort tasks when their TaskIds are of different length
> --
>
> Key: MESOS-4929
> URL: https://issues.apache.org/jira/browse/MESOS-4929
> Project: Mesos
>  Issue Type: Bug
>  Components: webui
>Affects Versions: 0.27.1
> Environment: Safari, Firefox, Chrome
>Reporter: Jay Guo
>Priority: Trivial
> Attachments: taskid.png
>
>
> On completed tasks page, tasks with multiple length of TaskIds are not 
> displayed in correct order when sorting by TaskId.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4929) WebUI does not correctly sort tasks when their TaskIds are of different length

2016-03-14 Thread Jay Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Guo updated MESOS-4929:
---
Attachment: (was: screenshot-1.png)

> WebUI does not correctly sort tasks when their TaskIds are of different length
> --
>
> Key: MESOS-4929
> URL: https://issues.apache.org/jira/browse/MESOS-4929
> Project: Mesos
>  Issue Type: Bug
>  Components: webui
>Affects Versions: 0.27.1
> Environment: Safari, Firefox, Chrome
>Reporter: Jay Guo
>Priority: Trivial
> Attachments: taskid.png
>
>
> On completed tasks page, tasks with multiple length of TaskIds are not 
> displayed in correct order when sorting by TaskId.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3057) Mesos web ui sorting by Id results in non-intuitive order.

2016-03-14 Thread Jay Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15192937#comment-15192937
 ] 

Jay Guo commented on MESOS-3057:


[~haosd...@gmail.com] Then should we at least fix this in example frameworks 
that come with Mesos? We came across this issue while trying 
long-lived-framework in /src/examples/long_lived_framework.cpp

> Mesos web ui sorting by Id results in non-intuitive order.
> --
>
> Key: MESOS-3057
> URL: https://issues.apache.org/jira/browse/MESOS-3057
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.23.0
>Reporter: Cody Roseborough
>Priority: Trivial
>  Labels: newbie
> Attachments: web ui sorted by ids.png
>
>
> In the mesos webui sorting task by ID results in non-intuitive order. For 
> example with Id's task_0-task_200 sorted asc you get task_0, task_1, task_10, 
> task_100... task_109, task_11, task_110 etc. It happens if you use just 
> numbers as Id's also. 
> It seems like it should be sorted using natural sort order.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3057) Mesos web ui sorting by Id results in non-intuitive order.

2016-03-14 Thread Jay Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15192989#comment-15192989
 ] 

Jay Guo commented on MESOS-3057:


OK, I created another ticket https://issues.apache.org/jira/browse/MESOS-4930 
instead of rewriting history here. I could work on it as well if anybody could 
shepherd it.

> Mesos web ui sorting by Id results in non-intuitive order.
> --
>
> Key: MESOS-3057
> URL: https://issues.apache.org/jira/browse/MESOS-3057
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.23.0
>Reporter: Cody Roseborough
>Priority: Trivial
>  Labels: newbie
> Attachments: web ui sorted by ids.png
>
>
> In the mesos webui sorting task by ID results in non-intuitive order. For 
> example with Id's task_0-task_200 sorted asc you get task_0, task_1, task_10, 
> task_100... task_109, task_11, task_110 etc. It happens if you use just 
> numbers as Id's also. 
> It seems like it should be sorted using natural sort order.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4930) Update example frameworks in Mesos codebase to assign proper TaskId in order to be sorted correctly in WebUI

2016-03-15 Thread Jay Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194826#comment-15194826
 ] 

Jay Guo commented on MESOS-4930:


review board link: https://reviews.apache.org/r/44836/

> Update example frameworks in Mesos codebase to assign proper TaskId in order 
> to be sorted correctly in WebUI
> 
>
> Key: MESOS-4930
> URL: https://issues.apache.org/jira/browse/MESOS-4930
> Project: Mesos
>  Issue Type: Improvement
>  Components: framework, webui
>Reporter: Jay Guo
>Assignee: Jay Guo
>Priority: Trivial
>
> Frameworks should assign fixed number of digits to tasks as the TaskId, which 
> will be lexically sorted by WebUI in correct order.
> For instance, `1`, `2`, `10`, `11` will be sorted to `1`, `10`, `11`, `2`. 
> But `001`, `002`, `010`, `011` will be sorted in ascending order.
> /src/examples/long_lived_framework.cpp should be updated



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4891) Add a '/containers' endpoint to the agent to list all the active containers.

2016-03-15 Thread Jay Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194899#comment-15194899
 ] 

Jay Guo commented on MESOS-4891:


[~jieyu] Should this endpoint be added directly to agent or /monitor/containers?

> Add a '/containers' endpoint to the agent to list all the active containers.
> 
>
> Key: MESOS-4891
> URL: https://issues.apache.org/jira/browse/MESOS-4891
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Jie Yu
>Assignee: Jay Guo
>
> This endpoint will be similar to /monitor/statistics.json endpoint, but it'll 
> also contain the 'container_status' about the container (see ContainerStatus 
> in mesos.proto). We'll eventually deprecate the /monitor/statistics.json 
> endpoint.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-4930) Update example frameworks in Mesos codebase to assign proper TaskId in order to be sorted correctly in WebUI

2016-03-15 Thread Jay Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Guo reassigned MESOS-4930:
--

Assignee: Jay Guo

> Update example frameworks in Mesos codebase to assign proper TaskId in order 
> to be sorted correctly in WebUI
> 
>
> Key: MESOS-4930
> URL: https://issues.apache.org/jira/browse/MESOS-4930
> Project: Mesos
>  Issue Type: Improvement
>  Components: framework, webui
>Reporter: Jay Guo
>Assignee: Jay Guo
>Priority: Trivial
>
> Frameworks should assign fixed number of digits to tasks as the TaskId, which 
> will be lexically sorted by WebUI in correct order.
> For instance, `1`, `2`, `10`, `11` will be sorted to `1`, `10`, `11`, `2`. 
> But `001`, `002`, `010`, `011` will be sorted in ascending order.
> /src/examples/long_lived_framework.cpp should be updated



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3781) Replace Master/Slave Terminology Phase I - Add duplicate agent flags

2016-04-12 Thread Jay Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15236912#comment-15236912
 ] 

Jay Guo commented on MESOS-3781:


OK, will do next time. Thanks!

> Replace Master/Slave Terminology Phase I - Add duplicate agent flags 
> -
>
> Key: MESOS-3781
> URL: https://issues.apache.org/jira/browse/MESOS-3781
> Project: Mesos
>  Issue Type: Task
>Reporter: Diana Arroyo
>Assignee: Jay Guo
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4891) Add a '/containers' endpoint to the agent to list all the active containers.

2016-04-10 Thread Jay Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15234422#comment-15234422
 ] 

Jay Guo commented on MESOS-4891:


Docs updated and patch submitted! Please take a look here: 
https://reviews.apache.org/r/45014/

Thanks!!

> Add a '/containers' endpoint to the agent to list all the active containers.
> 
>
> Key: MESOS-4891
> URL: https://issues.apache.org/jira/browse/MESOS-4891
> Project: Mesos
>  Issue Type: Improvement
>  Components: slave
>Reporter: Jie Yu
>Assignee: Jay Guo
>  Labels: mesosphere
>
> This endpoint will be similar to /monitor/statistics.json endpoint, but it'll 
> also contain the 'container_status' about the container (see ContainerStatus 
> in mesos.proto). We'll eventually deprecate the /monitor/statistics.json 
> endpoint.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3781) Replace Master/Slave Terminology Phase I - Add duplicate agent flags

2016-04-11 Thread Jay Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15234590#comment-15234590
 ] 

Jay Guo commented on MESOS-3781:


Hi, I updated the patch https://reviews.apache.org/r/45200/ please take a look. 
BTW, how to change a story to reviewable? I'll do it next time. THx

> Replace Master/Slave Terminology Phase I - Add duplicate agent flags 
> -
>
> Key: MESOS-3781
> URL: https://issues.apache.org/jira/browse/MESOS-3781
> Project: Mesos
>  Issue Type: Task
>Reporter: Diana Arroyo
>Assignee: Jay Guo
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4891) Add a '/containers' endpoint to the agent to list all the active containers.

2016-04-11 Thread Jay Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15234598#comment-15234598
 ] 

Jay Guo commented on MESOS-4891:


gr8, thx!

> Add a '/containers' endpoint to the agent to list all the active containers.
> 
>
> Key: MESOS-4891
> URL: https://issues.apache.org/jira/browse/MESOS-4891
> Project: Mesos
>  Issue Type: Improvement
>  Components: slave
>Reporter: Jie Yu
>Assignee: Jay Guo
>  Labels: mesosphere
> Attachments: screenshot.png
>
>
> This endpoint will be similar to /monitor/statistics.json endpoint, but it'll 
> also contain the 'container_status' about the container (see ContainerStatus 
> in mesos.proto). We'll eventually deprecate the /monitor/statistics.json 
> endpoint.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5038) Added a any mechanism for futures

2016-04-05 Thread Jay Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15227588#comment-15227588
 ] 

Jay Guo commented on MESOS-5038:


BTW, would it be nice to have an await() that handles futures with a number of 
different types?

> Added a any mechanism for futures
> -
>
> Key: MESOS-5038
> URL: https://issues.apache.org/jira/browse/MESOS-5038
> Project: Mesos
>  Issue Type: Improvement
>  Components: libprocess
>Reporter: haosdent
>Assignee: haosdent
>
> Now we already have {{collect}} and {{await}} mechanisms which would wait for 
> a list of {{Future}}. However, we would like to return immediately if any of 
> the list of {{Future}} complete instead of wait for the whole list finished 
> in {{collect}}. The interface of this any mechanism could be
> {code}
> template 
> Future any(const std::list& futures);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4891) Add a '/containers' endpoint to the agent to list all the active containers.

2016-04-11 Thread Jay Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15234544#comment-15234544
 ] 

Jay Guo commented on MESOS-4891:


How to put a story into reviewable?

> Add a '/containers' endpoint to the agent to list all the active containers.
> 
>
> Key: MESOS-4891
> URL: https://issues.apache.org/jira/browse/MESOS-4891
> Project: Mesos
>  Issue Type: Improvement
>  Components: slave
>Reporter: Jie Yu
>Assignee: Jay Guo
>  Labels: mesosphere
>
> This endpoint will be similar to /monitor/statistics.json endpoint, but it'll 
> also contain the 'container_status' about the container (see ContainerStatus 
> in mesos.proto). We'll eventually deprecate the /monitor/statistics.json 
> endpoint.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-3781) Replace Master/Slave Terminology Phase I - Add duplicate agent flags

2016-03-21 Thread Jay Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Guo reassigned MESOS-3781:
--

Assignee: Jay Guo

> Replace Master/Slave Terminology Phase I - Add duplicate agent flags 
> -
>
> Key: MESOS-3781
> URL: https://issues.apache.org/jira/browse/MESOS-3781
> Project: Mesos
>  Issue Type: Task
>Reporter: Diana Arroyo
>Assignee: Jay Guo
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3781) Replace Master/Slave Terminology Phase I - Add duplicate agent flags

2016-03-21 Thread Jay Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15203895#comment-15203895
 ] 

Jay Guo commented on MESOS-3781:


As I pick up this ticket, just wanna confirm the actual requirements. Are we 
going to duplicate following flags with keyword {{agent}}?
In {{src/master/flags.hpp}}
* slave_reregister_timeout
* recovery_slave_removal_limit
* slave_removal_rate_limit
* authenticate_slaves
* slave_ping_timeout
* max_slave_ping_timeouts
* max_executors_per_slave

In {{src/slave/flags.hpp}}
* slave_subsystems

[~darroyo]

> Replace Master/Slave Terminology Phase I - Add duplicate agent flags 
> -
>
> Key: MESOS-3781
> URL: https://issues.apache.org/jira/browse/MESOS-3781
> Project: Mesos
>  Issue Type: Task
>Reporter: Diana Arroyo
>Assignee: Jay Guo
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4580) Consider returning `202` (Accepted) for /reserve and related endpoints

2016-03-25 Thread Jay Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Guo updated MESOS-4580:
---
Assignee: zhou xing  (was: Jay Guo)

> Consider returning `202` (Accepted) for /reserve and related endpoints
> --
>
> Key: MESOS-4580
> URL: https://issues.apache.org/jira/browse/MESOS-4580
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Reporter: Neil Conway
>Assignee: zhou xing
>  Labels: mesosphere
>
> We currently return {{200}} (OK) when a POST to {{/reserve}}, {{/unreserve}}, 
> {{/create-volumes}}, and {{/destroy-volumes}} is validated successfully. This 
> is misleading, because the underlying operation is still dispatched 
> asynchronously and might subsequently fail. It would be more accurate to 
> return {{202}} (Accepted) instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3782) Replace Master/Slave Terminology Phase I - Add duplicate binaries (or create symlinks)

2016-03-25 Thread Jay Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15212734#comment-15212734
 ] 

Jay Guo commented on MESOS-3782:


Long story short, the epic is to replace all keyword `slave` with `agent` in 
the project through several deprecation phase.

This ticket belongs to the epic here: 
https://issues.apache.org/jira/browse/MESOS-1478
And there's a design doc there. Please take a look.

> Replace Master/Slave Terminology Phase I - Add duplicate binaries (or create 
> symlinks)
> --
>
> Key: MESOS-3782
> URL: https://issues.apache.org/jira/browse/MESOS-3782
> Project: Mesos
>  Issue Type: Task
>Reporter: Diana Arroyo
>Assignee: zhou xing
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5038) Added a any mechanism for futures

2016-03-27 Thread Jay Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15213808#comment-15213808
 ] 

Jay Guo commented on MESOS-5038:


what's the use case of this one?

> Added a any mechanism for futures
> -
>
> Key: MESOS-5038
> URL: https://issues.apache.org/jira/browse/MESOS-5038
> Project: Mesos
>  Issue Type: Improvement
>  Components: libprocess
>Reporter: haosdent
>Assignee: haosdent
>
> Now we already have {{collect}} and {{await}} mechanisms which would wait for 
> a list of {{Future}}. However, we would like to return immediately if any of 
> the list of {{Future}} complete instead of wait for the whole list finished 
> in {{collect}}. The interface of this any mechanism could be
> {code}
> template 
> Future any(const std::list& futures);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5038) Added a any mechanism for futures

2016-03-28 Thread Jay Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15213864#comment-15213864
 ] 

Jay Guo commented on MESOS-5038:


[~haosd...@gmail.com] Do you mind giving an example where we may use this in 
current codebase?

> Added a any mechanism for futures
> -
>
> Key: MESOS-5038
> URL: https://issues.apache.org/jira/browse/MESOS-5038
> Project: Mesos
>  Issue Type: Improvement
>  Components: libprocess
>Reporter: haosdent
>Assignee: haosdent
>
> Now we already have {{collect}} and {{await}} mechanisms which would wait for 
> a list of {{Future}}. However, we would like to return immediately if any of 
> the list of {{Future}} complete instead of wait for the whole list finished 
> in {{collect}}. The interface of this any mechanism could be
> {code}
> template 
> Future any(const std::list& futures);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3781) Replace Master/Slave Terminology Phase I - Add duplicate agent flags

2016-03-23 Thread Jay Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15208018#comment-15208018
 ] 

Jay Guo commented on MESOS-3781:


[~vinodkone] we submitted a patch here: https://reviews.apache.org/r/45200/ 

please review. Thx

> Replace Master/Slave Terminology Phase I - Add duplicate agent flags 
> -
>
> Key: MESOS-3781
> URL: https://issues.apache.org/jira/browse/MESOS-3781
> Project: Mesos
>  Issue Type: Task
>Reporter: Diana Arroyo
>Assignee: Jay Guo
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3782) Replace Master/Slave Terminology Phase I - Add duplicate binaries (or create symlinks)

2016-03-31 Thread Jay Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15221165#comment-15221165
 ] 

Jay Guo commented on MESOS-3782:


Or we could copy it twice and change the name in the second copy.

> Replace Master/Slave Terminology Phase I - Add duplicate binaries (or create 
> symlinks)
> --
>
> Key: MESOS-3782
> URL: https://issues.apache.org/jira/browse/MESOS-3782
> Project: Mesos
>  Issue Type: Task
>Reporter: Diana Arroyo
>Assignee: zhou xing
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3784) Replace Master/Slave Terminology Phase I - Update mesos-cli

2016-04-27 Thread Jay Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259728#comment-15259728
 ] 

Jay Guo commented on MESOS-3784:


I took a look at this ticket and here are some findings:
1. root command {{mesos}} collects scripts whose name match {{mesos-*}} in same 
dir as the subcommands. As long as we keep mesos-slave binary and its wrapper 
script, {{slave}} appears in the subcommands, unless we explicitly filter it 
out in {{mesos}} executable. I would suggest to add deprecation warning when 
using mesos-slave.

2. Subcommand's usage info are basically flags descriptions, which are deeply 
coupled with https://issues.apache.org/jira/browse/MESOS-3781 I suggest we 
solve them there.

3. Python utils _mesos-cat_, _mesos-ps_, _mesos-tail_ and _mesos-scp_ do 
contain few hardcoded "slave". Although I wonder whether we still keep these 
utils? In pypi page, it's stated that mesos.cli is deprecated in favor of DCOS 
CLI. https://pypi.python.org/pypi/mesos.cli  For one thing, these utils assume 
a zookeeper setup and resolve master from them, therefore not useful in a 
standalone setup. Secondly, there are some bugs in the util, such as 
referencing variables in {{finally}} which is assigned in {{try}} without a 
default value. Is there someone maintaining these scripts? Should we fix them 
systematically?

[~vinodkone] ideas?

> Replace Master/Slave Terminology Phase I - Update mesos-cli 
> 
>
> Key: MESOS-3784
> URL: https://issues.apache.org/jira/browse/MESOS-3784
> Project: Mesos
>  Issue Type: Task
>Reporter: Diana Arroyo
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-3784) Replace Master/Slave Terminology Phase I - Update mesos-cli

2016-04-27 Thread Jay Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Guo reassigned MESOS-3784:
--

Assignee: Jay Guo

> Replace Master/Slave Terminology Phase I - Update mesos-cli 
> 
>
> Key: MESOS-3784
> URL: https://issues.apache.org/jira/browse/MESOS-3784
> Project: Mesos
>  Issue Type: Task
>Reporter: Diana Arroyo
>Assignee: Jay Guo
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5269) Replace Master/Slave Terminology Phase I - Update Metrics

2016-04-26 Thread Jay Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259440#comment-15259440
 ] 

Jay Guo commented on MESOS-5269:


Agreed. However, WebUI currently relies on JSON attributes returned by endpoint 
/metrics/snapshot, e.g. {{slave/uptime_secs}}. We have two options to duplicate 
returned attributes:
1. Duplicate the metric itself. Although this results in possible inconsistency 
of {{slave/XXX: valueX}} and {{agent/YYY: valueY}}, since metric->value() is 
called twice.
2. Duplicate the field in JSON before return. This simply manipulates JSON 
string before returning it. Although it's in libprocess and having this kinda 
operation there is quite awkward.

Ideas?

> Replace Master/Slave Terminology Phase I - Update Metrics
> -
>
> Key: MESOS-5269
> URL: https://issues.apache.org/jira/browse/MESOS-5269
> Project: Mesos
>  Issue Type: Task
>Reporter: Jay Guo
>
>   process::metrics::Gauge slaves_connected;
>   process::metrics::Gauge slaves_disconnected;
>   process::metrics::Gauge slaves_active;
>   process::metrics::Gauge slaves_inactive;
>   process::metrics::Counter messages_register_slave;
>   process::metrics::Counter messages_reregister_slave;
>   process::metrics::Counter messages_unregister_slave;
>   process::metrics::Counter messages_update_slave;
>   process::metrics::Counter recovery_slave_removals;
>   process::metrics::Counter slave_registrations;
>   process::metrics::Counter slave_reregistrations;
>   process::metrics::Counter slave_removals;
>   process::metrics::Counter slave_removals_reason_unhealthy;
>   process::metrics::Counter slave_removals_reason_unregistered;
>   process::metrics::Counter slave_removals_reason_registered;
>   process::metrics::Counter slave_shutdowns_scheduled;
>   process::metrics::Counter slave_shutdowns_completed;
>   process::metrics::Counter slave_shutdowns_canceled;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3783) Replace Master/Slave Terminology Phase I - Update documentation

2016-04-27 Thread Jay Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259771#comment-15259771
 ] 

Jay Guo commented on MESOS-3783:


As I start replacing 'slave' in docs, I found it really awkward to leave them 
in an inconsistent state since some docs are blocked by actual code changes.

Firstly, I don't think we should have both terms mixed in one single doc
Secondly, lot of docs refer to flags, metrics, endpoints, public interfaces. If 
we continue with the replacement, we end up with significant part of docs using 
'slave', and part of them using 'agent'.

Therefore, I would suggest to do this at last, or at least after the majority 
of code change is done.

[~vinodkone]

> Replace Master/Slave Terminology Phase I - Update documentation 
> 
>
> Key: MESOS-3783
> URL: https://issues.apache.org/jira/browse/MESOS-3783
> Project: Mesos
>  Issue Type: Task
>Reporter: Diana Arroyo
>Assignee: Jay Guo
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-5269) Replace Master/Slave Terminology Phase I - Update Metrics

2016-04-24 Thread Jay Guo (JIRA)
Jay Guo created MESOS-5269:
--

 Summary: Replace Master/Slave Terminology Phase I - Update Metrics
 Key: MESOS-5269
 URL: https://issues.apache.org/jira/browse/MESOS-5269
 Project: Mesos
  Issue Type: Task
Reporter: Jay Guo


  process::metrics::Gauge slaves_connected;
  process::metrics::Gauge slaves_disconnected;
  process::metrics::Gauge slaves_active;
  process::metrics::Gauge slaves_inactive;
  process::metrics::Counter messages_register_slave;
  process::metrics::Counter messages_reregister_slave;
  process::metrics::Counter messages_unregister_slave;
  process::metrics::Counter messages_update_slave;
  process::metrics::Counter recovery_slave_removals;
  process::metrics::Counter slave_registrations;
  process::metrics::Counter slave_reregistrations;
  process::metrics::Counter slave_removals;
  process::metrics::Counter slave_removals_reason_unhealthy;
  process::metrics::Counter slave_removals_reason_unregistered;
  process::metrics::Counter slave_removals_reason_registered;
  process::metrics::Counter slave_shutdowns_scheduled;
  process::metrics::Counter slave_shutdowns_completed;
  process::metrics::Counter slave_shutdowns_canceled;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-5270) Replace Master/Slave Terminology Phase I - Duplicate slave field in JSON responses.

2016-04-24 Thread Jay Guo (JIRA)
Jay Guo created MESOS-5270:
--

 Summary: Replace Master/Slave Terminology Phase I - Duplicate 
slave field in JSON responses.
 Key: MESOS-5270
 URL: https://issues.apache.org/jira/browse/MESOS-5270
 Project: Mesos
  Issue Type: Task
Reporter: Jay Guo






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3783) Replace Master/Slave Terminology Phase I - Update documentation

2016-04-24 Thread Jay Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15255805#comment-15255805
 ] 

Jay Guo commented on MESOS-3783:


This should be the very last one to do since it refers to endpoints, metrics, 
etc.

> Replace Master/Slave Terminology Phase I - Update documentation 
> 
>
> Key: MESOS-3783
> URL: https://issues.apache.org/jira/browse/MESOS-3783
> Project: Mesos
>  Issue Type: Task
>Reporter: Diana Arroyo
>Assignee: Jay Guo
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3783) Replace Master/Slave Terminology Phase I - Update documentation

2016-04-24 Thread Jay Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Guo updated MESOS-3783:
---
Assignee: Jay Guo

> Replace Master/Slave Terminology Phase I - Update documentation 
> 
>
> Key: MESOS-3783
> URL: https://issues.apache.org/jira/browse/MESOS-3783
> Project: Mesos
>  Issue Type: Task
>Reporter: Diana Arroyo
>Assignee: Jay Guo
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5269) Replace Master/Slave Terminology Phase I - Update Metrics

2016-04-25 Thread Jay Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15255944#comment-15255944
 ] 

Jay Guo commented on MESOS-5269:


If we take similar approach as multi-named flags, we could also have 
multi-named metrics, which take vector of strings during construction and treat 
them as aliases for the same metric.

e.g.
{code}
Gauge(const std::vector& names, const Deferred& 
f)
  : Metric(name, None()), data(new Data(f)) {}
{code}

> Replace Master/Slave Terminology Phase I - Update Metrics
> -
>
> Key: MESOS-5269
> URL: https://issues.apache.org/jira/browse/MESOS-5269
> Project: Mesos
>  Issue Type: Task
>Reporter: Jay Guo
>
>   process::metrics::Gauge slaves_connected;
>   process::metrics::Gauge slaves_disconnected;
>   process::metrics::Gauge slaves_active;
>   process::metrics::Gauge slaves_inactive;
>   process::metrics::Counter messages_register_slave;
>   process::metrics::Counter messages_reregister_slave;
>   process::metrics::Counter messages_unregister_slave;
>   process::metrics::Counter messages_update_slave;
>   process::metrics::Counter recovery_slave_removals;
>   process::metrics::Counter slave_registrations;
>   process::metrics::Counter slave_reregistrations;
>   process::metrics::Counter slave_removals;
>   process::metrics::Counter slave_removals_reason_unhealthy;
>   process::metrics::Counter slave_removals_reason_unregistered;
>   process::metrics::Counter slave_removals_reason_registered;
>   process::metrics::Counter slave_shutdowns_scheduled;
>   process::metrics::Counter slave_shutdowns_completed;
>   process::metrics::Counter slave_shutdowns_canceled;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5269) Replace Master/Slave Terminology Phase I - Update Metrics

2016-04-25 Thread Jay Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15256081#comment-15256081
 ] 

Jay Guo commented on MESOS-5269:


[~vinodkone]

> Replace Master/Slave Terminology Phase I - Update Metrics
> -
>
> Key: MESOS-5269
> URL: https://issues.apache.org/jira/browse/MESOS-5269
> Project: Mesos
>  Issue Type: Task
>Reporter: Jay Guo
>
>   process::metrics::Gauge slaves_connected;
>   process::metrics::Gauge slaves_disconnected;
>   process::metrics::Gauge slaves_active;
>   process::metrics::Gauge slaves_inactive;
>   process::metrics::Counter messages_register_slave;
>   process::metrics::Counter messages_reregister_slave;
>   process::metrics::Counter messages_unregister_slave;
>   process::metrics::Counter messages_update_slave;
>   process::metrics::Counter recovery_slave_removals;
>   process::metrics::Counter slave_registrations;
>   process::metrics::Counter slave_reregistrations;
>   process::metrics::Counter slave_removals;
>   process::metrics::Counter slave_removals_reason_unhealthy;
>   process::metrics::Counter slave_removals_reason_unregistered;
>   process::metrics::Counter slave_removals_reason_registered;
>   process::metrics::Counter slave_shutdowns_scheduled;
>   process::metrics::Counter slave_shutdowns_completed;
>   process::metrics::Counter slave_shutdowns_canceled;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5269) Replace Master/Slave Terminology Phase I - Update Metrics

2016-04-25 Thread Jay Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15256080#comment-15256080
 ] 

Jay Guo commented on MESOS-5269:


Since {{hashmap metrics}} expects a _string_ 
(_metric.name_) as the key to avoid duplicated metrics, we probably need 
something like this:
{code}
Gauge(const std::string& primaryName, const std::vector& aliases = 
None(), const Deferred& f)
  : Metric(primaryName, aliases, None()), data(new Data(f)) {}
{code}

Let me know what you think.

> Replace Master/Slave Terminology Phase I - Update Metrics
> -
>
> Key: MESOS-5269
> URL: https://issues.apache.org/jira/browse/MESOS-5269
> Project: Mesos
>  Issue Type: Task
>Reporter: Jay Guo
>
>   process::metrics::Gauge slaves_connected;
>   process::metrics::Gauge slaves_disconnected;
>   process::metrics::Gauge slaves_active;
>   process::metrics::Gauge slaves_inactive;
>   process::metrics::Counter messages_register_slave;
>   process::metrics::Counter messages_reregister_slave;
>   process::metrics::Counter messages_unregister_slave;
>   process::metrics::Counter messages_update_slave;
>   process::metrics::Counter recovery_slave_removals;
>   process::metrics::Counter slave_registrations;
>   process::metrics::Counter slave_reregistrations;
>   process::metrics::Counter slave_removals;
>   process::metrics::Counter slave_removals_reason_unhealthy;
>   process::metrics::Counter slave_removals_reason_unregistered;
>   process::metrics::Counter slave_removals_reason_registered;
>   process::metrics::Counter slave_shutdowns_scheduled;
>   process::metrics::Counter slave_shutdowns_completed;
>   process::metrics::Counter slave_shutdowns_canceled;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1806) Substituting etcd for Zookeeper

2016-04-26 Thread Jay Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15257657#comment-15257657
 ] 

Jay Guo commented on MESOS-1806:


We should also update Mesos documentation to reflect modulerization. 
http://mesos.apache.org/documentation/latest/high-availability/

Contender/Detector is still tightly coupled with Zookeeper according to the doc.

> Substituting etcd for Zookeeper
> ---
>
> Key: MESOS-1806
> URL: https://issues.apache.org/jira/browse/MESOS-1806
> Project: Mesos
>  Issue Type: Task
>  Components: leader election
>Reporter: Ed Ropple
>Assignee: Shuai Lin
>Priority: Minor
>
>eropple: Could you also file a new JIRA for Mesos to drop ZK 
> in favor of etcd or ReplicatedLog? Would love to get some momentum going on 
> that one.
> --
> Consider it filed. =)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-5406) Validate ACLs on creating an instance of local authorizer.

2016-05-18 Thread Jay Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Guo reassigned MESOS-5406:
--

Assignee: Jay Guo

> Validate ACLs on creating an instance of local authorizer.
> --
>
> Key: MESOS-5406
> URL: https://issues.apache.org/jira/browse/MESOS-5406
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Alexander Rukletsov
>Assignee: Jay Guo
>  Labels: mesosphere, security
>
> Some combinations of ACLs are not allowed, for example, specifying both 
> {{SetQuota}} and {{UpdateQuota}}. We should capture such issues and error out 
> early. 
> This ticket aims to add as many validations as possible to a dedicated 
> {{validate()}} routine, instead of having them implicitly in the codebase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5406) Validate ACLs on creating an instance of local authorizer.

2016-05-23 Thread Jay Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15295989#comment-15295989
 ] 

Jay Guo commented on MESOS-5406:


Just wanna make sure I understand it correctly, this story is to catch 
contradictory acls while creating authorizer, besides `SetQuota` and 
`UpdateQuota`. For example, following test case should pass (both NONE and ANY 
for the same principle):
{code}
// Should fail to create authorizer with acls that specifies
// both NONE and ANY for the same principle
TYPED_TEST(AuthorizationTest, ContradictoryACLs)
{
  ACLs acls;

  {
mesos::ACL::UpdateQuota* acl = acls.add_update_quotas();
acl->mutable_principals()->add_values("foo");
acl->mutable_roles()->set_type(mesos::ACL::Entity::ANY);
  }

  {
mesos::ACL::UpdateQuota* acl = acls.add_update_quotas();
acl->mutable_principals()->add_values("foo");
acl->mutable_roles()->set_type(mesos::ACL::Entity::NONE);
  }

  Try create = TypeParam::create(parameterize(acls));
  ASSERT_ERROR(create);
}
{code}

> Validate ACLs on creating an instance of local authorizer.
> --
>
> Key: MESOS-5406
> URL: https://issues.apache.org/jira/browse/MESOS-5406
> Project: Mesos
>  Issue Type: Improvement
>  Components: security
>Reporter: Alexander Rukletsov
>Assignee: Jay Guo
>  Labels: mesosphere, security
>
> Some combinations of ACLs are not allowed, for example, specifying both 
> {{SetQuota}} and {{UpdateQuota}}. We should capture such issues and error out 
> early. 
> This ticket aims to add as many validations as possible to a dedicated 
> {{validate()}} routine, instead of having them implicitly in the codebase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5406) Validate ACLs on creating an instance of local authorizer.

2016-05-24 Thread Jay Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15298130#comment-15298130
 ] 

Jay Guo commented on MESOS-5406:


Some more thoughts:
# Should we sort ACLs and apply some mechanism like longest-prefix-match in 
routing table? Instead of relying on the order they are specified by user
# Also should aggregate ACLs for given action? I saw TODO in codebase: 
TODO(vinod): Do aggregation of ACLs when possible.

> Validate ACLs on creating an instance of local authorizer.
> --
>
> Key: MESOS-5406
> URL: https://issues.apache.org/jira/browse/MESOS-5406
> Project: Mesos
>  Issue Type: Improvement
>  Components: security
>Reporter: Alexander Rukletsov
>Assignee: Jay Guo
>  Labels: mesosphere, security
>
> Some combinations of ACLs are not allowed, for example, specifying both 
> {{SetQuota}} and {{UpdateQuota}}. We should capture such issues and error out 
> early. 
> This ticket aims to add as many validations as possible to a dedicated 
> {{validate()}} routine, instead of having them implicitly in the codebase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3784) Replace Master/Slave Terminology Phase I - Update mesos-cli

2016-05-10 Thread Jay Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Guo updated MESOS-3784:
---
Shepherd: Vinod Kone

> Replace Master/Slave Terminology Phase I - Update mesos-cli 
> 
>
> Key: MESOS-3784
> URL: https://issues.apache.org/jira/browse/MESOS-3784
> Project: Mesos
>  Issue Type: Task
>Reporter: Diana Arroyo
>Assignee: Jay Guo
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3784) Replace Master/Slave Terminology Phase I - Update mesos-cli

2016-05-10 Thread Jay Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15279416#comment-15279416
 ] 

Jay Guo commented on MESOS-3784:


reviewable at: https://reviews.apache.org/r/47217/

> Replace Master/Slave Terminology Phase I - Update mesos-cli 
> 
>
> Key: MESOS-3784
> URL: https://issues.apache.org/jira/browse/MESOS-3784
> Project: Mesos
>  Issue Type: Task
>  Components: cli
>Reporter: Diana Arroyo
>Assignee: Jay Guo
>Priority: Trivial
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1806) Etcd-based master contender/detector module

2016-05-11 Thread Jay Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15281232#comment-15281232
 ] 

Jay Guo commented on MESOS-1806:


We create a repo to temporarily host this module. Your comments and reviews are 
highly appreciated.
https://github.com/guoger/mesos-etcd-module

> Etcd-based master contender/detector module
> ---
>
> Key: MESOS-1806
> URL: https://issues.apache.org/jira/browse/MESOS-1806
> Project: Mesos
>  Issue Type: Epic
>  Components: leader election
>Reporter: Ed Ropple
>Assignee: Shuai Lin
>Priority: Minor
>
>eropple: Could you also file a new JIRA for Mesos to drop ZK 
> in favor of etcd or ReplicatedLog? Would love to get some momentum going on 
> that one.
> --
> Consider it filed. =)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5366) Update documentation to include contender/detector module

2016-05-12 Thread Jay Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15281267#comment-15281267
 ] 

Jay Guo commented on MESOS-5366:


Reviewable at: https://reviews.apache.org/r/47292/

> Update documentation to include contender/detector module
> -
>
> Key: MESOS-5366
> URL: https://issues.apache.org/jira/browse/MESOS-5366
> Project: Mesos
>  Issue Type: Documentation
>Reporter: Jay Guo
>Assignee: Jay Guo
>Priority: Minor
>
> Since contender and detector are modulerized, the documentation should be 
> updated to reflect this change as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4434) Install 3rdparty package boost, glog, protobuf and picojson when installing Mesos

2016-05-11 Thread Jay Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15281227#comment-15281227
 ] 

Jay Guo commented on MESOS-4434:


This definitely ease the compilation of modules. It would be good to have flag 
`--enable-install-module-dependencies` reflected in documentation as well

> Install 3rdparty package boost, glog, protobuf and picojson when installing 
> Mesos
> -
>
> Key: MESOS-4434
> URL: https://issues.apache.org/jira/browse/MESOS-4434
> Project: Mesos
>  Issue Type: Bug
>  Components: build, modules
>Reporter: Kapil Arya
>Assignee: Kapil Arya
>  Labels: mesosphere
> Fix For: 0.29.0
>
>
> Mesos modules depend on having these packages installed with the exact 
> version as Mesos was compiled with.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5366) Update documentation to include contender/detector module

2016-05-12 Thread Jay Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Guo updated MESOS-5366:
---
Shepherd: Kapil Arya

> Update documentation to include contender/detector module
> -
>
> Key: MESOS-5366
> URL: https://issues.apache.org/jira/browse/MESOS-5366
> Project: Mesos
>  Issue Type: Documentation
>Reporter: Jay Guo
>Assignee: Jay Guo
>Priority: Minor
>
> Since contender and detector are modulerized, the documentation should be 
> updated to reflect this change as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-5366) Update documentation to include contender/detector module

2016-05-11 Thread Jay Guo (JIRA)
Jay Guo created MESOS-5366:
--

 Summary: Update documentation to include contender/detector module
 Key: MESOS-5366
 URL: https://issues.apache.org/jira/browse/MESOS-5366
 Project: Mesos
  Issue Type: Documentation
Reporter: Jay Guo
Assignee: Jay Guo
Priority: Minor


Since contender and detector are modulerized, the documentation should be 
updated to reflect this change as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-5407) Slave/Agent rename: diagrams in docs

2016-05-17 Thread Jay Guo (JIRA)
Jay Guo created MESOS-5407:
--

 Summary: Slave/Agent rename: diagrams in docs
 Key: MESOS-5407
 URL: https://issues.apache.org/jira/browse/MESOS-5407
 Project: Mesos
  Issue Type: Bug
  Components: documentation
Reporter: Jay Guo
Priority: Minor


Rename 'slave' in diagrams



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5407) Slave/Agent rename: diagrams in docs

2016-05-17 Thread Jay Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15287988#comment-15287988
 ] 

Jay Guo commented on MESOS-5407:


[~vinodkone] who should we talk to in order to get the original file of these 
images?

> Slave/Agent rename: diagrams in docs
> 
>
> Key: MESOS-5407
> URL: https://issues.apache.org/jira/browse/MESOS-5407
> Project: Mesos
>  Issue Type: Bug
>  Components: documentation
>Reporter: Jay Guo
>Priority: Minor
>
> Rename 'slave' in diagrams



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3781) Replace Master/Slave Terminology Phase I - Rename flag names and deprecate old ones

2016-05-17 Thread Jay Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Guo updated MESOS-3781:
---
Summary: Replace Master/Slave Terminology Phase I - Rename flag names and 
deprecate old ones  (was: Replace Master/Slave Terminology Phase I - Add 
duplicate agent flags )

> Replace Master/Slave Terminology Phase I - Rename flag names and deprecate 
> old ones
> ---
>
> Key: MESOS-3781
> URL: https://issues.apache.org/jira/browse/MESOS-3781
> Project: Mesos
>  Issue Type: Task
>Reporter: Diana Arroyo
>Assignee: Jay Guo
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3781) Replace Master/Slave Terminology Phase I - Rename flag names and deprecate old ones

2016-05-17 Thread Jay Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15287995#comment-15287995
 ] 

Jay Guo commented on MESOS-3781:


here we go: https://reviews.apache.org/r/47507/

> Replace Master/Slave Terminology Phase I - Rename flag names and deprecate 
> old ones
> ---
>
> Key: MESOS-3781
> URL: https://issues.apache.org/jira/browse/MESOS-3781
> Project: Mesos
>  Issue Type: Task
>Reporter: Diana Arroyo
>Assignee: Jay Guo
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (MESOS-3781) Replace Master/Slave Terminology Phase I - Rename flag names and deprecate old ones

2016-05-17 Thread Jay Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Guo updated MESOS-3781:
---
Comment: was deleted

(was: here we go: https://reviews.apache.org/r/47507/)

> Replace Master/Slave Terminology Phase I - Rename flag names and deprecate 
> old ones
> ---
>
> Key: MESOS-3781
> URL: https://issues.apache.org/jira/browse/MESOS-3781
> Project: Mesos
>  Issue Type: Task
>Reporter: Diana Arroyo
>Assignee: Jay Guo
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3781) Replace Master/Slave Terminology Phase I - Rename flag names and deprecate old ones

2016-05-17 Thread Jay Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15287996#comment-15287996
 ] 

Jay Guo commented on MESOS-3781:


here we go: https://reviews.apache.org/r/47507/

> Replace Master/Slave Terminology Phase I - Rename flag names and deprecate 
> old ones
> ---
>
> Key: MESOS-3781
> URL: https://issues.apache.org/jira/browse/MESOS-3781
> Project: Mesos
>  Issue Type: Task
>Reporter: Diana Arroyo
>Assignee: Jay Guo
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-5106) Improve test_http_framework so it can load master detector from modules

2016-05-11 Thread Jay Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Guo reassigned MESOS-5106:
--

Assignee: zhou xing

> Improve test_http_framework so it can load master detector from modules
> ---
>
> Key: MESOS-5106
> URL: https://issues.apache.org/jira/browse/MESOS-5106
> Project: Mesos
>  Issue Type: Task
>Reporter: Shuai Lin
>Assignee: zhou xing
>
> I'm planning to restart the work of [MESOS-1806] (etcd contender/detector) 
> based on [MESOS-4610]. One thing I need to address first is when writing a 
> script test,  I need a framework that can use a master detector loaded from a 
> module. The best way to do this seems to be adding {{\-\-modules}} and 
> {{\-\-master_detector}} flags to {{test_http_framework.cpp}} so we can reuse 
> it in tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3783) Replace Master/Slave Terminology Phase I - Update documentation

2016-05-04 Thread Jay Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15270212#comment-15270212
 ] 

Jay Guo commented on MESOS-3783:


Note: replace text in diagrams

> Replace Master/Slave Terminology Phase I - Update documentation 
> 
>
> Key: MESOS-3783
> URL: https://issues.apache.org/jira/browse/MESOS-3783
> Project: Mesos
>  Issue Type: Task
>Reporter: Diana Arroyo
>Assignee: Jay Guo
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4689) Design doc for v1 Operator API

2016-04-20 Thread Jay Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15249375#comment-15249375
 ] 

Jay Guo commented on MESOS-4689:


Hi, is there a link to the doc? Thx!

> Design doc for v1 Operator API
> --
>
> Key: MESOS-4689
> URL: https://issues.apache.org/jira/browse/MESOS-4689
> Project: Mesos
>  Issue Type: Documentation
>Reporter: Vinod Kone
>Assignee: Kevin Klues
>
> We need to design how the v1 operator API (all the HTTP endpoints exposed by 
> master/agent that are not for scheduler/executor interactions) looks and 
> works.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3781) Replace Master/Slave Terminology Phase I - Add duplicate agent flags

2016-04-19 Thread Jay Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15249314#comment-15249314
 ] 

Jay Guo commented on MESOS-3781:


OK, let's revisit deprecation implementation when it's done. Thx!

> Replace Master/Slave Terminology Phase I - Add duplicate agent flags 
> -
>
> Key: MESOS-3781
> URL: https://issues.apache.org/jira/browse/MESOS-3781
> Project: Mesos
>  Issue Type: Task
>Reporter: Diana Arroyo
>Assignee: Jay Guo
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-3781) Replace Master/Slave Terminology Phase I - Add duplicate agent flags

2016-04-18 Thread Jay Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15245223#comment-15245223
 ] 

Jay Guo edited comment on MESOS-3781 at 4/18/16 7:42 AM:
-

Here's what I understand from your comments:
1. We should enable multi-named flags in FlagsBase
2. While loading flag values from cmd/env in FlagsBase::load(), it generates 
warnings by determining actual name being used. (Add check logic to *Flag.load* 
lambda? It takes *DeprecatedNames* in capture, as well as the *name* used to 
actual load the value, and generate warnings if *name* falls into 
DeprecatedNames) Something like this:
{code}
flag.load = [t1, deprecatedNames](FlagsBase*, const std::string& name, const 
std::string& value) -> Try {
  ...
  if (deprecatedNames.find(name)) { deprecationWarning(name); }
  ...
};
{code}
3. Add duplicate names to all applicable flags

My concerns:
1. Why both _Name_ and _deprecatedName_ structs? Since we only need to know 
whether it is deprecated. Also, I don't see any instance that has *multiple* 
deprecated names, so why vector of structs?
2. If the sole purpose of having this vector of structs is to search for 
deprecated names, I suggest to use _set_ instead.
3. Are we overengineering this? 'slave' flags will eventually be removed, along 
with deprecatedNames. Nevertheless, I like the idea of having multi-name flags.
4. If we are renaming original flag names, we ought to rename them in the 
codebase where they are being used. Is it within the scope of this ticket?

Thanks!


was (Author: guoger):
Here's what I understand from your comments:
1. We should enable multi-named flags in FlagsBase
2. While loading flag values from cmd/env in FlagsBase::load(), it generates 
warnings by determining actual name being used. (Add check logic to *Flag.load* 
lambda? It takes *DeprecatedNames* in capture, as well as the *name* used to 
actual load the value, and generate warnings if *name* falls into 
DeprecatedNames) Something like this:
{code}
flag.load = [t1, deprecatedNames](FlagsBase*, const std::string& name, const 
std::string& value) -> Try {
  ...
  if (deprecatedNames.find(name)) { deprecationWarning(name); }
  ...
};
{code}
3. Add duplicate names to all applicable flags

My concerns:
1. Why both _Name_ and _deprecatedName_ structs? Since we only need to know 
whether it is deprecated. Also, I don't see any instance that has *multiple* 
deprecated names, so why vector of structs?
2. If the sole purpose of having this vector of structs is to search for 
deprecated names, I suggest to use _set_ instead.
3. Are we overengineering this? 'slave' flags will eventually be removed, along 
with deprecatedNames. Nevertheless, I like the idea of having multi-name flags.

Thanks!

> Replace Master/Slave Terminology Phase I - Add duplicate agent flags 
> -
>
> Key: MESOS-3781
> URL: https://issues.apache.org/jira/browse/MESOS-3781
> Project: Mesos
>  Issue Type: Task
>Reporter: Diana Arroyo
>Assignee: Jay Guo
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3781) Replace Master/Slave Terminology Phase I - Add duplicate agent flags

2016-04-18 Thread Jay Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15245223#comment-15245223
 ] 

Jay Guo commented on MESOS-3781:


Here's what I understand from your comments:
1. We should enable multi-named flags in FlagsBase
2. While loading flag values from cmd/env in FlagsBase::load(), it generates 
warnings by determining actual name being used. (Add check logic to 
__Flag.load__ lambda? It takes __DeprecatedName__ struct in capture, as well as 
the _name_ used to actual load the value, and generate warnings if _name_ falls 
into DeprecatedName) Something like this:
```cpp
flag.load = [t1, deprecatedNames](FlagsBase*, const std::string& name, const 
std::string& value) -> Try {
  ...
  if (deprecatedNames.find(name)) { deprecationWarning(name); }
  ...
};
```
3. Add duplicate names to all applicable flags

My concerns:
1. Why both _Name_ and _deprecatedName_ structs? Since we only need to know 
whether it is deprecated. Also, I don't see any instance that has __multiple__ 
deprecated names, so why vector of structs?
2. If the sole purpose of having this vector of structs is to search for 
deprecated names, I suggest to use _set_ instead.
3. Are we overengineering this? 'slave' flags will eventually be removed, along 
with deprecatedNames. Nevertheless, I like the idea of having multi-name flags.

Thanks!

> Replace Master/Slave Terminology Phase I - Add duplicate agent flags 
> -
>
> Key: MESOS-3781
> URL: https://issues.apache.org/jira/browse/MESOS-3781
> Project: Mesos
>  Issue Type: Task
>Reporter: Diana Arroyo
>Assignee: Jay Guo
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-3781) Replace Master/Slave Terminology Phase I - Add duplicate agent flags

2016-04-18 Thread Jay Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15245223#comment-15245223
 ] 

Jay Guo edited comment on MESOS-3781 at 4/18/16 7:12 AM:
-

Here's what I understand from your comments:
1. We should enable multi-named flags in FlagsBase
2. While loading flag values from cmd/env in FlagsBase::load(), it generates 
warnings by determining actual name being used. (Add check logic to *Flag.load* 
lambda? It takes *DeprecatedNames* in capture, as well as the *name* used to 
actual load the value, and generate warnings if *name* falls into 
DeprecatedNames) Something like this:
{code}
flag.load = [t1, deprecatedNames](FlagsBase*, const std::string& name, const 
std::string& value) -> Try {
  ...
  if (deprecatedNames.find(name)) { deprecationWarning(name); }
  ...
};
{code}
3. Add duplicate names to all applicable flags

My concerns:
1. Why both _Name_ and _deprecatedName_ structs? Since we only need to know 
whether it is deprecated. Also, I don't see any instance that has *multiple* 
deprecated names, so why vector of structs?
2. If the sole purpose of having this vector of structs is to search for 
deprecated names, I suggest to use _set_ instead.
3. Are we overengineering this? 'slave' flags will eventually be removed, along 
with deprecatedNames. Nevertheless, I like the idea of having multi-name flags.

Thanks!


was (Author: guoger):
Here's what I understand from your comments:
1. We should enable multi-named flags in FlagsBase
2. While loading flag values from cmd/env in FlagsBase::load(), it generates 
warnings by determining actual name being used. (Add check logic to 
__Flag.load__ lambda? It takes __DeprecatedName__ struct in capture, as well as 
the _name_ used to actual load the value, and generate warnings if _name_ falls 
into DeprecatedName) Something like this:
{code}
flag.load = [t1, deprecatedNames](FlagsBase*, const std::string& name, const 
std::string& value) -> Try {
  ...
  if (deprecatedNames.find(name)) { deprecationWarning(name); }
  ...
};
{code}
3. Add duplicate names to all applicable flags

My concerns:
1. Why both _Name_ and _deprecatedName_ structs? Since we only need to know 
whether it is deprecated. Also, I don't see any instance that has __multiple__ 
deprecated names, so why vector of structs?
2. If the sole purpose of having this vector of structs is to search for 
deprecated names, I suggest to use _set_ instead.
3. Are we overengineering this? 'slave' flags will eventually be removed, along 
with deprecatedNames. Nevertheless, I like the idea of having multi-name flags.

Thanks!

> Replace Master/Slave Terminology Phase I - Add duplicate agent flags 
> -
>
> Key: MESOS-3781
> URL: https://issues.apache.org/jira/browse/MESOS-3781
> Project: Mesos
>  Issue Type: Task
>Reporter: Diana Arroyo
>Assignee: Jay Guo
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-3781) Replace Master/Slave Terminology Phase I - Add duplicate agent flags

2016-04-18 Thread Jay Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15245223#comment-15245223
 ] 

Jay Guo edited comment on MESOS-3781 at 4/18/16 7:10 AM:
-

Here's what I understand from your comments:
1. We should enable multi-named flags in FlagsBase
2. While loading flag values from cmd/env in FlagsBase::load(), it generates 
warnings by determining actual name being used. (Add check logic to 
__Flag.load__ lambda? It takes __DeprecatedName__ struct in capture, as well as 
the _name_ used to actual load the value, and generate warnings if _name_ falls 
into DeprecatedName) Something like this:
{code}
flag.load = [t1, deprecatedNames](FlagsBase*, const std::string& name, const 
std::string& value) -> Try {
  ...
  if (deprecatedNames.find(name)) { deprecationWarning(name); }
  ...
};
{code}
3. Add duplicate names to all applicable flags

My concerns:
1. Why both _Name_ and _deprecatedName_ structs? Since we only need to know 
whether it is deprecated. Also, I don't see any instance that has __multiple__ 
deprecated names, so why vector of structs?
2. If the sole purpose of having this vector of structs is to search for 
deprecated names, I suggest to use _set_ instead.
3. Are we overengineering this? 'slave' flags will eventually be removed, along 
with deprecatedNames. Nevertheless, I like the idea of having multi-name flags.

Thanks!


was (Author: guoger):
Here's what I understand from your comments:
1. We should enable multi-named flags in FlagsBase
2. While loading flag values from cmd/env in FlagsBase::load(), it generates 
warnings by determining actual name being used. (Add check logic to 
__Flag.load__ lambda? It takes __DeprecatedName__ struct in capture, as well as 
the _name_ used to actual load the value, and generate warnings if _name_ falls 
into DeprecatedName) Something like this:
```cpp
flag.load = [t1, deprecatedNames](FlagsBase*, const std::string& name, const 
std::string& value) -> Try {
  ...
  if (deprecatedNames.find(name)) { deprecationWarning(name); }
  ...
};
```
3. Add duplicate names to all applicable flags

My concerns:
1. Why both _Name_ and _deprecatedName_ structs? Since we only need to know 
whether it is deprecated. Also, I don't see any instance that has __multiple__ 
deprecated names, so why vector of structs?
2. If the sole purpose of having this vector of structs is to search for 
deprecated names, I suggest to use _set_ instead.
3. Are we overengineering this? 'slave' flags will eventually be removed, along 
with deprecatedNames. Nevertheless, I like the idea of having multi-name flags.

Thanks!

> Replace Master/Slave Terminology Phase I - Add duplicate agent flags 
> -
>
> Key: MESOS-3781
> URL: https://issues.apache.org/jira/browse/MESOS-3781
> Project: Mesos
>  Issue Type: Task
>Reporter: Diana Arroyo
>Assignee: Jay Guo
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5186) mesos.interface: Allow using protobuf 3.x

2016-07-30 Thread Jay Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15400933#comment-15400933
 ] 

Jay Guo commented on MESOS-5186:


Good to know! Unfortunately we are 'existing users' of proto2 and 'not 
recommended' to migrate to proto3. It's a bit hard to do that IMO.

> mesos.interface: Allow using protobuf 3.x
> -
>
> Key: MESOS-5186
> URL: https://issues.apache.org/jira/browse/MESOS-5186
> Project: Mesos
>  Issue Type: Improvement
>  Components: python api
>Reporter: Myautsai PAN
>Assignee: Yong Tang
>Priority: Minor
>  Labels: easyfix
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> We're working on integrating TensorFlow(https://www.tensorflow.org) with 
> mesos. Both the two require {{protobuf}}. The python package 
> {{mesos.interface}} requires {{protobuf>=2.6.1,<3}}, but {{tensorflow}} 
> requires {{protobuf>=3.0.0}} . Though protobuf 3.x is not compatible with 
> protobuf 2.x, but anyway we modify the {{setup.py}} 
> (https://github.com/apache/mesos/blob/66cddaf/src/python/interface/setup.py.in#L29)
> from {{'install_requires': [ 'google-common>=0.0.1', 'protobuf>=2.6.1,<3' 
> ],}} to {{'install_requires': [ 'google-common>=0.0.1', 'protobuf>=2.6.1' ],}}
> It works fine. Would you please consider support protobuf 3.x officially in 
> the next release? Maybe just remove the {{,<3}} restriction is enough.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-5186) mesos.interface: Allow using protobuf 3.x

2016-07-30 Thread Jay Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15400933#comment-15400933
 ] 

Jay Guo edited comment on MESOS-5186 at 7/31/16 3:11 AM:
-

Good to know! Unfortunately we are 'existing users' of proto2 and 'not 
recommended' to migrate to proto3. It's a bit hard to do that IMO, but I guess 
we could raise that in Mesos 2.0?


was (Author: guoger):
Good to know! Unfortunately we are 'existing users' of proto2 and 'not 
recommended' to migrate to proto3. It's a bit hard to do that IMO.

> mesos.interface: Allow using protobuf 3.x
> -
>
> Key: MESOS-5186
> URL: https://issues.apache.org/jira/browse/MESOS-5186
> Project: Mesos
>  Issue Type: Improvement
>  Components: python api
>Reporter: Myautsai PAN
>Assignee: Yong Tang
>Priority: Minor
>  Labels: easyfix
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> We're working on integrating TensorFlow(https://www.tensorflow.org) with 
> mesos. Both the two require {{protobuf}}. The python package 
> {{mesos.interface}} requires {{protobuf>=2.6.1,<3}}, but {{tensorflow}} 
> requires {{protobuf>=3.0.0}} . Though protobuf 3.x is not compatible with 
> protobuf 2.x, but anyway we modify the {{setup.py}} 
> (https://github.com/apache/mesos/blob/66cddaf/src/python/interface/setup.py.in#L29)
> from {{'install_requires': [ 'google-common>=0.0.1', 'protobuf>=2.6.1,<3' 
> ],}} to {{'install_requires': [ 'google-common>=0.0.1', 'protobuf>=2.6.1' ],}}
> It works fine. Would you please consider support protobuf 3.x officially in 
> the next release? Maybe just remove the {{,<3}} restriction is enough.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-5829) Mesos should be able to consume module for replicated_log

2016-07-10 Thread Jay Guo (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Guo updated MESOS-5829:
---
External issue ID:   (was: https://issues.apache.org/jira/browse/MESOS-5828)

> Mesos should be able to consume module for replicated_log
> -
>
> Key: MESOS-5829
> URL: https://issues.apache.org/jira/browse/MESOS-5829
> Project: Mesos
>  Issue Type: Bug
>  Components: modules, replicated log
>Reporter: Jay Guo
>Assignee: Jay Guo
>
> Currently {{--quorum}} is hardcoded to 1 if no *zk* provided, assuming 
> standalone mode, however this is not the true when using master contender and 
> detector modules.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-5828) Modularize Network in replicated_log

2016-07-10 Thread Jay Guo (JIRA)
Jay Guo created MESOS-5828:
--

 Summary: Modularize Network in replicated_log
 Key: MESOS-5828
 URL: https://issues.apache.org/jira/browse/MESOS-5828
 Project: Mesos
  Issue Type: Bug
  Components: replicated log
Reporter: Jay Guo
Assignee: Jay Guo


Currently replicated_log relies on Zookeeper for coordinator election. This is 
done through network abstraction _ZookeeperNetwork_. We need to modularize this 
part in order to enable replicated_log when using Master contender/detector 
modules.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-5829) Mesos should be able to consume module for replicated_log

2016-07-10 Thread Jay Guo (JIRA)
Jay Guo created MESOS-5829:
--

 Summary: Mesos should be able to consume module for replicated_log
 Key: MESOS-5829
 URL: https://issues.apache.org/jira/browse/MESOS-5829
 Project: Mesos
  Issue Type: Bug
  Components: modules, replicated log
Reporter: Jay Guo
Assignee: Jay Guo


Currently {{--quorum}} is hardcoded to 1 if no *zk* provided, assuming 
standalone mode, however this is not the true when using master contender and 
detector modules.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3505) Support specifying Docker image by Image ID.

2016-07-08 Thread Jay Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15367475#comment-15367475
 ] 

Jay Guo commented on MESOS-3505:


ping [~gyliu] [~jieyu] [~xujyan]

> Support specifying Docker image by Image ID.
> 
>
> Key: MESOS-3505
> URL: https://issues.apache.org/jira/browse/MESOS-3505
> Project: Mesos
>  Issue Type: Story
>Reporter: Yan Xu
>Assignee: Jay Guo
>  Labels: mesosphere
>
> A common way to specify a Docker image with the docker engine is through 
> {{repo:tag}}, which is convenient and sufficient for most people in most 
> scenarios. However this combination is neither precise nor immutable.
> For this reason, it's possible when an image with a {{repo:tag}} already 
> cached locally on an agent host and a task requiring this {{repo:tag}} 
> arrives, it's using an image that's different than the one the user intended.
> Docker CLI already supports referring to an image by {{repo@id}}, where the 
> ID can have two forms:
> * v1 Image ID
> * digest
> Native Mesos provisioner should support the same for Docker images. IMO it's 
> fine if image discovery by ID is not supported (and thus still requiring 
> {{repo:tag}} to be specified) (looks like [v2 
> registry|http://docs.docker.com/registry/spec/api/] does support it) but the 
> user can optionally specify an image ID and match it against the cached / 
> newly pulled image. If the ID doesn't match the cached image, the store can 
> re-pull it; if the ID doesn't match the newly pulled image (manifest), the 
> provisioner can fail the request without having the user unknowingly running 
> its task on the wrong image.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5828) Modularize Network in replicated_log

2016-08-05 Thread Jay Guo (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15409176#comment-15409176
 ] 

Jay Guo commented on MESOS-5828:


Updated patch chain summary:

||Reviews||Summary||
|https://reviews.apache.org/r/50837|Fixed minor code style.|
|https://reviews.apache.org/r/50491|Added PIDGroup to libprocess.|
|https://reviews.apache.org/r/50492|Switched replicated log to use PIDGroup.|
|https://reviews.apache.org/r/50490|Separated ZooKeeper PIDGroup implementation 
into its own cpp/hpp.|
|https://reviews.apache.org/r/50493|Added `base` to PIDGroup.|
|https://reviews.apache.org/r/50494|Remove `base` from ZooKeeperPIDGroup.|
|https://reviews.apache.org/r/50495|Added PIDGroup module struct.|
|https://reviews.apache.org/r/50496|Added static `createPIDGroup` method to 
LogProcess.|
|https://reviews.apache.org/r/50497|Added new constructors in Log and 
LogProcess.|
|https://reviews.apache.org/r/50498|Added --pid_group flag in master.|
|https://reviews.apache.org/r/50499|Added logic in master/main.cpp to use 
pid_group module.|
|https://reviews.apache.org/r/50838|Updated modules documentation to reflect 
PIDGroup module.|

> Modularize Network in replicated_log
> 
>
> Key: MESOS-5828
> URL: https://issues.apache.org/jira/browse/MESOS-5828
> Project: Mesos
>  Issue Type: Bug
>  Components: replicated log
>Reporter: Jay Guo
>Assignee: Jay Guo
>
> Currently replicated_log relies on Zookeeper for coordinator election. This 
> is done through network abstraction _ZookeeperNetwork_. We need to modularize 
> this part in order to enable replicated_log when using Master 
> contender/detector modules.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-5960) Design doc for supporting seccomp in Mesos container

2016-08-02 Thread Jay Guo (JIRA)
Jay Guo created MESOS-5960:
--

 Summary: Design doc for supporting seccomp in Mesos container
 Key: MESOS-5960
 URL: https://issues.apache.org/jira/browse/MESOS-5960
 Project: Mesos
  Issue Type: Bug
Reporter: Jay Guo






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >