[jira] [Created] (MESOS-9889) Master CPU high due to unexpected foreachkey behaviour in Master::__reregisterSlave

2019-07-12 Thread haosdent (JIRA)
haosdent created MESOS-9889:
---

 Summary: Master CPU high due to unexpected foreachkey behaviour in 
Master::__reregisterSlave
 Key: MESOS-9889
 URL: https://issues.apache.org/jira/browse/MESOS-9889
 Project: Mesos
  Issue Type: Bug
Reporter: haosdent


At 
https://github.com/apache/mesos/blob/9932550e9632e7fbb9a45b217793c7f508f57001/src/master/master.cpp#L7707-L7708

{code}
void Master::__reregisterSlave(
...
foreachkey (FrameworkID frameworkId,
   slaves.unreachableTasks.at(slaveInfo.id())) {
...
foreach (TaskID taskId,
 slaves.unreachableTasks.at(slaveInfo.id()).get(frameworkId)) {
{code}

Our case is when network flapping, 3~4 agents reregister, then master would CPU 
full and could not process any requests during that period.

After change 
{code}
-foreachkey (FrameworkID frameworkId,
-   slaves.unreachableTasks.at(slaveInfo.id())) {
+foreach (FrameworkID frameworkId,
+   slaves.unreachableTasks.at(slaveInfo.id()).keys()) {
{code}

The problem gone.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (MESOS-5037) foreachkey behaviour is not expected in multimap

2019-07-12 Thread haosdent (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-5037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16884055#comment-16884055
 ] 

haosdent commented on MESOS-5037:
-

[~bmahler] Create at https://issues.apache.org/jira/browse/MESOS-9889

> foreachkey behaviour is not expected in multimap
> 
>
> Key: MESOS-5037
> URL: https://issues.apache.org/jira/browse/MESOS-5037
> Project: Mesos
>  Issue Type: Bug
>  Components: stout
>Reporter: haosdent
>Priority: Major
>  Labels: foundations, stout
>
> Currently the {{foreachkey}} implementation is 
> {code}
> #define foreachkey(VAR, COL)\
>   foreachpair (VAR, __foreach__::ignore, COL)
> {code}
> This works in most structures. But in multimap, one key may map to multi 
> values. This means there are multi pairs which have same key. So when call 
> {{foreachkey}}, the {{key}} would duplicated when iteration. My idea to solve 
> this is we prefer call {{foreach}} on {{(COL).keys()}} if {{keys()}} method 
> exists in {{COL}}.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (MESOS-5037) foreachkey behaviour is not expected in multimap

2019-07-12 Thread haosdent (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-5037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16884051#comment-16884051
 ] 

haosdent commented on MESOS-5037:
-

[~bmahler] No problem.

> foreachkey behaviour is not expected in multimap
> 
>
> Key: MESOS-5037
> URL: https://issues.apache.org/jira/browse/MESOS-5037
> Project: Mesos
>  Issue Type: Bug
>  Components: stout
>Reporter: haosdent
>Priority: Major
>  Labels: foundations, stout
>
> Currently the {{foreachkey}} implementation is 
> {code}
> #define foreachkey(VAR, COL)\
>   foreachpair (VAR, __foreach__::ignore, COL)
> {code}
> This works in most structures. But in multimap, one key may map to multi 
> values. This means there are multi pairs which have same key. So when call 
> {{foreachkey}}, the {{key}} would duplicated when iteration. My idea to solve 
> this is we prefer call {{foreach}} on {{(COL).keys()}} if {{keys()}} method 
> exists in {{COL}}.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (MESOS-5037) foreachkey behaviour is not expected in multimap

2019-07-12 Thread haosdent (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-5037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16884045#comment-16884045
 ] 

haosdent commented on MESOS-5037:
-

After change 
{code}
-foreachkey (FrameworkID frameworkId,
-   slaves.unreachableTasks.at(slaveInfo.id())) {
+foreach (FrameworkID frameworkId,
+   slaves.unreachableTasks.at(slaveInfo.id()).keys()) {
{code}

The problem gone.

> foreachkey behaviour is not expected in multimap
> 
>
> Key: MESOS-5037
> URL: https://issues.apache.org/jira/browse/MESOS-5037
> Project: Mesos
>  Issue Type: Bug
>  Components: stout
>Reporter: haosdent
>Priority: Major
>  Labels: foundations, stout
>
> Currently the {{foreachkey}} implementation is 
> {code}
> #define foreachkey(VAR, COL)\
>   foreachpair (VAR, __foreach__::ignore, COL)
> {code}
> This works in most structures. But in multimap, one key may map to multi 
> values. This means there are multi pairs which have same key. So when call 
> {{foreachkey}}, the {{key}} would duplicated when iteration. My idea to solve 
> this is we prefer call {{foreach}} on {{(COL).keys()}} if {{keys()}} method 
> exists in {{COL}}.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (MESOS-5037) foreachkey behaviour is not expected in multimap

2019-07-12 Thread haosdent (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-5037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16884044#comment-16884044
 ] 

haosdent commented on MESOS-5037:
-

[~bmahler] Sure, it is 
https://github.com/apache/mesos/blob/master/src/master/master.cpp#L7707-L7708

{code}
void Master::__reregisterSlave(
...
foreachkey (FrameworkID frameworkId,
   slaves.unreachableTasks.at(slaveInfo.id())) {
...
foreach (TaskID taskId,
 slaves.unreachableTasks.at(slaveInfo.id()).get(frameworkId)) {
{code}

Our case is when network flapping, 3~4 agents reregister, then master would CPU 
full and could not process any requests during that period.

> foreachkey behaviour is not expected in multimap
> 
>
> Key: MESOS-5037
> URL: https://issues.apache.org/jira/browse/MESOS-5037
> Project: Mesos
>  Issue Type: Bug
>  Components: stout
>Reporter: haosdent
>Priority: Major
>  Labels: foundations, stout
>
> Currently the {{foreachkey}} implementation is 
> {code}
> #define foreachkey(VAR, COL)\
>   foreachpair (VAR, __foreach__::ignore, COL)
> {code}
> This works in most structures. But in multimap, one key may map to multi 
> values. This means there are multi pairs which have same key. So when call 
> {{foreachkey}}, the {{key}} would duplicated when iteration. My idea to solve 
> this is we prefer call {{foreach}} on {{(COL).keys()}} if {{keys()}} method 
> exists in {{COL}}.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (MESOS-5037) foreachkey behaviour is not expected in multimap

2019-07-12 Thread haosdent (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-5037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16883939#comment-16883939
 ] 

haosdent commented on MESOS-5037:
-

[~bmahler] Sry for the delay.  The context is we have an agent which has 1k 
tasks, and when the agent reregister, it would trigger 1000,000 rounds on this 
loop. I have a flamegraph last week, but it is cleared by incident.  

> foreachkey behaviour is not expected in multimap
> 
>
> Key: MESOS-5037
> URL: https://issues.apache.org/jira/browse/MESOS-5037
> Project: Mesos
>  Issue Type: Bug
>  Components: stout
>Reporter: haosdent
>Priority: Major
>  Labels: foundations, stout
>
> Currently the {{foreachkey}} implementation is 
> {code}
> #define foreachkey(VAR, COL)\
>   foreachpair (VAR, __foreach__::ignore, COL)
> {code}
> This works in most structures. But in multimap, one key may map to multi 
> values. This means there are multi pairs which have same key. So when call 
> {{foreachkey}}, the {{key}} would duplicated when iteration. My idea to solve 
> this is we prefer call {{foreach}} on {{(COL).keys()}} if {{keys()}} method 
> exists in {{COL}}.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (MESOS-5037) foreachkey behaviour is not expected in multimap

2019-07-07 Thread haosdent (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-5037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16879902#comment-16879902
 ] 

haosdent commented on MESOS-5037:
-

Recently we found a high CPU load case when mesos agent reregister. It is 
caused by this bug as well.
{code}
-foreachkey (FrameworkID frameworkId,
-   slaves.unreachableTasks.at(slaveInfo.id())) {
+foreach (FrameworkID frameworkId,
+   slaves.unreachableTasks.at(slaveInfo.id()).keys()) {
{code}

> foreachkey behaviour is not expected in multimap
> 
>
> Key: MESOS-5037
> URL: https://issues.apache.org/jira/browse/MESOS-5037
> Project: Mesos
>  Issue Type: Bug
>  Components: stout
>Reporter: haosdent
>Priority: Major
>  Labels: stout
>
> Currently the {{foreachkey}} implementation is 
> {code}
> #define foreachkey(VAR, COL)\
>   foreachpair (VAR, __foreach__::ignore, COL)
> {code}
> This works in most structures. But in multimap, one key may map to multi 
> values. This means there are multi pairs which have same key. So when call 
> {{foreachkey}}, the {{key}} would duplicated when iteration. My idea to solve 
> this is we prefer call {{foreach}} on {{(COL).keys()}} if {{keys()}} method 
> exists in {{COL}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-4992) sandbox uri does not work outisde mesos http server

2017-06-20 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16055300#comment-16055300
 ] 

haosdent commented on MESOS-4992:
-

[~skonto] Sorry for the delay. Have fixed.

> sandbox uri does not work outisde mesos http server
> ---
>
> Key: MESOS-4992
> URL: https://issues.apache.org/jira/browse/MESOS-4992
> Project: Mesos
>  Issue Type: Bug
>  Components: webui
>Affects Versions: 0.27.1
>Reporter: Stavros Kontopoulos
>Assignee: haosdent
>  Labels: mesosphere
> Fix For: 1.4.0
>
>
> The SandBox uri of a framework does not work if i just copy paste it to the 
> browser.
> For example the following sandbox uri:
> http://172.17.0.1:5050/#/slaves/50f87c73-79ef-4f2a-95f0-b2b4062b2de6-S0/frameworks/50f87c73-79ef-4f2a-95f0-b2b4062b2de6-0009/executors/driver-20160321155016-0001/browse
> should redirect to:
> http://172.17.0.1:5050/#/slaves/50f87c73-79ef-4f2a-95f0-b2b4062b2de6-S0/browse?path=%2Ftmp%2Fmesos%2Fslaves%2F50f87c73-79ef-4f2a-95f0-b2b4062b2de6-S0%2Fframeworks%2F50f87c73-79ef-4f2a-95f0-b2b4062b2de6-0009%2Fexecutors%2Fdriver-20160321155016-0001%2Fruns%2F60533483-31fb-4353-987d-f3393911cc80
> yet it fails with the message:
> "Failed to find slaves.
> Navigate to the slave's sandbox via the Mesos UI."
> and redirects to:
> http://172.17.0.1:5050/#/
> It is an issue for me because im working on expanding the mesos spark ui with 
> sandbox uri, The other option is to get the slave info and parse the json 
> file there and get executor paths not so straightforward or elegant though.
> Moreover i dont see the runs/container_id in the Mesos Proto Api. I guess 
> this is hidden info, this is the needed piece of info to re-write the uri 
> without redirection.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7458) webui display of framework resources is confusing

2017-06-04 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16036517#comment-16036517
 ] 

haosdent commented on MESOS-7458:
-

hi, [~vinodkone][~neilc] sry, some unexpected thing happen in past weeks, let 
me post a patch for this.

> webui display of framework resources is confusing
> -
>
> Key: MESOS-7458
> URL: https://issues.apache.org/jira/browse/MESOS-7458
> Project: Mesos
>  Issue Type: Bug
>  Components: webui
>Reporter: Neil Conway
>Assignee: haosdent
>  Labels: mesosphere
> Attachments: Screen Shot 2017-05-04 at 11.15.12 AM.png, Screen Shot 
> 2017-05-04 at 11.15.25 AM.png
>
>
> In the webui, the list of frameworks displays the {{used_resources}} for each 
> framework. When you click on the framework to access the per-framework page, 
> the resources displayed are the *total* resources (the {{resources}} key in 
> state.json, which is {{used_resources}} + {{offered_resources}}). This is 
> confusing in situations when the offered resources are very different from 
> the used resources.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-7468) Could not copy the sandbox path on WebUI

2017-05-07 Thread haosdent (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent updated MESOS-7468:

Description: 
I would get 

{code}
 var  lib  mesos  slaves  08879b43-58c9-4db7-a93e-4873e35c8144-S1  frameworks  
1c092dff-e6d2-4537-a872-52752929ea7e-  executors  
test-copy.cfd4d72a-3397-11e7-8e73-02426ed45ffc  runs  
3d8e16cb-f5c7-4580-952d-1a230943e154
{code}

when I select texts in webui.

It is because the definition of breadcrumb in bootstrap is 

{code}
.breadcrumb > li + li:before {
content: "/";
}
{code}

So "/" would not be included when select and copy text 

  was:
I would get 

{code}
 var  lib  mesos  slaves  08879b43-58c9-4db7-a93e-4873e35c8144-S1  frameworks  
1c092dff-e6d2-4537-a872-52752929ea7e-  executors  
test-copy.cfd4d72a-3397-11e7-8e73-02426ed45ffc  runs  
3d8e16cb-f5c7-4580-952d-1a230943e154
{code}

when I select texts in webui.


> Could not copy the sandbox path on WebUI 
> -
>
> Key: MESOS-7468
> URL: https://issues.apache.org/jira/browse/MESOS-7468
> Project: Mesos
>  Issue Type: Bug
>  Components: webui
>Reporter: haosdent
>Assignee: haosdent
>Priority: Minor
>
> I would get 
> {code}
>  var  lib  mesos  slaves  08879b43-58c9-4db7-a93e-4873e35c8144-S1  frameworks 
>  1c092dff-e6d2-4537-a872-52752929ea7e-  executors  
> test-copy.cfd4d72a-3397-11e7-8e73-02426ed45ffc  runs  
> 3d8e16cb-f5c7-4580-952d-1a230943e154
> {code}
> when I select texts in webui.
> It is because the definition of breadcrumb in bootstrap is 
> {code}
> .breadcrumb > li + li:before {
> content: "/";
> }
> {code}
> So "/" would not be included when select and copy text 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-7468) Could not copy the sandbox path on WebUI

2017-05-07 Thread haosdent (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent updated MESOS-7468:

Description: 
I would get 

{code}
 var  lib  mesos  slaves  08879b43-58c9-4db7-a93e-4873e35c8144-S1  frameworks  
1c092dff-e6d2-4537-a872-52752929ea7e-  executors  
test-copy.cfd4d72a-3397-11e7-8e73-02426ed45ffc  runs  
3d8e16cb-f5c7-4580-952d-1a230943e154
{code}

when I select texts in webui.

> Could not copy the sandbox path on WebUI 
> -
>
> Key: MESOS-7468
> URL: https://issues.apache.org/jira/browse/MESOS-7468
> Project: Mesos
>  Issue Type: Bug
>  Components: webui
>Reporter: haosdent
>Assignee: haosdent
>Priority: Minor
>
> I would get 
> {code}
>  var  lib  mesos  slaves  08879b43-58c9-4db7-a93e-4873e35c8144-S1  frameworks 
>  1c092dff-e6d2-4537-a872-52752929ea7e-  executors  
> test-copy.cfd4d72a-3397-11e7-8e73-02426ed45ffc  runs  
> 3d8e16cb-f5c7-4580-952d-1a230943e154
> {code}
> when I select texts in webui.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (MESOS-7468) Could not copy the sandbox path on WebUI

2017-05-07 Thread haosdent (JIRA)
haosdent created MESOS-7468:
---

 Summary: Could not copy the sandbox path on WebUI 
 Key: MESOS-7468
 URL: https://issues.apache.org/jira/browse/MESOS-7468
 Project: Mesos
  Issue Type: Bug
  Components: webui
Reporter: haosdent
Assignee: haosdent
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (MESOS-7456) Compilation error on recent glibc in cgroups device subsystem

2017-05-03 Thread haosdent (JIRA)
haosdent created MESOS-7456:
---

 Summary: Compilation error on recent glibc in cgroups device 
subsystem
 Key: MESOS-7456
 URL: https://issues.apache.org/jira/browse/MESOS-7456
 Project: Mesos
  Issue Type: Bug
Reporter: haosdent
Assignee: Zhongbo Tian
 Fix For: 1.3.0


Got compile error on Arch

{code}
../../src/slave/containerizer/mesos/isolators/cgroups/subsystems/devices.cpp:116:13:
 error: In the GNU C Library, "major" is defined
 by . For historical compatibility, it is
 currently defined by  as well, but we plan to
 remove this soon. To use "major", include 
 directly. If you did not intend to use a system-defined macro
 "major", you should undefine it after including . [-Werror]
   entry.selector.major = major(device.get());
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-7454) Document how to use `cgroups/devices`

2017-05-03 Thread haosdent (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent updated MESOS-7454:

Labels: docuentation  (was: )

> Document how to use `cgroups/devices`
> -
>
> Key: MESOS-7454
> URL: https://issues.apache.org/jira/browse/MESOS-7454
> Project: Mesos
>  Issue Type: Bug
>Reporter: haosdent
>Assignee: haosdent
>Priority: Minor
>  Labels: docuentation
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (MESOS-7454) Document how to use `cgroups/devices`

2017-05-03 Thread haosdent (JIRA)
haosdent created MESOS-7454:
---

 Summary: Document how to use `cgroups/devices`
 Key: MESOS-7454
 URL: https://issues.apache.org/jira/browse/MESOS-7454
 Project: Mesos
  Issue Type: Bug
Reporter: haosdent
Assignee: haosdent
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-6791) Allow to specific the device whitelist entries in cgroup devices subsystem

2017-05-03 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15994322#comment-15994322
 ] 

haosdent commented on MESOS-6791:
-

{code}
commit e51e0ec89fb0a8f6d76b5ddd02c41bf53fed9154
Author: Zhongbo Tian 
Date:   Wed May 3 14:00:37 2017 +0800

Fixed compilation error on recent glibc for device whitelist.

Review: https://reviews.apache.org/r/58944/
{code}

Hi [~lidejia], may you help to double check again?

> Allow to specific the device whitelist entries in cgroup devices subsystem
> --
>
> Key: MESOS-6791
> URL: https://issues.apache.org/jira/browse/MESOS-6791
> Project: Mesos
>  Issue Type: Task
>  Components: cgroups
>Reporter: haosdent
>Assignee: Zhongbo Tian
>  Labels: cgroups
> Fix For: 1.3.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-7453) glyphicons-halflings-regular.woff2 is missing in WebUI

2017-05-02 Thread haosdent (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent updated MESOS-7453:

Fix Version/s: 1.2.1

> glyphicons-halflings-regular.woff2 is missing in WebUI
> --
>
> Key: MESOS-7453
> URL: https://issues.apache.org/jira/browse/MESOS-7453
> Project: Mesos
>  Issue Type: Bug
>  Components: webui
>Reporter: haosdent
>Assignee: haosdent
>Priority: Minor
> Fix For: 1.2.1, 1.3.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (MESOS-7453) glyphicons-halflings-regular.woff2 is missing in WebUI

2017-05-02 Thread haosdent (JIRA)
haosdent created MESOS-7453:
---

 Summary: glyphicons-halflings-regular.woff2 is missing in WebUI
 Key: MESOS-7453
 URL: https://issues.apache.org/jira/browse/MESOS-7453
 Project: Mesos
  Issue Type: Bug
  Components: webui
Reporter: haosdent
Assignee: haosdent
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (MESOS-6440) "Catch up" the webui to features that have been added.

2017-05-01 Thread haosdent (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent reassigned MESOS-6440:
---

Assignee: haosdent

> "Catch up" the webui to features that have been added.
> --
>
> Key: MESOS-6440
> URL: https://issues.apache.org/jira/browse/MESOS-6440
> Project: Mesos
>  Issue Type: Epic
>  Components: webui
>Reporter: Benjamin Mahler
>Assignee: haosdent
>
> Going forward, we'd like to ensure that all features that are added include 
> the appropriate changes to the webui.
> Over time there have been some features that have been developed that have 
> not been reflected in the webui. The purpose of this epic is to collect these 
> and have an effort to catch up the webui to reflect the current state of 
> functionality.
> E.g. reservations / volumes are not visible in the UI
> E.g. framework capabilities are not visible in the UI



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (MESOS-6442) Display volumes in the agent page in the webui.

2017-05-01 Thread haosdent (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent reassigned MESOS-6442:
---

Assignee: haosdent

> Display volumes in the agent page in the webui.
> ---
>
> Key: MESOS-6442
> URL: https://issues.apache.org/jira/browse/MESOS-6442
> Project: Mesos
>  Issue Type: Task
>  Components: webui
>Reporter: Benjamin Mahler
>Assignee: haosdent
>
> We currently do not display the volumes present on the agent in the webui. It 
> would be nice to see this information.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (MESOS-6441) Display reservations in the agent page in the webui.

2017-05-01 Thread haosdent (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent reassigned MESOS-6441:
---

Assignee: haosdent

> Display reservations in the agent page in the webui.
> 
>
> Key: MESOS-6441
> URL: https://issues.apache.org/jira/browse/MESOS-6441
> Project: Mesos
>  Issue Type: Task
>  Components: webui
>Reporter: Benjamin Mahler
>Assignee: haosdent
>
> We currently do not display the reservations present on an agent in the 
> webui. It would be nice to see this information.
> It would also be nice to update the resource statistics tables to make the 
> distinction between unreserved and reserved resources. E.g.
> Reserved:
> Used, Allocated, Available and Total
> Unreserved:
> Used, Allocated, Available and Total



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-6791) Allow to specific the device whitelist entries in cgroup devices subsystem

2017-04-30 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15990263#comment-15990263
 ] 

haosdent commented on MESOS-6791:
-

Thx [~mcypark], it has committed.

> Allow to specific the device whitelist entries in cgroup devices subsystem
> --
>
> Key: MESOS-6791
> URL: https://issues.apache.org/jira/browse/MESOS-6791
> Project: Mesos
>  Issue Type: Task
>  Components: cgroups
>Reporter: haosdent
>Assignee: Zhongbo Tian
>  Labels: cgroups
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-6134) Port CFS quota support to Docker Containerizer using command executor.

2017-04-24 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15981595#comment-15981595
 ] 

haosdent commented on MESOS-6134:
-

Hi, [~neilc] Since we refactored docker args at 
https://issues.apache.org/jira/browse/MESOS-6808 . There are a lot of conflicts 
when cherry-pick this patch to 1.2.x and 1.1.x. Do you think it is OK to remove 
1.2.1 and 1.1.2 in fix versions?

> Port CFS quota support to Docker Containerizer using command executor.
> --
>
> Key: MESOS-6134
> URL: https://issues.apache.org/jira/browse/MESOS-6134
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, docker
>Reporter: Zhitao Li
>Assignee: Zhitao Li
> Fix For: 1.3.0
>
>
> MESOS-2154 only partially fixed the CFS quota support in Docker 
> Containerizer: that fix only works for custom executor.
> This tracks the fix for command executor so we can declare this is complete.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-6791) Allow to specific the device whitelist entries in cgroup devices subsystem

2017-04-17 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15972022#comment-15972022
 ] 

haosdent commented on MESOS-6791:
-

Have reverted this patch since we need to change something in API level as well.
{code}
commit 3398c95b0cbdf37a7ad8078fdbdb79e020e305ca
Author: Haosdent Huang 
Date:   Tue Apr 18 10:09:23 2017 +0800

Revert "Allowed whitelist additional devices in cgroups devices subsystem."

This reverts commit ff9ed0c831c347204d065c5f39e5c8bb86f38514.
{code}

> Allow to specific the device whitelist entries in cgroup devices subsystem
> --
>
> Key: MESOS-6791
> URL: https://issues.apache.org/jira/browse/MESOS-6791
> Project: Mesos
>  Issue Type: Task
>  Components: cgroups
>Reporter: haosdent
>Assignee: haosdent
>  Labels: cgroups
> Fix For: 1.3.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-7210) HTTP health check doesn't work when mesos runs with --docker_mesos_image

2017-04-17 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15972021#comment-15972021
 ] 

haosdent commented on MESOS-7210:
-

Hi, [~adam-mesos] thanks a lot, have backported to 1.2.x and 1.1.x.

> HTTP health check doesn't work when mesos runs with --docker_mesos_image
> 
>
> Key: MESOS-7210
> URL: https://issues.apache.org/jira/browse/MESOS-7210
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 1.1.0, 1.1.1, 1.2.0
> Environment: Ubuntu 16.04.02
> Docker version 1.13.1
> mesos 1.1.0, runs from container
> docker containers  spawned by marathon 1.4.1
>Reporter: Wojciech Sielski
>Assignee: Deshi Xiao
>Priority: Critical
> Fix For: 1.1.2, 1.2.1, 1.3.0
>
>
> When running mesos-slave with option "docker_mesos_image" like:
> {code}
> --master=zk://standalone:2181/mesos  --containerizers=docker,mesos  
> --executor_registration_timeout=5mins  --hostname=standalone  --ip=0.0.0.0  
> --docker_stop_timeout=5secs  --gc_delay=1days  
> --docker_socket=/var/run/docker.sock  --no-systemd_enable_support  
> --work_dir=/tmp/mesos  --docker_mesos_image=panteras/paas-in-a-box:0.4.0
> {code}
> from the container that was started with option "pid: host" like:
> {code}
>   net:host
>   privileged: true
>   pid:host
> {code}
> and example marathon job, that use MESOS_HTTP checks like:
> {code}
> {
>  "id": "python-example-stable",
>  "cmd": "python3 -m http.server 8080",
>  "mem": 16,
>  "cpus": 0.1,
>  "instances": 2,
>  "container": {
>"type": "DOCKER",
>"docker": {
>  "image": "python:alpine",
>  "network": "BRIDGE",
>  "portMappings": [
> { "containerPort": 8080, "hostPort": 0, "protocol": "tcp" }
>  ]
>}
>  },
>  "env": {
>"SERVICE_NAME" : "python"
>  },
>  "healthChecks": [
>{
>  "path": "/",
>  "portIndex": 0,
>  "protocol": "MESOS_HTTP",
>  "gracePeriodSeconds": 30,
>  "intervalSeconds": 10,
>  "timeoutSeconds": 30,
>  "maxConsecutiveFailures": 3
>}
>  ]
> }
> {code}
> I see the errors like:
> {code}
> F0306 07:41:58.84429335 health_checker.cpp:94] Failed to enter the net 
> namespace of task (pid: '13527'): Pid 13527 does not exist
> *** Check failure stack trace: ***
> @ 0x7f51770b0c1d  google::LogMessage::Fail()
> @ 0x7f51770b29d0  google::LogMessage::SendToLog()
> @ 0x7f51770b0803  google::LogMessage::Flush()
> @ 0x7f51770b33f9  google::LogMessageFatal::~LogMessageFatal()
> @ 0x7f517647ce46  
> _ZNSt17_Function_handlerIFivEZN5mesos8internal6health14cloneWithSetnsERKSt8functionIS0_E6OptionIiERKSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaISG_EEEUlvE_E9_M_invokeERKSt9_Any_data
> @ 0x7f517647bf2b  mesos::internal::health::cloneWithSetns()
> @ 0x7f517648374b  std::_Function_handler<>::_M_invoke()
> @ 0x7f5177068167  process::internal::cloneChild()
> @ 0x7f5177065c32  process::subprocess()
> @ 0x7f5176481a9d  
> mesos::internal::health::HealthCheckerProcess::_httpHealthCheck()
> @ 0x7f51764831f7  
> mesos::internal::health::HealthCheckerProcess::_healthCheck()
> @ 0x7f517701f38c  process::ProcessBase::visit()
> @ 0x7f517702c8b3  process::ProcessManager::resume()
> @ 0x7f517702fb77  
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
> @ 0x7f51754ddc80  (unknown)
> @ 0x7f5174cf06ba  start_thread
> @ 0x7f5174a2682d  (unknown)
> I0306 07:41:59.077986 9 health_checker.cpp:199] Ignoring failure as 
> health check still in grace period
> {code}
> Looks like option docker_mesos_image makes, that newly started mesos job is 
> not using "pid host" option same as mother container was started, but has his 
> own PID namespace (so it doesn't matter if mother container was started with 
> "pid host" or not it will never be able to find PID)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-7210) HTTP health check doesn't work when mesos runs with --docker_mesos_image

2017-04-17 Thread haosdent (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent updated MESOS-7210:

Fix Version/s: 1.2.1
   1.1.2

> HTTP health check doesn't work when mesos runs with --docker_mesos_image
> 
>
> Key: MESOS-7210
> URL: https://issues.apache.org/jira/browse/MESOS-7210
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 1.1.0, 1.1.1, 1.2.0
> Environment: Ubuntu 16.04.02
> Docker version 1.13.1
> mesos 1.1.0, runs from container
> docker containers  spawned by marathon 1.4.1
>Reporter: Wojciech Sielski
>Assignee: Deshi Xiao
>Priority: Critical
> Fix For: 1.1.2, 1.2.1, 1.3.0
>
>
> When running mesos-slave with option "docker_mesos_image" like:
> {code}
> --master=zk://standalone:2181/mesos  --containerizers=docker,mesos  
> --executor_registration_timeout=5mins  --hostname=standalone  --ip=0.0.0.0  
> --docker_stop_timeout=5secs  --gc_delay=1days  
> --docker_socket=/var/run/docker.sock  --no-systemd_enable_support  
> --work_dir=/tmp/mesos  --docker_mesos_image=panteras/paas-in-a-box:0.4.0
> {code}
> from the container that was started with option "pid: host" like:
> {code}
>   net:host
>   privileged: true
>   pid:host
> {code}
> and example marathon job, that use MESOS_HTTP checks like:
> {code}
> {
>  "id": "python-example-stable",
>  "cmd": "python3 -m http.server 8080",
>  "mem": 16,
>  "cpus": 0.1,
>  "instances": 2,
>  "container": {
>"type": "DOCKER",
>"docker": {
>  "image": "python:alpine",
>  "network": "BRIDGE",
>  "portMappings": [
> { "containerPort": 8080, "hostPort": 0, "protocol": "tcp" }
>  ]
>}
>  },
>  "env": {
>"SERVICE_NAME" : "python"
>  },
>  "healthChecks": [
>{
>  "path": "/",
>  "portIndex": 0,
>  "protocol": "MESOS_HTTP",
>  "gracePeriodSeconds": 30,
>  "intervalSeconds": 10,
>  "timeoutSeconds": 30,
>  "maxConsecutiveFailures": 3
>}
>  ]
> }
> {code}
> I see the errors like:
> {code}
> F0306 07:41:58.84429335 health_checker.cpp:94] Failed to enter the net 
> namespace of task (pid: '13527'): Pid 13527 does not exist
> *** Check failure stack trace: ***
> @ 0x7f51770b0c1d  google::LogMessage::Fail()
> @ 0x7f51770b29d0  google::LogMessage::SendToLog()
> @ 0x7f51770b0803  google::LogMessage::Flush()
> @ 0x7f51770b33f9  google::LogMessageFatal::~LogMessageFatal()
> @ 0x7f517647ce46  
> _ZNSt17_Function_handlerIFivEZN5mesos8internal6health14cloneWithSetnsERKSt8functionIS0_E6OptionIiERKSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaISG_EEEUlvE_E9_M_invokeERKSt9_Any_data
> @ 0x7f517647bf2b  mesos::internal::health::cloneWithSetns()
> @ 0x7f517648374b  std::_Function_handler<>::_M_invoke()
> @ 0x7f5177068167  process::internal::cloneChild()
> @ 0x7f5177065c32  process::subprocess()
> @ 0x7f5176481a9d  
> mesos::internal::health::HealthCheckerProcess::_httpHealthCheck()
> @ 0x7f51764831f7  
> mesos::internal::health::HealthCheckerProcess::_healthCheck()
> @ 0x7f517701f38c  process::ProcessBase::visit()
> @ 0x7f517702c8b3  process::ProcessManager::resume()
> @ 0x7f517702fb77  
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
> @ 0x7f51754ddc80  (unknown)
> @ 0x7f5174cf06ba  start_thread
> @ 0x7f5174a2682d  (unknown)
> I0306 07:41:59.077986 9 health_checker.cpp:199] Ignoring failure as 
> health check still in grace period
> {code}
> Looks like option docker_mesos_image makes, that newly started mesos job is 
> not using "pid host" option same as mother container was started, but has his 
> own PID namespace (so it doesn't matter if mother container was started with 
> "pid host" or not it will never be able to find PID)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-7210) HTTP health check doesn't work when mesos runs with --docker_mesos_image

2017-04-17 Thread haosdent (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent updated MESOS-7210:

Summary: HTTP health check doesn't work when mesos runs with 
--docker_mesos_image  (was: MESOS HTTP checks doesn't work when mesos runs with 
--docker_mesos_image)

> HTTP health check doesn't work when mesos runs with --docker_mesos_image
> 
>
> Key: MESOS-7210
> URL: https://issues.apache.org/jira/browse/MESOS-7210
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 1.1.0, 1.1.1, 1.2.0
> Environment: Ubuntu 16.04.02
> Docker version 1.13.1
> mesos 1.1.0, runs from container
> docker containers  spawned by marathon 1.4.1
>Reporter: Wojciech Sielski
>Assignee: Deshi Xiao
>Priority: Critical
> Fix For: 1.3.0
>
>
> When running mesos-slave with option "docker_mesos_image" like:
> {code}
> --master=zk://standalone:2181/mesos  --containerizers=docker,mesos  
> --executor_registration_timeout=5mins  --hostname=standalone  --ip=0.0.0.0  
> --docker_stop_timeout=5secs  --gc_delay=1days  
> --docker_socket=/var/run/docker.sock  --no-systemd_enable_support  
> --work_dir=/tmp/mesos  --docker_mesos_image=panteras/paas-in-a-box:0.4.0
> {code}
> from the container that was started with option "pid: host" like:
> {code}
>   net:host
>   privileged: true
>   pid:host
> {code}
> and example marathon job, that use MESOS_HTTP checks like:
> {code}
> {
>  "id": "python-example-stable",
>  "cmd": "python3 -m http.server 8080",
>  "mem": 16,
>  "cpus": 0.1,
>  "instances": 2,
>  "container": {
>"type": "DOCKER",
>"docker": {
>  "image": "python:alpine",
>  "network": "BRIDGE",
>  "portMappings": [
> { "containerPort": 8080, "hostPort": 0, "protocol": "tcp" }
>  ]
>}
>  },
>  "env": {
>"SERVICE_NAME" : "python"
>  },
>  "healthChecks": [
>{
>  "path": "/",
>  "portIndex": 0,
>  "protocol": "MESOS_HTTP",
>  "gracePeriodSeconds": 30,
>  "intervalSeconds": 10,
>  "timeoutSeconds": 30,
>  "maxConsecutiveFailures": 3
>}
>  ]
> }
> {code}
> I see the errors like:
> {code}
> F0306 07:41:58.84429335 health_checker.cpp:94] Failed to enter the net 
> namespace of task (pid: '13527'): Pid 13527 does not exist
> *** Check failure stack trace: ***
> @ 0x7f51770b0c1d  google::LogMessage::Fail()
> @ 0x7f51770b29d0  google::LogMessage::SendToLog()
> @ 0x7f51770b0803  google::LogMessage::Flush()
> @ 0x7f51770b33f9  google::LogMessageFatal::~LogMessageFatal()
> @ 0x7f517647ce46  
> _ZNSt17_Function_handlerIFivEZN5mesos8internal6health14cloneWithSetnsERKSt8functionIS0_E6OptionIiERKSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaISG_EEEUlvE_E9_M_invokeERKSt9_Any_data
> @ 0x7f517647bf2b  mesos::internal::health::cloneWithSetns()
> @ 0x7f517648374b  std::_Function_handler<>::_M_invoke()
> @ 0x7f5177068167  process::internal::cloneChild()
> @ 0x7f5177065c32  process::subprocess()
> @ 0x7f5176481a9d  
> mesos::internal::health::HealthCheckerProcess::_httpHealthCheck()
> @ 0x7f51764831f7  
> mesos::internal::health::HealthCheckerProcess::_healthCheck()
> @ 0x7f517701f38c  process::ProcessBase::visit()
> @ 0x7f517702c8b3  process::ProcessManager::resume()
> @ 0x7f517702fb77  
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
> @ 0x7f51754ddc80  (unknown)
> @ 0x7f5174cf06ba  start_thread
> @ 0x7f5174a2682d  (unknown)
> I0306 07:41:59.077986 9 health_checker.cpp:199] Ignoring failure as 
> health check still in grace period
> {code}
> Looks like option docker_mesos_image makes, that newly started mesos job is 
> not using "pid host" option same as mother container was started, but has his 
> own PID namespace (so it doesn't matter if mother container was started with 
> "pid host" or not it will never be able to find PID)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-7210) MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image

2017-04-17 Thread haosdent (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent updated MESOS-7210:

Summary: MESOS HTTP checks doesn't work when mesos runs with 
--docker_mesos_image  (was: MESOS HTTP checks doesn't work when mesos runs with 
--docker_mesos_image ( pid namespace mismatch ))

> MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image
> 
>
> Key: MESOS-7210
> URL: https://issues.apache.org/jira/browse/MESOS-7210
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 1.1.0, 1.1.1, 1.2.0
> Environment: Ubuntu 16.04.02
> Docker version 1.13.1
> mesos 1.1.0, runs from container
> docker containers  spawned by marathon 1.4.1
>Reporter: Wojciech Sielski
>Assignee: Deshi Xiao
>Priority: Critical
> Fix For: 1.3.0
>
>
> When running mesos-slave with option "docker_mesos_image" like:
> {code}
> --master=zk://standalone:2181/mesos  --containerizers=docker,mesos  
> --executor_registration_timeout=5mins  --hostname=standalone  --ip=0.0.0.0  
> --docker_stop_timeout=5secs  --gc_delay=1days  
> --docker_socket=/var/run/docker.sock  --no-systemd_enable_support  
> --work_dir=/tmp/mesos  --docker_mesos_image=panteras/paas-in-a-box:0.4.0
> {code}
> from the container that was started with option "pid: host" like:
> {code}
>   net:host
>   privileged: true
>   pid:host
> {code}
> and example marathon job, that use MESOS_HTTP checks like:
> {code}
> {
>  "id": "python-example-stable",
>  "cmd": "python3 -m http.server 8080",
>  "mem": 16,
>  "cpus": 0.1,
>  "instances": 2,
>  "container": {
>"type": "DOCKER",
>"docker": {
>  "image": "python:alpine",
>  "network": "BRIDGE",
>  "portMappings": [
> { "containerPort": 8080, "hostPort": 0, "protocol": "tcp" }
>  ]
>}
>  },
>  "env": {
>"SERVICE_NAME" : "python"
>  },
>  "healthChecks": [
>{
>  "path": "/",
>  "portIndex": 0,
>  "protocol": "MESOS_HTTP",
>  "gracePeriodSeconds": 30,
>  "intervalSeconds": 10,
>  "timeoutSeconds": 30,
>  "maxConsecutiveFailures": 3
>}
>  ]
> }
> {code}
> I see the errors like:
> {code}
> F0306 07:41:58.84429335 health_checker.cpp:94] Failed to enter the net 
> namespace of task (pid: '13527'): Pid 13527 does not exist
> *** Check failure stack trace: ***
> @ 0x7f51770b0c1d  google::LogMessage::Fail()
> @ 0x7f51770b29d0  google::LogMessage::SendToLog()
> @ 0x7f51770b0803  google::LogMessage::Flush()
> @ 0x7f51770b33f9  google::LogMessageFatal::~LogMessageFatal()
> @ 0x7f517647ce46  
> _ZNSt17_Function_handlerIFivEZN5mesos8internal6health14cloneWithSetnsERKSt8functionIS0_E6OptionIiERKSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaISG_EEEUlvE_E9_M_invokeERKSt9_Any_data
> @ 0x7f517647bf2b  mesos::internal::health::cloneWithSetns()
> @ 0x7f517648374b  std::_Function_handler<>::_M_invoke()
> @ 0x7f5177068167  process::internal::cloneChild()
> @ 0x7f5177065c32  process::subprocess()
> @ 0x7f5176481a9d  
> mesos::internal::health::HealthCheckerProcess::_httpHealthCheck()
> @ 0x7f51764831f7  
> mesos::internal::health::HealthCheckerProcess::_healthCheck()
> @ 0x7f517701f38c  process::ProcessBase::visit()
> @ 0x7f517702c8b3  process::ProcessManager::resume()
> @ 0x7f517702fb77  
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
> @ 0x7f51754ddc80  (unknown)
> @ 0x7f5174cf06ba  start_thread
> @ 0x7f5174a2682d  (unknown)
> I0306 07:41:59.077986 9 health_checker.cpp:199] Ignoring failure as 
> health check still in grace period
> {code}
> Looks like option docker_mesos_image makes, that newly started mesos job is 
> not using "pid host" option same as mother container was started, but has his 
> own PID namespace (so it doesn't matter if mother container was started with 
> "pid host" or not it will never be able to find PID)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-7210) MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image ( pid namespace mismatch )

2017-04-16 Thread haosdent (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent updated MESOS-7210:

Fix Version/s: 1.3.0

> MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image ( 
> pid namespace mismatch )
> ---
>
> Key: MESOS-7210
> URL: https://issues.apache.org/jira/browse/MESOS-7210
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 1.1.0, 1.1.1, 1.2.0
> Environment: Ubuntu 16.04.02
> Docker version 1.13.1
> mesos 1.1.0, runs from container
> docker containers  spawned by marathon 1.4.1
>Reporter: Wojciech Sielski
>Assignee: Deshi Xiao
>Priority: Critical
> Fix For: 1.3.0
>
>
> When running mesos-slave with option "docker_mesos_image" like:
> {code}
> --master=zk://standalone:2181/mesos  --containerizers=docker,mesos  
> --executor_registration_timeout=5mins  --hostname=standalone  --ip=0.0.0.0  
> --docker_stop_timeout=5secs  --gc_delay=1days  
> --docker_socket=/var/run/docker.sock  --no-systemd_enable_support  
> --work_dir=/tmp/mesos  --docker_mesos_image=panteras/paas-in-a-box:0.4.0
> {code}
> from the container that was started with option "pid: host" like:
> {code}
>   net:host
>   privileged: true
>   pid:host
> {code}
> and example marathon job, that use MESOS_HTTP checks like:
> {code}
> {
>  "id": "python-example-stable",
>  "cmd": "python3 -m http.server 8080",
>  "mem": 16,
>  "cpus": 0.1,
>  "instances": 2,
>  "container": {
>"type": "DOCKER",
>"docker": {
>  "image": "python:alpine",
>  "network": "BRIDGE",
>  "portMappings": [
> { "containerPort": 8080, "hostPort": 0, "protocol": "tcp" }
>  ]
>}
>  },
>  "env": {
>"SERVICE_NAME" : "python"
>  },
>  "healthChecks": [
>{
>  "path": "/",
>  "portIndex": 0,
>  "protocol": "MESOS_HTTP",
>  "gracePeriodSeconds": 30,
>  "intervalSeconds": 10,
>  "timeoutSeconds": 30,
>  "maxConsecutiveFailures": 3
>}
>  ]
> }
> {code}
> I see the errors like:
> {code}
> F0306 07:41:58.84429335 health_checker.cpp:94] Failed to enter the net 
> namespace of task (pid: '13527'): Pid 13527 does not exist
> *** Check failure stack trace: ***
> @ 0x7f51770b0c1d  google::LogMessage::Fail()
> @ 0x7f51770b29d0  google::LogMessage::SendToLog()
> @ 0x7f51770b0803  google::LogMessage::Flush()
> @ 0x7f51770b33f9  google::LogMessageFatal::~LogMessageFatal()
> @ 0x7f517647ce46  
> _ZNSt17_Function_handlerIFivEZN5mesos8internal6health14cloneWithSetnsERKSt8functionIS0_E6OptionIiERKSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaISG_EEEUlvE_E9_M_invokeERKSt9_Any_data
> @ 0x7f517647bf2b  mesos::internal::health::cloneWithSetns()
> @ 0x7f517648374b  std::_Function_handler<>::_M_invoke()
> @ 0x7f5177068167  process::internal::cloneChild()
> @ 0x7f5177065c32  process::subprocess()
> @ 0x7f5176481a9d  
> mesos::internal::health::HealthCheckerProcess::_httpHealthCheck()
> @ 0x7f51764831f7  
> mesos::internal::health::HealthCheckerProcess::_healthCheck()
> @ 0x7f517701f38c  process::ProcessBase::visit()
> @ 0x7f517702c8b3  process::ProcessManager::resume()
> @ 0x7f517702fb77  
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
> @ 0x7f51754ddc80  (unknown)
> @ 0x7f5174cf06ba  start_thread
> @ 0x7f5174a2682d  (unknown)
> I0306 07:41:59.077986 9 health_checker.cpp:199] Ignoring failure as 
> health check still in grace period
> {code}
> Looks like option docker_mesos_image makes, that newly started mesos job is 
> not using "pid host" option same as mother container was started, but has his 
> own PID namespace (so it doesn't matter if mother container was started with 
> "pid host" or not it will never be able to find PID)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-6791) Allow to specific the device whitelist entries in cgroup devices subsystem

2017-04-07 Thread haosdent (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent updated MESOS-6791:

Story Points: 1

> Allow to specific the device whitelist entries in cgroup devices subsystem
> --
>
> Key: MESOS-6791
> URL: https://issues.apache.org/jira/browse/MESOS-6791
> Project: Mesos
>  Issue Type: Task
>  Components: cgroups
>Reporter: haosdent
>Assignee: haosdent
>  Labels: cgroups
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-6791) Allow to specific the device whitelist entries in cgroup devices subsystem

2017-04-07 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15961114#comment-15961114
 ] 

haosdent commented on MESOS-6791:
-

hi, [~gilbert] 1~2 points.

> Allow to specific the device whitelist entries in cgroup devices subsystem
> --
>
> Key: MESOS-6791
> URL: https://issues.apache.org/jira/browse/MESOS-6791
> Project: Mesos
>  Issue Type: Task
>  Components: cgroups
>Reporter: haosdent
>Assignee: haosdent
>  Labels: cgroups
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (MESOS-7210) MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image ( pid namespace mismatch )

2017-04-07 Thread haosdent (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent reassigned MESOS-7210:
---

Assignee: Deshi Xiao  (was: haosdent)

> MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image ( 
> pid namespace mismatch )
> ---
>
> Key: MESOS-7210
> URL: https://issues.apache.org/jira/browse/MESOS-7210
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 1.1.0, 1.1.1, 1.2.0
> Environment: Ubuntu 16.04.02
> Docker version 1.13.1
> mesos 1.1.0, runs from container
> docker containers  spawned by marathon 1.4.1
>Reporter: Wojciech Sielski
>Assignee: Deshi Xiao
>Priority: Critical
>
> When running mesos-slave with option "docker_mesos_image" like:
> {code}
> --master=zk://standalone:2181/mesos  --containerizers=docker,mesos  
> --executor_registration_timeout=5mins  --hostname=standalone  --ip=0.0.0.0  
> --docker_stop_timeout=5secs  --gc_delay=1days  
> --docker_socket=/var/run/docker.sock  --no-systemd_enable_support  
> --work_dir=/tmp/mesos  --docker_mesos_image=panteras/paas-in-a-box:0.4.0
> {code}
> from the container that was started with option "pid: host" like:
> {code}
>   net:host
>   privileged: true
>   pid:host
> {code}
> and example marathon job, that use MESOS_HTTP checks like:
> {code}
> {
>  "id": "python-example-stable",
>  "cmd": "python3 -m http.server 8080",
>  "mem": 16,
>  "cpus": 0.1,
>  "instances": 2,
>  "container": {
>"type": "DOCKER",
>"docker": {
>  "image": "python:alpine",
>  "network": "BRIDGE",
>  "portMappings": [
> { "containerPort": 8080, "hostPort": 0, "protocol": "tcp" }
>  ]
>}
>  },
>  "env": {
>"SERVICE_NAME" : "python"
>  },
>  "healthChecks": [
>{
>  "path": "/",
>  "portIndex": 0,
>  "protocol": "MESOS_HTTP",
>  "gracePeriodSeconds": 30,
>  "intervalSeconds": 10,
>  "timeoutSeconds": 30,
>  "maxConsecutiveFailures": 3
>}
>  ]
> }
> {code}
> I see the errors like:
> {code}
> F0306 07:41:58.84429335 health_checker.cpp:94] Failed to enter the net 
> namespace of task (pid: '13527'): Pid 13527 does not exist
> *** Check failure stack trace: ***
> @ 0x7f51770b0c1d  google::LogMessage::Fail()
> @ 0x7f51770b29d0  google::LogMessage::SendToLog()
> @ 0x7f51770b0803  google::LogMessage::Flush()
> @ 0x7f51770b33f9  google::LogMessageFatal::~LogMessageFatal()
> @ 0x7f517647ce46  
> _ZNSt17_Function_handlerIFivEZN5mesos8internal6health14cloneWithSetnsERKSt8functionIS0_E6OptionIiERKSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaISG_EEEUlvE_E9_M_invokeERKSt9_Any_data
> @ 0x7f517647bf2b  mesos::internal::health::cloneWithSetns()
> @ 0x7f517648374b  std::_Function_handler<>::_M_invoke()
> @ 0x7f5177068167  process::internal::cloneChild()
> @ 0x7f5177065c32  process::subprocess()
> @ 0x7f5176481a9d  
> mesos::internal::health::HealthCheckerProcess::_httpHealthCheck()
> @ 0x7f51764831f7  
> mesos::internal::health::HealthCheckerProcess::_healthCheck()
> @ 0x7f517701f38c  process::ProcessBase::visit()
> @ 0x7f517702c8b3  process::ProcessManager::resume()
> @ 0x7f517702fb77  
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
> @ 0x7f51754ddc80  (unknown)
> @ 0x7f5174cf06ba  start_thread
> @ 0x7f5174a2682d  (unknown)
> I0306 07:41:59.077986 9 health_checker.cpp:199] Ignoring failure as 
> health check still in grace period
> {code}
> Looks like option docker_mesos_image makes, that newly started mesos job is 
> not using "pid host" option same as mother container was started, but has his 
> own PID namespace (so it doesn't matter if mother container was started with 
> "pid host" or not it will never be able to find PID)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-7210) MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image ( pid namespace mismatch )

2017-04-07 Thread haosdent (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent updated MESOS-7210:

Shepherd: haosdent

> MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image ( 
> pid namespace mismatch )
> ---
>
> Key: MESOS-7210
> URL: https://issues.apache.org/jira/browse/MESOS-7210
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 1.1.0, 1.1.1, 1.2.0
> Environment: Ubuntu 16.04.02
> Docker version 1.13.1
> mesos 1.1.0, runs from container
> docker containers  spawned by marathon 1.4.1
>Reporter: Wojciech Sielski
>Assignee: Deshi Xiao
>Priority: Critical
>
> When running mesos-slave with option "docker_mesos_image" like:
> {code}
> --master=zk://standalone:2181/mesos  --containerizers=docker,mesos  
> --executor_registration_timeout=5mins  --hostname=standalone  --ip=0.0.0.0  
> --docker_stop_timeout=5secs  --gc_delay=1days  
> --docker_socket=/var/run/docker.sock  --no-systemd_enable_support  
> --work_dir=/tmp/mesos  --docker_mesos_image=panteras/paas-in-a-box:0.4.0
> {code}
> from the container that was started with option "pid: host" like:
> {code}
>   net:host
>   privileged: true
>   pid:host
> {code}
> and example marathon job, that use MESOS_HTTP checks like:
> {code}
> {
>  "id": "python-example-stable",
>  "cmd": "python3 -m http.server 8080",
>  "mem": 16,
>  "cpus": 0.1,
>  "instances": 2,
>  "container": {
>"type": "DOCKER",
>"docker": {
>  "image": "python:alpine",
>  "network": "BRIDGE",
>  "portMappings": [
> { "containerPort": 8080, "hostPort": 0, "protocol": "tcp" }
>  ]
>}
>  },
>  "env": {
>"SERVICE_NAME" : "python"
>  },
>  "healthChecks": [
>{
>  "path": "/",
>  "portIndex": 0,
>  "protocol": "MESOS_HTTP",
>  "gracePeriodSeconds": 30,
>  "intervalSeconds": 10,
>  "timeoutSeconds": 30,
>  "maxConsecutiveFailures": 3
>}
>  ]
> }
> {code}
> I see the errors like:
> {code}
> F0306 07:41:58.84429335 health_checker.cpp:94] Failed to enter the net 
> namespace of task (pid: '13527'): Pid 13527 does not exist
> *** Check failure stack trace: ***
> @ 0x7f51770b0c1d  google::LogMessage::Fail()
> @ 0x7f51770b29d0  google::LogMessage::SendToLog()
> @ 0x7f51770b0803  google::LogMessage::Flush()
> @ 0x7f51770b33f9  google::LogMessageFatal::~LogMessageFatal()
> @ 0x7f517647ce46  
> _ZNSt17_Function_handlerIFivEZN5mesos8internal6health14cloneWithSetnsERKSt8functionIS0_E6OptionIiERKSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaISG_EEEUlvE_E9_M_invokeERKSt9_Any_data
> @ 0x7f517647bf2b  mesos::internal::health::cloneWithSetns()
> @ 0x7f517648374b  std::_Function_handler<>::_M_invoke()
> @ 0x7f5177068167  process::internal::cloneChild()
> @ 0x7f5177065c32  process::subprocess()
> @ 0x7f5176481a9d  
> mesos::internal::health::HealthCheckerProcess::_httpHealthCheck()
> @ 0x7f51764831f7  
> mesos::internal::health::HealthCheckerProcess::_healthCheck()
> @ 0x7f517701f38c  process::ProcessBase::visit()
> @ 0x7f517702c8b3  process::ProcessManager::resume()
> @ 0x7f517702fb77  
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
> @ 0x7f51754ddc80  (unknown)
> @ 0x7f5174cf06ba  start_thread
> @ 0x7f5174a2682d  (unknown)
> I0306 07:41:59.077986 9 health_checker.cpp:199] Ignoring failure as 
> health check still in grace period
> {code}
> Looks like option docker_mesos_image makes, that newly started mesos job is 
> not using "pid host" option same as mother container was started, but has his 
> own PID namespace (so it doesn't matter if mother container was started with 
> "pid host" or not it will never be able to find PID)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-7210) MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image ( pid namespace mismatch )

2017-03-20 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15934001#comment-15934001
 ] 

haosdent commented on MESOS-7210:
-

Thanks a lot  [~sielaq] [~alexr]'s help. Let me try to fix this.

> MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image ( 
> pid namespace mismatch )
> ---
>
> Key: MESOS-7210
> URL: https://issues.apache.org/jira/browse/MESOS-7210
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 1.1.0
> Environment: Ubuntu 16.04.02
> Docker version 1.13.1
> mesos 1.1.0, runs from container
> docker containers  spawned by marathon 1.4.1
>Reporter: Wojciech Sielski
>Assignee: haosdent
>
> When running mesos-slave with option "docker_mesos_image" like:
> {code}
> --master=zk://standalone:2181/mesos  --containerizers=docker,mesos  
> --executor_registration_timeout=5mins  --hostname=standalone  --ip=0.0.0.0  
> --docker_stop_timeout=5secs  --gc_delay=1days  
> --docker_socket=/var/run/docker.sock  --no-systemd_enable_support  
> --work_dir=/tmp/mesos  --docker_mesos_image=panteras/paas-in-a-box:0.4.0
> {code}
> from the container that was started with option "pid: host" like:
> {code}
>   net:host
>   privileged: true
>   pid:host
> {code}
> and example marathon job, that use MESOS_HTTP checks like:
> {code}
> {
>  "id": "python-example-stable",
>  "cmd": "python3 -m http.server 8080",
>  "mem": 16,
>  "cpus": 0.1,
>  "instances": 2,
>  "container": {
>"type": "DOCKER",
>"docker": {
>  "image": "python:alpine",
>  "network": "BRIDGE",
>  "portMappings": [
> { "containerPort": 8080, "hostPort": 0, "protocol": "tcp" }
>  ]
>}
>  },
>  "env": {
>"SERVICE_NAME" : "python"
>  },
>  "healthChecks": [
>{
>  "path": "/",
>  "portIndex": 0,
>  "protocol": "MESOS_HTTP",
>  "gracePeriodSeconds": 30,
>  "intervalSeconds": 10,
>  "timeoutSeconds": 30,
>  "maxConsecutiveFailures": 3
>}
>  ]
> }
> {code}
> I see the errors like:
> {code}
> F0306 07:41:58.84429335 health_checker.cpp:94] Failed to enter the net 
> namespace of task (pid: '13527'): Pid 13527 does not exist
> *** Check failure stack trace: ***
> @ 0x7f51770b0c1d  google::LogMessage::Fail()
> @ 0x7f51770b29d0  google::LogMessage::SendToLog()
> @ 0x7f51770b0803  google::LogMessage::Flush()
> @ 0x7f51770b33f9  google::LogMessageFatal::~LogMessageFatal()
> @ 0x7f517647ce46  
> _ZNSt17_Function_handlerIFivEZN5mesos8internal6health14cloneWithSetnsERKSt8functionIS0_E6OptionIiERKSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaISG_EEEUlvE_E9_M_invokeERKSt9_Any_data
> @ 0x7f517647bf2b  mesos::internal::health::cloneWithSetns()
> @ 0x7f517648374b  std::_Function_handler<>::_M_invoke()
> @ 0x7f5177068167  process::internal::cloneChild()
> @ 0x7f5177065c32  process::subprocess()
> @ 0x7f5176481a9d  
> mesos::internal::health::HealthCheckerProcess::_httpHealthCheck()
> @ 0x7f51764831f7  
> mesos::internal::health::HealthCheckerProcess::_healthCheck()
> @ 0x7f517701f38c  process::ProcessBase::visit()
> @ 0x7f517702c8b3  process::ProcessManager::resume()
> @ 0x7f517702fb77  
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
> @ 0x7f51754ddc80  (unknown)
> @ 0x7f5174cf06ba  start_thread
> @ 0x7f5174a2682d  (unknown)
> I0306 07:41:59.077986 9 health_checker.cpp:199] Ignoring failure as 
> health check still in grace period
> {code}
> Looks like option docker_mesos_image makes, that newly started mesos job is 
> not using "pid host" option same as mother container was started, but has his 
> own PID namespace (so it doesn't matter if mother container was started with 
> "pid host" or not it will never be able to find PID)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (MESOS-7210) MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image ( pid namespace mismatch )

2017-03-20 Thread haosdent (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent reassigned MESOS-7210:
---

Assignee: haosdent  (was: Gastón Kleiman)

> MESOS HTTP checks doesn't work when mesos runs with --docker_mesos_image ( 
> pid namespace mismatch )
> ---
>
> Key: MESOS-7210
> URL: https://issues.apache.org/jira/browse/MESOS-7210
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 1.1.0
> Environment: Ubuntu 16.04.02
> Docker version 1.13.1
> mesos 1.1.0, runs from container
> docker containers  spawned by marathon 1.4.1
>Reporter: Wojciech Sielski
>Assignee: haosdent
>
> When running mesos-slave with option "docker_mesos_image" like:
> {code}
> --master=zk://standalone:2181/mesos  --containerizers=docker,mesos  
> --executor_registration_timeout=5mins  --hostname=standalone  --ip=0.0.0.0  
> --docker_stop_timeout=5secs  --gc_delay=1days  
> --docker_socket=/var/run/docker.sock  --no-systemd_enable_support  
> --work_dir=/tmp/mesos  --docker_mesos_image=panteras/paas-in-a-box:0.4.0
> {code}
> from the container that was started with option "pid: host" like:
> {code}
>   net:host
>   privileged: true
>   pid:host
> {code}
> and example marathon job, that use MESOS_HTTP checks like:
> {code}
> {
>  "id": "python-example-stable",
>  "cmd": "python3 -m http.server 8080",
>  "mem": 16,
>  "cpus": 0.1,
>  "instances": 2,
>  "container": {
>"type": "DOCKER",
>"docker": {
>  "image": "python:alpine",
>  "network": "BRIDGE",
>  "portMappings": [
> { "containerPort": 8080, "hostPort": 0, "protocol": "tcp" }
>  ]
>}
>  },
>  "env": {
>"SERVICE_NAME" : "python"
>  },
>  "healthChecks": [
>{
>  "path": "/",
>  "portIndex": 0,
>  "protocol": "MESOS_HTTP",
>  "gracePeriodSeconds": 30,
>  "intervalSeconds": 10,
>  "timeoutSeconds": 30,
>  "maxConsecutiveFailures": 3
>}
>  ]
> }
> {code}
> I see the errors like:
> {code}
> F0306 07:41:58.84429335 health_checker.cpp:94] Failed to enter the net 
> namespace of task (pid: '13527'): Pid 13527 does not exist
> *** Check failure stack trace: ***
> @ 0x7f51770b0c1d  google::LogMessage::Fail()
> @ 0x7f51770b29d0  google::LogMessage::SendToLog()
> @ 0x7f51770b0803  google::LogMessage::Flush()
> @ 0x7f51770b33f9  google::LogMessageFatal::~LogMessageFatal()
> @ 0x7f517647ce46  
> _ZNSt17_Function_handlerIFivEZN5mesos8internal6health14cloneWithSetnsERKSt8functionIS0_E6OptionIiERKSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaISG_EEEUlvE_E9_M_invokeERKSt9_Any_data
> @ 0x7f517647bf2b  mesos::internal::health::cloneWithSetns()
> @ 0x7f517648374b  std::_Function_handler<>::_M_invoke()
> @ 0x7f5177068167  process::internal::cloneChild()
> @ 0x7f5177065c32  process::subprocess()
> @ 0x7f5176481a9d  
> mesos::internal::health::HealthCheckerProcess::_httpHealthCheck()
> @ 0x7f51764831f7  
> mesos::internal::health::HealthCheckerProcess::_healthCheck()
> @ 0x7f517701f38c  process::ProcessBase::visit()
> @ 0x7f517702c8b3  process::ProcessManager::resume()
> @ 0x7f517702fb77  
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
> @ 0x7f51754ddc80  (unknown)
> @ 0x7f5174cf06ba  start_thread
> @ 0x7f5174a2682d  (unknown)
> I0306 07:41:59.077986 9 health_checker.cpp:199] Ignoring failure as 
> health check still in grace period
> {code}
> Looks like option docker_mesos_image makes, that newly started mesos job is 
> not using "pid host" option same as mother container was started, but has his 
> own PID namespace (so it doesn't matter if mother container was started with 
> "pid host" or not it will never be able to find PID)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-6443) Display maintenance information in the webui.

2017-03-18 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15931096#comment-15931096
 ] 

haosdent commented on MESOS-6443:
-

hi [~pawan.ufl] fixed at MESOS-7261

> Display maintenance information in the webui.
> -
>
> Key: MESOS-6443
> URL: https://issues.apache.org/jira/browse/MESOS-6443
> Project: Mesos
>  Issue Type: Improvement
>  Components: webui
>Reporter: Tomasz Janiszewski
>Assignee: Tomasz Janiszewski
>Priority: Minor
> Fix For: 1.2.0
>
> Attachments: mesos_webui_maintenance_schedule.png
>
>
> Add new tab with Maintenance schedule.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (MESOS-7261) maintenance.html is missing during packaging

2017-03-18 Thread haosdent (JIRA)
haosdent created MESOS-7261:
---

 Summary: maintenance.html is missing during packaging
 Key: MESOS-7261
 URL: https://issues.apache.org/jira/browse/MESOS-7261
 Project: Mesos
  Issue Type: Bug
  Components: webui
Affects Versions: 1.2.0
Reporter: haosdent
Assignee: haosdent
 Fix For: 1.2.1, 1.3.0






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (MESOS-6995) Update the webui to reflect hierarchical roles.

2017-03-18 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15931082#comment-15931082
 ] 

haosdent edited comment on MESOS-6995 at 3/18/17 6:17 AM:
--

Hi [~guoger] [~bmahler] Do you think is it better to use a tree structure to 
show the hierarchical roles?


was (Author: haosd...@gmail.com):
[~guoger][~bmahler] Do you think is it better to use a tree structure to show 
the hierarchical roles?

> Update the webui to reflect hierarchical roles.
> ---
>
> Key: MESOS-6995
> URL: https://issues.apache.org/jira/browse/MESOS-6995
> Project: Mesos
>  Issue Type: Task
>  Components: webui
>Reporter: Benjamin Mahler
>Assignee: Jay Guo
>
> It may not need any changes, but we should confirm that the new role format 
> for hierarchical roles is correctly displayed in the webui.
> In addition, we can add a roles tab that shows the summary information 
> (shares, weights, quotas). For now, we don't need to make any of this 
> clickable (e.g. to see the tasks / frameworks under the role).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-6995) Update the webui to reflect hierarchical roles.

2017-03-18 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15931082#comment-15931082
 ] 

haosdent commented on MESOS-6995:
-

[~guoger][~bmahler] Do you think is it better to use a tree structure to show 
the hierarchical roles?

> Update the webui to reflect hierarchical roles.
> ---
>
> Key: MESOS-6995
> URL: https://issues.apache.org/jira/browse/MESOS-6995
> Project: Mesos
>  Issue Type: Task
>  Components: webui
>Reporter: Benjamin Mahler
>Assignee: Jay Guo
>
> It may not need any changes, but we should confirm that the new role format 
> for hierarchical roles is correctly displayed in the webui.
> In addition, we can add a roles tab that shows the summary information 
> (shares, weights, quotas). For now, we don't need to make any of this 
> clickable (e.g. to see the tasks / frameworks under the role).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-6995) Update the webui to reflect hierarchical roles.

2017-03-18 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15931080#comment-15931080
 ] 

haosdent commented on MESOS-6995:
-

{code}
commit c05eab4f4dfc87555cd131c82d9792cb71a796f4
Author: Benjamin Mahler 
Date:   Sat Mar 18 14:02:23 2017 +0800

Introduced a Roles tab in the webui.

Initially, this includes the weight, number of frameworks involved
with the role, and the resource allocation. Longer term this should
include the quota information and the revocable resources.

Review: https://reviews.apache.org/r/57622/
{code}

> Update the webui to reflect hierarchical roles.
> ---
>
> Key: MESOS-6995
> URL: https://issues.apache.org/jira/browse/MESOS-6995
> Project: Mesos
>  Issue Type: Task
>  Components: webui
>Reporter: Benjamin Mahler
>Assignee: Jay Guo
>
> It may not need any changes, but we should confirm that the new role format 
> for hierarchical roles is correctly displayed in the webui.
> In addition, we can add a roles tab that shows the summary information 
> (shares, weights, quotas). For now, we don't need to make any of this 
> clickable (e.g. to see the tasks / frameworks under the role).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-6443) Display maintenance information in the webui.

2017-03-17 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15930450#comment-15930450
 ] 

haosdent commented on MESOS-6443:
-

Hi, [~pawan.ufl] There is a bug that missing maintenance.html when packaging. 
Would create a separate ticket to cherry pick it to 1.2.1, sorry for the 
inconvenient. 

> Display maintenance information in the webui.
> -
>
> Key: MESOS-6443
> URL: https://issues.apache.org/jira/browse/MESOS-6443
> Project: Mesos
>  Issue Type: Improvement
>  Components: webui
>Reporter: Tomasz Janiszewski
>Assignee: Tomasz Janiszewski
>Priority: Minor
> Fix For: 1.2.0
>
> Attachments: mesos_webui_maintenance_schedule.png
>
>
> Add new tab with Maintenance schedule.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (MESOS-7219) Mesos UI to display agent attribute details

2017-03-07 Thread haosdent (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent reassigned MESOS-7219:
---

Assignee: haosdent

> Mesos UI to display agent attribute details
> ---
>
> Key: MESOS-7219
> URL: https://issues.apache.org/jira/browse/MESOS-7219
> Project: Mesos
>  Issue Type: Wish
>  Components: agent
>Affects Versions: 1.1.0
> Environment: All
>Reporter: Rory A. Savage
>Assignee: haosdent
>  Labels: mesos
> Fix For: 1.1.1
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> It would be helpful that the Mesos UI (server:5050) would display mesos agent 
> attributes when viewing all of the agents under the Agents view.  When 
> dealing with a large number of agents it's sometimes hard to distinguish 
> which agents are which if you are not relying on DNS naming conventions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (MESOS-6480) Support for docker live-restore option in Mesos

2017-03-06 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15898649#comment-15898649
 ] 

haosdent edited comment on MESOS-6480 at 3/7/17 2:59 AM:
-

As check, all docker command would fail when use {{--live-store}} and {{service 
docker stop}}, include {{docker log}} no matter which log-driver we use. After 
chat with [~jieyu], The possible way to resolve this is 

1. 
* {{docker run -d}} to start the program
* {{docker log --since xxx --follow}} to read the log
* If {{docker log}} failed, check if {{/proc/$taskPid}} exist, if the task 
process still exist, keep retry {{docker log}} util {{/proc/$taskPid}} 
disappear or {{docker log}} success again.

The problem of this way is it is a bit tricky to find the timestamp parameter 
in {{docker log --since}}. And some logs may miss

2. 
* Read the {{/run/docker/libcontainerd/$container_id/init-stdout}} and 
{{/run/docker/libcontainerd/$container_id/init-stderr}} directly. This is 
tricky as well. Because it depends on the implementation of docker accross 
different versions. And it don't allow multiple consumers, which mean if we 
read this file directly, other consumers on {{docker log}} would not see the 
log we got from this file.

In a short word, I think we don't have a perfect solution for this problem 
unless we allow some log missing.


was (Author: haosd...@gmail.com):
As check, all docker command would fail when use {{--live-store}} and {{service 
docker stop}}, include {{docker log}} no matter which log-driver we use. After 
chat with Jie Yu, The possible way to resolve this is 

1. 
* {{docker run -d}} to start the program
* {{docker log --since xxx --follow}} to read the log
* If {{docker log}} failed, check if {{/proc/$taskPid}} exist, if the task 
process still exist, keep retry {{docker log}} util {{/proc/$taskPid}} 
disappear or {{docker log}} success again.

The problem of this way is it is a bit tricky to find the timestamp parameter 
in {{docker log --since}}. And some logs may miss

2. 
* Read the {{/run/docker/libcontainerd/$container_id/init-stdout}} and 
{{/run/docker/libcontainerd/$container_id/init-stderr}} directly. This is 
tricky as well. Because it depends on the implementation of docker accross 
different versions. And it don't allow multiple consumers, which mean if we 
read this file directly, other consumers on {{docker log}} would not see the 
log we got from this file.

In a short word, I think we don't have a perfect solution for this problem 
unless we allow some log missing.

> Support for docker live-restore option in Mesos
> ---
>
> Key: MESOS-6480
> URL: https://issues.apache.org/jira/browse/MESOS-6480
> Project: Mesos
>  Issue Type: Task
>Reporter: Milind Chawre
>
> Docker-1.12 supports live-restore option which keeps containers alive during 
> docker daemon downtime https://docs.docker.com/engine/admin/live-restore/
> I tried to use this option in my Mesos setup And  observed this :
> 1. On mesos worker node stop docker daemon.
> 2. After some time start the docker daemon. All the containers running on 
> that are still visible using "docker ps". This is an expected behaviour of 
> live-restore option.
> 3. When I check mesos and marathon UI. It shows no Active tasks running on 
> that node. The containers which are still running on that node are now 
> scheduled on different mesos nodes, which is not right since I can see the 
> containers in "docker ps" output because of live-restore option.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-6480) Support for docker live-restore option in Mesos

2017-03-06 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15898649#comment-15898649
 ] 

haosdent commented on MESOS-6480:
-

As check, all docker command would fail when use {{--live-store}} and {{service 
docker stop}}, include {{docker log}} no matter which log-driver we use. After 
chat with Jie Yu, The possible way to resolve this is 

1. 
* {{docker run -d}} to start the program
* {{docker log --since xxx --follow}} to read the log
* If {{docker log}} failed, check if {{/proc/$taskPid}} exist, if the task 
process still exist, keep retry {{docker log}} util {{/proc/$taskPid}} 
disappear or {{docker log}} success again.

The problem of this way is it is a bit tricky to find the timestamp parameter 
in {{docker log --since}}. And some logs may miss

2. 
* Read the {{/run/docker/libcontainerd/$container_id/init-stdout}} and 
{{/run/docker/libcontainerd/$container_id/init-stderr}} directly. This is 
tricky as well. Because it depends on the implementation of docker accross 
different versions. And it don't allow multiple consumers, which mean if we 
read this file directly, other consumers on {{docker log}} would not see the 
log we got from this file.

In a short word, I think we don't have a perfect solution for this problem 
unless we allow some log missing.

> Support for docker live-restore option in Mesos
> ---
>
> Key: MESOS-6480
> URL: https://issues.apache.org/jira/browse/MESOS-6480
> Project: Mesos
>  Issue Type: Task
>Reporter: Milind Chawre
>
> Docker-1.12 supports live-restore option which keeps containers alive during 
> docker daemon downtime https://docs.docker.com/engine/admin/live-restore/
> I tried to use this option in my Mesos setup And  observed this :
> 1. On mesos worker node stop docker daemon.
> 2. After some time start the docker daemon. All the containers running on 
> that are still visible using "docker ps". This is an expected behaviour of 
> live-restore option.
> 3. When I check mesos and marathon UI. It shows no Active tasks running on 
> that node. The containers which are still running on that node are now 
> scheduled on different mesos nodes, which is not right since I can see the 
> containers in "docker ps" output because of live-restore option.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-5900) Support Unix domain socket connections in libprocess

2017-03-05 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15896720#comment-15896720
 ] 

haosdent commented on MESOS-5900:
-

The patch is https://reviews.apache.org/r/53460/

> Support Unix domain socket connections in libprocess
> 
>
> Key: MESOS-5900
> URL: https://issues.apache.org/jira/browse/MESOS-5900
> Project: Mesos
>  Issue Type: Improvement
>  Components: libprocess
>Reporter: Neil Conway
>Assignee: Benjamin Hindman
>  Labels: mesosphere
> Fix For: 1.2.0
>
>
> We should consider allowing two programs on the same host using libprocess to 
> communicate via Unix domain sockets rather than TCP. This has a few 
> advantages:
> * Security: remote hosts cannot connect to the Unix socket. Domain sockets 
> also offer additional support for 
> [authentication|https://docs.fedoraproject.org/en-US/Fedora_Security_Team/1/html/Defensive_Coding/sect-Defensive_Coding-Authentication-UNIX_Domain.html].
> * Performance: domain sockets are marginally faster than localhost TCP.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-7183) Always get coredump by add a health check on docker container app

2017-02-27 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15885613#comment-15885613
 ] 

haosdent commented on MESOS-7183:
-

We could change to {{LOG(ERROR)}} and {{EXIT(-1)}} which would not generate 
core dump.

> Always get coredump by add a health check on docker container app
> -
>
> Key: MESOS-7183
> URL: https://issues.apache.org/jira/browse/MESOS-7183
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Deshi Xiao
> Attachments: stderr
>
>
> see the sandbox's stderr log:
> {code}
> I0227 09:20:02.624827 22345 exec.cpp:162] Version: 1.1.0
> I0227 09:20:02.651790 22347 exec.cpp:237] Executor registered on agent 
> f2aeab4d-b224-479c-869d-121daa0c12cb-S0
> I0227 09:20:02.656651 22347 docker.cpp:811] Running docker -H 
> unix:///var/run/docker.sock run --privileged --cpu-shares 2048 --memory 
> 33554432 -e WORDPRESS_DB_HOST=192.168.1.210 -e WORDPRESS_DB_PASSWORD=root -e 
> MESOS_SANDBOX=/mnt/mesos/sandbox -e 
> MESOS_CONTAINER_NAME=mesos-f2aeab4d-b224-479c-869d-121daa0c12cb-S0.8f6de3ab-0e85-434a-a099-d16f9654a10c
>  -v /home:/data:rw -v 
> /var/lib/mesos/slaves/f2aeab4d-b224-479c-869d-121daa0c12cb-S0/frameworks/67b3106e-fe2b-4eaa-8dcc-51653d027738-0001/executors/0-wordpress4-nmg-nmgtest-55ba456bf6eb4e979610f5ec1fb23980/runs/8f6de3ab-0e85-434a-a099-d16f9654a10c:/mnt/mesos/sandbox
>  --net bridge --label=APP_ID=wordpress --label=USER=nmg 
> --label=CLUSTER=nmgtest --label=SLOT=0 --label=APP=wordpress4 -p 
> 31000:8080/tcp --name 
> mesos-f2aeab4d-b224-479c-869d-121daa0c12cb-S0.8f6de3ab-0e85-434a-a099-d16f9654a10c
>  wordpress
> WordPress not found in /var/www/html - copying now...
> Complete! WordPress has been successfully copied to /var/www/html
> MySQL Connection Error: (2002) Connection refused
> MySQL Connection Error: (2002) Connection refused
> MySQL Connection Error: (2002) Connection refused
> MySQL Connection Error: (2002) Connection refused
> MySQL Connection Error: (2002) Connection refused
> W0227 09:20:18.425110 22353 health_checker.cpp:205] Health check failed 1 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:19.535784 22347 health_checker.cpp:205] Health check failed 2 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> MySQL Connection Error: (2002) Connection refused
> W0227 09:20:20.646812 22350 health_checker.cpp:205] Health check failed 3 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:21.758222 22353 health_checker.cpp:205] Health check failed 4 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:22.773813 22349 health_checker.cpp:205] Health check failed 5 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> MySQL Connection Error: (2002) Connection refused
> W0227 09:20:23.883586 22352 health_checker.cpp:205] Health check failed 6 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:24.994628 22350 health_checker.cpp:205] Health check failed 7 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:26.106149 22352 health_checker.cpp:205] Health check failed 8 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> MySQL Connection Error: (2002) Connection refused
> W0227 09:20:27.218143 22351 health_checker.cpp:205] Health check failed 9 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:28.329988 22350 health_checker.cpp:205] Health check failed 10 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:29.440842 22348 health_checker.cpp:205] Health check failed 11 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> MySQL Connection Error: (2002) Connection refused
> W0227 09:20:30.554321 22348 health_checker.cpp:205] Health check failed 12 
> 

[jira] [Commented] (MESOS-7183) Always get coredump by add a health check on docker container app

2017-02-27 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15885612#comment-15885612
 ] 

haosdent commented on MESOS-7183:
-

It is becasue we use fatal log
{code}
  // This effectively aborts the health check.
  LOG(FATAL) << "Failed to enter the " << ns << " namespace of "
 << "task (pid: '" << taskPid.get() << "'): "
 << setns.error();
{code}

> Always get coredump by add a health check on docker container app
> -
>
> Key: MESOS-7183
> URL: https://issues.apache.org/jira/browse/MESOS-7183
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Deshi Xiao
> Attachments: stderr
>
>
> see the sandbox's stderr log:
> I0227 09:20:02.624827 22345 exec.cpp:162] Version: 1.1.0
> I0227 09:20:02.651790 22347 exec.cpp:237] Executor registered on agent 
> f2aeab4d-b224-479c-869d-121daa0c12cb-S0
> I0227 09:20:02.656651 22347 docker.cpp:811] Running docker -H 
> unix:///var/run/docker.sock run --privileged --cpu-shares 2048 --memory 
> 33554432 -e WORDPRESS_DB_HOST=192.168.1.210 -e WORDPRESS_DB_PASSWORD=root -e 
> MESOS_SANDBOX=/mnt/mesos/sandbox -e 
> MESOS_CONTAINER_NAME=mesos-f2aeab4d-b224-479c-869d-121daa0c12cb-S0.8f6de3ab-0e85-434a-a099-d16f9654a10c
>  -v /home:/data:rw -v 
> /var/lib/mesos/slaves/f2aeab4d-b224-479c-869d-121daa0c12cb-S0/frameworks/67b3106e-fe2b-4eaa-8dcc-51653d027738-0001/executors/0-wordpress4-nmg-nmgtest-55ba456bf6eb4e979610f5ec1fb23980/runs/8f6de3ab-0e85-434a-a099-d16f9654a10c:/mnt/mesos/sandbox
>  --net bridge --label=APP_ID=wordpress --label=USER=nmg 
> --label=CLUSTER=nmgtest --label=SLOT=0 --label=APP=wordpress4 -p 
> 31000:8080/tcp --name 
> mesos-f2aeab4d-b224-479c-869d-121daa0c12cb-S0.8f6de3ab-0e85-434a-a099-d16f9654a10c
>  wordpress
> WordPress not found in /var/www/html - copying now...
> Complete! WordPress has been successfully copied to /var/www/html
> MySQL Connection Error: (2002) Connection refused
> MySQL Connection Error: (2002) Connection refused
> MySQL Connection Error: (2002) Connection refused
> MySQL Connection Error: (2002) Connection refused
> MySQL Connection Error: (2002) Connection refused
> W0227 09:20:18.425110 22353 health_checker.cpp:205] Health check failed 1 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:19.535784 22347 health_checker.cpp:205] Health check failed 2 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> MySQL Connection Error: (2002) Connection refused
> W0227 09:20:20.646812 22350 health_checker.cpp:205] Health check failed 3 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:21.758222 22353 health_checker.cpp:205] Health check failed 4 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:22.773813 22349 health_checker.cpp:205] Health check failed 5 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> MySQL Connection Error: (2002) Connection refused
> W0227 09:20:23.883586 22352 health_checker.cpp:205] Health check failed 6 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:24.994628 22350 health_checker.cpp:205] Health check failed 7 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:26.106149 22352 health_checker.cpp:205] Health check failed 8 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> MySQL Connection Error: (2002) Connection refused
> W0227 09:20:27.218143 22351 health_checker.cpp:205] Health check failed 9 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:28.329988 22350 health_checker.cpp:205] Health check failed 10 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
> W0227 09:20:29.440842 22348 health_checker.cpp:205] Health check failed 11 
> times consecutively: HTTP health check failed: curl returned exited with 
> status 7: curl: (7) 

[jira] [Updated] (MESOS-7183) Always get coredump by add a health check on docker container app

2017-02-27 Thread haosdent (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent updated MESOS-7183:

Description: 
see the sandbox's stderr log:
{code}
I0227 09:20:02.624827 22345 exec.cpp:162] Version: 1.1.0
I0227 09:20:02.651790 22347 exec.cpp:237] Executor registered on agent 
f2aeab4d-b224-479c-869d-121daa0c12cb-S0
I0227 09:20:02.656651 22347 docker.cpp:811] Running docker -H 
unix:///var/run/docker.sock run --privileged --cpu-shares 2048 --memory 
33554432 -e WORDPRESS_DB_HOST=192.168.1.210 -e WORDPRESS_DB_PASSWORD=root -e 
MESOS_SANDBOX=/mnt/mesos/sandbox -e 
MESOS_CONTAINER_NAME=mesos-f2aeab4d-b224-479c-869d-121daa0c12cb-S0.8f6de3ab-0e85-434a-a099-d16f9654a10c
 -v /home:/data:rw -v 
/var/lib/mesos/slaves/f2aeab4d-b224-479c-869d-121daa0c12cb-S0/frameworks/67b3106e-fe2b-4eaa-8dcc-51653d027738-0001/executors/0-wordpress4-nmg-nmgtest-55ba456bf6eb4e979610f5ec1fb23980/runs/8f6de3ab-0e85-434a-a099-d16f9654a10c:/mnt/mesos/sandbox
 --net bridge --label=APP_ID=wordpress --label=USER=nmg --label=CLUSTER=nmgtest 
--label=SLOT=0 --label=APP=wordpress4 -p 31000:8080/tcp --name 
mesos-f2aeab4d-b224-479c-869d-121daa0c12cb-S0.8f6de3ab-0e85-434a-a099-d16f9654a10c
 wordpress
WordPress not found in /var/www/html - copying now...
Complete! WordPress has been successfully copied to /var/www/html

MySQL Connection Error: (2002) Connection refused

MySQL Connection Error: (2002) Connection refused

MySQL Connection Error: (2002) Connection refused

MySQL Connection Error: (2002) Connection refused

MySQL Connection Error: (2002) Connection refused
W0227 09:20:18.425110 22353 health_checker.cpp:205] Health check failed 1 times 
consecutively: HTTP health check failed: curl returned exited with status 7: 
curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
W0227 09:20:19.535784 22347 health_checker.cpp:205] Health check failed 2 times 
consecutively: HTTP health check failed: curl returned exited with status 7: 
curl: (7) Failed connect to 127.0.0.1:8080; Connection refused

MySQL Connection Error: (2002) Connection refused
W0227 09:20:20.646812 22350 health_checker.cpp:205] Health check failed 3 times 
consecutively: HTTP health check failed: curl returned exited with status 7: 
curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
W0227 09:20:21.758222 22353 health_checker.cpp:205] Health check failed 4 times 
consecutively: HTTP health check failed: curl returned exited with status 7: 
curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
W0227 09:20:22.773813 22349 health_checker.cpp:205] Health check failed 5 times 
consecutively: HTTP health check failed: curl returned exited with status 7: 
curl: (7) Failed connect to 127.0.0.1:8080; Connection refused

MySQL Connection Error: (2002) Connection refused
W0227 09:20:23.883586 22352 health_checker.cpp:205] Health check failed 6 times 
consecutively: HTTP health check failed: curl returned exited with status 7: 
curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
W0227 09:20:24.994628 22350 health_checker.cpp:205] Health check failed 7 times 
consecutively: HTTP health check failed: curl returned exited with status 7: 
curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
W0227 09:20:26.106149 22352 health_checker.cpp:205] Health check failed 8 times 
consecutively: HTTP health check failed: curl returned exited with status 7: 
curl: (7) Failed connect to 127.0.0.1:8080; Connection refused

MySQL Connection Error: (2002) Connection refused
W0227 09:20:27.218143 22351 health_checker.cpp:205] Health check failed 9 times 
consecutively: HTTP health check failed: curl returned exited with status 7: 
curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
W0227 09:20:28.329988 22350 health_checker.cpp:205] Health check failed 10 
times consecutively: HTTP health check failed: curl returned exited with status 
7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
W0227 09:20:29.440842 22348 health_checker.cpp:205] Health check failed 11 
times consecutively: HTTP health check failed: curl returned exited with status 
7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused

MySQL Connection Error: (2002) Connection refused
W0227 09:20:30.554321 22348 health_checker.cpp:205] Health check failed 12 
times consecutively: HTTP health check failed: curl returned exited with status 
7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused
W0227 09:20:31.664559 22347 health_checker.cpp:205] Health check failed 13 
times consecutively: HTTP health check failed: curl returned exited with status 
7: curl: (7) Failed connect to 127.0.0.1:8080; Connection refused

MySQL Connection Error: (2002) Connection refused
F0227 09:20:32.666734 22601 health_checker.cpp:94] Failed to enter the net 
namespace of task (pid: '22392'): Pid 22392 does not exist
*** Check failure stack trace: ***
@ 0x7f9b33f07862  

[jira] [Commented] (MESOS-7151) Some stdout and stderr was disappeared. And some stdout can not fully display in webui.

2017-02-23 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15880814#comment-15880814
 ] 

haosdent commented on MESOS-7151:
-

[~mark1982] Sorry, I could identify the problem according to your logs. May you 
mind use Teamviewer or Chrome remote to troubleshoot the problem? My hangout is 
haosd...@gmail.com 

> Some stdout and stderr was disappeared. And some stdout can not fully display 
> in webui.
> ---
>
> Key: MESOS-7151
> URL: https://issues.apache.org/jira/browse/MESOS-7151
> Project: Mesos
>  Issue Type: Bug
>  Components: agent
>Affects Versions: 1.0.1
>Reporter: mark1982
> Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png, 
> screenshot-4.png, screenshot-5.png, screenshot-6.png, taskid=1292.png
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-7151) Some stdout and stderr was disappeared. And some stdout can not fully display in webui.

2017-02-21 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15875575#comment-15875575
 ] 

haosdent commented on MESOS-7151:
-

the file you show is stderr or stdout in screnshot-6?

> Some stdout and stderr was disappeared. And some stdout can not fully display 
> in webui.
> ---
>
> Key: MESOS-7151
> URL: https://issues.apache.org/jira/browse/MESOS-7151
> Project: Mesos
>  Issue Type: Bug
>  Components: agent
>Affects Versions: 1.0.1
>Reporter: mark1982
> Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png, 
> screenshot-4.png, screenshot-5.png, screenshot-6.png
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-7151) Some stdout and stderr was disappeared. And some stdout can not fully display in webui.

2017-02-21 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15875556#comment-15875556
 ] 

haosdent commented on MESOS-7151:
-

may you mind "Right click" -> "Inspect" -> "Click Console" in Chrome and show 
the error?

> Some stdout and stderr was disappeared. And some stdout can not fully display 
> in webui.
> ---
>
> Key: MESOS-7151
> URL: https://issues.apache.org/jira/browse/MESOS-7151
> Project: Mesos
>  Issue Type: Bug
>  Components: agent
>Affects Versions: 1.0.1
>Reporter: mark1982
> Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png, 
> screenshot-4.png, screenshot-5.png
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-7151) Some stdout and stderr was disappeared. And some stdout can not fully display in webui.

2017-02-20 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15875481#comment-15875481
 ] 

haosdent commented on MESOS-7151:
-

It sounds like {{pailer}} have problem when fetching the log. Do you see any 
error log in Chrome console?

> Some stdout and stderr was disappeared. And some stdout can not fully display 
> in webui.
> ---
>
> Key: MESOS-7151
> URL: https://issues.apache.org/jira/browse/MESOS-7151
> Project: Mesos
>  Issue Type: Bug
>  Components: agent
>Affects Versions: 1.0.1
>Reporter: mark1982
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (MESOS-7148) Compare the performance of the replicated log after upgrade to leveldb 1.19

2017-02-20 Thread haosdent (JIRA)
haosdent created MESOS-7148:
---

 Summary: Compare the performance of the replicated log after 
upgrade to leveldb 1.19
 Key: MESOS-7148
 URL: https://issues.apache.org/jira/browse/MESOS-7148
 Project: Mesos
  Issue Type: Task
Reporter: haosdent
Assignee: Tomasz Janiszewski


We need to use {{./mesos-log benchmark}} to do the benchmark test for 
replicated log, or add a new benchmark test to automatical this.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (MESOS-7146) OSX broken due to wrong configuration of LevelDB after update.

2017-02-20 Thread haosdent (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent reassigned MESOS-7146:
---

Assignee: Tomasz Janiszewski  (was: haosdent)

> OSX broken due to wrong configuration of LevelDB after update.
> --
>
> Key: MESOS-7146
> URL: https://issues.apache.org/jira/browse/MESOS-7146
> Project: Mesos
>  Issue Type: Bug
>Reporter: Alexander Rojas
>Assignee: Tomasz Janiszewski
>Priority: Blocker
>  Labels: mesosphere
>
> The commit 
> [74878e255bb099029dde2a03e0b1d22fecf16000|https://reviews.apache.org/r/51053/]
>  broke the build in OS-X. On a first run it will break with the following 
> message:
> {noformat}
> checking if clang supports -c -o file.o... yes
> checking for poll.h... yes
> checking sys/select.h usability... checking if clang supports -fno-rtti 
> -fno-exceptions... yes
> checking if clang supports -c -o file.o... yes
> (cached) yes
> checking for clang option to produce PIC... checking whether the clang linker 
> (/usr/bin/ld) supports shared libraries... -fno-common -DPIC
> checking if clang PIC flag -fno-common -DPIC works... yes
> checking dynamic linker characteristics... mkdir out-shared
> clang++ -stdlib=libc++ -nostdinc++ 
> -I/usr/local/opt/llvm@3.8/lib/llvm-3.8/include/c++/v1 
> -Wno-deprecated-declarations  -fvisibility-inlines-hidden -fcolor-diagnostics 
> -Wno-unused-local-typedef -std=c++11 -stdlib=libc++ -DGTEST_USE_OWN_TR1
> _TUPLE=1 -DGTEST_LANG_CXX11 -I. -I./include -std=c++0x  -DOS_MACOSX 
> -DLEVELDB_PLATFORM_POSIX -DLEVELDB_ATOMIC_PRESENT -stdlib=libc++ -nostdinc++ 
> -I/usr/local/opt/llvm@3.8/lib/llvm-3.8/include/c++/v1 
> -Wno-deprecated-declarations  -fvisibil
> ity-inlines-hidden -fcolor-diagnostics -Wno-unused-local-typedef -std=c++11 
> -stdlib=libc++ -DGTEST_USE_OWN_TR1_TUPLE=1 -DGTEST_LANG_CXX11 -fPIC -fPIC -c 
> db/db_bench.cc -o out-shared/db/db_bench.o
> yes
> checking sys/select.h presence... error: unable to open output file 
> 'out-shared/db/db_bench.o': 'No such file or directory'
> 1 error generated.
> make[4]: *** [out-shared/db/db_bench.o] Error 1
> make[3]: *** [leveldb-1.19/out-static/libleveldb.a] Error 2
> make[3]: *** Waiting for unfinished jobs
> {noformat}
> if one re-runs the make command, then it fails with the following error:
> {noformat}
> clang -I. -I./include -DOS_MACOSX -DLEVELDB_PLATFORM_POSIX 
> -DLEVELDB_ATOMIC_PRESENT -stdlib=libc++ -nostdinc++ 
> -I/usr/local/opt/llvm@3.8/lib/llvm-3.8/include/c++/v1 
> -Wno-deprecated-declarations  -fvisibility-inlines-hidden -fcolor-diagnostics 
> -Wno-unused-local-typedef -std=c++11 -stdlib=libc++ 
> -DGTEST_USE_OWN_TR1_TUPLE=1 -DGTEST_LANG_CXX11 -fPIC -c db/c_test.c -o 
> out-static/db/c_test.o
> error: invalid argument '-std=c++11' not allowed with 'C/ObjC'
> {noformat}
> This one indicates that the C compiler is using CXXFLAGS instead of CFLAGS or 
> that CFLAGS are being wrongly generated. Running a thir time the make command 
> throes the following output:
> {noformat}
> clang -I. -I./include -DOS_MACOSX -DLEVELDB_PLATFORM_POSIX 
> -DLEVELDB_ATOMIC_PRESENT -stdlib=libc++ -nostdinc++ 
> -I/usr/local/opt/llvm@3.8/lib/llvm-3.8/include/c++/v1 
> -Wno-deprecated-declarations  -fvisibility-inlines-hidden -fcolor-diagno$
> tics -Wno-unused-local-typedef -std=c++11 -stdlib=libc++ 
> -DGTEST_USE_OWN_TR1_TUPLE=1 -DGTEST_LANG_CXX11 -fPIC -c db/c_test.c -o 
> out-static/db/c_test.o
> clang++ -L/usr/local/opt/subversion/lib -L/usr/local/opt/openssl/lib 
> -L/usr/local/opt/libevent/lib -L/usr/local/opt/apr/libexec/lib 
> -L/usr/local/opt/llvm/lib -Wl,-rpath,/usr/local/opt/llvm/lib  
> -fcolor-diagnostics  -stdlib=libc++ -nostdi$
> c++ -I/usr/local/opt/llvm@3.8/lib/llvm-3.8/include/c++/v1 
> -Wno-deprecated-declarations  -fvisibility-inlines-hidden -fcolor-diagnostics 
> -Wno-unused-local-typedef -std=c++11 -stdlib=libc++ 
> -DGTEST_USE_OWN_TR1_TUPLE=1 -DGTEST_LANG_CXX11 -I$
>  -I./include -std=c++0x  -DOS_MACOSX -DLEVELDB_PLATFORM_POSIX 
> -DLEVELDB_ATOMIC_PRESENT -stdlib=libc++ -nostdinc++ 
> -I/usr/local/opt/llvm@3.8/lib/llvm-3.8/include/c++/v1 
> -Wno-deprecated-declarations  -fvisibility-inlines-hidden -fcolor-dia$
> nostics -Wno-unused-local-typedef -std=c++11 -stdlib=libc++ 
> -DGTEST_USE_OWN_TR1_TUPLE=1 -DGTEST_LANG_CXX11 -fPIC db/corruption_test.cc 
> out-static/db/builder.o out-static/db/c.o out-static/db/db_impl.o 
> out-static/db/db_iter.o out-static/d$
> /dbformat.o out-static/db/dumpfile.o out-static/db/filename.o 
> out-static/db/log_reader.o out-static/db/log_writer.o 
> out-static/db/memtable.o out-static/db/repair.o out-static/db/table_cache.o 
> out-static/db/version_edit.o out-static/db/ve$
> sion_set.o out-static/db/write_batch.o out-static/table/block.o 
> out-static/table/block_builder.o out-static/table/filter_block.o 
> 

[jira] [Commented] (MESOS-7146) OSX broken due to wrong configuration of LevelDB after update.

2017-02-20 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15874750#comment-15874750
 ] 

haosdent commented on MESOS-7146:
-

According to the error message, it looks like c++ flags are not allowed when 
compile c files. [~janisz] has a patch before, let me verify if his patch works.

> OSX broken due to wrong configuration of LevelDB after update.
> --
>
> Key: MESOS-7146
> URL: https://issues.apache.org/jira/browse/MESOS-7146
> Project: Mesos
>  Issue Type: Bug
>Reporter: Alexander Rojas
>Assignee: haosdent
>Priority: Blocker
>  Labels: mesosphere
>
> The commit 
> [74878e255bb099029dde2a03e0b1d22fecf16000|https://reviews.apache.org/r/51053/]
>  broke the build in OS-X. On a first run it will break with the following 
> message:
> {noformat}
> checking if clang supports -c -o file.o... yes
> checking for poll.h... yes
> checking sys/select.h usability... checking if clang supports -fno-rtti 
> -fno-exceptions... yes
> checking if clang supports -c -o file.o... yes
> (cached) yes
> checking for clang option to produce PIC... checking whether the clang linker 
> (/usr/bin/ld) supports shared libraries... -fno-common -DPIC
> checking if clang PIC flag -fno-common -DPIC works... yes
> checking dynamic linker characteristics... mkdir out-shared
> clang++ -stdlib=libc++ -nostdinc++ 
> -I/usr/local/opt/llvm@3.8/lib/llvm-3.8/include/c++/v1 
> -Wno-deprecated-declarations  -fvisibility-inlines-hidden -fcolor-diagnostics 
> -Wno-unused-local-typedef -std=c++11 -stdlib=libc++ -DGTEST_USE_OWN_TR1
> _TUPLE=1 -DGTEST_LANG_CXX11 -I. -I./include -std=c++0x  -DOS_MACOSX 
> -DLEVELDB_PLATFORM_POSIX -DLEVELDB_ATOMIC_PRESENT -stdlib=libc++ -nostdinc++ 
> -I/usr/local/opt/llvm@3.8/lib/llvm-3.8/include/c++/v1 
> -Wno-deprecated-declarations  -fvisibil
> ity-inlines-hidden -fcolor-diagnostics -Wno-unused-local-typedef -std=c++11 
> -stdlib=libc++ -DGTEST_USE_OWN_TR1_TUPLE=1 -DGTEST_LANG_CXX11 -fPIC -fPIC -c 
> db/db_bench.cc -o out-shared/db/db_bench.o
> yes
> checking sys/select.h presence... error: unable to open output file 
> 'out-shared/db/db_bench.o': 'No such file or directory'
> 1 error generated.
> make[4]: *** [out-shared/db/db_bench.o] Error 1
> make[3]: *** [leveldb-1.19/out-static/libleveldb.a] Error 2
> make[3]: *** Waiting for unfinished jobs
> {noformat}
> if one re-runs the make command, then it fails with the following error:
> {noformat}
> clang -I. -I./include -DOS_MACOSX -DLEVELDB_PLATFORM_POSIX 
> -DLEVELDB_ATOMIC_PRESENT -stdlib=libc++ -nostdinc++ 
> -I/usr/local/opt/llvm@3.8/lib/llvm-3.8/include/c++/v1 
> -Wno-deprecated-declarations  -fvisibility-inlines-hidden -fcolor-diagnostics 
> -Wno-unused-local-typedef -std=c++11 -stdlib=libc++ 
> -DGTEST_USE_OWN_TR1_TUPLE=1 -DGTEST_LANG_CXX11 -fPIC -c db/c_test.c -o 
> out-static/db/c_test.o
> error: invalid argument '-std=c++11' not allowed with 'C/ObjC'
> {noformat}
> This one indicates that the C compiler is using CXXFLAGS instead of CFLAGS or 
> that CFLAGS are being wrongly generated. Running a thir time the make command 
> throes the following output:
> {noformat}
> clang -I. -I./include -DOS_MACOSX -DLEVELDB_PLATFORM_POSIX 
> -DLEVELDB_ATOMIC_PRESENT -stdlib=libc++ -nostdinc++ 
> -I/usr/local/opt/llvm@3.8/lib/llvm-3.8/include/c++/v1 
> -Wno-deprecated-declarations  -fvisibility-inlines-hidden -fcolor-diagno$
> tics -Wno-unused-local-typedef -std=c++11 -stdlib=libc++ 
> -DGTEST_USE_OWN_TR1_TUPLE=1 -DGTEST_LANG_CXX11 -fPIC -c db/c_test.c -o 
> out-static/db/c_test.o
> clang++ -L/usr/local/opt/subversion/lib -L/usr/local/opt/openssl/lib 
> -L/usr/local/opt/libevent/lib -L/usr/local/opt/apr/libexec/lib 
> -L/usr/local/opt/llvm/lib -Wl,-rpath,/usr/local/opt/llvm/lib  
> -fcolor-diagnostics  -stdlib=libc++ -nostdi$
> c++ -I/usr/local/opt/llvm@3.8/lib/llvm-3.8/include/c++/v1 
> -Wno-deprecated-declarations  -fvisibility-inlines-hidden -fcolor-diagnostics 
> -Wno-unused-local-typedef -std=c++11 -stdlib=libc++ 
> -DGTEST_USE_OWN_TR1_TUPLE=1 -DGTEST_LANG_CXX11 -I$
>  -I./include -std=c++0x  -DOS_MACOSX -DLEVELDB_PLATFORM_POSIX 
> -DLEVELDB_ATOMIC_PRESENT -stdlib=libc++ -nostdinc++ 
> -I/usr/local/opt/llvm@3.8/lib/llvm-3.8/include/c++/v1 
> -Wno-deprecated-declarations  -fvisibility-inlines-hidden -fcolor-dia$
> nostics -Wno-unused-local-typedef -std=c++11 -stdlib=libc++ 
> -DGTEST_USE_OWN_TR1_TUPLE=1 -DGTEST_LANG_CXX11 -fPIC db/corruption_test.cc 
> out-static/db/builder.o out-static/db/c.o out-static/db/db_impl.o 
> out-static/db/db_iter.o out-static/d$
> /dbformat.o out-static/db/dumpfile.o out-static/db/filename.o 
> out-static/db/log_reader.o out-static/db/log_writer.o 
> out-static/db/memtable.o out-static/db/repair.o out-static/db/table_cache.o 
> out-static/db/version_edit.o out-static/db/ve$
> 

[jira] [Assigned] (MESOS-7146) OSX broken due to wrong configuration of LevelDB after update.

2017-02-20 Thread haosdent (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent reassigned MESOS-7146:
---

Assignee: haosdent

> OSX broken due to wrong configuration of LevelDB after update.
> --
>
> Key: MESOS-7146
> URL: https://issues.apache.org/jira/browse/MESOS-7146
> Project: Mesos
>  Issue Type: Bug
>Reporter: Alexander Rojas
>Assignee: haosdent
>Priority: Blocker
>  Labels: mesosphere
>
> The commit 
> [74878e255bb099029dde2a03e0b1d22fecf16000|https://reviews.apache.org/r/51053/]
>  broke the build in OS-X. On a first run it will break with the following 
> message:
> {noformat}
> checking if clang supports -c -o file.o... yes
> checking for poll.h... yes
> checking sys/select.h usability... checking if clang supports -fno-rtti 
> -fno-exceptions... yes
> checking if clang supports -c -o file.o... yes
> (cached) yes
> checking for clang option to produce PIC... checking whether the clang linker 
> (/usr/bin/ld) supports shared libraries... -fno-common -DPIC
> checking if clang PIC flag -fno-common -DPIC works... yes
> checking dynamic linker characteristics... mkdir out-shared
> clang++ -stdlib=libc++ -nostdinc++ 
> -I/usr/local/opt/llvm@3.8/lib/llvm-3.8/include/c++/v1 
> -Wno-deprecated-declarations  -fvisibility-inlines-hidden -fcolor-diagnostics 
> -Wno-unused-local-typedef -std=c++11 -stdlib=libc++ -DGTEST_USE_OWN_TR1
> _TUPLE=1 -DGTEST_LANG_CXX11 -I. -I./include -std=c++0x  -DOS_MACOSX 
> -DLEVELDB_PLATFORM_POSIX -DLEVELDB_ATOMIC_PRESENT -stdlib=libc++ -nostdinc++ 
> -I/usr/local/opt/llvm@3.8/lib/llvm-3.8/include/c++/v1 
> -Wno-deprecated-declarations  -fvisibil
> ity-inlines-hidden -fcolor-diagnostics -Wno-unused-local-typedef -std=c++11 
> -stdlib=libc++ -DGTEST_USE_OWN_TR1_TUPLE=1 -DGTEST_LANG_CXX11 -fPIC -fPIC -c 
> db/db_bench.cc -o out-shared/db/db_bench.o
> yes
> checking sys/select.h presence... error: unable to open output file 
> 'out-shared/db/db_bench.o': 'No such file or directory'
> 1 error generated.
> make[4]: *** [out-shared/db/db_bench.o] Error 1
> make[3]: *** [leveldb-1.19/out-static/libleveldb.a] Error 2
> make[3]: *** Waiting for unfinished jobs
> {noformat}
> if one re-runs the make command, then it fails with the following error:
> {noformat}
> clang -I. -I./include -DOS_MACOSX -DLEVELDB_PLATFORM_POSIX 
> -DLEVELDB_ATOMIC_PRESENT -stdlib=libc++ -nostdinc++ 
> -I/usr/local/opt/llvm@3.8/lib/llvm-3.8/include/c++/v1 
> -Wno-deprecated-declarations  -fvisibility-inlines-hidden -fcolor-diagnostics 
> -Wno-unused-local-typedef -std=c++11 -stdlib=libc++ 
> -DGTEST_USE_OWN_TR1_TUPLE=1 -DGTEST_LANG_CXX11 -fPIC -c db/c_test.c -o 
> out-static/db/c_test.o
> error: invalid argument '-std=c++11' not allowed with 'C/ObjC'
> {noformat}
> This one indicates that the C compiler is using CXXFLAGS instead of CFLAGS or 
> that CFLAGS are being wrongly generated. Running a thir time the make command 
> throes the following output:
> {noformat}
> clang -I. -I./include -DOS_MACOSX -DLEVELDB_PLATFORM_POSIX 
> -DLEVELDB_ATOMIC_PRESENT -stdlib=libc++ -nostdinc++ 
> -I/usr/local/opt/llvm@3.8/lib/llvm-3.8/include/c++/v1 
> -Wno-deprecated-declarations  -fvisibility-inlines-hidden -fcolor-diagno$
> tics -Wno-unused-local-typedef -std=c++11 -stdlib=libc++ 
> -DGTEST_USE_OWN_TR1_TUPLE=1 -DGTEST_LANG_CXX11 -fPIC -c db/c_test.c -o 
> out-static/db/c_test.o
> clang++ -L/usr/local/opt/subversion/lib -L/usr/local/opt/openssl/lib 
> -L/usr/local/opt/libevent/lib -L/usr/local/opt/apr/libexec/lib 
> -L/usr/local/opt/llvm/lib -Wl,-rpath,/usr/local/opt/llvm/lib  
> -fcolor-diagnostics  -stdlib=libc++ -nostdi$
> c++ -I/usr/local/opt/llvm@3.8/lib/llvm-3.8/include/c++/v1 
> -Wno-deprecated-declarations  -fvisibility-inlines-hidden -fcolor-diagnostics 
> -Wno-unused-local-typedef -std=c++11 -stdlib=libc++ 
> -DGTEST_USE_OWN_TR1_TUPLE=1 -DGTEST_LANG_CXX11 -I$
>  -I./include -std=c++0x  -DOS_MACOSX -DLEVELDB_PLATFORM_POSIX 
> -DLEVELDB_ATOMIC_PRESENT -stdlib=libc++ -nostdinc++ 
> -I/usr/local/opt/llvm@3.8/lib/llvm-3.8/include/c++/v1 
> -Wno-deprecated-declarations  -fvisibility-inlines-hidden -fcolor-dia$
> nostics -Wno-unused-local-typedef -std=c++11 -stdlib=libc++ 
> -DGTEST_USE_OWN_TR1_TUPLE=1 -DGTEST_LANG_CXX11 -fPIC db/corruption_test.cc 
> out-static/db/builder.o out-static/db/c.o out-static/db/db_impl.o 
> out-static/db/db_iter.o out-static/d$
> /dbformat.o out-static/db/dumpfile.o out-static/db/filename.o 
> out-static/db/log_reader.o out-static/db/log_writer.o 
> out-static/db/memtable.o out-static/db/repair.o out-static/db/table_cache.o 
> out-static/db/version_edit.o out-static/db/ve$
> sion_set.o out-static/db/write_batch.o out-static/table/block.o 
> out-static/table/block_builder.o out-static/table/filter_block.o 
> out-static/table/format.o 

[jira] [Commented] (MESOS-6988) WebUI redirect doesn't work with stats from /metrics/snapshot

2017-02-19 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15874014#comment-15874014
 ] 

haosdent commented on MESOS-6988:
-

The differences of {{/master/state}} and {{/metrics/snapshot}} is 
{{/master/state}} is handled by {{Master}} whilte {{/metrics/snapshot}} is 
handled by {{MetricsProcess}}. And we do redirection in {{Master}} while we 
could not do redirection {{/metrics/snapshot}} (We may add a new endpoint under 
master like {{/master/metrics_snapshot}} so it could support redirection).

So [~xujyan], you prefer to remove the fallback, right? I think we could do 
that.

> WebUI redirect doesn't work with stats from /metrics/snapshot
> -
>
> Key: MESOS-6988
> URL: https://issues.apache.org/jira/browse/MESOS-6988
> Project: Mesos
>  Issue Type: Bug
>  Components: webui
>Affects Versions: 1.1.0
>Reporter: Yan Xu
>Assignee: haosdent
>
> The issue described in MESOS-6446 is still not fixed in 1.1.0. (Especially 
> for non-leading masters)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-6134) Port CFS quota support to Docker Containerizer using command executor.

2017-02-18 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15873091#comment-15873091
 ] 

haosdent commented on MESOS-6134:
-

{code}
commit 346cc8dd528a28a6e1f1cbdb4c95b8bdea2f6070
Author: Zhitao Li 
Date:   Sat Feb 18 18:22:20 2017 +0800

Made docker executor be aware of the `cgroups_enable_cfs` flag.

This fixes the cpu CFS quota setting for docker executor by ensuring
`--cpu-quota` flag when performing `docker run`.

Review: https://reviews.apache.org/r/51052/
{code}

> Port CFS quota support to Docker Containerizer using command executor.
> --
>
> Key: MESOS-6134
> URL: https://issues.apache.org/jira/browse/MESOS-6134
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, docker
>Reporter: Zhitao Li
>Assignee: Zhitao Li
>
> MESOS-2154 only partially fixed the CFS quota support in Docker 
> Containerizer: that fix only works for custom executor.
> This tracks the fix for command executor so we can declare this is complete.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-5186) mesos.interface: Allow using protobuf 3.x

2017-02-17 Thread haosdent (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent updated MESOS-5186:

Shepherd: haosdent  (was: Anand Mazumdar)

> mesos.interface: Allow using protobuf 3.x
> -
>
> Key: MESOS-5186
> URL: https://issues.apache.org/jira/browse/MESOS-5186
> Project: Mesos
>  Issue Type: Improvement
>  Components: python api
>Reporter: Myautsai PAN
>Assignee: Anthony Sottile
>  Labels: protobuf, python
>
> We're working on integrating TensorFlow(https://www.tensorflow.org) with 
> mesos. Both the two require {{protobuf}}. The python package 
> {{mesos.interface}} requires {{protobuf>=2.6.1,<3}}, but {{tensorflow}} 
> requires {{protobuf>=3.0.0}} . Though protobuf 3.x is not compatible with 
> protobuf 2.x, but anyway we modify the {{setup.py}} 
> (https://github.com/apache/mesos/blob/66cddaf/src/python/interface/setup.py.in#L29)
> from {{'install_requires': [ 'google-common>=0.0.1', 'protobuf>=2.6.1,<3' 
> ],}} to {{'install_requires': [ 'google-common>=0.0.1', 'protobuf>=2.6.1' ],}}
> It works fine. Would you please consider support protobuf 3.x officially in 
> the next release? Maybe just remove the {{,<3}} restriction is enough.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (MESOS-7140) Provide a default framework implementation

2017-02-17 Thread haosdent (JIRA)
haosdent created MESOS-7140:
---

 Summary: Provide a default framework implementation
 Key: MESOS-7140
 URL: https://issues.apache.org/jira/browse/MESOS-7140
 Project: Mesos
  Issue Type: Wish
Reporter: haosdent
Priority: Minor


Now we have a lot of example frameworks for different features, it would be 
nice if we could provide the simplest default framework in Mesos to provide all 
of these features. And it could merge the functions of mesos-execute as well.





--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-7049) CgroupsAnyHierarchyWithPerfEventTest.ROOT_CGROUPS_PERF_PerfTest is broken on Fedora 25.

2017-02-15 Thread haosdent (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent updated MESOS-7049:

Shepherd: haosdent

> CgroupsAnyHierarchyWithPerfEventTest.ROOT_CGROUPS_PERF_PerfTest is broken on 
> Fedora 25.
> ---
>
> Key: MESOS-7049
> URL: https://issues.apache.org/jira/browse/MESOS-7049
> Project: Mesos
>  Issue Type: Bug
>  Components: isolation, tests
>Reporter: James Peach
>Assignee: James Peach
>
> *Test output:*
> {noformat}
> [==] Running 1 test from 1 test case.
> [--] Global test environment set-up.
> [--] 1 test from CgroupsAnyHierarchyWithPerfEventTest
> [ RUN  ] CgroupsAnyHierarchyWithPerfEventTest.ROOT_CGROUPS_PERF_PerfTest
> ../../src/tests/containerizer/cgroups_tests.cpp:1020: Failure
> (statistics).failure(): Failed to parse perf sample: Failed to parse perf 
> sample line '6186960975,,cycles,mesos_test,2000511515,100.00,3.093,GHz': 
> Unexpected number of fields
> ../../src/tests/containerizer/cgroups_tests.cpp:193: Failure
> (cgroups::destroy(hierarchy, cgroup)).failure(): Failed to remove cgroup 
> '/sys/fs/cgroup/perf_event/mesos_test': Device or resource busy
> [  FAILED  ] CgroupsAnyHierarchyWithPerfEventTest.ROOT_CGROUPS_PERF_PerfTest 
> (2123 ms)
> [--] 1 test from CgroupsAnyHierarchyWithPerfEventTest (2123 ms total)
> [--] Global test environment tear-down
> ../../src/tests/environment.cpp:836: Failure
> Failed
> Tests completed with child processes remaining:
> -+- 20455 /home/jpeach/upstream/mesos/build/src/.libs/mesos-tests --verbose 
> --gtest_filter=CgroupsAnyHierarchyWithPerfEventTest.ROOT_CGROUPS_PERF_PerfTest
>  \--- 20500 /home/jpeach/upstream/mesos/build/src/.libs/mesos-tests --verbose 
> --gtest_filter=CgroupsAnyHierarchyWithPerfEventTest.ROOT_CGROUPS_PERF_PerfTest
> [==] 1 test from 1 test case ran. (2141 ms total)
> [  PASSED  ] 0 tests.
> [  FAILED  ] 1 test, listed below:
> [  FAILED  ] CgroupsAnyHierarchyWithPerfEventTest.ROOT_CGROUPS_PERF_PerfTest
> {noformat}
> *Software versions:*
> {noformat}
> [jpeach@jpeach src]$ uname -a
> Linux jpeach.apple.com 4.9.6-200.fc25.x86_64 #1 SMP Thu Jan 26 10:17:45 UTC 
> 2017 x86_64 x86_64 x86_64 GNU/Linux
> [jpeach@jpeach src]$ perf -v
> perf version 4.9.6.200.fc25.x86_64.g51a0
> [jpeach@jpeach src]$ cat /etc/os-release
> NAME=Fedora
> VERSION="25 (Workstation Edition)"
> ID=fedora
> VERSION_ID=25
> PRETTY_NAME="Fedora 25 (Workstation Edition)"
> ANSI_COLOR="0;34"
> CPE_NAME="cpe:/o:fedoraproject:fedora:25"
> HOME_URL="https://fedoraproject.org/;
> BUG_REPORT_URL="https://bugzilla.redhat.com/;
> REDHAT_BUGZILLA_PRODUCT="Fedora"
> REDHAT_BUGZILLA_PRODUCT_VERSION=25
> REDHAT_SUPPORT_PRODUCT="Fedora"
> REDHAT_SUPPORT_PRODUCT_VERSION=25
> PRIVACY_POLICY_URL=https://fedoraproject.org/wiki/Legal:PrivacyPolicy
> VARIANT="Workstation Edition"
> VARIANT_ID=workstation
> {noformat}
> The test then fails to clean up, leaving stale processes and cgroups.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-6988) WebUI redirect doesn't work with stats from /metrics/snapshot

2017-02-14 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15867010#comment-15867010
 ] 

haosdent commented on MESOS-6988:
-

Does this cause any problems in your cluster?

> WebUI redirect doesn't work with stats from /metrics/snapshot
> -
>
> Key: MESOS-6988
> URL: https://issues.apache.org/jira/browse/MESOS-6988
> Project: Mesos
>  Issue Type: Bug
>  Components: webui
>Affects Versions: 1.1.0
>Reporter: Yan Xu
>Assignee: haosdent
>
> The issue described in MESOS-6446 is still not fixed in 1.1.0. (Especially 
> for non-leading masters)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-6988) WebUI redirect doesn't work with stats from /metrics/snapshot

2017-02-14 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15867009#comment-15867009
 ] 

haosdent commented on MESOS-6988:
-

hi, [~xujyan] yep, it would fall back to current following master address. But 
it would be updated to the leading master when {{$scope.state}} contains the 
leading master information. We query this periodically.

> WebUI redirect doesn't work with stats from /metrics/snapshot
> -
>
> Key: MESOS-6988
> URL: https://issues.apache.org/jira/browse/MESOS-6988
> Project: Mesos
>  Issue Type: Bug
>  Components: webui
>Affects Versions: 1.1.0
>Reporter: Yan Xu
>Assignee: haosdent
>
> The issue described in MESOS-6446 is still not fixed in 1.1.0. (Especially 
> for non-leading masters)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-7094) Slave not displaying correctly in the Mesos Web UI

2017-02-09 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15860411#comment-15860411
 ] 

haosdent commented on MESOS-7094:
-

[~anandmazumdar] As chated with [~dumont.devops] at slack, it should be the 
network issue in [~dumont.devops]'s environment, let me close this.

{code}
haosdent huang [10:43 PM] 
so you could ‘curl http://fe12.intern.dumontnet.de:5051/slave(1)/state’ in 
command line?

[10:43]  
how about open ‘http://fe12.intern.dumontnet.de:5051/slave(1)/state’  in 
browser directly?

[10:43]  
what’s its response


deem APP [10:44 PM] 
works fine from commandline. but times out in browser... i'm smelling some 
network related issues here :/

haosdent huang [10:45 PM] 
do you use any proxy and it cause the differences between console and browser?


deem APP [10:45 PM] 
no proxy

[10:45]  
but i'm in a different network here than the mesos hosts are

[10:46]  
so it seems like it's not a bug on mesos side but with the network in the 
office here

[10:46]  
allthough there could be a error message in web ui instead of just displaying 
the master properties
{code}

> Slave not displaying correctly in the Mesos Web UI
> --
>
> Key: MESOS-7094
> URL: https://issues.apache.org/jira/browse/MESOS-7094
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.28.1
>Reporter: DuMont DevOps
>Priority: Minor
>  Labels: webui
> Attachments: mesos-slaves-chrome-console.png, mesos-webui.png
>
>
> We're currently experiencing issues with mesos web ui.
> We added recently 2 new nodes to the cluster, which are now shown in mesos 
> web ui.
> If we click on one of the new nodes, we get into the slave overview. Instead 
> of showing the slave stats, we see stats for the master node. (see attached 
> picture) 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (MESOS-7094) Slave not displaying correctly in the Mesos Web UI

2017-02-09 Thread haosdent (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent reassigned MESOS-7094:
---

Assignee: haosdent

> Slave not displaying correctly in the Mesos Web UI
> --
>
> Key: MESOS-7094
> URL: https://issues.apache.org/jira/browse/MESOS-7094
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.28.1
>Reporter: DuMont DevOps
>Assignee: haosdent
>Priority: Minor
>  Labels: webui
> Attachments: mesos-slaves-chrome-console.png, mesos-webui.png
>
>
> We're currently experiencing issues with mesos web ui.
> We added recently 2 new nodes to the cluster, which are now shown in mesos 
> web ui.
> If we click on one of the new nodes, we get into the slave overview. Instead 
> of showing the slave stats, we see stats for the master node. (see attached 
> picture) 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-7083) No master is currently leading

2017-02-08 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15858360#comment-15858360
 ] 

haosdent commented on MESOS-7083:
-

[~hemanthmakaraju] may you confirm if this duplicated with MESOS-6624 which 
fixed in 1.1.1?

> No master is currently leading
> --
>
> Key: MESOS-7083
> URL: https://issues.apache.org/jira/browse/MESOS-7083
> Project: Mesos
>  Issue Type: Bug
>  Components: master, webui
>Affects Versions: 1.1.0
>Reporter: hemanth makaraju
>Assignee: haosdent
>
> when i run http://127.0.0.1:5050 on web-browser i see "No master is currently 
> leading" but mesos resolve command detected master
> mesos-resolve zk://172.17.0.2:2181/mesos
> I0208 11:17:33.489379 24715 zookeeper.cpp:259] A new leading master 
> (UPID=master@127.0.0.1:5050) is detected
> this is the command i used to run mesos-master
> mesos-master --zk=zk://127.0.0.1:2181/mesos --quorum=1 
> --advertise_ip=127.0.0.1 --advertise_port=5050 --work_dir=/mesos/master



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (MESOS-7083) No master is currently leading

2017-02-08 Thread haosdent (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent reassigned MESOS-7083:
---

Assignee: haosdent

> No master is currently leading
> --
>
> Key: MESOS-7083
> URL: https://issues.apache.org/jira/browse/MESOS-7083
> Project: Mesos
>  Issue Type: Bug
>  Components: master, webui
>Affects Versions: 1.1.0
>Reporter: hemanth makaraju
>Assignee: haosdent
>
> when i run http://127.0.0.1:5050 on web-browser i see "No master is currently 
> leading" but mesos resolve command detected master
> mesos-resolve zk://172.17.0.2:2181/mesos
> I0208 11:17:33.489379 24715 zookeeper.cpp:259] A new leading master 
> (UPID=master@127.0.0.1:5050) is detected
> this is the command i used to run mesos-master
> mesos-master --zk=zk://127.0.0.1:2181/mesos --quorum=1 
> --advertise_ip=127.0.0.1 --advertise_port=5050 --work_dir=/mesos/master



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-6624) Master WebUI does not work on Firefox 45

2017-02-06 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15855172#comment-15855172
 ] 

haosdent commented on MESOS-6624:
-

Backport to 1.1.1
{code}
commit 467f0c5def23c838842598ab6720796cf17405eb
Author: Haosdent Huang 
Date:   Tue Feb 7 10:21:36 2017 +0800

Added MESOS-6624 to CHANGELOG for 1.1.1.
{code}

> Master WebUI does not work on Firefox 45
> 
>
> Key: MESOS-6624
> URL: https://issues.apache.org/jira/browse/MESOS-6624
> Project: Mesos
>  Issue Type: Bug
>  Components: webui
>Affects Versions: 1.1.0
>Reporter: Adam Cecile
>Assignee: haosdent
> Fix For: 1.1.1, 1.2.0
>
>
> Hello,
> I only see the "No master leading" message which is obvisouly wrong because 
> the API just work as expected. Switching to another browser make it works 
> again.
> In Firefox console I can see the following error:
> {quote}
> SyntaxError: in strict mode code, functions may be declared only at top level 
> or immediately within another function controllers.js:845:19
> "Error: [ng:areq] 
> http://errors.angularjs.org/1.2.3/ng/areq?p0=MainCntl=not%20a%20function%2C%20got%20undefined
> A/<@http://zelda.service.domain.com:5050/static/js/angular-1.2.3.min.js:6:449
> tb@http://zelda.service.domain.com:5050/static/js/angular-1.2.3.min.js:18:250
> Oa@http://zelda.service.domain.com:5050/static/js/angular-1.2.3.min.js:18:337
> hd/this.$gethttp://zelda.service.domain.com:5050/static/js/angular-1.2.3.min.js:62:96
> Q/<@http://zelda.service.domain.com:5050/static/js/angular-1.2.3.min.js:49:117
> q@http://zelda.service.domain.com:5050/static/js/angular-1.2.3.min.js:7:359
> Q@http://zelda.service.domain.com:5050/static/js/angular-1.2.3.min.js:48:1
> f@http://zelda.service.domain.com:5050/static/js/angular-1.2.3.min.js:43:24
> f@http://zelda.service.domain.com:5050/static/js/angular-1.2.3.min.js:43:1
> f@http://zelda.service.domain.com:5050/static/js/angular-1.2.3.min.js:43:1
> y/<@http://zelda.service.domain.com:5050/static/js/angular-1.2.3.min.js:42:180
> Xb/c/http://zelda.service.domain.com:5050/static/js/angular-1.2.3.min.js:17:455
> xd/this.$gethttp://zelda.service.domain.com:5050/static/js/angular-1.2.3.min.js:101:35
> xd/this.$gethttp://zelda.service.domain.com:5050/static/js/angular-1.2.3.min.js:101:312
> Xb/c/<@http://zelda.service.domain.com:5050/static/js/angular-1.2.3.min.js:17:413
> d@http://zelda.service.domain.com:5050/static/js/angular-1.2.3.min.js:30:328
> Xb/c@http://zelda.service.domain.com:5050/static/js/angular-1.2.3.min.js:17:321
> Xb@http://zelda.service.domain.com:5050/static/js/angular-1.2.3.min.js:18:30
> Rc@http://zelda.service.domain.com:5050/static/js/angular-1.2.3.min.js:17:99
> @http://zelda.service.domain.com:5050/static/js/angular-1.2.3.min.js:199:1
> f.Callbacks/n@http://zelda.service.domain.com:5050/static/js/jquery-1.7.1.min.js:2:14779
> f.Callbacks/o.fireWith@http://zelda.service.domain.com:5050/static/js/jquery-1.7.1.min.js:2:15553
> fhttp://zelda.service.domain.com:5050/static/js/jquery-1.7.1.min.js:2:9771
> fhttp://zelda.service.domain.com:5050/static/js/jquery-1.7.1.min.js:2:14346
> "
> {quote}
> It used to work just fine with 1.0.x and I think it matters because Firefox 
> 45 is Debian stable browser so there's plenty of users.
> Best regards, Adam.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-6624) Master WebUI does not work on Firefox 45

2017-02-06 Thread haosdent (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent updated MESOS-6624:

Fix Version/s: 1.1.1

> Master WebUI does not work on Firefox 45
> 
>
> Key: MESOS-6624
> URL: https://issues.apache.org/jira/browse/MESOS-6624
> Project: Mesos
>  Issue Type: Bug
>  Components: webui
>Affects Versions: 1.1.0
>Reporter: Adam Cecile
>Assignee: haosdent
> Fix For: 1.1.1, 1.2.0
>
>
> Hello,
> I only see the "No master leading" message which is obvisouly wrong because 
> the API just work as expected. Switching to another browser make it works 
> again.
> In Firefox console I can see the following error:
> {quote}
> SyntaxError: in strict mode code, functions may be declared only at top level 
> or immediately within another function controllers.js:845:19
> "Error: [ng:areq] 
> http://errors.angularjs.org/1.2.3/ng/areq?p0=MainCntl=not%20a%20function%2C%20got%20undefined
> A/<@http://zelda.service.domain.com:5050/static/js/angular-1.2.3.min.js:6:449
> tb@http://zelda.service.domain.com:5050/static/js/angular-1.2.3.min.js:18:250
> Oa@http://zelda.service.domain.com:5050/static/js/angular-1.2.3.min.js:18:337
> hd/this.$gethttp://zelda.service.domain.com:5050/static/js/angular-1.2.3.min.js:62:96
> Q/<@http://zelda.service.domain.com:5050/static/js/angular-1.2.3.min.js:49:117
> q@http://zelda.service.domain.com:5050/static/js/angular-1.2.3.min.js:7:359
> Q@http://zelda.service.domain.com:5050/static/js/angular-1.2.3.min.js:48:1
> f@http://zelda.service.domain.com:5050/static/js/angular-1.2.3.min.js:43:24
> f@http://zelda.service.domain.com:5050/static/js/angular-1.2.3.min.js:43:1
> f@http://zelda.service.domain.com:5050/static/js/angular-1.2.3.min.js:43:1
> y/<@http://zelda.service.domain.com:5050/static/js/angular-1.2.3.min.js:42:180
> Xb/c/http://zelda.service.domain.com:5050/static/js/angular-1.2.3.min.js:17:455
> xd/this.$gethttp://zelda.service.domain.com:5050/static/js/angular-1.2.3.min.js:101:35
> xd/this.$gethttp://zelda.service.domain.com:5050/static/js/angular-1.2.3.min.js:101:312
> Xb/c/<@http://zelda.service.domain.com:5050/static/js/angular-1.2.3.min.js:17:413
> d@http://zelda.service.domain.com:5050/static/js/angular-1.2.3.min.js:30:328
> Xb/c@http://zelda.service.domain.com:5050/static/js/angular-1.2.3.min.js:17:321
> Xb@http://zelda.service.domain.com:5050/static/js/angular-1.2.3.min.js:18:30
> Rc@http://zelda.service.domain.com:5050/static/js/angular-1.2.3.min.js:17:99
> @http://zelda.service.domain.com:5050/static/js/angular-1.2.3.min.js:199:1
> f.Callbacks/n@http://zelda.service.domain.com:5050/static/js/jquery-1.7.1.min.js:2:14779
> f.Callbacks/o.fireWith@http://zelda.service.domain.com:5050/static/js/jquery-1.7.1.min.js:2:15553
> fhttp://zelda.service.domain.com:5050/static/js/jquery-1.7.1.min.js:2:9771
> fhttp://zelda.service.domain.com:5050/static/js/jquery-1.7.1.min.js:2:14346
> "
> {quote}
> It used to work just fine with 1.0.x and I think it matters because Firefox 
> 45 is Debian stable browser so there's plenty of users.
> Best regards, Adam.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-6624) Master WebUI does not work on Firefox 45

2017-02-06 Thread haosdent (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent updated MESOS-6624:

Summary: Master WebUI does not work on Firefox 45  (was: Master web 
interface does not work on Firefox 45)

> Master WebUI does not work on Firefox 45
> 
>
> Key: MESOS-6624
> URL: https://issues.apache.org/jira/browse/MESOS-6624
> Project: Mesos
>  Issue Type: Bug
>  Components: webui
>Affects Versions: 1.1.0
>Reporter: Adam Cecile
>Assignee: haosdent
> Fix For: 1.2.0
>
>
> Hello,
> I only see the "No master leading" message which is obvisouly wrong because 
> the API just work as expected. Switching to another browser make it works 
> again.
> In Firefox console I can see the following error:
> {quote}
> SyntaxError: in strict mode code, functions may be declared only at top level 
> or immediately within another function controllers.js:845:19
> "Error: [ng:areq] 
> http://errors.angularjs.org/1.2.3/ng/areq?p0=MainCntl=not%20a%20function%2C%20got%20undefined
> A/<@http://zelda.service.domain.com:5050/static/js/angular-1.2.3.min.js:6:449
> tb@http://zelda.service.domain.com:5050/static/js/angular-1.2.3.min.js:18:250
> Oa@http://zelda.service.domain.com:5050/static/js/angular-1.2.3.min.js:18:337
> hd/this.$gethttp://zelda.service.domain.com:5050/static/js/angular-1.2.3.min.js:62:96
> Q/<@http://zelda.service.domain.com:5050/static/js/angular-1.2.3.min.js:49:117
> q@http://zelda.service.domain.com:5050/static/js/angular-1.2.3.min.js:7:359
> Q@http://zelda.service.domain.com:5050/static/js/angular-1.2.3.min.js:48:1
> f@http://zelda.service.domain.com:5050/static/js/angular-1.2.3.min.js:43:24
> f@http://zelda.service.domain.com:5050/static/js/angular-1.2.3.min.js:43:1
> f@http://zelda.service.domain.com:5050/static/js/angular-1.2.3.min.js:43:1
> y/<@http://zelda.service.domain.com:5050/static/js/angular-1.2.3.min.js:42:180
> Xb/c/http://zelda.service.domain.com:5050/static/js/angular-1.2.3.min.js:17:455
> xd/this.$gethttp://zelda.service.domain.com:5050/static/js/angular-1.2.3.min.js:101:35
> xd/this.$gethttp://zelda.service.domain.com:5050/static/js/angular-1.2.3.min.js:101:312
> Xb/c/<@http://zelda.service.domain.com:5050/static/js/angular-1.2.3.min.js:17:413
> d@http://zelda.service.domain.com:5050/static/js/angular-1.2.3.min.js:30:328
> Xb/c@http://zelda.service.domain.com:5050/static/js/angular-1.2.3.min.js:17:321
> Xb@http://zelda.service.domain.com:5050/static/js/angular-1.2.3.min.js:18:30
> Rc@http://zelda.service.domain.com:5050/static/js/angular-1.2.3.min.js:17:99
> @http://zelda.service.domain.com:5050/static/js/angular-1.2.3.min.js:199:1
> f.Callbacks/n@http://zelda.service.domain.com:5050/static/js/jquery-1.7.1.min.js:2:14779
> f.Callbacks/o.fireWith@http://zelda.service.domain.com:5050/static/js/jquery-1.7.1.min.js:2:15553
> fhttp://zelda.service.domain.com:5050/static/js/jquery-1.7.1.min.js:2:9771
> fhttp://zelda.service.domain.com:5050/static/js/jquery-1.7.1.min.js:2:14346
> "
> {quote}
> It used to work just fine with 1.0.x and I think it matters because Firefox 
> 45 is Debian stable browser so there's plenty of users.
> Best regards, Adam.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-6624) Master web interface does not work on Firefox 45

2017-02-06 Thread haosdent (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent updated MESOS-6624:

Summary: Master web interface does not work on Firefox 45  (was: Master web 
interface does not work on Firefox 45 (angular js issue))

> Master web interface does not work on Firefox 45
> 
>
> Key: MESOS-6624
> URL: https://issues.apache.org/jira/browse/MESOS-6624
> Project: Mesos
>  Issue Type: Bug
>  Components: webui
>Affects Versions: 1.1.0
>Reporter: Adam Cecile
>Assignee: haosdent
> Fix For: 1.2.0
>
>
> Hello,
> I only see the "No master leading" message which is obvisouly wrong because 
> the API just work as expected. Switching to another browser make it works 
> again.
> In Firefox console I can see the following error:
> {quote}
> SyntaxError: in strict mode code, functions may be declared only at top level 
> or immediately within another function controllers.js:845:19
> "Error: [ng:areq] 
> http://errors.angularjs.org/1.2.3/ng/areq?p0=MainCntl=not%20a%20function%2C%20got%20undefined
> A/<@http://zelda.service.domain.com:5050/static/js/angular-1.2.3.min.js:6:449
> tb@http://zelda.service.domain.com:5050/static/js/angular-1.2.3.min.js:18:250
> Oa@http://zelda.service.domain.com:5050/static/js/angular-1.2.3.min.js:18:337
> hd/this.$gethttp://zelda.service.domain.com:5050/static/js/angular-1.2.3.min.js:62:96
> Q/<@http://zelda.service.domain.com:5050/static/js/angular-1.2.3.min.js:49:117
> q@http://zelda.service.domain.com:5050/static/js/angular-1.2.3.min.js:7:359
> Q@http://zelda.service.domain.com:5050/static/js/angular-1.2.3.min.js:48:1
> f@http://zelda.service.domain.com:5050/static/js/angular-1.2.3.min.js:43:24
> f@http://zelda.service.domain.com:5050/static/js/angular-1.2.3.min.js:43:1
> f@http://zelda.service.domain.com:5050/static/js/angular-1.2.3.min.js:43:1
> y/<@http://zelda.service.domain.com:5050/static/js/angular-1.2.3.min.js:42:180
> Xb/c/http://zelda.service.domain.com:5050/static/js/angular-1.2.3.min.js:17:455
> xd/this.$gethttp://zelda.service.domain.com:5050/static/js/angular-1.2.3.min.js:101:35
> xd/this.$gethttp://zelda.service.domain.com:5050/static/js/angular-1.2.3.min.js:101:312
> Xb/c/<@http://zelda.service.domain.com:5050/static/js/angular-1.2.3.min.js:17:413
> d@http://zelda.service.domain.com:5050/static/js/angular-1.2.3.min.js:30:328
> Xb/c@http://zelda.service.domain.com:5050/static/js/angular-1.2.3.min.js:17:321
> Xb@http://zelda.service.domain.com:5050/static/js/angular-1.2.3.min.js:18:30
> Rc@http://zelda.service.domain.com:5050/static/js/angular-1.2.3.min.js:17:99
> @http://zelda.service.domain.com:5050/static/js/angular-1.2.3.min.js:199:1
> f.Callbacks/n@http://zelda.service.domain.com:5050/static/js/jquery-1.7.1.min.js:2:14779
> f.Callbacks/o.fireWith@http://zelda.service.domain.com:5050/static/js/jquery-1.7.1.min.js:2:15553
> fhttp://zelda.service.domain.com:5050/static/js/jquery-1.7.1.min.js:2:9771
> fhttp://zelda.service.domain.com:5050/static/js/jquery-1.7.1.min.js:2:14346
> "
> {quote}
> It used to work just fine with 1.0.x and I think it matters because Firefox 
> 45 is Debian stable browser so there's plenty of users.
> Best regards, Adam.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-5186) mesos.interface: Allow using protobuf 3.x

2017-02-06 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15854363#comment-15854363
 ] 

haosdent commented on MESOS-5186:
-

1.2.0 would be cut in this week. Because this is not a minor change, actually 
it is a change associated with protocol and compatibility. Let's don't review 
and ship in a hurry, otherwise it would affect others production environment. 
We may discuss if could cherry pick it to 1.2.1 after 1.2.0 release.

> mesos.interface: Allow using protobuf 3.x
> -
>
> Key: MESOS-5186
> URL: https://issues.apache.org/jira/browse/MESOS-5186
> Project: Mesos
>  Issue Type: Improvement
>  Components: python api
>Reporter: Myautsai PAN
>Assignee: Yong Tang
>  Labels: protobuf, python
>
> We're working on integrating TensorFlow(https://www.tensorflow.org) with 
> mesos. Both the two require {{protobuf}}. The python package 
> {{mesos.interface}} requires {{protobuf>=2.6.1,<3}}, but {{tensorflow}} 
> requires {{protobuf>=3.0.0}} . Though protobuf 3.x is not compatible with 
> protobuf 2.x, but anyway we modify the {{setup.py}} 
> (https://github.com/apache/mesos/blob/66cddaf/src/python/interface/setup.py.in#L29)
> from {{'install_requires': [ 'google-common>=0.0.1', 'protobuf>=2.6.1,<3' 
> ],}} to {{'install_requires': [ 'google-common>=0.0.1', 'protobuf>=2.6.1' ],}}
> It works fine. Would you please consider support protobuf 3.x officially in 
> the next release? Maybe just remove the {{,<3}} restriction is enough.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-5186) mesos.interface: Allow using protobuf 3.x

2017-02-06 Thread haosdent (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent updated MESOS-5186:

Labels: protobuf python  (was: protobuf)

> mesos.interface: Allow using protobuf 3.x
> -
>
> Key: MESOS-5186
> URL: https://issues.apache.org/jira/browse/MESOS-5186
> Project: Mesos
>  Issue Type: Improvement
>  Components: python api
>Reporter: Myautsai PAN
>Assignee: Yong Tang
>  Labels: protobuf, python
>
> We're working on integrating TensorFlow(https://www.tensorflow.org) with 
> mesos. Both the two require {{protobuf}}. The python package 
> {{mesos.interface}} requires {{protobuf>=2.6.1,<3}}, but {{tensorflow}} 
> requires {{protobuf>=3.0.0}} . Though protobuf 3.x is not compatible with 
> protobuf 2.x, but anyway we modify the {{setup.py}} 
> (https://github.com/apache/mesos/blob/66cddaf/src/python/interface/setup.py.in#L29)
> from {{'install_requires': [ 'google-common>=0.0.1', 'protobuf>=2.6.1,<3' 
> ],}} to {{'install_requires': [ 'google-common>=0.0.1', 'protobuf>=2.6.1' ],}}
> It works fine. Would you please consider support protobuf 3.x officially in 
> the next release? Maybe just remove the {{,<3}} restriction is enough.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-5186) mesos.interface: Allow using protobuf 3.x

2017-02-06 Thread haosdent (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent updated MESOS-5186:

 Flags:   (was: Patch)
  Target Version/s: 1.3.0
Labels: protobuf  (was: )
Remaining Estimate: (was: 504h)
 Original Estimate: (was: 504h)

> mesos.interface: Allow using protobuf 3.x
> -
>
> Key: MESOS-5186
> URL: https://issues.apache.org/jira/browse/MESOS-5186
> Project: Mesos
>  Issue Type: Improvement
>  Components: python api
>Reporter: Myautsai PAN
>Assignee: Yong Tang
>  Labels: protobuf, python
>
> We're working on integrating TensorFlow(https://www.tensorflow.org) with 
> mesos. Both the two require {{protobuf}}. The python package 
> {{mesos.interface}} requires {{protobuf>=2.6.1,<3}}, but {{tensorflow}} 
> requires {{protobuf>=3.0.0}} . Though protobuf 3.x is not compatible with 
> protobuf 2.x, but anyway we modify the {{setup.py}} 
> (https://github.com/apache/mesos/blob/66cddaf/src/python/interface/setup.py.in#L29)
> from {{'install_requires': [ 'google-common>=0.0.1', 'protobuf>=2.6.1,<3' 
> ],}} to {{'install_requires': [ 'google-common>=0.0.1', 'protobuf>=2.6.1' ],}}
> It works fine. Would you please consider support protobuf 3.x officially in 
> the next release? Maybe just remove the {{,<3}} restriction is enough.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-5186) mesos.interface: Allow using protobuf 3.x

2017-02-06 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15854338#comment-15854338
 ] 

haosdent commented on MESOS-5186:
-

[~asottileyelp] Thanks a lot for your answer. I test with a python scheduler 
with protobuf3 while the Mesos compile with protobuf2. It works for me. 
[Feng Xiao|https://github.com/xfxyjwf] mentioned in 
https://groups.google.com/forum/#!topic/protobuf/wAqvtPLBsE8
{quote}
Proto2 and proto3 are wire compatible. The same construct in proto2 and proto3 
will have the same binary representation. If your proto only uses features 
available in both proto2 and proto3, systems built with proto2 should be able 
to communicate with systems built with proto3 without any problem (it's also 
true vice versa).
{quote}
at https://groups.google.com/forum/#!topic/protobuf/wAqvtPLBsE8

Since 1.2.0 would be cut shortly, how about put off this to 1.3.0 and let's 
continue discuss and review it after 1.2.0 released.

> mesos.interface: Allow using protobuf 3.x
> -
>
> Key: MESOS-5186
> URL: https://issues.apache.org/jira/browse/MESOS-5186
> Project: Mesos
>  Issue Type: Improvement
>  Components: python api
>Reporter: Myautsai PAN
>Assignee: Yong Tang
>Priority: Minor
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> We're working on integrating TensorFlow(https://www.tensorflow.org) with 
> mesos. Both the two require {{protobuf}}. The python package 
> {{mesos.interface}} requires {{protobuf>=2.6.1,<3}}, but {{tensorflow}} 
> requires {{protobuf>=3.0.0}} . Though protobuf 3.x is not compatible with 
> protobuf 2.x, but anyway we modify the {{setup.py}} 
> (https://github.com/apache/mesos/blob/66cddaf/src/python/interface/setup.py.in#L29)
> from {{'install_requires': [ 'google-common>=0.0.1', 'protobuf>=2.6.1,<3' 
> ],}} to {{'install_requires': [ 'google-common>=0.0.1', 'protobuf>=2.6.1' ],}}
> It works fine. Would you please consider support protobuf 3.x officially in 
> the next release? Maybe just remove the {{,<3}} restriction is enough.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-5186) mesos.interface: Allow using protobuf 3.x

2017-02-06 Thread haosdent (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent updated MESOS-5186:

Priority: Major  (was: Minor)

> mesos.interface: Allow using protobuf 3.x
> -
>
> Key: MESOS-5186
> URL: https://issues.apache.org/jira/browse/MESOS-5186
> Project: Mesos
>  Issue Type: Improvement
>  Components: python api
>Reporter: Myautsai PAN
>Assignee: Yong Tang
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> We're working on integrating TensorFlow(https://www.tensorflow.org) with 
> mesos. Both the two require {{protobuf}}. The python package 
> {{mesos.interface}} requires {{protobuf>=2.6.1,<3}}, but {{tensorflow}} 
> requires {{protobuf>=3.0.0}} . Though protobuf 3.x is not compatible with 
> protobuf 2.x, but anyway we modify the {{setup.py}} 
> (https://github.com/apache/mesos/blob/66cddaf/src/python/interface/setup.py.in#L29)
> from {{'install_requires': [ 'google-common>=0.0.1', 'protobuf>=2.6.1,<3' 
> ],}} to {{'install_requires': [ 'google-common>=0.0.1', 'protobuf>=2.6.1' ],}}
> It works fine. Would you please consider support protobuf 3.x officially in 
> the next release? Maybe just remove the {{,<3}} restriction is enough.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-5186) mesos.interface: Allow using protobuf 3.x

2017-02-06 Thread haosdent (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-5186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent updated MESOS-5186:

Labels:   (was: easyfix)

> mesos.interface: Allow using protobuf 3.x
> -
>
> Key: MESOS-5186
> URL: https://issues.apache.org/jira/browse/MESOS-5186
> Project: Mesos
>  Issue Type: Improvement
>  Components: python api
>Reporter: Myautsai PAN
>Assignee: Yong Tang
>Priority: Minor
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> We're working on integrating TensorFlow(https://www.tensorflow.org) with 
> mesos. Both the two require {{protobuf}}. The python package 
> {{mesos.interface}} requires {{protobuf>=2.6.1,<3}}, but {{tensorflow}} 
> requires {{protobuf>=3.0.0}} . Though protobuf 3.x is not compatible with 
> protobuf 2.x, but anyway we modify the {{setup.py}} 
> (https://github.com/apache/mesos/blob/66cddaf/src/python/interface/setup.py.in#L29)
> from {{'install_requires': [ 'google-common>=0.0.1', 'protobuf>=2.6.1,<3' 
> ],}} to {{'install_requires': [ 'google-common>=0.0.1', 'protobuf>=2.6.1' ],}}
> It works fine. Would you please consider support protobuf 3.x officially in 
> the next release? Maybe just remove the {{,<3}} restriction is enough.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-7006) Launch docker containers with --cpus instead of cpu-shares

2017-02-06 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15854007#comment-15854007
 ] 

haosdent commented on MESOS-7006:
-

yep, {{cfs_period_us}} is 100ms in Mesos as well, this is the default value of 
Linux actually.

> Launch docker containers with --cpus instead of cpu-shares
> --
>
> Key: MESOS-7006
> URL: https://issues.apache.org/jira/browse/MESOS-7006
> Project: Mesos
>  Issue Type: Improvement
>Affects Versions: 1.1.0
>Reporter: Craig W
>Assignee: Tomasz Janiszewski
>Priority: Minor
>
> docker 1.13 was recently released and it now has a new --cpus flag which 
> allows a user to specify how many cpus a container should have. This is much 
> simpler for users to reason about.
> mesos should switch to starting a container with --cpus instead of 
> --cpu-shares, or at least make it configurable.
> https://blog.docker.com/2017/01/cpu-management-docker-1-13/



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-7006) Launch docker containers with --cpus instead of cpu-shares

2017-02-06 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15853925#comment-15853925
 ] 

haosdent commented on MESOS-7006:
-

Refer to https://docs.docker.com/engine/reference/run/#/cpu-period-constraint , 
it related to CFS. {{cpus=0.5}} is equal to set {{cpu-quota=25000}} and 
{{cpu-period=5}}

> Launch docker containers with --cpus instead of cpu-shares
> --
>
> Key: MESOS-7006
> URL: https://issues.apache.org/jira/browse/MESOS-7006
> Project: Mesos
>  Issue Type: Improvement
>Affects Versions: 1.1.0
>Reporter: Craig W
>Assignee: Tomasz Janiszewski
>Priority: Minor
>
> docker 1.13 was recently released and it now has a new --cpus flag which 
> allows a user to specify how many cpus a container should have. This is much 
> simpler for users to reason about.
> mesos should switch to starting a container with --cpus instead of 
> --cpu-shares, or at least make it configurable.
> https://blog.docker.com/2017/01/cpu-management-docker-1-13/



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-7006) Launch docker containers with --cpus instead of cpu-shares

2017-02-06 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15853868#comment-15853868
 ] 

haosdent commented on MESOS-7006:
-

It should be duplicated because if we don't {{enable_cfs}} in agent flags, 
docker should use {{cpu-shares}}. If dropping {{cpu-shares}} and use {{cpus}}. 
This is incorrect because the cpu limitation on docker containers are not 
relative limitation.

> Launch docker containers with --cpus instead of cpu-shares
> --
>
> Key: MESOS-7006
> URL: https://issues.apache.org/jira/browse/MESOS-7006
> Project: Mesos
>  Issue Type: Improvement
>Affects Versions: 1.1.0
>Reporter: Craig W
>Assignee: Tomasz Janiszewski
>Priority: Minor
>
> docker 1.13 was recently released and it now has a new --cpus flag which 
> allows a user to specify how many cpus a container should have. This is much 
> simpler for users to reason about.
> mesos should switch to starting a container with --cpus instead of 
> --cpu-shares, or at least make it configurable.
> https://blog.docker.com/2017/01/cpu-management-docker-1-13/



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-7006) Launch docker containers with --cpus instead of cpu-shares

2017-02-06 Thread haosdent (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-7006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent updated MESOS-7006:

Priority: Minor  (was: Major)

> Launch docker containers with --cpus instead of cpu-shares
> --
>
> Key: MESOS-7006
> URL: https://issues.apache.org/jira/browse/MESOS-7006
> Project: Mesos
>  Issue Type: Improvement
>Affects Versions: 1.1.0
>Reporter: Craig W
>Assignee: Tomasz Janiszewski
>Priority: Minor
>
> docker 1.13 was recently released and it now has a new --cpus flag which 
> allows a user to specify how many cpus a container should have. This is much 
> simpler for users to reason about.
> mesos should switch to starting a container with --cpus instead of 
> --cpu-shares, or at least make it configurable.
> https://blog.docker.com/2017/01/cpu-management-docker-1-13/



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-7006) Launch docker containers with --cpus instead of cpu-shares

2017-02-06 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15853789#comment-15853789
 ] 

haosdent commented on MESOS-7006:
-

Refer to https://docs.docker.com/engine/reference/run/#/cpu-period-constraint
{code}
For example, if there is 1 CPU, then --cpus=0.5 will achieve the same result as 
setting --cpu-period=5 and --cpu-quota=25000 (50% CPU).
{code}

This issue looks duplicated with what [~zhitao] do before. 
https://issues.apache.org/jira/browse/MESOS-6134 Should we mark this 
duplicated? 
[~jieyu][~gilbert][~janisz] 

Another option is we put off this after we have a better way to validate 
docker's version and handle compatible logics.

> Launch docker containers with --cpus instead of cpu-shares
> --
>
> Key: MESOS-7006
> URL: https://issues.apache.org/jira/browse/MESOS-7006
> Project: Mesos
>  Issue Type: Improvement
>Affects Versions: 1.1.0
>Reporter: Craig W
>Assignee: Tomasz Janiszewski
>
> docker 1.13 was recently released and it now has a new --cpus flag which 
> allows a user to specify how many cpus a container should have. This is much 
> simpler for users to reason about.
> mesos should switch to starting a container with --cpus instead of 
> --cpu-shares, or at least make it configurable.
> https://blog.docker.com/2017/01/cpu-management-docker-1-13/



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-5186) mesos.interface: Allow using protobuf 3.x

2017-02-06 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15853779#comment-15853779
 ] 

haosdent commented on MESOS-5186:
-

hi, [~asottileyelp] What if the client use python protobuf3 library and Mesos 
use protobuf2? Does this cause any compatibility issues?

> mesos.interface: Allow using protobuf 3.x
> -
>
> Key: MESOS-5186
> URL: https://issues.apache.org/jira/browse/MESOS-5186
> Project: Mesos
>  Issue Type: Improvement
>  Components: python api
>Reporter: Myautsai PAN
>Assignee: Yong Tang
>Priority: Minor
>  Labels: easyfix
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> We're working on integrating TensorFlow(https://www.tensorflow.org) with 
> mesos. Both the two require {{protobuf}}. The python package 
> {{mesos.interface}} requires {{protobuf>=2.6.1,<3}}, but {{tensorflow}} 
> requires {{protobuf>=3.0.0}} . Though protobuf 3.x is not compatible with 
> protobuf 2.x, but anyway we modify the {{setup.py}} 
> (https://github.com/apache/mesos/blob/66cddaf/src/python/interface/setup.py.in#L29)
> from {{'install_requires': [ 'google-common>=0.0.1', 'protobuf>=2.6.1,<3' 
> ],}} to {{'install_requires': [ 'google-common>=0.0.1', 'protobuf>=2.6.1' ],}}
> It works fine. Would you please consider support protobuf 3.x officially in 
> the next release? Maybe just remove the {{,<3}} restriction is enough.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-970) Upgrade bundled leveldb to 1.19

2017-02-06 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15853776#comment-15853776
 ] 

haosdent commented on MESOS-970:


[~janisz] no worried, let me test it first.

> Upgrade bundled leveldb to 1.19
> ---
>
> Key: MESOS-970
> URL: https://issues.apache.org/jira/browse/MESOS-970
> Project: Mesos
>  Issue Type: Improvement
>  Components: replicated log
>Reporter: Benjamin Mahler
>Assignee: Tomasz Janiszewski
>
> We currently bundle leveldb 1.4, and the latest version is leveldb 1.19.
> A careful review of the fixes and changes in each release would be prudent. 
> Regression testing and performance testing would also be prudent, given the 
> replicated log is built on leveldb.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-6134) Port CFS quota support to Docker Containerizer using command executor.

2017-02-05 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15853560#comment-15853560
 ] 

haosdent commented on MESOS-6134:
-

The patch chain is almost ready. But 1.2.0 would be cut recently, let's commit 
it after 1.2.0 release and backport to branches under active maintain at that 
time. Thanks zhitao and jieyu's works again.

> Port CFS quota support to Docker Containerizer using command executor.
> --
>
> Key: MESOS-6134
> URL: https://issues.apache.org/jira/browse/MESOS-6134
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, docker
>Reporter: Zhitao Li
>Assignee: Zhitao Li
>
> MESOS-2154 only partially fixed the CFS quota support in Docker 
> Containerizer: that fix only works for custom executor.
> This tracks the fix for command executor so we can declare this is complete.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-6134) Port CFS quota support to Docker Containerizer using command executor.

2017-02-05 Thread haosdent (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent updated MESOS-6134:

Target Version/s: 1.3.0  (was: 1.2.0)

> Port CFS quota support to Docker Containerizer using command executor.
> --
>
> Key: MESOS-6134
> URL: https://issues.apache.org/jira/browse/MESOS-6134
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, docker
>Reporter: Zhitao Li
>Assignee: Zhitao Li
>
> MESOS-2154 only partially fixed the CFS quota support in Docker 
> Containerizer: that fix only works for custom executor.
> This tracks the fix for command executor so we can declare this is complete.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-7040) Advertising port is used by Framework to register

2017-02-03 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15851957#comment-15851957
 ] 

haosdent commented on MESOS-7040:
-

hi, [~klueska] yep, I try to resolve this with MESOS-3901 together.

> Advertising port is used by Framework to register
> -
>
> Key: MESOS-7040
> URL: https://issues.apache.org/jira/browse/MESOS-7040
> Project: Mesos
>  Issue Type: Bug
>  Components: framework, webui
>Affects Versions: 1.1.0
>Reporter: Mickaël Fortunato
>Assignee: haosdent
>
> We have 3 instances of mesos-master listening on port 1, all 3 behind a 
> proxy. The proxy listens on port 80 and proxypass the request to a HAproxy 
> that distributes the requests through the 3 instances. 
> Our problem is that the `MESOS_PORT` is set to 1, so the Mesos UI try to 
> callback the proxy with the port 1.
> We try to use advertising port, but it makes the frameworks try to register 
> on the wrong port. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-3901) Enable Mesos to be able know when it is hosted behind a proxy with a URL prefix

2017-02-03 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15851845#comment-15851845
 ] 

haosdent commented on MESOS-3901:
-

hi, [~anandmazumdar] We should enhancement this in the WebUI. Because we 
resolve this, I suggest the above workarounds to bypass this problem which 
exists since Mesos has WebUI.

> Enable Mesos to be able know when it is hosted behind a proxy with a URL 
> prefix
> ---
>
> Key: MESOS-3901
> URL: https://issues.apache.org/jira/browse/MESOS-3901
> Project: Mesos
>  Issue Type: Improvement
>  Components: webui
>Reporter: Harpreet
>Assignee: haosdent
>  Labels: mesosphere
>
> If Mesos is run behind a proxy with a URL prefix e.g.  
> https://:/services/mesos (`/services/mesos` being the URL 
> prefix), sandboxes in mesos don't load. This happens because when
>   Mesos is accessed through a proxy at 
> https://:/services/mesos, Mesos tries to request slave state 
> from 
> https://:/slave/20151110-232502-218431498-5050-1234-S1/slave(1)/state.json?jsonp=angular.callbacks._4.
>  This URL is missing the /services/mesos path prefix, so the request fails. 
> Fixing this by rewriting URLs in the body of every response, would not be a 
> clean solution and can be error prone.
> After searching around a bit we've learned that this is apparently a common 
> issue with webapps, because there is no standard specification for making 
> them aware of their base URL path. Some will allow you to specify a base path 
> in configuration[1], others will respect an X-Forwarded-Path header if a 
> proxy provides it[2], and others don't handle this at all. 
> It would be great to have explicit support in for this in Mesos.
> [1] 
> http://search.cpan.org/~bobtfish/Catalyst-TraitFor-Request-ProxyBase-0.05/lib/Catalyst/TraitFor/Request/ProxyBase.pm
> [2] https://github.com/mattkenney/feedsquish/blob/master/rupta.py#L94



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (MESOS-3901) Enable Mesos to be able know when it is hosted behind a proxy with a URL prefix

2017-02-03 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15851845#comment-15851845
 ] 

haosdent edited comment on MESOS-3901 at 2/3/17 6:02 PM:
-

hi, [~anandmazumdar] We should enhancement this in the WebUI. Before we resolve 
this, I suggest the above workarounds to bypass this problem which exists since 
Mesos has WebUI.


was (Author: haosd...@gmail.com):
hi, [~anandmazumdar] We should enhancement this in the WebUI. Because we 
resolve this, I suggest the above workarounds to bypass this problem which 
exists since Mesos has WebUI.

> Enable Mesos to be able know when it is hosted behind a proxy with a URL 
> prefix
> ---
>
> Key: MESOS-3901
> URL: https://issues.apache.org/jira/browse/MESOS-3901
> Project: Mesos
>  Issue Type: Improvement
>  Components: webui
>Reporter: Harpreet
>Assignee: haosdent
>  Labels: mesosphere
>
> If Mesos is run behind a proxy with a URL prefix e.g.  
> https://:/services/mesos (`/services/mesos` being the URL 
> prefix), sandboxes in mesos don't load. This happens because when
>   Mesos is accessed through a proxy at 
> https://:/services/mesos, Mesos tries to request slave state 
> from 
> https://:/slave/20151110-232502-218431498-5050-1234-S1/slave(1)/state.json?jsonp=angular.callbacks._4.
>  This URL is missing the /services/mesos path prefix, so the request fails. 
> Fixing this by rewriting URLs in the body of every response, would not be a 
> clean solution and can be error prone.
> After searching around a bit we've learned that this is apparently a common 
> issue with webapps, because there is no standard specification for making 
> them aware of their base URL path. Some will allow you to specify a base path 
> in configuration[1], others will respect an X-Forwarded-Path header if a 
> proxy provides it[2], and others don't handle this at all. 
> It would be great to have explicit support in for this in Mesos.
> [1] 
> http://search.cpan.org/~bobtfish/Catalyst-TraitFor-Request-ProxyBase-0.05/lib/Catalyst/TraitFor/Request/ProxyBase.pm
> [2] https://github.com/mattkenney/feedsquish/blob/master/rupta.py#L94



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (MESOS-3901) Enable Mesos to be able know when it is hosted behind a proxy with a URL prefix

2017-02-02 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15851128#comment-15851128
 ] 

haosdent edited comment on MESOS-3901 at 2/3/17 6:16 AM:
-

[~dddpaul] OK, I think you may try https://github.com/dcos/adminrouter/ to 
address your problem. cc [~zanes]


was (Author: haosd...@gmail.com):
[~dddpaul] OK, I think you may try https://github.com/dcos/adminrouter/ to 
address your problem.

> Enable Mesos to be able know when it is hosted behind a proxy with a URL 
> prefix
> ---
>
> Key: MESOS-3901
> URL: https://issues.apache.org/jira/browse/MESOS-3901
> Project: Mesos
>  Issue Type: Improvement
>  Components: webui
>Reporter: Harpreet
>Assignee: haosdent
>  Labels: mesosphere
>
> If Mesos is run behind a proxy with a URL prefix e.g.  
> https://:/services/mesos (`/services/mesos` being the URL 
> prefix), sandboxes in mesos don't load. This happens because when
>   Mesos is accessed through a proxy at 
> https://:/services/mesos, Mesos tries to request slave state 
> from 
> https://:/slave/20151110-232502-218431498-5050-1234-S1/slave(1)/state.json?jsonp=angular.callbacks._4.
>  This URL is missing the /services/mesos path prefix, so the request fails. 
> Fixing this by rewriting URLs in the body of every response, would not be a 
> clean solution and can be error prone.
> After searching around a bit we've learned that this is apparently a common 
> issue with webapps, because there is no standard specification for making 
> them aware of their base URL path. Some will allow you to specify a base path 
> in configuration[1], others will respect an X-Forwarded-Path header if a 
> proxy provides it[2], and others don't handle this at all. 
> It would be great to have explicit support in for this in Mesos.
> [1] 
> http://search.cpan.org/~bobtfish/Catalyst-TraitFor-Request-ProxyBase-0.05/lib/Catalyst/TraitFor/Request/ProxyBase.pm
> [2] https://github.com/mattkenney/feedsquish/blob/master/rupta.py#L94



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (MESOS-3901) Enable Mesos to be able know when it is hosted behind a proxy with a URL prefix

2017-02-02 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15851128#comment-15851128
 ] 

haosdent commented on MESOS-3901:
-

[~dddpaul] OK, I think you may try https://github.com/dcos/adminrouter/ to 
address your problem.

> Enable Mesos to be able know when it is hosted behind a proxy with a URL 
> prefix
> ---
>
> Key: MESOS-3901
> URL: https://issues.apache.org/jira/browse/MESOS-3901
> Project: Mesos
>  Issue Type: Improvement
>  Components: webui
>Reporter: Harpreet
>Assignee: haosdent
>  Labels: mesosphere
>
> If Mesos is run behind a proxy with a URL prefix e.g.  
> https://:/services/mesos (`/services/mesos` being the URL 
> prefix), sandboxes in mesos don't load. This happens because when
>   Mesos is accessed through a proxy at 
> https://:/services/mesos, Mesos tries to request slave state 
> from 
> https://:/slave/20151110-232502-218431498-5050-1234-S1/slave(1)/state.json?jsonp=angular.callbacks._4.
>  This URL is missing the /services/mesos path prefix, so the request fails. 
> Fixing this by rewriting URLs in the body of every response, would not be a 
> clean solution and can be error prone.
> After searching around a bit we've learned that this is apparently a common 
> issue with webapps, because there is no standard specification for making 
> them aware of their base URL path. Some will allow you to specify a base path 
> in configuration[1], others will respect an X-Forwarded-Path header if a 
> proxy provides it[2], and others don't handle this at all. 
> It would be great to have explicit support in for this in Mesos.
> [1] 
> http://search.cpan.org/~bobtfish/Catalyst-TraitFor-Request-ProxyBase-0.05/lib/Catalyst/TraitFor/Request/ProxyBase.pm
> [2] https://github.com/mattkenney/feedsquish/blob/master/rupta.py#L94



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (MESOS-3901) Enable Mesos to be able know when it is hosted behind a proxy with a URL prefix

2017-02-02 Thread haosdent (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent reassigned MESOS-3901:
---

Assignee: haosdent

> Enable Mesos to be able know when it is hosted behind a proxy with a URL 
> prefix
> ---
>
> Key: MESOS-3901
> URL: https://issues.apache.org/jira/browse/MESOS-3901
> Project: Mesos
>  Issue Type: Improvement
>  Components: webui
>Reporter: Harpreet
>Assignee: haosdent
>  Labels: mesosphere
>
> If Mesos is run behind a proxy with a URL prefix e.g.  
> https://:/services/mesos (`/services/mesos` being the URL 
> prefix), sandboxes in mesos don't load. This happens because when
>   Mesos is accessed through a proxy at 
> https://:/services/mesos, Mesos tries to request slave state 
> from 
> https://:/slave/20151110-232502-218431498-5050-1234-S1/slave(1)/state.json?jsonp=angular.callbacks._4.
>  This URL is missing the /services/mesos path prefix, so the request fails. 
> Fixing this by rewriting URLs in the body of every response, would not be a 
> clean solution and can be error prone.
> After searching around a bit we've learned that this is apparently a common 
> issue with webapps, because there is no standard specification for making 
> them aware of their base URL path. Some will allow you to specify a base path 
> in configuration[1], others will respect an X-Forwarded-Path header if a 
> proxy provides it[2], and others don't handle this at all. 
> It would be great to have explicit support in for this in Mesos.
> [1] 
> http://search.cpan.org/~bobtfish/Catalyst-TraitFor-Request-ProxyBase-0.05/lib/Catalyst/TraitFor/Request/ProxyBase.pm
> [2] https://github.com/mattkenney/feedsquish/blob/master/rupta.py#L94



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] (MESOS-7040) Advertising port is used by Framework to register

2017-01-31 Thread haosdent (JIRA)
Title: Message Title
 
 
 
 
 
 
 
 
 
 
  
 
 haosdent assigned an issue to haosdent 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 Mesos /  MESOS-7040 
 
 
 
  Advertising port is used by Framework to register  
 
 
 
 
 
 
 
 
 

Change By:
 
 haosdent 
 
 
 

Assignee:
 
 haosdent 
 
 
 
 
 
 
 
 
 
 
 
 

 
 Add Comment 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 
 
 

 This message was sent by Atlassian JIRA (v6.3.15#6346-sha1:dbc023d) 
 
 
 
 
  
 
 
 
 
 
 
 
 
   



[jira] [Commented] (MESOS-6517) Health checking only on 127.0.0.1 is limiting.

2017-01-27 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15843094#comment-15843094
 ] 

haosdent commented on MESOS-6517:
-

[~alexr][~avinash.mesos] Got it, thanks a lot for your nice explanation.

> Health checking only on 127.0.0.1 is limiting.
> --
>
> Key: MESOS-6517
> URL: https://issues.apache.org/jira/browse/MESOS-6517
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Alexander Rukletsov
>  Labels: health-check, mesosphere
>
> As of Mesos 1.1.0, HTTP and TCP health checks always use 127.0.0.1 as the 
> target IP. This is not configurable. As a result, tasks should listen on all 
> interfaces if they want to support HTTP and TCP health checks. However, there 
> might be some cases where tasks or containers will end up binding to a 
> specific IP address. 
> To make health checking more robust we can:
> * look at all interfaces in a given network namespace and do health check on 
> all the IP addresses;
> * allow users to specify the IP to health check;
> * deduce the target IP from task's discovery information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6517) Health checking only on 127.0.0.1 is limiting.

2017-01-26 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15839523#comment-15839523
 ] 

haosdent commented on MESOS-6517:
-

Hi, [~alexr][~avinash.mesos][~gkleiman][~jieyu] Should we add {{ip}} or 
{{hostname}} field in {{HTTPCheckInfo}} to address this?

Refer to the discussion at 
http://search-hadoop.com/m/Mesos/0Vlr6jCHiaMC2pm1?subj=Re+customized+IP+for+health+check
 
It looks like this ticket is invalid. 

> Health checking only on 127.0.0.1 is limiting.
> --
>
> Key: MESOS-6517
> URL: https://issues.apache.org/jira/browse/MESOS-6517
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Alexander Rukletsov
>  Labels: health-check, mesosphere
>
> As of Mesos 1.1.0, HTTP and TCP health checks always use 127.0.0.1 as the 
> target IP. This is not configurable. As a result, tasks should listen on all 
> interfaces if they want to support HTTP and TCP health checks. However, there 
> might be some cases where tasks or containers will end up binding to a 
> specific IP address. 
> To make health checking more robust we can:
> * look at all interfaces in a given network namespace and do health check on 
> all the IP addresses;
> * allow users to specify the IP to health check;
> * deduce the target IP from task's discovery information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4705) Linux 'perf' parsing logic may fail when OS distribution has perf backports.

2017-01-26 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15839508#comment-15839508
 ] 

haosdent commented on MESOS-4705:
-

[~bmahler] It looks like have more fields in recent perf version.

https://github.com/torvalds/linux/blob/v4.9/tools/perf/util/stat-shadow.c#L528
https://github.com/torvalds/linux/blob/v4.9/tools/perf/builtin-stat.c#L1149

Should we create a new ticket for this since this is marked "resolved" ?

> Linux 'perf' parsing logic may fail when OS distribution has perf backports.
> 
>
> Key: MESOS-4705
> URL: https://issues.apache.org/jira/browse/MESOS-4705
> Project: Mesos
>  Issue Type: Bug
>  Components: cgroups, isolation
>Affects Versions: 0.27.1
>Reporter: Fan Du
>Assignee: Fan Du
> Fix For: 0.26.2, 0.27.3, 0.28.2, 1.0.0
>
>
> When sampling container with perf event on Centos7 with kernel 
> 3.10.0-123.el7.x86_64, slave complained with below error spew:
> {code}
> E0218 16:32:00.591181  8376 perf_event.cpp:408] Failed to get perf sample: 
> Failed to parse perf sample: Failed to parse perf sample line 
> '25871993253,,cycles,mesos/5f23ffca-87ed-4ff6-84f2-6ec3d4098ab8,10059827422,100.00':
>  Unexpected number of fields
> {code}
> it's caused by the current perf format [assumption | 
> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob;f=src/linux/perf.cpp;h=1c113a2b3f57877e132bbd65e01fb2f045132128;hb=HEAD#l430]
>  with kernel version below 3.12 
> On 3.10.0-123.el7.x86_64 kernel, the format is with 6 tokens as below:
> value,unit,event,cgroup,running,ratio
> A local modification fixed this error on my test bed, please review this 
> ticket.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6790) Wrong task started time in webui

2017-01-26 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15839423#comment-15839423
 ] 

haosdent commented on MESOS-6790:
-

ping [~janisz] Do you think is this way acceptable?

> Wrong task started time in webui
> 
>
> Key: MESOS-6790
> URL: https://issues.apache.org/jira/browse/MESOS-6790
> Project: Mesos
>  Issue Type: Bug
>  Components: webui
>Reporter: haosdent
>Assignee: Tomasz Janiszewski
>  Labels: health-check, webui
>
> Reported by [~janisz]
> {quote}
> Hi
> When task has enabled Mesos healthcheck start time in UI can show wrong
> time. This happens because UI assumes that first status is task started
> [0]. This is not always true because Mesos keeps only recent tasks statuses
> [1] so when healthcheck updates tasks status it can override task start
> time displayed in webui.
> Best
> Tomek
> [0]
> https://github.com/apache/mesos/blob/master/src/webui/master/static/js/controllers.js#L140
> [1]
> https://github.com/apache/mesos/blob/f2adc8a95afda943f6a10e771aad64300da19047/src/common/protobuf_utils.cpp#L263-L265
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6933) Executor does not respect grace period

2017-01-26 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15839421#comment-15839421
 ] 

haosdent commented on MESOS-6933:
-

[~klueska][~janisz] This is {{sh}} problem rather than Mesos bug, because 
{{/bin/sh}} doesn't forward signals to any child processes. 

Docker has similar problem when you try to exit gracefully if you use {{sh}} to 
launch commands, refer to 
https://www.ctl.io/developers/blog/post/gracefully-stopping-docker-containers/ 
for the details.

So the correct way to implement exit gracefully in Docker, Mesos and other 
applications is to avoid use {{sh}}. More precisely, user should set 
{{CommandInfo.shell}} to false and use {{exec}} form to launch tasks if they 
would like to make task exit gracefully. Make sense? 

> Executor does not respect grace period
> --
>
> Key: MESOS-6933
> URL: https://issues.apache.org/jira/browse/MESOS-6933
> Project: Mesos
>  Issue Type: Bug
>  Components: executor
>Reporter: Tomasz Janiszewski
>
> Mesos Command Executor try to support grace period with escalate but 
> unfortunately it does not work. It launches {{command}} by wrapping it in 
> {{sh -c}} this cause process tree to look like this
> {code}
> Received killTask
> Shutting down
> Sending SIGTERM to process tree at pid 18
> Sent SIGTERM to the following process trees:
> [ 
> -+- 18 sh -c cd offer-i18n-0.1.24 && LD_PRELOAD=../librealresources.so 
> ./bin/offer-i18n -e prod -p $PORT0 
>  \--- 19 command...
> ]
> Command terminated with signal Terminated (pid: 18)
> {code}
> This cause {{sh}} to immediately close and so executor, while wrapped 
> {{command}} might need some more time to finish. Finally, executor thinks 
> command executed gracefully so it won't 
> [escalate|https://github.com/apache/mesos/blob/1.1.0/src/launcher/executor.cpp#L695]
>  to SIGKILL.
> This cause leaks when POSIX containerizer is used because if command ignores 
> SIGTERM it will be attached to initialize and never get killed. Using 
> pid/namespace only masks the problem because hanging process is captured 
> before it can gracefully shutdown.
> Fix for this is to sent SIGTERM only to {{sh}} children. {{sh}} will exit 
> when all children processes finish. If not they will be killed by escalation 
> to SIGKILL.
> All versions from 0.20 are affected.
> This test should pass 
> [src/tests/command_executor_tests.cpp:342|https://github.com/apache/mesos/blob/2c856178b59593ff8068ea8d6c6593943c33008c/src/tests/command_executor_tests.cpp#L342-L343]
> [Mailing list 
> thread|https://lists.apache.org/thread.html/1025dca0cf4418aee50b14330711500af864f08b53eb82d10cd5c04c@%3Cuser.mesos.apache.org%3E]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6933) Executor does not respect grace period

2017-01-26 Thread haosdent (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent updated MESOS-6933:

Description: 
Mesos Command Executor try to support grace period with escalate but 
unfortunately it does not work. It launches {{command}} by wrapping it in {{sh 
-c}} this cause process tree to look like this

{code}
Received killTask
Shutting down
Sending SIGTERM to process tree at pid 18
Sent SIGTERM to the following process trees:
[ 
-+- 18 sh -c cd offer-i18n-0.1.24 && LD_PRELOAD=../librealresources.so 
./bin/offer-i18n -e prod -p $PORT0 
 \--- 19 command...
]
Command terminated with signal Terminated (pid: 18)
{code}

This cause {{sh}} to immediately close and so executor, while wrapped 
{{command}} might need some more time to finish. Finally, executor thinks 
command executed gracefully so it won't 
[escalate|https://github.com/apache/mesos/blob/1.1.0/src/launcher/executor.cpp#L695]
 to SIGKILL.

This cause leaks when POSIX containerizer is used because if command ignores 
SIGTERM it will be attached to initialize and never get killed. Using 
pid/namespace only masks the problem because hanging process is captured before 
it can gracefully shutdown.

Fix for this is to sent SIGTERM only to {{sh}} children. {{sh}} will exit when 
all children processes finish. If not they will be killed by escalation to 
SIGKILL.

All versions from 0.20 are affected.

This test should pass 
[src/tests/command_executor_tests.cpp:342|https://github.com/apache/mesos/blob/2c856178b59593ff8068ea8d6c6593943c33008c/src/tests/command_executor_tests.cpp#L342-L343]
[Mailing list 
thread|https://lists.apache.org/thread.html/1025dca0cf4418aee50b14330711500af864f08b53eb82d10cd5c04c@%3Cuser.mesos.apache.org%3E]

  was:
Mesos Command Executor try to support grace period with escalate but 
unfortunately it does not work. It launches {{command}} by wrapping it in {{sh 
-c}} this cause process tree to look like this

{code}
Received killTask
Shutting down
Sending SIGTERM to process tree at pid 18
Sent SIGTERM to the following process trees:
[ 
-+- 18 sh -c cd offer-i18n-0.1.24 && LD_PRELOAD=../librealresources.so 
./bin/offer-i18n -e prod -p $PORT0 
 \--- 19 command...
]
Command terminated with signal Terminated (pid: 18)
{code}

This cause {{sh}} to immediately close and so executor, while wrapped 
{{command}} might need some more time to finish. Finally, executor thinks 
command executed gracefully so it won't 
[escalate|https://github.com/apache/mesos/blob/1.1.0/src/launcher/executor.cpp#L695]
 to SIGKILL.

This cause leaks when POSIX contenerizer is used because if command ignores 
SIGTERM it will be attached to init and never get killed. Using pid/namespace 
only masks the problem because hanging process is cpatured before it can 
gracefully shutdown.

Fix for this is to sent SIGTERM only to {{sh}} children. {{sh}} will exit when 
all sub processes finish. If not they will be killed by escalation to SIGKILL.

All versions from: 0.20 are affected.

This test should pass 
[src/tests/command_executor_tests.cpp:342|https://github.com/apache/mesos/blob/2c856178b59593ff8068ea8d6c6593943c33008c/src/tests/command_executor_tests.cpp#L342-L343]
[Mailing list 
thread|https://lists.apache.org/thread.html/1025dca0cf4418aee50b14330711500af864f08b53eb82d10cd5c04c@%3Cuser.mesos.apache.org%3E]


> Executor does not respect grace period
> --
>
> Key: MESOS-6933
> URL: https://issues.apache.org/jira/browse/MESOS-6933
> Project: Mesos
>  Issue Type: Bug
>  Components: executor
>Reporter: Tomasz Janiszewski
>
> Mesos Command Executor try to support grace period with escalate but 
> unfortunately it does not work. It launches {{command}} by wrapping it in 
> {{sh -c}} this cause process tree to look like this
> {code}
> Received killTask
> Shutting down
> Sending SIGTERM to process tree at pid 18
> Sent SIGTERM to the following process trees:
> [ 
> -+- 18 sh -c cd offer-i18n-0.1.24 && LD_PRELOAD=../librealresources.so 
> ./bin/offer-i18n -e prod -p $PORT0 
>  \--- 19 command...
> ]
> Command terminated with signal Terminated (pid: 18)
> {code}
> This cause {{sh}} to immediately close and so executor, while wrapped 
> {{command}} might need some more time to finish. Finally, executor thinks 
> command executed gracefully so it won't 
> [escalate|https://github.com/apache/mesos/blob/1.1.0/src/launcher/executor.cpp#L695]
>  to SIGKILL.
> This cause leaks when POSIX containerizer is used because if command ignores 
> SIGTERM it will be attached to initialize and never get killed. Using 
> pid/namespace only masks the problem because hanging process is captured 
> before it can gracefully shutdown.
> Fix for this is to sent SIGTERM only to {{sh}} children. {{sh}} will exit 
> when all children processes finish. If not they will be killed by escalation 
> to 

[jira] [Updated] (MESOS-6933) Executor does not respect grace period

2017-01-26 Thread haosdent (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent updated MESOS-6933:

Description: 
Mesos Command Executor try to support grace period with escalate but 
unfortunately it does not work. It launches {{command}} by wrapping it in {{sh 
-c}} this cause process tree to look like this

{code}
Received killTask
Shutting down
Sending SIGTERM to process tree at pid 18
Sent SIGTERM to the following process trees:
[ 
-+- 18 sh -c cd offer-i18n-0.1.24 && LD_PRELOAD=../librealresources.so 
./bin/offer-i18n -e prod -p $PORT0 
 \--- 19 command...
]
Command terminated with signal Terminated (pid: 18)
{code}

This cause {{sh}} to immediately close and so executor, while wrapped 
{{command}} might need some more time to finish. Finally, executor thinks 
command executed gracefully so it won't 
[escalate|https://github.com/apache/mesos/blob/1.1.0/src/launcher/executor.cpp#L695]
 to SIGKILL.

This cause leaks when POSIX contenerizer is used because if command ignores 
SIGTERM it will be attached to init and never get killed. Using pid/namespace 
only masks the problem because hanging process is cpatured before it can 
gracefully shutdown.

Fix for this is to sent SIGTERM only to {{sh}} children. {{sh}} will exit when 
all sub processes finish. If not they will be killed by escalation to SIGKILL.

All versions from: 0.20 are affected.

This test should pass 
[src/tests/command_executor_tests.cpp:342|https://github.com/apache/mesos/blob/2c856178b59593ff8068ea8d6c6593943c33008c/src/tests/command_executor_tests.cpp#L342-L343]
[Mailing list 
thread|https://lists.apache.org/thread.html/1025dca0cf4418aee50b14330711500af864f08b53eb82d10cd5c04c@%3Cuser.mesos.apache.org%3E]

  was:
Mesos Defult Executor try to support grace period with escalate but 
unfortunately it does not work. It launches {{command}} by wrapping it in {{sh 
-c}} this cause process tree to look like this

{code}
Received killTask
Shutting down
Sending SIGTERM to process tree at pid 18
Sent SIGTERM to the following process trees:
[ 
-+- 18 sh -c cd offer-i18n-0.1.24 && LD_PRELOAD=../librealresources.so 
./bin/offer-i18n -e prod -p $PORT0 
 \--- 19 command...
]
Command terminated with signal Terminated (pid: 18)
{code}

This cause {{sh}} to immediately close and so executor, while wrapped 
{{command}} might need some more time to finish. Finally, executor thinks 
command executed gracefully so it won't 
[escalate|https://github.com/apache/mesos/blob/1.1.0/src/launcher/executor.cpp#L695]
 to SIGKILL.

This cause leaks when POSIX contenerizer is used because if command ignores 
SIGTERM it will be attached to init and never get killed. Using pid/namespace 
only masks the problem because hanging process is cpatured before it can 
gracefully shutdown.

Fix for this is to sent SIGTERM only to {{sh}} children. {{sh}} will exit when 
all sub processes finish. If not they will be killed by escalation to SIGKILL.

All versions from: 0.20 are affected.

This test should pass 
[src/tests/command_executor_tests.cpp:342|https://github.com/apache/mesos/blob/2c856178b59593ff8068ea8d6c6593943c33008c/src/tests/command_executor_tests.cpp#L342-L343]
[Mailing list 
thread|https://lists.apache.org/thread.html/1025dca0cf4418aee50b14330711500af864f08b53eb82d10cd5c04c@%3Cuser.mesos.apache.org%3E]


> Executor does not respect grace period
> --
>
> Key: MESOS-6933
> URL: https://issues.apache.org/jira/browse/MESOS-6933
> Project: Mesos
>  Issue Type: Bug
>  Components: executor
>Reporter: Tomasz Janiszewski
>
> Mesos Command Executor try to support grace period with escalate but 
> unfortunately it does not work. It launches {{command}} by wrapping it in 
> {{sh -c}} this cause process tree to look like this
> {code}
> Received killTask
> Shutting down
> Sending SIGTERM to process tree at pid 18
> Sent SIGTERM to the following process trees:
> [ 
> -+- 18 sh -c cd offer-i18n-0.1.24 && LD_PRELOAD=../librealresources.so 
> ./bin/offer-i18n -e prod -p $PORT0 
>  \--- 19 command...
> ]
> Command terminated with signal Terminated (pid: 18)
> {code}
> This cause {{sh}} to immediately close and so executor, while wrapped 
> {{command}} might need some more time to finish. Finally, executor thinks 
> command executed gracefully so it won't 
> [escalate|https://github.com/apache/mesos/blob/1.1.0/src/launcher/executor.cpp#L695]
>  to SIGKILL.
> This cause leaks when POSIX contenerizer is used because if command ignores 
> SIGTERM it will be attached to init and never get killed. Using pid/namespace 
> only masks the problem because hanging process is cpatured before it can 
> gracefully shutdown.
> Fix for this is to sent SIGTERM only to {{sh}} children. {{sh}} will exit 
> when all sub processes finish. If not they will be killed by escalation to 
> SIGKILL.
> All versions 

  1   2   3   4   5   6   7   8   9   10   >