from:"Miguel Bernadin \(JIRA\)"

[jira] [Created] (MESOS-7920) Mesos Agents Crash If Loss Connection to ZK

2017-08-27 Thread Miguel Bernadin (JIRA)

Miguel Bernadin created MESOS-7920:
--

 Summary: Mesos Agents Crash If Loss Connection to ZK
 Key: MESOS-7920
 URL: https://issues.apache.org/jira/browse/MESOS-7920
 Project: Mesos
  Issue Type: Bug
  Components: agent
Affects Versions: 1.3.1
Reporter: Miguel Bernadin
Assignee: Vinod Kone


There is an issue where by if mesos agents are dead because they lost access to 
the zookeeper quorum. Once when the the {{dcos-mesos-slave.service}} main 
process exited, code=killed, status=6/ABRT.

 

_*Mesos Agents Exiting with Loss of ZK Connectivity*_
{code:java}
mesos-slave[12971]: 2017-08-09 
20:12:44,698:12971(0x7fd2161d3700):ZOO_INFO@zookeeper_init@786: Initiating 
client connection, host=leader.mesos:2181 sessionTimeout=1 
watcher=0x7fd22188d250 sessionId=0 sessionPasswd= con
mesos-slave[12971]: 2017-08-09 
20:12:44,713:12971(0x7fd2161d3700):ZOO_ERROR@getaddrs@599: getaddrinfo: No such 
file or directory
mesos-slave[12971]: F0809 20:12:44.713604 12988 zookeeper.cpp:132] Failed to 
create ZooKeeper, zookeeper_init: No such file or directory [2]
mesos-slave[12971]: *** Check failure stack trace: ***
mesos-slave[12971]: @ 0x7fd22075a9fd google::LogMessage::Fail()
mesos-slave[12971]: @ 0x7fd22075c89d google::LogMessage::SendToLog()
mesos-slave[12971]: @ 0x7fd22075a5ec google::LogMessage::Flush()
mesos-slave[12971]: @ 0x7fd22075a7f9 google::LogMessage::~LogMessage()
mesos-slave[12971]: @ 0x7fd22075b76e google::ErrnoLogMessage::~ErrnoLogMessage()
mesos-slave[12971]: @ 0x7fd22188daf3 ZooKeeperProcess::initialize()
mesos-slave[12971]: @ 0x7fd221c424c1 process::ProcessManager::resume()
mesos-slave[12971]: @ 0x7fd221c42777 
_ZNSt6thread5_ImplISt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS6_EEEvEEE6_M_runEv
mesos-slave[12971]: @ 0x7fd2201fad73 (unknown)
mesos-slave[12971]: @ 0x7fd21fcfb52c (unknown)
mesos-slave[12971]: @ 0x7fd21fa391dd (unknown)
systemd[1]: dcos-mesos-slave.service: Main process exited, code=killed, 
status=6/ABRT
systemd[1]: dcos-mesos-slave.service: Unit entered failed state.
systemd[1]: dcos-mesos-slave.service: Failed with result 'signal'.
systemd[1]: dcos-mesos-slave.service: Service hold-off time over, scheduling 
restart.{code}
 
*NEXT STEP*

Determine if we can change the behavior how Mesos responds to loss of access to 
ZK.





--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (MESOS-6369) Add a column for FrameworkID when displaying tasks in the WebUI

2016-10-31 Thread Miguel Bernadin (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-6369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miguel Bernadin reassigned MESOS-6369:
--

Assignee: Miguel Bernadin

> Add a column for FrameworkID when displaying tasks in the WebUI
> ---
>
> Key: MESOS-6369
> URL: https://issues.apache.org/jira/browse/MESOS-6369
> Project: Mesos
>  Issue Type: Improvement
>  Components: webui
>Reporter: Joseph Wu
>Assignee: Miguel Bernadin
>Priority: Minor
>  Labels: mesosphere, newbie
>
> The Mesos Web UI home page shows a list of active/completed/orphan tasks 
> tasks like this:
> || ID || Name || State || Started || Host || ||
> | 1 | My ambiguously named task | RUNNING | 1 minute ago | 10.10.0.1 | 
> Sandbox |
> | 1 | My ambiguously named task | RUNNING | 1 minute ago | 10.10.0.1 | 
> Sandbox |
> | 2 | My ambiguously named task | RUNNING | 1 minute ago | 10.10.0.1 | 
> Sandbox |
> When you start multiple frameworks, the task IDs and names show in the UI may 
> be ambiguous, requiring extra clicks/investigation to disambiguate.  
> In the above case, to disambiguate between the two tasks with ID {{1}}, the 
> user would need to navigate to each sandbox and check the associated 
> frameworkID in the {{/browse}} view.  
> We could add a column showing the {{FrameworkID}} next to each task:
> || Framework || ID || Name || State || Started || Host || ||
> | 179b5436-30ec-45e9-b324-fa5c5a1dd756- | 1 | My ambiguously named task | 
> RUNNING | 1 minute ago | 10.10.0.1 | Sandbox |
> | 179b5436-30ec-45e9-b324-fa5c5a1dd756-0001 | 1 | My ambiguously named task | 
> RUNNING | 1 minute ago | 10.10.0.1 | Sandbox |
> | 179b5436-30ec-45e9-b324-fa5c5a1dd756-0001 | 2 | My ambiguously named task | 
> RUNNING | 1 minute ago | 10.10.0.1 | Sandbox |
> The {{FrameworkID}} s could be links to the associated framework
> {code}
> 
>   {{framework.id | truncateMesosID}}
> 
> {code}
> -
> This involves additions to three tables:
> https://github.com/apache/mesos/blob/1.0.x/src/webui/master/static/home.html#L152-L157
> https://github.com/apache/mesos/blob/1.0.x/src/webui/master/static/home.html#L199-L205
> https://github.com/apache/mesos/blob/1.0.x/src/webui/master/static/home.html#L246-L252



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-6369) Add a column for FrameworkID when displaying tasks in the WebUI

2016-10-31 Thread Miguel Bernadin (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-6369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15623594#comment-15623594
 ] 

Miguel Bernadin commented on MESOS-6369:


[~kaysoky], Sumbitted for review https://reviews.apache.org/r/53324/ to this 
that changes that look like this with modified field names: 

|| Framework ID || Task ID || Task Name || State || Started || Host || ||
| 179b5436-30ec-45e9-b324-fa5c5a1dd756- | 1 | My ambiguously named task | 
RUNNING | 1 minute ago | 10.10.0.1 | Sandbox |
| 179b5436-30ec-45e9-b324-fa5c5a1dd756-0001 | 1 | My ambiguously named task | 
RUNNING | 1 minute ago | 10.10.0.1 | Sandbox |
| 179b5436-30ec-45e9-b324-fa5c5a1dd756-0001 | 2 | My ambiguously named task | 
RUNNING | 1 minute ago | 10.10.0.1 | Sandbox |


> Add a column for FrameworkID when displaying tasks in the WebUI
> ---
>
> Key: MESOS-6369
> URL: https://issues.apache.org/jira/browse/MESOS-6369
> Project: Mesos
>  Issue Type: Improvement
>  Components: webui
>Reporter: Joseph Wu
>Assignee: Miguel Bernadin
>Priority: Minor
>  Labels: mesosphere, newbie
>
> The Mesos Web UI home page shows a list of active/completed/orphan tasks 
> tasks like this:
> || ID || Name || State || Started || Host || ||
> | 1 | My ambiguously named task | RUNNING | 1 minute ago | 10.10.0.1 | 
> Sandbox |
> | 1 | My ambiguously named task | RUNNING | 1 minute ago | 10.10.0.1 | 
> Sandbox |
> | 2 | My ambiguously named task | RUNNING | 1 minute ago | 10.10.0.1 | 
> Sandbox |
> When you start multiple frameworks, the task IDs and names show in the UI may 
> be ambiguous, requiring extra clicks/investigation to disambiguate.  
> In the above case, to disambiguate between the two tasks with ID {{1}}, the 
> user would need to navigate to each sandbox and check the associated 
> frameworkID in the {{/browse}} view.  
> We could add a column showing the {{FrameworkID}} next to each task:
> || Framework || ID || Name || State || Started || Host || ||
> | 179b5436-30ec-45e9-b324-fa5c5a1dd756- | 1 | My ambiguously named task | 
> RUNNING | 1 minute ago | 10.10.0.1 | Sandbox |
> | 179b5436-30ec-45e9-b324-fa5c5a1dd756-0001 | 1 | My ambiguously named task | 
> RUNNING | 1 minute ago | 10.10.0.1 | Sandbox |
> | 179b5436-30ec-45e9-b324-fa5c5a1dd756-0001 | 2 | My ambiguously named task | 
> RUNNING | 1 minute ago | 10.10.0.1 | Sandbox |
> The {{FrameworkID}} s could be links to the associated framework
> {code}
> 
>   {{framework.id | truncateMesosID}}
> 
> {code}
> -
> This involves additions to three tables:
> https://github.com/apache/mesos/blob/1.0.x/src/webui/master/static/home.html#L152-L157
> https://github.com/apache/mesos/blob/1.0.x/src/webui/master/static/home.html#L199-L205
> https://github.com/apache/mesos/blob/1.0.x/src/webui/master/static/home.html#L246-L252



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (MESOS-6369) Add a column for FrameworkID when displaying tasks in the WebUI

2016-10-31 Thread Miguel Bernadin (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-6369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15623594#comment-15623594
 ] 

Miguel Bernadin edited comment on MESOS-6369 at 10/31/16 10:21 PM:
---

[~kaysoky], Submitted for review https://reviews.apache.org/r/53324/ to this 
that changes that look like this with modified field names: 

|| Framework ID || Task ID || Task Name || State || Started || Host || ||
| 179b5436-30ec-45e9-b324-fa5c5a1dd756- | 1 | My ambiguously named task | 
RUNNING | 1 minute ago | 10.10.0.1 | Sandbox |
| 179b5436-30ec-45e9-b324-fa5c5a1dd756-0001 | 1 | My ambiguously named task | 
RUNNING | 1 minute ago | 10.10.0.1 | Sandbox |
| 179b5436-30ec-45e9-b324-fa5c5a1dd756-0001 | 2 | My ambiguously named task | 
RUNNING | 1 minute ago | 10.10.0.1 | Sandbox |



was (Author: bernadinm):
[~kaysoky], Sumbitted for review https://reviews.apache.org/r/53324/ to this 
that changes that look like this with modified field names: 

|| Framework ID || Task ID || Task Name || State || Started || Host || ||
| 179b5436-30ec-45e9-b324-fa5c5a1dd756- | 1 | My ambiguously named task | 
RUNNING | 1 minute ago | 10.10.0.1 | Sandbox |
| 179b5436-30ec-45e9-b324-fa5c5a1dd756-0001 | 1 | My ambiguously named task | 
RUNNING | 1 minute ago | 10.10.0.1 | Sandbox |
| 179b5436-30ec-45e9-b324-fa5c5a1dd756-0001 | 2 | My ambiguously named task | 
RUNNING | 1 minute ago | 10.10.0.1 | Sandbox |


> Add a column for FrameworkID when displaying tasks in the WebUI
> ---
>
> Key: MESOS-6369
> URL: https://issues.apache.org/jira/browse/MESOS-6369
> Project: Mesos
>  Issue Type: Improvement
>  Components: webui
>Reporter: Joseph Wu
>Assignee: Miguel Bernadin
>Priority: Minor
>  Labels: mesosphere, newbie
>
> The Mesos Web UI home page shows a list of active/completed/orphan tasks 
> tasks like this:
> || ID || Name || State || Started || Host || ||
> | 1 | My ambiguously named task | RUNNING | 1 minute ago | 10.10.0.1 | 
> Sandbox |
> | 1 | My ambiguously named task | RUNNING | 1 minute ago | 10.10.0.1 | 
> Sandbox |
> | 2 | My ambiguously named task | RUNNING | 1 minute ago | 10.10.0.1 | 
> Sandbox |
> When you start multiple frameworks, the task IDs and names show in the UI may 
> be ambiguous, requiring extra clicks/investigation to disambiguate.  
> In the above case, to disambiguate between the two tasks with ID {{1}}, the 
> user would need to navigate to each sandbox and check the associated 
> frameworkID in the {{/browse}} view.  
> We could add a column showing the {{FrameworkID}} next to each task:
> || Framework || ID || Name || State || Started || Host || ||
> | 179b5436-30ec-45e9-b324-fa5c5a1dd756- | 1 | My ambiguously named task | 
> RUNNING | 1 minute ago | 10.10.0.1 | Sandbox |
> | 179b5436-30ec-45e9-b324-fa5c5a1dd756-0001 | 1 | My ambiguously named task | 
> RUNNING | 1 minute ago | 10.10.0.1 | Sandbox |
> | 179b5436-30ec-45e9-b324-fa5c5a1dd756-0001 | 2 | My ambiguously named task | 
> RUNNING | 1 minute ago | 10.10.0.1 | Sandbox |
> The {{FrameworkID}} s could be links to the associated framework
> {code}
> 
>   {{framework.id | truncateMesosID}}
> 
> {code}
> -
> This involves additions to three tables:
> https://github.com/apache/mesos/blob/1.0.x/src/webui/master/static/home.html#L152-L157
> https://github.com/apache/mesos/blob/1.0.x/src/webui/master/static/home.html#L199-L205
> https://github.com/apache/mesos/blob/1.0.x/src/webui/master/static/home.html#L246-L252



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-6598) Broken Link Framework Development Page

2016-11-16 Thread Miguel Bernadin (JIRA)

Miguel Bernadin created MESOS-6598:
--

 Summary: Broken Link Framework Development Page
 Key: MESOS-6598
 URL: https://issues.apache.org/jira/browse/MESOS-6598
 Project: Mesos
  Issue Type: Bug
  Components: project website
Reporter: Miguel Bernadin
Priority: Trivial


http://mesos.apache.org/documentation/latest/app-framework-development-guide/

The link to this page is broken: 

Create your Framework Scheduler
If you are writing a scheduler against Mesos 1.0 or newer, it is recommended to 
use the new HTTP API (BROKEN LINK) to talk to Mesos.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (MESOS-6598) Broken Link Framework Development Page

2016-11-16 Thread Miguel Bernadin (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-6598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miguel Bernadin reassigned MESOS-6598:
--

Assignee: Miguel Bernadin

> Broken Link Framework Development Page
> --
>
> Key: MESOS-6598
> URL: https://issues.apache.org/jira/browse/MESOS-6598
> Project: Mesos
>  Issue Type: Bug
>  Components: project website
>Reporter: Miguel Bernadin
>Assignee: Miguel Bernadin
>Priority: Trivial
>
> http://mesos.apache.org/documentation/latest/app-framework-development-guide/
> The link to this page is broken: 
> Create your Framework Scheduler
> If you are writing a scheduler against Mesos 1.0 or newer, it is recommended 
> to use the new HTTP API (BROKEN LINK) to talk to Mesos.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-6598) Broken Link Framework Development Page

2016-11-16 Thread Miguel Bernadin (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-6598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15672212#comment-15672212
 ] 

Miguel Bernadin commented on MESOS-6598:


https://reviews.apache.org/r/53832/

> Broken Link Framework Development Page
> --
>
> Key: MESOS-6598
> URL: https://issues.apache.org/jira/browse/MESOS-6598
> Project: Mesos
>  Issue Type: Bug
>  Components: project website
>Reporter: Miguel Bernadin
>Assignee: Miguel Bernadin
>Priority: Trivial
>
> http://mesos.apache.org/documentation/latest/app-framework-development-guide/
> The link to this page is broken: 
> Create your Framework Scheduler
> If you are writing a scheduler against Mesos 1.0 or newer, it is recommended 
> to use the new HTTP API (BROKEN LINK) to talk to Mesos.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-6598) Broken Link Framework Development Page

2016-11-16 Thread Miguel Bernadin (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-6598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miguel Bernadin updated MESOS-6598:
---
Assignee: Joseph Wu  (was: Miguel Bernadin)

> Broken Link Framework Development Page
> --
>
> Key: MESOS-6598
> URL: https://issues.apache.org/jira/browse/MESOS-6598
> Project: Mesos
>  Issue Type: Bug
>  Components: project website
>Reporter: Miguel Bernadin
>Assignee: Joseph Wu
>Priority: Trivial
>
> http://mesos.apache.org/documentation/latest/app-framework-development-guide/
> The link to this page is broken: 
> Create your Framework Scheduler
> If you are writing a scheduler against Mesos 1.0 or newer, it is recommended 
> to use the new HTTP API (BROKEN LINK) to talk to Mesos.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-6841) Allow Dynamic CNI Configuration

2016-12-27 Thread Miguel Bernadin (JIRA)

Miguel Bernadin created MESOS-6841:
--

 Summary: Allow Dynamic CNI Configuration
 Key: MESOS-6841
 URL: https://issues.apache.org/jira/browse/MESOS-6841
 Project: Mesos
  Issue Type: Improvement
  Components: agent
Reporter: Miguel Bernadin
Priority: Minor


Agents has the ability to set resources dynamically without agent restart. 
After seeing that CNI is configured at agent startup we could add Dynamic CNI 
Configuration support in the future. Creating this JIRA is to track this effort.

{quote}
Note that the network/cni isolator learns all the available networks by looking 
at the CNI configuration in the --network_cni_config_dir at startup. This 
implies that if a new CNI network needs to be added after Agent startup, the 
Agent needs to be restarted. The network/cni isolator has been designed with 
recover capabilities and hence restarting the Agent (and therefore the 
network/cni isolator) will not affect container orchestration.
{quote}
Sourced from http://mesos.apache.org/documentation/latest/cni/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-6898) Allow Master Node Maintenance Primitives

2017-01-09 Thread Miguel Bernadin (JIRA)

Miguel Bernadin created MESOS-6898:
--

 Summary: Allow Master Node Maintenance Primitives 
 Key: MESOS-6898
 URL: https://issues.apache.org/jira/browse/MESOS-6898
 Project: Mesos
  Issue Type: Improvement
  Components: master
Reporter: Miguel Bernadin
Assignee: Gilbert Song


Currently it's very useful to allow agents to exit the cluster when using Mesos 
Primitives which can be helpful with adding/removing agents dynamically when 
replacing nodes for compliance, upgrades, or other reasons. We see there is a 
benefit to allow Mesos Masters on Mesos Maintenance Primitives so Masters can 
be dynamically removed from a cluster programmatically. 

Creating this JIRA so we can track this work. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-6898) Allow Master Node Maintenance Primitives

2017-01-09 Thread Miguel Bernadin (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-6898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miguel Bernadin updated MESOS-6898:
---
Affects Version/s: 1.2.0

> Allow Master Node Maintenance Primitives 
> -
>
> Key: MESOS-6898
> URL: https://issues.apache.org/jira/browse/MESOS-6898
> Project: Mesos
>  Issue Type: Improvement
>  Components: master
>Affects Versions: 1.2.0
>Reporter: Miguel Bernadin
>Assignee: Gilbert Song
>
> Currently it's very useful to allow agents to exit the cluster when using 
> Mesos Primitives which can be helpful with adding/removing agents dynamically 
> when replacing nodes for compliance, upgrades, or other reasons. We see there 
> is a benefit to allow Mesos Masters on Mesos Maintenance Primitives so 
> Masters can be dynamically removed from a cluster programmatically. 
> Creating this JIRA so we can track this work. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-3586) Installing Mesos 0.24.0 on multiple systems

2015-10-05 Thread Miguel Bernadin (JIRA)

Miguel Bernadin created MESOS-3586:
--

 Summary: Installing Mesos 0.24.0 on multiple systems
 Key: MESOS-3586
 URL: https://issues.apache.org/jira/browse/MESOS-3586
 Project: Mesos
  Issue Type: Bug
Affects Versions: 0.24.0
 Environment: Ubuntu 14.04, 3.13.0-32 generic
Reporter: Miguel Bernadin


I am install Mesos 0.24.0 on 4 servers which have very similar hardware and 
software configurations. 

After performing ../configure, make, and make check some servers have completed 
successfully and other failed on test [ RUN  ] 
MemoryPressureMesosTest.CGROUPS_ROOT_Statistics.

Is there something I should check in this test? 

PERFORMED MAKE CHECK NODE-001
[ RUN  ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics
I1005 14:37:35.585067 38479 exec.cpp:133] Version: 0.24.0
I1005 14:37:35.593789 38497 exec.cpp:207] Executor registered on slave 
20151005-143735-2393768202-35106-27900-S0
Registered executor on svdidac038.techlabs.accenture.com
Starting task 010b2fe9-4eac-4136-8a8a-6ce7665488b0
Forked command at 38510
sh -c 'while true; do dd count=512 bs=1M if=/dev/zero of=./temp; done'


PERFORMED MAKE CHECK NODE-002
[ RUN  ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics
I1005 14:38:58.794112 36997 exec.cpp:133] Version: 0.24.0
I1005 14:38:58.802851 37022 exec.cpp:207] Executor registered on slave 
20151005-143857-2360213770-50427-26325-S0
Registered executor on svdidac039.techlabs.accenture.com
Starting task 9bb317ba-41cb-44a4-b507-d1c85ceabc28
sh -c 'while true; do dd count=512 bs=1M if=/dev/zero of=./temp; done'
Forked command at 37028
../../src/tests/containerizer/memory_pressure_tests.cpp:145: Failure
Expected: (usage.get().mem_medium_pressure_counter()) >= 
(usage.get().mem_critical_pressure_counter()), actual: 5 vs 6
2015-10-05 
14:39:00,130:26325(0x2af08cc78700):ZOO_ERROR@handle_socket_error_msg@1697: 
Socket [127.0.0.1:37198] zk retcode=-4, errno=111(Connection refused): server 
refused to accept the client
[  FAILED  ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics (4303 ms)




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-3586) Installing Mesos 0.24.0 on multiple systems. Failed test on MemoryPressureMesosTest.CGROUPS_ROOT_Statistics

2015-10-05 Thread Miguel Bernadin (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-3586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miguel Bernadin updated MESOS-3586:
---
Summary: Installing Mesos 0.24.0 on multiple systems. Failed test on 
MemoryPressureMesosTest.CGROUPS_ROOT_Statistics  (was: Installing Mesos 0.24.0 
on multiple systems)

> Installing Mesos 0.24.0 on multiple systems. Failed test on 
> MemoryPressureMesosTest.CGROUPS_ROOT_Statistics
> ---
>
> Key: MESOS-3586
> URL: https://issues.apache.org/jira/browse/MESOS-3586
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.24.0
> Environment: Ubuntu 14.04, 3.13.0-32 generic
>Reporter: Miguel Bernadin
>
> I am install Mesos 0.24.0 on 4 servers which have very similar hardware and 
> software configurations. 
> After performing ../configure, make, and make check some servers have 
> completed successfully and other failed on test [ RUN  ] 
> MemoryPressureMesosTest.CGROUPS_ROOT_Statistics.
> Is there something I should check in this test? 
> PERFORMED MAKE CHECK NODE-001
> [ RUN  ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics
> I1005 14:37:35.585067 38479 exec.cpp:133] Version: 0.24.0
> I1005 14:37:35.593789 38497 exec.cpp:207] Executor registered on slave 
> 20151005-143735-2393768202-35106-27900-S0
> Registered executor on svdidac038.techlabs.accenture.com
> Starting task 010b2fe9-4eac-4136-8a8a-6ce7665488b0
> Forked command at 38510
> sh -c 'while true; do dd count=512 bs=1M if=/dev/zero of=./temp; done'
> PERFORMED MAKE CHECK NODE-002
> [ RUN  ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics
> I1005 14:38:58.794112 36997 exec.cpp:133] Version: 0.24.0
> I1005 14:38:58.802851 37022 exec.cpp:207] Executor registered on slave 
> 20151005-143857-2360213770-50427-26325-S0
> Registered executor on svdidac039.techlabs.accenture.com
> Starting task 9bb317ba-41cb-44a4-b507-d1c85ceabc28
> sh -c 'while true; do dd count=512 bs=1M if=/dev/zero of=./temp; done'
> Forked command at 37028
> ../../src/tests/containerizer/memory_pressure_tests.cpp:145: Failure
> Expected: (usage.get().mem_medium_pressure_counter()) >= 
> (usage.get().mem_critical_pressure_counter()), actual: 5 vs 6
> 2015-10-05 
> 14:39:00,130:26325(0x2af08cc78700):ZOO_ERROR@handle_socket_error_msg@1697: 
> Socket [127.0.0.1:37198] zk retcode=-4, errno=111(Connection refused): server 
> refused to accept the client
> [  FAILED  ] MemoryPressureMesosTest.CGROUPS_ROOT_Statistics (4303 ms)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-6098) Frameworks UI shows metrics for used plus offers

2016-08-26 Thread Miguel Bernadin (JIRA)

Miguel Bernadin created MESOS-6098:
--

 Summary: Frameworks UI shows metrics for used plus offers
 Key: MESOS-6098
 URL: https://issues.apache.org/jira/browse/MESOS-6098
 Project: Mesos
  Issue Type: Improvement
  Components: webui
Affects Versions: 1.0.1
Reporter: Miguel Bernadin
Assignee: Miguel Bernadin
Priority: Minor


When a framework is receiving many offers and it is denying them, the 
frameworks UI will show the metrics fluctuating for mem, cpu, gpu, and disk. 

>From a mesos perspective, those offers are given to the framework until the 
>framework declines them, so depending on the time the mesos UI gets updated, 
>its has combined all the used resources and offers (that have not been 
>accepted) to the framework and is reflected on the framework UI. If a 
>framework does not implement suppressiveOffers(), it will continue to deny 
>offers from mesos, which leads to the sporadic changes of metrics on the 
>framework UI. 

>From the operator's perspective, the user would expect to see used resources 
>consumed by the framework. Any offered resources can be viewed instead by 
>Mesos's Offers tab.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-6098) Frameworks UI shows metrics for used resources plus offers

2016-08-26 Thread Miguel Bernadin (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-6098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miguel Bernadin updated MESOS-6098:
---
Summary: Frameworks UI shows metrics for used resources plus offers  (was: 
Frameworks UI shows metrics for used plus offers)

> Frameworks UI shows metrics for used resources plus offers
> --
>
> Key: MESOS-6098
> URL: https://issues.apache.org/jira/browse/MESOS-6098
> Project: Mesos
>  Issue Type: Improvement
>  Components: webui
>Affects Versions: 1.0.1
>Reporter: Miguel Bernadin
>Assignee: Miguel Bernadin
>Priority: Minor
>
> When a framework is receiving many offers and it is denying them, the 
> frameworks UI will show the metrics fluctuating for mem, cpu, gpu, and disk. 
> From a mesos perspective, those offers are given to the framework until the 
> framework declines them, so depending on the time the mesos UI gets updated, 
> its has combined all the used resources and offers (that have not been 
> accepted) to the framework and is reflected on the framework UI. If a 
> framework does not implement suppressiveOffers(), it will continue to deny 
> offers from mesos, which leads to the sporadic changes of metrics on the 
> framework UI. 
> From the operator's perspective, the user would expect to see used resources 
> consumed by the framework. Any offered resources can be viewed instead by 
> Mesos's Offers tab.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-6098) Frameworks UI shows metrics for used resources plus offers

2016-08-29 Thread Miguel Bernadin (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-6098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miguel Bernadin updated MESOS-6098:
---
Assignee: Joseph Wu  (was: Miguel Bernadin)

> Frameworks UI shows metrics for used resources plus offers
> --
>
> Key: MESOS-6098
> URL: https://issues.apache.org/jira/browse/MESOS-6098
> Project: Mesos
>  Issue Type: Improvement
>  Components: webui
>Affects Versions: 1.0.1
>Reporter: Miguel Bernadin
>Assignee: Joseph Wu
>Priority: Minor
>
> When a framework is receiving many offers and it is denying them, the 
> frameworks UI will show the metrics fluctuating for mem, cpu, gpu, and disk. 
> From a mesos perspective, those offers are given to the framework until the 
> framework declines them, so depending on the time the mesos UI gets updated, 
> its has combined all the used resources and offers (that have not been 
> accepted) to the framework and is reflected on the framework UI. If a 
> framework does not implement suppressiveOffers(), it will continue to deny 
> offers from mesos, which leads to the sporadic changes of metrics on the 
> framework UI. 
> From the operator's perspective, the user would expect to see used resources 
> consumed by the framework. Any offered resources can be viewed instead by 
> Mesos's Offers tab.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-2982) Make Check Fails on RHEL 6

2015-07-01 Thread Miguel Bernadin (JIRA)

Miguel Bernadin created MESOS-2982:
--

 Summary: Make Check Fails on RHEL 6
 Key: MESOS-2982
 URL: https://issues.apache.org/jira/browse/MESOS-2982
 Project: Mesos
  Issue Type: Bug
  Components: build
Affects Versions: 0.22.1
 Environment: Linux xxx 2.6.32-504.16.2.el6.x86_64 #1 SMP Tue Mar 10 
17:01:00 EDT 2015 x86_64 x86_64 x86_64 GNU/Linux

Red Hat Enterprise Linux Server release 6.6 (Santiago)

Reporter: Miguel Bernadin


After downloading Mesos 22.1 and attemted to build it, I've encountered 
failrues on the build process below:

  FAILED  ] UserCgroupIsolatorTest/0.ROOT_CGROUPS_UserCgroup, where TypeParam = 
mesos::internal::slave::CgroupsMemIsolatorProcess (149 ms)
[--] 1 test from UserCgroupIsolatorTest/0 (149 ms total)

[--] 1 test from UserCgroupIsolatorTest/1, where TypeParam = 
mesos::internal::slave::CgroupsCpushareIsolatorProcess
userdel: user 'mesos.test.unprivileged.user' does not exist
[ RUN  ] UserCgroupIsolatorTest/1.ROOT_CGROUPS_UserCgroup
-bash: /sys/fs/cgroup/cpuacct/mesos/container/cgroup.procs: No such file or 
directory




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-2982) Make Check Fails on RHEL 6

2015-07-01 Thread Miguel Bernadin (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-2982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14611127#comment-14611127
 ] 

Miguel Bernadin commented on MESOS-2982:


After reading further, I've attempted to skip this Perf checking process since 
I've read that Perf is specific to the kernel version and different versions 
have different flags and output formats. Specifically, the code requires a 
kernel release >= 2.6.39 but I am running a 2.6.32 kernel: my version of perf 
is not currently supported and I should skip those tests. The only effect of 
this is that I cannot use the optional perf_event isolator.

That said, I've attempted these commands below to try to skip it but, it seems 
to ignore my environment variables and flags:

 build]# GTEST_FILTER="-Perf*:-UserCgroupIsolatorTest*UserCgroup"; export 
GTEST_FILTER
 build]# make check GTEST_FILTER="$GTEST_FILTER"

That failed above, then I tried this below:

 build]# ./bin/mesos-tests.sh 
--gtest_filter="-Perf*:-UserCgroupIsolatorTest*UserCgroup"





> Make Check Fails on RHEL 6
> --
>
> Key: MESOS-2982
> URL: https://issues.apache.org/jira/browse/MESOS-2982
> Project: Mesos
>  Issue Type: Bug
>  Components: build
>Affects Versions: 0.22.1
> Environment: Linux xxx 2.6.32-504.16.2.el6.x86_64 #1 SMP Tue Mar 10 
> 17:01:00 EDT 2015 x86_64 x86_64 x86_64 GNU/Linux
> Red Hat Enterprise Linux Server release 6.6 (Santiago)
>Reporter: Miguel Bernadin
>
> After downloading Mesos 22.1 and attemted to build it, I've encountered 
> failrues on the build process below:
>   FAILED  ] UserCgroupIsolatorTest/0.ROOT_CGROUPS_UserCgroup, where TypeParam 
> = mesos::internal::slave::CgroupsMemIsolatorProcess (149 ms)
> [--] 1 test from UserCgroupIsolatorTest/0 (149 ms total)
> [--] 1 test from UserCgroupIsolatorTest/1, where TypeParam = 
> mesos::internal::slave::CgroupsCpushareIsolatorProcess
> userdel: user 'mesos.test.unprivileged.user' does not exist
> [ RUN  ] UserCgroupIsolatorTest/1.ROOT_CGROUPS_UserCgroup
> -bash: /sys/fs/cgroup/cpuacct/mesos/container/cgroup.procs: No such file or 
> directory



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MESOS-2982) Make Check Fails on RHEL 6

2015-07-01 Thread Miguel Bernadin (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-2982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miguel Bernadin updated MESOS-2982:
---
Description: 
After downloading Mesos 22.1 and attemted to build it, I've encountered 
failures on the build process below:

  FAILED  ] UserCgroupIsolatorTest/0.ROOT_CGROUPS_UserCgroup, where TypeParam = 
mesos::internal::slave::CgroupsMemIsolatorProcess (149 ms)
[--] 1 test from UserCgroupIsolatorTest/0 (149 ms total)

[--] 1 test from UserCgroupIsolatorTest/1, where TypeParam = 
mesos::internal::slave::CgroupsCpushareIsolatorProcess
userdel: user 'mesos.test.unprivileged.user' does not exist
[ RUN  ] UserCgroupIsolatorTest/1.ROOT_CGROUPS_UserCgroup
-bash: /sys/fs/cgroup/cpuacct/mesos/container/cgroup.procs: No such file or 
directory


  was:
After downloading Mesos 22.1 and attemted to build it, I've encountered 
failrues on the build process below:

  FAILED  ] UserCgroupIsolatorTest/0.ROOT_CGROUPS_UserCgroup, where TypeParam = 
mesos::internal::slave::CgroupsMemIsolatorProcess (149 ms)
[--] 1 test from UserCgroupIsolatorTest/0 (149 ms total)

[--] 1 test from UserCgroupIsolatorTest/1, where TypeParam = 
mesos::internal::slave::CgroupsCpushareIsolatorProcess
userdel: user 'mesos.test.unprivileged.user' does not exist
[ RUN  ] UserCgroupIsolatorTest/1.ROOT_CGROUPS_UserCgroup
-bash: /sys/fs/cgroup/cpuacct/mesos/container/cgroup.procs: No such file or 
directory



> Make Check Fails on RHEL 6
> --
>
> Key: MESOS-2982
> URL: https://issues.apache.org/jira/browse/MESOS-2982
> Project: Mesos
>  Issue Type: Bug
>  Components: build
>Affects Versions: 0.22.1
> Environment: Linux xxx 2.6.32-504.16.2.el6.x86_64 #1 SMP Tue Mar 10 
> 17:01:00 EDT 2015 x86_64 x86_64 x86_64 GNU/Linux
> Red Hat Enterprise Linux Server release 6.6 (Santiago)
>Reporter: Miguel Bernadin
>
> After downloading Mesos 22.1 and attemted to build it, I've encountered 
> failures on the build process below:
>   FAILED  ] UserCgroupIsolatorTest/0.ROOT_CGROUPS_UserCgroup, where TypeParam 
> = mesos::internal::slave::CgroupsMemIsolatorProcess (149 ms)
> [--] 1 test from UserCgroupIsolatorTest/0 (149 ms total)
> [--] 1 test from UserCgroupIsolatorTest/1, where TypeParam = 
> mesos::internal::slave::CgroupsCpushareIsolatorProcess
> userdel: user 'mesos.test.unprivileged.user' does not exist
> [ RUN  ] UserCgroupIsolatorTest/1.ROOT_CGROUPS_UserCgroup
> -bash: /sys/fs/cgroup/cpuacct/mesos/container/cgroup.procs: No such file or 
> directory



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-2982) Make Check Fails on RHEL 6

2015-07-01 Thread Miguel Bernadin (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-2982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14611191#comment-14611191
 ] 

Miguel Bernadin commented on MESOS-2982:


Thanks a lot Ian!

The build seem to have completed and returned these messages below. Do you know 
if they any of concern?

2015-07-01 
16:39:23,342:90964(0x7f33f7fff700):ZOO_ERROR@handle_socket_error_msg@1697: 
Socket [127.0.0.1:52417] zk retcode=-4, errno=111(Connection refused): server 
refused to accept the client
[   OK ] Strict/RegistrarTest.abort/1 (1021 ms)
[--] 16 tests from Strict/RegistrarTest (27443 ms total)

[--] Global test environment tear-down
[==] 601 tests from 101 test cases ran. (745589 ms total)
[  PASSED  ] 570 tests.
[  FAILED  ] 31 tests, listed below:
[  FAILED  ] CgroupsAnyHierarchyWithPerfEventTest.ROOT_CGROUPS_Perf
[  FAILED  ] SlaveRecoveryTest/0.RecoverSlaveState, where TypeParam = 
mesos::internal::slave::MesosContainerizer
[  FAILED  ] SlaveRecoveryTest/0.RecoverStatusUpdateManager, where TypeParam = 
mesos::internal::slave::MesosContainerizer
[  FAILED  ] SlaveRecoveryTest/0.ReconnectExecutor, where TypeParam = 
mesos::internal::slave::MesosContainerizer
[  FAILED  ] SlaveRecoveryTest/0.RecoverUnregisteredExecutor, where TypeParam = 
mesos::internal::slave::MesosContainerizer
[  FAILED  ] SlaveRecoveryTest/0.RecoverTerminatedExecutor, where TypeParam = 
mesos::internal::slave::MesosContainerizer
[  FAILED  ] SlaveRecoveryTest/0.RecoverCompletedExecutor, where TypeParam = 
mesos::internal::slave::MesosContainerizer
[  FAILED  ] SlaveRecoveryTest/0.CleanupExecutor, where TypeParam = 
mesos::internal::slave::MesosContainerizer
[  FAILED  ] SlaveRecoveryTest/0.RemoveNonCheckpointingFramework, where 
TypeParam = mesos::internal::slave::MesosContainerizer
[  FAILED  ] SlaveRecoveryTest/0.NonCheckpointingFramework, where TypeParam = 
mesos::internal::slave::MesosContainerizer
[  FAILED  ] SlaveRecoveryTest/0.NonCheckpointingSlave, where TypeParam = 
mesos::internal::slave::MesosContainerizer
[  FAILED  ] SlaveRecoveryTest/0.KillTask, where TypeParam = 
mesos::internal::slave::MesosContainerizer
[  FAILED  ] SlaveRecoveryTest/0.Reboot, where TypeParam = 
mesos::internal::slave::MesosContainerizer
[  FAILED  ] SlaveRecoveryTest/0.GCExecutor, where TypeParam = 
mesos::internal::slave::MesosContainerizer
[  FAILED  ] SlaveRecoveryTest/0.ShutdownSlave, where TypeParam = 
mesos::internal::slave::MesosContainerizer
[  FAILED  ] SlaveRecoveryTest/0.ShutdownSlaveSIGUSR1, where TypeParam = 
mesos::internal::slave::MesosContainerizer
[  FAILED  ] SlaveRecoveryTest/0.RegisterDisconnectedSlave, where TypeParam = 
mesos::internal::slave::MesosContainerizer
[  FAILED  ] SlaveRecoveryTest/0.ReconcileKillTask, where TypeParam = 
mesos::internal::slave::MesosContainerizer
[  FAILED  ] SlaveRecoveryTest/0.ReconcileShutdownFramework, where TypeParam = 
mesos::internal::slave::MesosContainerizer
[  FAILED  ] SlaveRecoveryTest/0.ReconcileTasksMissingFromSlave, where 
TypeParam = mesos::internal::slave::MesosContainerizer
[  FAILED  ] SlaveRecoveryTest/0.SchedulerFailover, where TypeParam = 
mesos::internal::slave::MesosContainerizer
[  FAILED  ] SlaveRecoveryTest/0.PartitionedSlave, where TypeParam = 
mesos::internal::slave::MesosContainerizer
[  FAILED  ] SlaveRecoveryTest/0.MasterFailover, where TypeParam = 
mesos::internal::slave::MesosContainerizer
[  FAILED  ] SlaveRecoveryTest/0.MultipleFrameworks, where TypeParam = 
mesos::internal::slave::MesosContainerizer
[  FAILED  ] SlaveRecoveryTest/0.MultipleSlaves, where TypeParam = 
mesos::internal::slave::MesosContainerizer
[  FAILED  ] SlaveRecoveryTest/0.RestartBeforeContainerizerLaunch, where 
TypeParam = mesos::internal::slave::MesosContainerizer
[  FAILED  ] MesosContainerizerSlaveRecoveryTest.ResourceStatistics
[  FAILED  ] MesosContainerizerSlaveRecoveryTest.CGROUPS_ROOT_PerfRollForward
[  FAILED  ] 
MesosContainerizerSlaveRecoveryTest.CGROUPS_ROOT_PidNamespaceForward
[  FAILED  ] 
MesosContainerizerSlaveRecoveryTest.CGROUPS_ROOT_PidNamespaceBackward
[  FAILED  ] ExamplesTest.JavaFramework

31 FAILED TESTS
  YOU HAVE 9 DISABLED TESTS

[root@build]# 


> Make Check Fails on RHEL 6
> --
>
> Key: MESOS-2982
> URL: https://issues.apache.org/jira/browse/MESOS-2982
> Project: Mesos
>  Issue Type: Bug
>  Components: build
>Affects Versions: 0.22.1
> Environment: Linux xxx 2.6.32-504.16.2.el6.x86_64 #1 SMP Tue Mar 10 
> 17:01:00 EDT 2015 x86_64 x86_64 x86_64 GNU/Linux
> Red Hat Enterprise Linux Server release 6.6 (Santiago)
>Reporter: Miguel Bernadin
>
> After downloading Mesos 22.1 and attemted to build it, I've encountered 
> failures on the build process below:
>   FAILED  ] UserCgroupIsolatorTest/0.ROOT_CGROUPS_UserCgroup, where TypeParam 
> = mesos::internal::

[jira] [Commented] (MESOS-2982) Make Check Fails on RHEL 6

2015-07-01 Thread Miguel Bernadin (JIRA)


[ 
https://issues.apache.org/jira/browse/MESOS-2982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14611216#comment-14611216
 ] 

Miguel Bernadin commented on MESOS-2982:


Thanks again. Yes, here are the test below as requested. I will be able to 
update the kernel to something more recent. I will do so and provide and update 
here.

MESOS_VERBOSE=1 ./bin/mesos-tests.sh 
--gtest_filter="SlaveRecoveryTest/0.RecoverSlaveState"

Source directory: /usr/local/mesos-0.22.1
Build directory: /usr/local/mesos-0.22.1/build
-
We cannot run any cgroups tests that require mounting
hierarchies because you have the following hierarchies mounted:
/cgroup/blkio, /cgroup/cpu, /cgroup/cpuacct, /cgroup/cpuset, /cgroup/devices, 
/cgroup/freezer, /cgroup/memory, /cgroup/net_cls, 
/tmp/mesos_test_cgroup/perf_event
We'll disable the CgroupsNoHierarchyTest test fixture for now.
-
Note: Google Test filter = 
SlaveRecoveryTest/0.RecoverSlaveState-CgroupsNoHierarchyTest.ROOT_CGROUPS_NOHIERARCHY_MountUnmountHierarchy:SlaveCount/Registrar_BENCHMARK_Test.performance/0:SlaveCount/Registrar_BENCHMARK_Test.performance/1:SlaveCount/Registrar_BENCHMARK_Test.performance/2:SlaveCount/Registrar_BENCHMARK_Test.performance/3
[==] Running 1 test from 1 test case.
[--] Global test environment set-up.
[--] 1 test from SlaveRecoveryTest/0, where TypeParam = 
mesos::internal::slave::MesosContainerizer
../../src/tests/mesos.cpp:501: Failure
(cgroups::cleanup(hierarchy)).failure(): Failed to remove cgroup 
'/tmp/mesos_test_cgroup/perf_event/mesos_test': Device or resource busy
[ RUN  ] SlaveRecoveryTest/0.RecoverSlaveState
Using temporary directory '/tmp/SlaveRecoveryTest_0_RecoverSlaveState_InpqgG'
../../src/tests/mesos.cpp:562: Failure
cgroups::mount(hierarchy, subsystem): 'freezer' is already attached to another 
hierarchy
-
We cannot run any cgroups tests that require
a hierarchy with subsystem 'freezer'
because we failed to find an existing hierarchy
or create a new one (tried '/tmp/mesos_test_cgroup/freezer').
You can either remove all existing
hierarchies, or disable this test case
(i.e., --gtest_filter=-SlaveRecoveryTest/0.*).
-
../../src/tests/mesos.cpp:598: Failure
(cgroups::destroy(hierarchy, cgroup)).failure(): Failed to remove cgroup 
'/tmp/mesos_test_cgroup/perf_event/mesos_test': Device or resource busy
[  FAILED  ] SlaveRecoveryTest/0.RecoverSlaveState, where TypeParam = 
mesos::internal::slave::MesosContainerizer (7 ms)
../../src/tests/mesos.cpp:519: Failure
(cgroups::cleanup(hierarchy)).failure(): Failed to remove cgroup 
'/tmp/mesos_test_cgroup/perf_event/mesos_test': Device or resource busy
[--] 1 test from SlaveRecoveryTest/0 (7 ms total)

[--] Global test environment tear-down
[==] 1 test from 1 test case ran. (32 ms total)
[  PASSED  ] 0 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] SlaveRecoveryTest/0.RecoverSlaveState, where TypeParam = 
mesos::internal::slave::MesosContainerizer

 1 FAILED TEST
  YOU HAVE 9 DISABLED TESTS


> Make Check Fails on RHEL 6
> --
>
> Key: MESOS-2982
> URL: https://issues.apache.org/jira/browse/MESOS-2982
> Project: Mesos
>  Issue Type: Bug
>  Components: build
>Affects Versions: 0.22.1
> Environment: Linux xxx 2.6.32-504.16.2.el6.x86_64 #1 SMP Tue Mar 10 
> 17:01:00 EDT 2015 x86_64 x86_64 x86_64 GNU/Linux
> Red Hat Enterprise Linux Server release 6.6 (Santiago)
>Reporter: Miguel Bernadin
>
> After downloading Mesos 22.1 and attemted to build it, I've encountered 
> failures on the build process below:
>   FAILED  ] UserCgroupIsolatorTest/0.ROOT_CGROUPS_UserCgroup, where TypeParam 
> = mesos::internal::slave::CgroupsMemIsolatorProcess (149 ms)
> [--] 1 test from UserCgroupIsolatorTest/0 (149 ms total)
> [--] 1 test from UserCgroupIsolatorTest/1, where TypeParam = 
> mesos::internal::slave::CgroupsCpushareIsolatorProcess
> userdel: user 'mesos.test.unprivileged.user' does not exist
> [ RUN  ] UserCgroupIsolatorTest/1.ROOT_CGROUPS_UserCgroup
> -bash: /sys/fs/cgroup/cpuacct/mesos/container/cgroup.procs: No such file or 
> directory



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MESOS-7170) Allow for custom filters on Mesos APIs

2017-02-24 Thread Miguel Bernadin (JIRA)

Miguel Bernadin created MESOS-7170:
--

 Summary: Allow for custom filters on Mesos APIs 
 Key: MESOS-7170
 URL: https://issues.apache.org/jira/browse/MESOS-7170
 Project: Mesos
  Issue Type: Improvement
  Components: HTTP API
Reporter: Miguel Bernadin
Assignee: Gilbert Song
Priority: Minor


For tasks.json API and others like state.json, etc, on larger clusters the data 
that Mesos master sends is quite lengthy. It would be good to provide filters 
in the API to allow Mesos to just send only the RUNNING tasks in the cluster so 
it does less work. Creating this JIRA so we can have intelligent filters to 
pick what data to send on the server side, rather than filtering it out on the 
client side. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (MESOS-7171) Mesos Containerizer Change Size of SHM

2017-02-24 Thread Miguel Bernadin (JIRA)

Miguel Bernadin created MESOS-7171:
--

 Summary: Mesos Containerizer Change Size of SHM
 Key: MESOS-7171
 URL: https://issues.apache.org/jira/browse/MESOS-7171
 Project: Mesos
  Issue Type: Improvement
Reporter: Miguel Bernadin
Priority: Minor


like the ability to adjust the size of the shared memory device just like this 
can be performed on docker.
For example: To be able to change this on docker you can specify how much space 
you would like to allocate as a parameter in the app definition in marathon.

{code}
  "parameters": [
{
  "key": "shm-size",
  "value": "256mb"
}
{code}

As you can see below, here is an example of a container running and how much 
space is available on disk reflecting this change.
Modified Parameter Container:
{code}
{
  "id": "/ubuntu-withshm",
  "cmd": "sleep 1000\n",
  "cpus": 1,
  "mem": 128,
  "disk": 0,
  "instances": 1,
  "container": {
"type": "DOCKER",
"volumes": [],
"docker": {
  "image": "ubuntu",
  "network": "HOST",
  "privileged": false,
  "parameters": [
{
  "key": "shm-size",
  "value": "256mb"
}
  ],
  "forcePullImage": false
}
  },
  "portDefinitions": [
{
  "port": 10005,
  "protocol": "tcp",
  "labels": {}
}
  ]
}
{code}

Modified Parameter Container:
{code}
core@ip-10-0-0-19 ~ $ docker exec -it a818cf2277a5 bash
root@ip-10-0-0-19:/# df -h
Filesystem  Size  Used Avail Use% Mounted on
overlay  37G  2.0G   33G   6% /
tmpfs   7.4G 0  7.4G   0% /dev
tmpfs   7.4G 0  7.4G   0% /sys/fs/cgroup
/dev/xvdb37G  2.0G   33G   6% /etc/hostname
shm 256M 0  256M   0% /dev/shm
{code}
Standard Container:
{code}
{
  "id": "/ubuntu-withoutshm",
  "cmd": "sleep 1",
  "cpus": 1,
  "mem": 128,
  "disk": 0,
  "instances": 1,
  "container": {
"type": "DOCKER",
"volumes": [],
"docker": {
  "image": "ubuntu",
  "network": "HOST",
  "privileged": false,
  "parameters": [],
  "forcePullImage": false
}
  },
  "portDefinitions": [
{
  "port": 10006,
  "protocol": "tcp",
  "labels": {}
}
  ]
}
{code}
Standard Container:
{code}
root@ip-10-0-0-19:/# exit
exit
core@ip-10-0-0-19 ~ $ docker exec -it c85433062e78 bash
root@ip-10-0-0-19:/# df -h
Filesystem  Size  Used Avail Use% Mounted on
overlay  37G  2.0G   33G   6% /
tmpfs   7.4G 0  7.4G   0% /dev
tmpfs   7.4G 0  7.4G   0% /sys/fs/cgroup
/dev/xvdb37G  2.0G   33G   6% /etc/hostname
shm  64M 0   64M   0% /dev/shm
{code}

How can this be done on mesos containerizer?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Assigned] (MESOS-7171) Mesos Containerizer Change Size of SHM

2017-02-24 Thread Miguel Bernadin (JIRA)


 [ 
https://issues.apache.org/jira/browse/MESOS-7171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miguel Bernadin reassigned MESOS-7171:
--

Assignee: Joseph Wu

> Mesos Containerizer Change Size of SHM
> --
>
> Key: MESOS-7171
> URL: https://issues.apache.org/jira/browse/MESOS-7171
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Miguel Bernadin
>Assignee: Joseph Wu
>Priority: Minor
>
> like the ability to adjust the size of the shared memory device just like 
> this can be performed on docker.
> For example: To be able to change this on docker you can specify how much 
> space you would like to allocate as a parameter in the app definition in 
> marathon.
> {code}
>   "parameters": [
> {
>   "key": "shm-size",
>   "value": "256mb"
> }
> {code}
> As you can see below, here is an example of a container running and how much 
> space is available on disk reflecting this change.
> Modified Parameter Container:
> {code}
> {
>   "id": "/ubuntu-withshm",
>   "cmd": "sleep 1000\n",
>   "cpus": 1,
>   "mem": 128,
>   "disk": 0,
>   "instances": 1,
>   "container": {
> "type": "DOCKER",
> "volumes": [],
> "docker": {
>   "image": "ubuntu",
>   "network": "HOST",
>   "privileged": false,
>   "parameters": [
> {
>   "key": "shm-size",
>   "value": "256mb"
> }
>   ],
>   "forcePullImage": false
> }
>   },
>   "portDefinitions": [
> {
>   "port": 10005,
>   "protocol": "tcp",
>   "labels": {}
> }
>   ]
> }
> {code}
> Modified Parameter Container:
> {code}
> core@ip-10-0-0-19 ~ $ docker exec -it a818cf2277a5 bash
> root@ip-10-0-0-19:/# df -h
> Filesystem  Size  Used Avail Use% Mounted on
> overlay  37G  2.0G   33G   6% /
> tmpfs   7.4G 0  7.4G   0% /dev
> tmpfs   7.4G 0  7.4G   0% /sys/fs/cgroup
> /dev/xvdb37G  2.0G   33G   6% /etc/hostname
> shm 256M 0  256M   0% /dev/shm
> {code}
> Standard Container:
> {code}
> {
>   "id": "/ubuntu-withoutshm",
>   "cmd": "sleep 1",
>   "cpus": 1,
>   "mem": 128,
>   "disk": 0,
>   "instances": 1,
>   "container": {
> "type": "DOCKER",
> "volumes": [],
> "docker": {
>   "image": "ubuntu",
>   "network": "HOST",
>   "privileged": false,
>   "parameters": [],
>   "forcePullImage": false
> }
>   },
>   "portDefinitions": [
> {
>   "port": 10006,
>   "protocol": "tcp",
>   "labels": {}
> }
>   ]
> }
> {code}
> Standard Container:
> {code}
> root@ip-10-0-0-19:/# exit
> exit
> core@ip-10-0-0-19 ~ $ docker exec -it c85433062e78 bash
> root@ip-10-0-0-19:/# df -h
> Filesystem  Size  Used Avail Use% Mounted on
> overlay  37G  2.0G   33G   6% /
> tmpfs   7.4G 0  7.4G   0% /dev
> tmpfs   7.4G 0  7.4G   0% /sys/fs/cgroup
> /dev/xvdb37G  2.0G   33G   6% /etc/hostname
> shm  64M 0   64M   0% /dev/shm
> {code}
> How can this be done on mesos containerizer?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (MESOS-7474) Mesos Fetcher Cache Doesn't Retry when Missed

2017-05-08 Thread Miguel Bernadin (JIRA)

Miguel Bernadin created MESOS-7474:
--

 Summary: Mesos Fetcher Cache Doesn't Retry when Missed
 Key: MESOS-7474
 URL: https://issues.apache.org/jira/browse/MESOS-7474
 Project: Mesos
  Issue Type: Bug
  Components: fetcher
Affects Versions: 1.2.0
Reporter: Miguel Bernadin
Assignee: Joseph Wu


Mesos Fetcher doesn't retry when a cache is missed. It needs to have the 
ability to pull from source when it fails. 

421 15:52:53.022902 32751 fetcher.cpp:498] Fetcher Info: 
{"cache_directory":"\/tmp\/mesos\/fetch\/slaves\/","items":[\{"action":"RETRIEVE_FROM_CACHE","cache_filename":")","uri":\{"cache":true,"executable":false,"extract":true,"value":"https:\/\/\/"}}],"sandbox_directory":"\/var\/lib\/mesos\/slave\/slaves\/\/frameworks\\/executors\/name\/runs\/"}
 
I0421 15:52:53.024926 32751 fetcher.cpp:409] Fetching URI 
'"https:\/\/\/" 
I0421 15:52:53.024942 32751 fetcher.cpp:306] Fetching from cache 
I0421 15:52:53.024958 32751 fetcher.cpp:84] Extracting with command: tar -C 
"\/var\/lib\/mesos\/slave\/slaves\/\/frameworks\\/executors\/name\/runs\/'
 -xf 
'/tmp/mesos/fetch/slaves/f3feeab8-a2fe-4ac1-afeb-ec7bd4ce7b0d-S29/c1-docker-hub.tar.gz'
 
tar: /"https:\/\/\/": Cannot 
open: No such file or directory 
tar: Error is not recoverable: exiting now 
Failed to fetch '"https:\/\/\/"': 
Failed to extract: command tar -C 
'"\/var\/lib\/mesos\/slave\/slaves\/\/frameworks\\/executors\/name\/runs\/'
 -xf '/tmp/mesos/fetch/slaves/"' exited with status: 512




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (MESOS-9233) Mesos Fetcher failes for windows with linux tars

2018-09-14 Thread Miguel Bernadin (JIRA)

Miguel Bernadin created MESOS-9233:
--

 Summary: Mesos Fetcher failes for windows with linux tars
 Key: MESOS-9233
 URL: https://issues.apache.org/jira/browse/MESOS-9233
 Project: Mesos
  Issue Type: Bug
  Components: fetcher
Reporter: Miguel Bernadin


When trying to extract a linux tar on windows, tar fails with the command below:

{code}
tar xf server-jre-8u162-linux-x64.tar
jdk1.8.0_162/man/ja: Can't create 
'?\\C:\\Users\\andrew\\Downloads\\jdk1.8.0_162\\man\\ja'
jdk1.8.0_162/jre/lib/amd64/server/libjsig.so: Can't create 
'?\\C:\\Users\\andrew\\Downloads\\jdk1.8.0_162\\jre\\lib\\amd64\\server\\libjsig.so'
tar.exe: Error exit delayed from previous errors.
{code}

[~andschwa] has found that someone has attempted to get this to work for 
windows which should resolve this problem for Mesos.  
https://github.com/libarchive/libarchive/pull/1030


Marathon app def to reproduce as well:
{code:java}
{
  "id": "/sleep",
  "backoffFactor": 1.15,
  "backoffSeconds": 1,
  "cmd": "powershell -c start-sleep 999",
  "container": {
"type": "MESOS",
"volumes": []
  },
  "cpus": 0.1,
  "disk": 0,
  "fetch": [
{
  "uri": 
https://downloads.mesosphere.com/java/server-jre-8u162-linux-x64.tar.gz";,
  "extract": true,
  "executable": false,
  "cache": false
}
  ],
  "instances": 1,
  "maxLaunchDelaySeconds": 3600,
  "mem": 128,
  "gpus": 0,
  "networks": [
{
  "mode": "host"
}
  ],
  "portDefinitions": [],
  "requirePorts": false,
  "upgradeStrategy": {
"maximumOverCapacity": 1,
"minimumHealthCapacity": 1
  },
  "killSelection": "YOUNGEST_FIRST",
  "unreachableStrategy": {
"inactiveAfterSeconds": 0,
"expungeAfterSeconds": 0
  },
  "healthChecks": [],
  "constraints": []
}{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (MESOS-7920) Mesos Agents Crash If Loss Connection to ZK

[jira] [Assigned] (MESOS-6369) Add a column for FrameworkID when displaying tasks in the WebUI

[jira] [Commented] (MESOS-6369) Add a column for FrameworkID when displaying tasks in the WebUI

[jira] [Comment Edited] (MESOS-6369) Add a column for FrameworkID when displaying tasks in the WebUI

[jira] [Created] (MESOS-6598) Broken Link Framework Development Page

[jira] [Assigned] (MESOS-6598) Broken Link Framework Development Page

[jira] [Commented] (MESOS-6598) Broken Link Framework Development Page

[jira] [Updated] (MESOS-6598) Broken Link Framework Development Page

[jira] [Created] (MESOS-6841) Allow Dynamic CNI Configuration

[jira] [Created] (MESOS-6898) Allow Master Node Maintenance Primitives

[jira] [Updated] (MESOS-6898) Allow Master Node Maintenance Primitives

[jira] [Created] (MESOS-3586) Installing Mesos 0.24.0 on multiple systems

[jira] [Updated] (MESOS-3586) Installing Mesos 0.24.0 on multiple systems. Failed test on MemoryPressureMesosTest.CGROUPS_ROOT_Statistics

[jira] [Created] (MESOS-6098) Frameworks UI shows metrics for used plus offers

[jira] [Updated] (MESOS-6098) Frameworks UI shows metrics for used resources plus offers

[jira] [Updated] (MESOS-6098) Frameworks UI shows metrics for used resources plus offers

[jira] [Created] (MESOS-2982) Make Check Fails on RHEL 6

[jira] [Commented] (MESOS-2982) Make Check Fails on RHEL 6

[jira] [Updated] (MESOS-2982) Make Check Fails on RHEL 6

[jira] [Commented] (MESOS-2982) Make Check Fails on RHEL 6

[jira] [Commented] (MESOS-2982) Make Check Fails on RHEL 6

[jira] [Created] (MESOS-7170) Allow for custom filters on Mesos APIs

[jira] [Created] (MESOS-7171) Mesos Containerizer Change Size of SHM

[jira] [Assigned] (MESOS-7171) Mesos Containerizer Change Size of SHM

[jira] [Created] (MESOS-7474) Mesos Fetcher Cache Doesn't Retry when Missed

[jira] [Created] (MESOS-9233) Mesos Fetcher failes for windows with linux tars

26 matches

Site Navigation

Mail list logo

Footer information