[jira] [Created] (MESOS-8139) Upgrade protobuf to 3.4.x.

2017-10-26 Thread Benjamin Mahler (JIRA)
Benjamin Mahler created MESOS-8139:
--

 Summary: Upgrade protobuf to 3.4.x.
 Key: MESOS-8139
 URL: https://issues.apache.org/jira/browse/MESOS-8139
 Project: Mesos
  Issue Type: Task
Reporter: Benjamin Mahler


The 3.4.x release includes move support:
https://github.com/google/protobuf/releases/tag/v3.4.0

This will provide some performance improvements for us, and will allow us to 
start using move semantics for messages.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-8138) Master can fail to detect HTTP framework disconnection if it disconnects very fast

2017-10-26 Thread Yan Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16221487#comment-16221487
 ] 

Yan Xu commented on MESOS-8138:
---

{quote}
the master realizes the disconnection when it tries to the pipe immediately
{quote}

How? You mean the 
[following|https://github.com/apache/mesos/blob/f26ffcee0a359a644968feca1ec91243401f589a/src/master/master.cpp#L8178-L8179]
 will execute {{Self::exited}} immediately?

{code:title=}
  http.closed()
.onAny(defer(self(), ::exited, framework->id(), http));
{code}

It looks like it won't because {{http.close()}} internally tracks a 
{{process::http::Pipe::Writer writer}} 
[object|https://github.com/apache/mesos/blob/f26ffcee0a359a644968feca1ec91243401f589a/src/master/master.hpp#L303]
 which is instantiated 
[here|https://github.com/apache/mesos/blob/f26ffcee0a359a644968feca1ec91243401f589a/src/master/http.cpp#L843]
 which is not connected to the broken socket at all if the HttpProxy is 
terminated. 

Right?

> Master can fail to detect HTTP framework disconnection if it disconnects very 
> fast
> --
>
> Key: MESOS-8138
> URL: https://issues.apache.org/jira/browse/MESOS-8138
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API, master
>Reporter: Yan Xu
>
> What we've observed is that if the framework disconnects before the master 
> actor processes the initial subscribe request, the master would [set up an 
> exited 
> callback|https://github.com/apache/mesos/blob/f26ffcee0a359a644968feca1ec91243401f589a/src/master/master.cpp#L8179]
>  that never gets triggered.
> It looks like it's because when the socket closes and libprocess terminates 
> the HttpProxy for this socket, [the pipe reader for this proxy is not 
> set|https://github.com/apache/mesos/blob/f599839bb854c7aff3d610e49f7e5465d7fe9341/3rdparty/libprocess/src/process.cpp#L1515-L1518].
>  
> Later when the master [sets up the 
> callback|https://github.com/apache/mesos/blob/f26ffcee0a359a644968feca1ec91243401f589a/src/master/master.cpp#L8179],
>  it would be a noop in this regard.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-8138) Master can fail to detect HTTP framework disconnection if it disconnects very fast

2017-10-26 Thread Anand Mazumdar (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16221446#comment-16221446
 ] 

Anand Mazumdar commented on MESOS-8138:
---

hmm, doesn't the master realizes the disconnection when it tries to the pipe 
immediately with the {{SUBSCRIBED}} event or am I missing something?

> Master can fail to detect HTTP framework disconnection if it disconnects very 
> fast
> --
>
> Key: MESOS-8138
> URL: https://issues.apache.org/jira/browse/MESOS-8138
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API, master
>Reporter: Yan Xu
>
> What we've observed is that if the framework disconnects before the master 
> actor processes the initial subscribe request, the master would [set up an 
> exited 
> callback|https://github.com/apache/mesos/blob/f26ffcee0a359a644968feca1ec91243401f589a/src/master/master.cpp#L8179]
>  that never gets triggered.
> It looks like it's because when the socket closes and libprocess terminates 
> the HttpProxy for this socket, [the pipe reader for this proxy is not 
> set|https://github.com/apache/mesos/blob/f599839bb854c7aff3d610e49f7e5465d7fe9341/3rdparty/libprocess/src/process.cpp#L1515-L1518].
>  
> Later when the master [sets up the 
> callback|https://github.com/apache/mesos/blob/f26ffcee0a359a644968feca1ec91243401f589a/src/master/master.cpp#L8179],
>  it would be a noop in this regard.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8138) Master can fail to detect HTTP framework disconnection if it disconnects very fast

2017-10-26 Thread Yan Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Xu updated MESOS-8138:
--
Description: 
What we've observed is that if the framework disconnects before the master 
actor processes the initial subscribe request, the master would [set up an 
exited 
callback|https://github.com/apache/mesos/blob/f26ffcee0a359a644968feca1ec91243401f589a/src/master/master.cpp#L8179]
 that never gets triggered.

It looks like it's because when the socket closes and libprocess terminates the 
HttpProxy for this socket, [the pipe reader for this proxy is not 
set|https://github.com/apache/mesos/blob/f599839bb854c7aff3d610e49f7e5465d7fe9341/3rdparty/libprocess/src/process.cpp#L1515-L1518].
 

Later when the master [sets up the 
callback|https://github.com/apache/mesos/blob/f26ffcee0a359a644968feca1ec91243401f589a/src/master/master.cpp#L8179],
 it would be a noop in this regard.

  was:
What we've observed is that if the framework disconnects before the master 
actor processes the request, the master would [set up an exited 
callback|https://github.com/apache/mesos/blob/f26ffcee0a359a644968feca1ec91243401f589a/src/master/master.cpp#L8179]
 that never gets triggered.

It looks like it's because when the socket closes and libprocess terminates the 
HttpProxy for this socket, [the pipe reader for this proxy is not 
set|https://github.com/apache/mesos/blob/f599839bb854c7aff3d610e49f7e5465d7fe9341/3rdparty/libprocess/src/process.cpp#L1515-L1518].
 

Later when the master [sets up the 
callback|https://github.com/apache/mesos/blob/f26ffcee0a359a644968feca1ec91243401f589a/src/master/master.cpp#L8179],
 it would be a noop in this regard.


> Master can fail to detect HTTP framework disconnection if it disconnects very 
> fast
> --
>
> Key: MESOS-8138
> URL: https://issues.apache.org/jira/browse/MESOS-8138
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API, master
>Reporter: Yan Xu
>
> What we've observed is that if the framework disconnects before the master 
> actor processes the initial subscribe request, the master would [set up an 
> exited 
> callback|https://github.com/apache/mesos/blob/f26ffcee0a359a644968feca1ec91243401f589a/src/master/master.cpp#L8179]
>  that never gets triggered.
> It looks like it's because when the socket closes and libprocess terminates 
> the HttpProxy for this socket, [the pipe reader for this proxy is not 
> set|https://github.com/apache/mesos/blob/f599839bb854c7aff3d610e49f7e5465d7fe9341/3rdparty/libprocess/src/process.cpp#L1515-L1518].
>  
> Later when the master [sets up the 
> callback|https://github.com/apache/mesos/blob/f26ffcee0a359a644968feca1ec91243401f589a/src/master/master.cpp#L8179],
>  it would be a noop in this regard.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-8138) Master can fail to detect HTTP framework disconnection if it disconnects very fast

2017-10-26 Thread Yan Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16221369#comment-16221369
 ] 

Yan Xu commented on MESOS-8138:
---

/cc [~anandmazumdar] who implemented MESOS-2294.

> Master can fail to detect HTTP framework disconnection if it disconnects very 
> fast
> --
>
> Key: MESOS-8138
> URL: https://issues.apache.org/jira/browse/MESOS-8138
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API, master
>Reporter: Yan Xu
>
> What we've observed is that if the framework disconnects before the master 
> actor processes the request, the master would [set up an exited 
> callback|https://github.com/apache/mesos/blob/f26ffcee0a359a644968feca1ec91243401f589a/src/master/master.cpp#L8179]
>  that never gets triggered.
> It looks like it's because when the socket closes and libprocess terminates 
> the HttpProxy for this socket, [the pipe reader for this proxy is not 
> set|https://github.com/apache/mesos/blob/f599839bb854c7aff3d610e49f7e5465d7fe9341/3rdparty/libprocess/src/process.cpp#L1515-L1518].
>  
> Later when the master [sets up the 
> callback|https://github.com/apache/mesos/blob/f26ffcee0a359a644968feca1ec91243401f589a/src/master/master.cpp#L8179],
>  it would be a noop in this regard.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (MESOS-8138) Master can fail to detect HTTP framework disconnection if it disconnects very fast

2017-10-26 Thread Yan Xu (JIRA)
Yan Xu created MESOS-8138:
-

 Summary: Master can fail to detect HTTP framework disconnection if 
it disconnects very fast
 Key: MESOS-8138
 URL: https://issues.apache.org/jira/browse/MESOS-8138
 Project: Mesos
  Issue Type: Bug
  Components: HTTP API, master
Reporter: Yan Xu


What we've observed is that if the framework disconnects before the master 
actor processes the request, the master would [set up an exited 
callback|https://github.com/apache/mesos/blob/f26ffcee0a359a644968feca1ec91243401f589a/src/master/master.cpp#L8179]
 that never gets triggered.

It looks like it's because when the socket closes and libprocess terminates the 
HttpProxy for this socket, [the pipe reader for this proxy is not 
set|https://github.com/apache/mesos/blob/f599839bb854c7aff3d610e49f7e5465d7fe9341/3rdparty/libprocess/src/process.cpp#L1515-L1518].
 

Later when the master [sets up the 
callback|https://github.com/apache/mesos/blob/f26ffcee0a359a644968feca1ec91243401f589a/src/master/master.cpp#L8179],
 it would be a noop in this regard.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-5881) Semantics of `os::symlink` differ across POSIX and Windows

2017-10-26 Thread Andrew Schwartzmeyer (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16220888#comment-16220888
 ] 

Andrew Schwartzmeyer commented on MESOS-5881:
-

Yup, verified per 
https://msdn.microsoft.com/en-us/library/windows/desktop/aa365460(v=vs.85).aspx

> Symbolic links can point to a non-existent target.

> Semantics of `os::symlink` differ across POSIX and Windows
> --
>
> Key: MESOS-5881
> URL: https://issues.apache.org/jira/browse/MESOS-5881
> Project: Mesos
>  Issue Type: Bug
>  Components: stout
>Reporter: Alex Clemmer
>Assignee: Li Li
>  Labels: mesosphere, stout, windows
>
> This issue causes the following tests to fail on Windows:
> * RmdirTest.RemoveDirectoryWithNoTargetSymbolicLink
> * OsTest.Realpath
> On most POSIX implementations, it is possible to create a symlink with a 
> target that does not exist. On Windows, attempting to create a symlink 
> pointing to a target that does not exist will cause a runtime failure.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-7604) SlaveTest.ExecutorReregistrationTimeoutFlag aborts on Windows

2017-10-26 Thread Andrew Schwartzmeyer (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-7604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16220880#comment-16220880
 ] 

Andrew Schwartzmeyer commented on MESOS-7604:
-

Oh, I guess I didn't add: the problem is {{realpath}} not resolving correctly.

> SlaveTest.ExecutorReregistrationTimeoutFlag aborts on Windows
> -
>
> Key: MESOS-7604
> URL: https://issues.apache.org/jira/browse/MESOS-7604
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.4.0
> Environment: Windows
>Reporter: Joseph Wu
>Assignee: Andrew Schwartzmeyer
>  Labels: mesosphere, windows
>
> {code}
> [ RUN  ] SlaveTest.ExecutorReregistrationTimeoutFlag
> rk ae9679b1-67c9-4db6-8187-0641b0e929d2-
> I0601 23:53:23.488337  2748 master.cpp:1156] Master terminating
> I0601 23:53:23.492337  2728 hierarchical.cpp:579] Removed agent 
> ae9679b1-67c9-4db6-8187-0641b0e929d2-S0
> I0601 23:53:23.530340  1512 cluster.cpp:162] Creating default 'local' 
> authorizer
> I0601 23:53:23.544342  2728 master.cpp:436] Master 
> f07f4fdd-cd91-4d62-bf33-169b20d02020 (ip-172-20-128-1.ec2.internal) started 
> on 172.20.128.1:51241
> I0601 23:53:23.545341  2728 master.cpp:438] Flags at startup: --acls="" 
> --agent_ping_timeout="15secs" --agent_reregister_timeout="10mins" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate_agents="false" --authenticate_frameworks="false" 
> --authenticate_http_frameworks="true" --authenticate_http_readonly="true" 
> --authenticate_http_readwrite="true" --authenticators="crammd5" 
> --authorizers="local" --credentials="C:\temp\FWZORI\credentials" 
> --filter_gpu_resources="true" --framework_sorter="drf" --help="false" 
> --hostname_lookup="true" --http_authenticators="basic" 
> --http_framework_authenticators="basic" --initialize_driver_logging="true" 
> --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
> --max_agent_ping_timeouts="5" --max_completed_frameworks="50" 
> --max_completed_tasks_per_framework="1000" 
> --max_unreachable_tasks_per_framework="1000" --port="5050" --quiet="false" 
> --recovery_agent_removal_limit="100%" --registry="in_memory" 
> --registry_fetch_timeout="1mins" --registry_gc_interval="15mins" 
> --registry_max_agent_age="2weeks" --registry_max_agent_count="102400" 
> --registry_store_timeout="100secs" --registry_strict="false" 
> --root_submissions="true" --user_sorter="drf" --version="false" 
> --webui_dir="/webui" --work_dir="C:\temp\FWZORI\master" 
> --zk_session_timeout="10secs"
> I0601 23:53:23.550338  2728 master.cpp:515] Master only allowing 
> authenticated HTTP frameworks to register
> I0601 23:53:23.550338  2728 credentials.hpp:37] Loading credentials for 
> authentication from 'C:\temp\FWZORI\credentials'
> I0601 23:53:23.552338  2728 http.cpp:975] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-readonly'
> I0601 23:53:23.553339  2728 http.cpp:975] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-readwrite'
> I0601 23:53:23.554340  2728 http.cpp:975] Creating default 'basic' HTTP 
> authenticator for realm 'mesos-master-scheduler'
> I0601 23:53:23.555341  2728 master.cpp:640] Authorization enabled
> I0601 23:53:23.570340  2124 master.cpp:2159] Elected as the leading master!
> I0601 23:53:23.570340  2124 master.cpp:1698] Recovering from registrar
> I0601 23:53:23.573341  1920 registrar.cpp:389] Successfully fetched the 
> registry (0B) in 0ns
> I0601 23:53:23.573341  1920 registrar.cpp:493] Applied 1 operations in 0ns; 
> attempting to update the registry
> I0601 23:53:23.575342  1920 registrar.cpp:550] Successfully updated the 
> registry in 0ns
> I0601 23:53:23.576344  1920 registrar.cpp:422] Successfully recovered 
> registrar
> I0601 23:53:23.577342  2728 master.cpp:1797] Recovered 0 agents from the 
> registry (167B); allowing 10mins for agents to re-register
> I0601 23:53:23.595341  1512 containerizer.cpp:230] Using isolation: 
> windows/cpu,filesystem/windows,environment_secret
> I0601 23:53:23.596343  1512 provisioner.cpp:255] Using default backend 'copy'
> I0601 23:53:23.626343  3976 slave.cpp:248] Mesos agent started on 
> (133)@172.20.128.1:51241
> I0601 23:53:23.627342  3976 slave.cpp:249] Flags at startup: 
> --appc_simple_discovery_uri_prefix="http://; 
> --appc_store_dir="C:\temp\kglZbS\store\appc" 
> --authenticate_http_readonly="true" --authenticate_http_readwrite="true" 
> --authenticatee="crammd5" --authentication_backoff_factor="1secs" 
> --authorizer="local" --container_disk_watch_interval="15secs" 
> --containerizers="mesos" --default_role="*" --disk_watch_interval="1mins" 
> --docker="docker" --docker_kill_orphans="true" 
> --docker_registry="https://registry-1.docker.io; --docker_remove_delay="6hrs" 
> 

[jira] [Commented] (MESOS-6671) External 3rdparty deps are not built with the configured compiler in cmake build

2017-10-26 Thread Andrew Schwartzmeyer (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16220845#comment-16220845
 ] 

Andrew Schwartzmeyer commented on MESOS-6671:
-

Looks like there's a review here: https://reviews.apache.org/r/63033/

> External 3rdparty deps are not built with the configured compiler in cmake 
> build
> 
>
> Key: MESOS-6671
> URL: https://issues.apache.org/jira/browse/MESOS-6671
> Project: Mesos
>  Issue Type: Bug
>  Components: cmake
>Reporter: Benjamin Bannier
>Assignee: Andrew Schwartzmeyer
>  Labels: cmake
>
> CMake permits users to modify the compiler for e.g., C++ source files by 
> setting {{CMAKE_CXX_COMPILER}}. Additionally, to use compiler wrappers like 
> ccache or distcc modern cmake versions allow to specify the wrapper by 
> setting e.g., {{CMAKE_CXX_COMPILER_LAUNCHER}}.
> The current Mesos cmake ignores both these variables when building external 
> 3rdparty autotools-based dependencies, and the only way to overwrite the 
> compiler there would be to set e.g., {{CXX='ccache clang++'}} and {{make}} 
> time. This is undesirable as it is too easy to introduce inconsistent 
> compiler settings.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-5820) Investigate porting master; develop time estimates

2017-10-26 Thread Andrew Schwartzmeyer (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16220834#comment-16220834
 ] 

Andrew Schwartzmeyer commented on MESOS-5820:
-

The only real blocker is Windows support for leveldb, which is now happening.

Windows support: https://github.com/google/leveldb/issues/519

CMake support: https://github.com/google/leveldb/issues/466 

> Investigate porting master; develop time estimates
> --
>
> Key: MESOS-5820
> URL: https://issues.apache.org/jira/browse/MESOS-5820
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Reporter: Alex Clemmer
>Assignee: Andrew Schwartzmeyer
>  Labels: master, mesosphere
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (MESOS-8137) Mesos agent can hang during startup.

2017-10-26 Thread Jie Yu (JIRA)
Jie Yu created MESOS-8137:
-

 Summary: Mesos agent can hang during startup.
 Key: MESOS-8137
 URL: https://issues.apache.org/jira/browse/MESOS-8137
 Project: Mesos
  Issue Type: Bug
Affects Versions: 1.4.0
Reporter: Jie Yu


Environment:
Linux dcos-agentdisks-as1-1100-2 4.11.0-1011-azure #11-Ubuntu SMP Tue Sep 19 
19:03:54 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

{noformat}
#0  __lll_lock_wait_private () at 
../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:95
#1  0x7f132b856f7b in __malloc_fork_lock_parent () at arena.c:155
#2  0x7f132b89f5da in __libc_fork () at ../sysdeps/nptl/fork.c:131
#3  0x7f132b842350 in _IO_new_proc_open (fp=fp@entry=0xf1282b84e0, 
command=command@entry=0xf1282b6ea8 “logrotate --help > /dev/null”, 
mode=, mode@entry=0xf1275fb0f2 “r”)
at iopopen.c:180
#4  0x7f132b84265c in _IO_new_popen (command=0xf1282b6ea8 “logrotate --help 
> /dev/null”, mode=0xf1275fb0f2 “r”) at iopopen.c:296
#5  0x00f1275e622a in Try os::shell<>(std::string 
const&) ()
#6  0x7f130fdbae37 in mesos::journald::flags::Flags()::{lambda(std::string 
const&)#2}::operator()(std::string const&) const (value=..., 
__closure=)
at /pkg/src/mesos-modules/journald/lib_journald.hpp:153
#7  void flags::FlagsBase::add(std::string mesos::journald::flags::*, flags::Name const&, 
Option const&, std::string const&, char 
const (*) [10], mesos::journald::flags::basic_string()::{lambda(std::string 
const&)#2})::{lambda(flags::FlagsBase const&)#3}::operator()(flags::FlagsBase 
const) const (base=..., __closure=) at 
/opt/mesosphere/active/mesos/include/stout/flags/flags.hpp:399
#8  std::_Function_handler

[jira] [Issue Comment Deleted] (MESOS-6690) Wire up resource control API to Windows Job objects API

2017-10-26 Thread Andrew Schwartzmeyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Schwartzmeyer updated MESOS-6690:

Comment: was deleted

(was: Review chain here: https://reviews.apache.org/r/63279/)

> Wire up resource control API to Windows Job objects API
> ---
>
> Key: MESOS-6690
> URL: https://issues.apache.org/jira/browse/MESOS-6690
> Project: Mesos
>  Issue Type: Bug
>  Components: agent, containerization
>Reporter: Alex Clemmer
>Assignee: Andrew Schwartzmeyer
>  Labels: microsoft, windows-mvp
>
> The Windows version of the containerizer actually launches tasks in the Job 
> object API, which means that it should be possible to wire up the Mesos 
> resource constraint API and use those constraints when we launch a task.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-6690) Wire up resource control API to Windows Job objects API

2017-10-26 Thread Andrew Schwartzmeyer (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16220740#comment-16220740
 ] 

Andrew Schwartzmeyer commented on MESOS-6690:
-

Review chain here: https://reviews.apache.org/r/63279/

> Wire up resource control API to Windows Job objects API
> ---
>
> Key: MESOS-6690
> URL: https://issues.apache.org/jira/browse/MESOS-6690
> Project: Mesos
>  Issue Type: Bug
>  Components: agent, containerization
>Reporter: Alex Clemmer
>Assignee: Andrew Schwartzmeyer
>  Labels: microsoft, windows-mvp
>
> The Windows version of the containerizer actually launches tasks in the Job 
> object API, which means that it should be possible to wire up the Mesos 
> resource constraint API and use those constraints when we launch a task.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-8136) Update XFS isolator tests to handle TASK_STARTING.

2017-10-26 Thread James Peach (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Peach updated MESOS-8136:
---
Component/s: test

> Update XFS isolator tests to handle TASK_STARTING.
> --
>
> Key: MESOS-8136
> URL: https://issues.apache.org/jira/browse/MESOS-8136
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Reporter: James Peach
>
> The XFS isolator tests are failing since the default executors started 
> sending {{TASK_STARTING}}.
> {noformat}
> 04:37:25 - [ RUN  ] ROOT_XFS_QuotaTest.DirectoryTree
> ...
> 04:37:26 - I1026 04:37:26.400321  7723 executor.cpp:477] Running 
> '/tmp/mesos-build/mesos/build/src/mesos-containerizer launch 
> '
> 04:37:26 - ../../src/tests/containerizer/xfs_quota_tests.cpp:427: Failure
> 04:37:26 -   Expected: TASK_RUNNING
> 04:37:26 - To be equal to: status1->state()
> 04:37:26 -   Which is: TASK_STARTING
> 04:37:26 - I1026 04:37:26.409204  7723 executor.cpp:650] Forked command at 
> 7730
> 04:37:26 - ../../src/tests/containerizer/xfs_quota_tests.cpp:431: Failure
> 04:37:26 -   Expected: TASK_FAILED
> 04:37:26 - To be equal to: status2->state()
> 04:37:26 -   Which is: TASK_RUNNING
> 04:37:26 - ../../src/tests/containerizer/xfs_quota_tests.cpp:436: Failure
> 04:37:26 -   Expected: "Command exited with status 1"
> 04:37:26 - To be equal to: status2->message()
> 04:37:26 -   Which is: ""
> ...
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (MESOS-8136) Update XFS isolator tests to handle TASK_STARTING.

2017-10-26 Thread James Peach (JIRA)
James Peach created MESOS-8136:
--

 Summary: Update XFS isolator tests to handle TASK_STARTING.
 Key: MESOS-8136
 URL: https://issues.apache.org/jira/browse/MESOS-8136
 Project: Mesos
  Issue Type: Bug
Reporter: James Peach


The XFS isolator tests are failing since the default executors started sending 
{{TASK_STARTING}}.

{noformat}
04:37:25 - [ RUN  ] ROOT_XFS_QuotaTest.DirectoryTree
...
04:37:26 - I1026 04:37:26.400321  7723 executor.cpp:477] Running 
'/tmp/mesos-build/mesos/build/src/mesos-containerizer launch 
'
04:37:26 - ../../src/tests/containerizer/xfs_quota_tests.cpp:427: Failure
04:37:26 -   Expected: TASK_RUNNING
04:37:26 - To be equal to: status1->state()
04:37:26 -   Which is: TASK_STARTING
04:37:26 - I1026 04:37:26.409204  7723 executor.cpp:650] Forked command at 7730
04:37:26 - ../../src/tests/containerizer/xfs_quota_tests.cpp:431: Failure
04:37:26 -   Expected: TASK_FAILED
04:37:26 - To be equal to: status2->state()
04:37:26 -   Which is: TASK_RUNNING
04:37:26 - ../../src/tests/containerizer/xfs_quota_tests.cpp:436: Failure
04:37:26 -   Expected: "Command exited with status 1"
04:37:26 - To be equal to: status2->message()
04:37:26 -   Which is: ""
...
{noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (MESOS-8130) Add placeholder handlers for offer operation feedback

2017-10-26 Thread Greg Mann (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16220059#comment-16220059
 ] 

Greg Mann commented on MESOS-8130:
--

Review here: https://reviews.apache.org/r/63322/

> Add placeholder handlers for offer operation feedback
> -
>
> Key: MESOS-8130
> URL: https://issues.apache.org/jira/browse/MESOS-8130
> Project: Mesos
>  Issue Type: Task
>  Components: agent, master
>Reporter: Greg Mann
>Assignee: Greg Mann
>  Labels: mesosphere
>
> In order to sketch out the flow of messages necessary to facilitate offer 
> operation feedback, we should add some empty placeholder handlers to the 
> master and agent as detailed in the [offer operation feedback design 
> doc|https://docs.google.com/document/d/1GGh14SbPTItjiweSZfann4GZ6PCteNrn-1y4pxOjgcI/edit#].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Issue Comment Deleted] (MESOS-8130) Add placeholder handlers for offer operation feedback

2017-10-26 Thread Greg Mann (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-8130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann updated MESOS-8130:
-
Comment: was deleted

(was: Review here: https://reviews.apache.org/r/63322/)

> Add placeholder handlers for offer operation feedback
> -
>
> Key: MESOS-8130
> URL: https://issues.apache.org/jira/browse/MESOS-8130
> Project: Mesos
>  Issue Type: Task
>  Components: agent, master
>Reporter: Greg Mann
>Assignee: Greg Mann
>  Labels: mesosphere
>
> In order to sketch out the flow of messages necessary to facilitate offer 
> operation feedback, we should add some empty placeholder handlers to the 
> master and agent as detailed in the [offer operation feedback design 
> doc|https://docs.google.com/document/d/1GGh14SbPTItjiweSZfann4GZ6PCteNrn-1y4pxOjgcI/edit#].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (MESOS-1553) MasterTest.KillTask is flaky

2017-10-26 Thread Alexander Rukletsov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Rukletsov updated MESOS-1553:
---
Labels: flaky flaky-test  (was: flaky)

> MasterTest.KillTask is flaky
> 
>
> Key: MESOS-1553
> URL: https://issues.apache.org/jira/browse/MESOS-1553
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Reporter: Vinod Kone
>  Labels: flaky, flaky-test
>
> Not entirely sure if this is the fault of the test. It looks like 
> Cluster::Slaves::shutdown() was never called presumable because 
> Cluster::Master::shutdown() was blocked on something.
> {code}
> [ RUN  ] MasterTest.KillTask
> Using temporary directory '/tmp/MasterTest_KillTask_BYKYwN'
> I0627 13:11:56.627650  6574 leveldb.cpp:176] Opened db in 706544ns
> I0627 13:11:56.628262  6574 leveldb.cpp:183] Compacted db in 234376ns
> I0627 13:11:56.628664  6574 leveldb.cpp:198] Created db iterator in 6515ns
> I0627 13:11:56.628991  6574 leveldb.cpp:204] Seeked to beginning of db in 
> 1589ns
> I0627 13:11:56.629302  6574 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 837ns
> I0627 13:11:56.629667  6574 replica.cpp:741] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0627 13:11:56.630141  6594 recover.cpp:425] Starting replica recovery
> I0627 13:11:56.630204  6594 recover.cpp:451] Replica is in EMPTY status
> I0627 13:11:56.630439  6594 replica.cpp:638] Replica in EMPTY status received 
> a broadcasted recover request
> I0627 13:11:56.630491  6594 recover.cpp:188] Received a recover response from 
> a replica in EMPTY status
> I0627 13:11:56.630604  6594 recover.cpp:542] Updating replica status to 
> STARTING
> I0627 13:11:56.630693  6594 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 40601ns
> I0627 13:11:56.630708  6594 replica.cpp:320] Persisted replica status to 
> STARTING
> I0627 13:11:56.630744  6594 recover.cpp:451] Replica is in STARTING status
> I0627 13:11:56.630914  6594 replica.cpp:638] Replica in STARTING status 
> received a broadcasted recover request
> I0627 13:11:56.630955  6594 recover.cpp:188] Received a recover response from 
> a replica in STARTING status
> I0627 13:11:56.631019  6594 recover.cpp:542] Updating replica status to VOTING
> I0627 13:11:56.631067  6594 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 15771ns
> I0627 13:11:56.631080  6594 replica.cpp:320] Persisted replica status to 
> VOTING
> I0627 13:11:56.631100  6594 recover.cpp:556] Successfully joined the Paxos 
> group
> I0627 13:11:56.631136  6594 recover.cpp:440] Recover process terminated
> I0627 13:11:56.634690  6600 master.cpp:288] Master 
> 20140627-131156-2759502016-44870-6574 (fedora-20) started on 
> 192.168.122.164:44870
> I0627 13:11:56.634718  6600 master.cpp:325] Master only allowing 
> authenticated frameworks to register
> I0627 13:11:56.634726  6600 master.cpp:330] Master only allowing 
> authenticated slaves to register
> I0627 13:11:56.634733  6600 credentials.hpp:35] Loading credentials for 
> authentication from '/tmp/MasterTest_KillTask_BYKYwN/credentials'
> I0627 13:11:56.634809  6600 master.cpp:356] Authorization enabled
> I0627 13:11:56.635213  6600 hierarchical_allocator_process.hpp:301] 
> Initializing hierarchical allocator process with master : 
> master@192.168.122.164:44870
> I0627 13:11:56.635254  6600 master.cpp:122] No whitelist given. Advertising 
> offers for all slaves
> I0627 13:11:56.635414  6600 master.cpp:1122] The newly elected leader is 
> master@192.168.122.164:44870 with id 20140627-131156-2759502016-44870-6574
> I0627 13:11:56.635431  6600 master.cpp:1135] Elected as the leading master!
> I0627 13:11:56.635437  6600 master.cpp:953] Recovering from registrar
> I0627 13:11:56.635483  6600 registrar.cpp:313] Recovering registrar
> I0627 13:11:56.635711  6596 log.cpp:656] Attempting to start the writer
> I0627 13:11:56.635979  6596 replica.cpp:474] Replica received implicit 
> promise request with proposal 1
> I0627 13:11:56.636018  6596 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 20426ns
> I0627 13:11:56.636029  6596 replica.cpp:342] Persisted promised to 1
> I0627 13:11:56.636169  6596 coordinator.cpp:230] Coordinator attemping to 
> fill missing position
> I0627 13:11:56.636431  6596 replica.cpp:375] Replica received explicit 
> promise request for position 0 with proposal 2
> I0627 13:11:56.636467  6596 leveldb.cpp:343] Persisting action (8 bytes) to 
> leveldb took 18333ns
> I0627 13:11:56.636478  6596 replica.cpp:676] Persisted action at 0
> I0627 13:11:56.636855  6598 replica.cpp:508] Replica received write request 
> for position 0
> I0627 13:11:56.636889  6598 leveldb.cpp:438] Reading position from leveldb 
> took 14464ns
> I0627 13:11:56.636916  6598 leveldb.cpp:343] Persisting action (14