[jira] [Commented] (MESOS-8727) JSON -> protobuf conversion in stout handles duplicated keys in a map incorrectly

2018-03-22 Thread Qian Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16410757#comment-16410757
 ] 

Qian Zhang commented on MESOS-8727:
---

The root cause of this issue is, if we call {{JSON::parse()}} with a JSON 
string which have duplicated map keys, the last key seen is used.

> JSON -> protobuf conversion in stout handles duplicated keys in a map 
> incorrectly
> -
>
> Key: MESOS-8727
> URL: https://issues.apache.org/jira/browse/MESOS-8727
> Project: Mesos
>  Issue Type: Bug
>  Components: stout
>Reporter: Qian Zhang
>Priority: Major
>
> In Mesos code, we usually use the following two functions in stout to convert 
> a JSON string to a protobuf message.
>  # {{JSON::parse()}} to convert a JSON string to a JSON object (i.e., 
> {{JSON::Object}}).
>  # {{protobuf::parse()}} to convert the JSON object to a protobuf message.
> In Google protobuf, there is a single function which can be used to achieve 
> the same goal: {{JsonStringToMessage()}}. And based on [the doc of Google 
> protobuf|https://developers.google.com/protocol-buffers/docs/proto#maps], if 
> there are duplicated keys in a map in a JSON string, the conversion to 
> protobuf message may fail, i.e., if we use {{JsonStringToMessage}} to convert 
> the following JSON string to a protobuf message, it will fail with an error 
> like {{int32_to_string[0]: Repeated map key: '1' is already set.}} 
> {code:java}
> "int32_to_string": {
>   "1": "value1",
>   "1": "value2"
> }
> {code}
> However, {{JSON::parse()}} and {{protobuf::parse()}} handles this case 
> differently: they will succeed, and in the resulted protobuf message, we will 
> see only one key-value pair {{"1": "value2"}}, i.e., the first key-value pair 
> is overwritten. We should have the same behavior with {{JsonStringToMessage}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (MESOS-8727) JSON -> protobuf conversion in stout handles duplicated keys in a map incorrectly

2018-03-22 Thread Qian Zhang (JIRA)
Qian Zhang created MESOS-8727:
-

 Summary: JSON -> protobuf conversion in stout handles duplicated 
keys in a map incorrectly
 Key: MESOS-8727
 URL: https://issues.apache.org/jira/browse/MESOS-8727
 Project: Mesos
  Issue Type: Bug
  Components: stout
Reporter: Qian Zhang


In Mesos code, we usually use the following two functions in stout to convert a 
JSON string to a protobuf message.
 # {{JSON::parse()}} to convert a JSON string to a JSON object (i.e., 
{{JSON::Object}}).
 # {{protobuf::parse()}} to convert the JSON object to a protobuf message.

In Google protobuf, there is a single function which can be used to achieve the 
same goal: {{JsonStringToMessage()}}. And based on [the doc of Google 
protobuf|https://developers.google.com/protocol-buffers/docs/proto#maps], if 
there are duplicated keys in a map in a JSON string, the conversion to protobuf 
message may fail, i.e., if we use {{JsonStringToMessage}} to convert the 
following JSON string to a protobuf message, it will fail with an error like 
{{int32_to_string[0]: Repeated map key: '1' is already set.}}

 
{code:java}
"int32_to_string": {
  "1": "value1",
  "1": "value2"
}
{code}
However, {{JSON::parse()}} and {{protobuf::parse()}} handles this case 
differently: they will succeed, and in the resulted protobuf message, we will 
see only one key-value pair {{"1": "value2"}}, i.e., the first key-value pair 
is overwritten. We should have the same behavior with {{JsonStringToMessage}}.

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (MESOS-8530) Default executor tasks can get stuck in KILLING state

2018-03-22 Thread JIRA

[ 
https://issues.apache.org/jira/browse/MESOS-8530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16368048#comment-16368048
 ] 

Gastón Kleiman edited comment on MESOS-8530 at 3/23/18 2:18 AM:


https://reviews.apache.org/r/65692/
https://reviews.apache.org/r/65693/
https://reviews.apache.org/r/66232/
https://reviews.apache.org/r/65694/
https://reviews.apache.org/r/66233/
https://reviews.apache.org/r/65962/
https://reviews.apache.org/r/66234/


was (Author: gkleiman):
https://reviews.apache.org/r/65692/
https://reviews.apache.org/r/65693/
https://reviews.apache.org/r/65694/
https://reviews.apache.org/r/65695/
https://reviews.apache.org/r/66123/

> Default executor tasks can get stuck in KILLING state
> -
>
> Key: MESOS-8530
> URL: https://issues.apache.org/jira/browse/MESOS-8530
> Project: Mesos
>  Issue Type: Bug
>  Components: executor
>Affects Versions: 1.2.3, 1.3.1, 1.4.1, 1.5.0
>Reporter: Gastón Kleiman
>Assignee: Gastón Kleiman
>Priority: Critical
>  Labels: default-executor, mesosphere
>
> The default executor will transition a task to {{TASK_KILLING}} and mark its 
> container as being killed before issuing the {{KILL_NESTED_CONTAINER}} call.
> If the kill call fails, the task will get stuck in {{TASK_KILLING}}, and the 
> executor won't allow retrying the kill.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (MESOS-8726) Default executor doesn't retry SIGTERM kills

2018-03-22 Thread JIRA
Gastón Kleiman created MESOS-8726:
-

 Summary: Default executor doesn't retry SIGTERM kills
 Key: MESOS-8726
 URL: https://issues.apache.org/jira/browse/MESOS-8726
 Project: Mesos
  Issue Type: Bug
  Components: executor
Reporter: Gastón Kleiman


Once https://issues.apache.org/jira/browse/MESOS-8530 is resolved, the default 
executor will retry the kill escalation (SIGKILL), but not the initial SIGTERM.

Tasks won't get stuck anymore, but this is still bad, because it could prevent 
tasks from gracefully shutting down.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-8725) Support deadline for tasks

2018-03-22 Thread Zhitao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16410613#comment-16410613
 ] 

Zhitao Li commented on MESOS-8725:
--

[~jamesmulcahy], we actually started on that path, however some of the 
scalability difficulties we met:
 * limited compute resource on scheduler: a lot schedulers takes same design of 
Mesos master and only run one active process, and tracking a timer per task 
there uses up precious resources there;
 * network partition: if master/agent was under network partition, the 
scheduler could not terminate the task;
 * recovery upon scheduler restart: this was the biggest problem for us, but 
when our scheduler process restarted, it needed to recover "all" running tasks 
from database and reconstruct what to do for each task (which is also a common 
pattern among schedulers). Any additional features introduced there will 
further made the process heavier;
 * cheaper to implement in executor: with isolation mechanisms like `pid`, we 
expect that executor has a longer lifecycle. Therefore, executors do not even 
need to maintain a busy thread, but simply use a 
[Timer|https://github.com/apache/mesos/blob/master/3rdparty/libprocess/include/process/timer.hpp]
 and terminate the task.

> Support deadline for tasks
> --
>
> Key: MESOS-8725
> URL: https://issues.apache.org/jira/browse/MESOS-8725
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Zhitao Li
>Priority: Major
>
> In our environment, we run a lot of batch jobs, some of which have tight 
> timeline. If any tasks in the job runs longer than x hours, it does not make 
> sense to run it anymore. 
>  
> For instance, a team would submit a job which builds a weekly index and 
> repeats every Monday. If the job does not finish before next Monday for 
> whatever reason, there is no point to keep any task running.
>  
> We believe that implementing deadline tracking distributed across our cluster 
> makes more sense as it makes the system more scalable and also makes our 
> centralized state machine simpler.
>  
> One idea I have right now is to add an  *optional* *TimeInfo deadline* to 
> TaskInfo field, and all default executors in Mesos can simply terminate the 
> task and send a proper *StatusUpdate.*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-8725) Support deadline for tasks

2018-03-22 Thread James Mulcahy (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16410505#comment-16410505
 ] 

James Mulcahy commented on MESOS-8725:
--

Is this actually simpler overall?  The framework will know the deadline for the 
task itself, and could kill the task if that expired, without any changes in 
Mesos today.  I could see an argument for decentralizing this to the agents if 
this was an "expensive" thing to check, but it seems like a relatively low 
overhead + low complexity task for a framework to track – even with say, 
millions of tasks?

> Support deadline for tasks
> --
>
> Key: MESOS-8725
> URL: https://issues.apache.org/jira/browse/MESOS-8725
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Zhitao Li
>Priority: Major
>
> In our environment, we run a lot of batch jobs, some of which have tight 
> timeline. If any tasks in the job runs longer than x hours, it does not make 
> sense to run it anymore. 
>  
> For instance, a team would submit a job which builds a weekly index and 
> repeats every Monday. If the job does not finish before next Monday for 
> whatever reason, there is no point to keep any task running.
>  
> We believe that implementing deadline tracking distributed across our cluster 
> makes more sense as it makes the system more scalable and also makes our 
> centralized state machine simpler.
>  
> One idea I have right now is to add an  *optional* *TimeInfo deadline* to 
> TaskInfo field, and all default executors in Mesos can simply terminate the 
> task and send a proper *StatusUpdate.*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-8714) Cleanup `containers_` hashmap once container exits

2018-03-22 Thread Andrei Budnik (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16410015#comment-16410015
 ] 

Andrei Budnik commented on MESOS-8714:
--

Composing c'zer 
[subscribes|https://github.com/apache/mesos/blob/5b655ce062ff55cdefed119d97ad923aeeb2efb5/src/slave/containerizer/composing.cpp#L356-L357]
 on container termination after successful launch. So we always clean up this 
hash map.
After changes in composing c'zer, this invariant (that we always clean up 
terminated containers) should remain unchanged.
I think that there should be only one place, where we do cleanup: 
`ComposingContainerizerProcess::_launch`.

> Cleanup `containers_` hashmap once container exits
> --
>
> Key: MESOS-8714
> URL: https://issues.apache.org/jira/browse/MESOS-8714
> Project: Mesos
>  Issue Type: Task
>Reporter: Andrei Budnik
>Priority: Major
>
> To clean up a `containers_` hash map in composing c'zer, we need to subscribe 
> on a container termination event in `_launch` method. Also, it's desirable to 
> limit the number of places where we do the clean up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-8724) G++ Warning about libc system macros `major` and `minor` prevents Mesos build

2018-03-22 Thread Benno Evers (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16409916#comment-16409916
 ] 

Benno Evers commented on MESOS-8724:


One subtle thing to keep in mind, if we decide to "properly" fix it by getting 
protoc to add the correct #undef's for minor and major, we should take care to 
*not* backport the patch to older mesos versions, since that would remove the 
previously defined function `csi::Version::gnu_dev_major()`, causing ABI 
incompatibility for people upgrading libmesos.so.

> G++ Warning about libc system macros `major` and `minor` prevents Mesos build
> -
>
> Key: MESOS-8724
> URL: https://issues.apache.org/jira/browse/MESOS-8724
> Project: Mesos
>  Issue Type: Bug
>Reporter: Benno Evers
>Priority: Major
>
> On linux systems, the header `` defines three macros called 
> makedev(), major() and minor(). (See also 
> [http://man7.org/linux/man-pages/man3/makedev.3.html])
> Trying to compile Mesos using g++ 7.2.0 leads to the following warning:
> {noformat}
> ../include/csi/csi.pb.h:6042:13: error: In the GNU C Library, "minor" is 
> defined
>  by . For historical compatibility, it is
>  currently defined by  as well, but we plan to
>  remove this soon. To use "minor", include 
>  directly. If you did not intend to use a system-defined macro
>  "minor", you should undefine it after including . [-Werror]
>  inline ::google::protobuf::uint32 Version::minor() const {
> {noformat}
> The root cause is that csi.proto defines the following protobuf message:
> {noformat}
> message Version {
>   uint32 major = 1;  // This field is REQUIRED.
>   uint32 minor = 2;  // This field is REQUIRED.
>   uint32 patch = 3;  // This field is REQUIRED.
> }
> {noformat}
> The generated C++ in `csi.pb.h` headers will contain, amongst others, the 
> following function:
> {noformat}
> #include 
> // [6000 lines of code...]
> inline ::google::protobuf::uint32 Version::major() const {
>   // @@protoc_insertion_point(field_get:csi.Version.major)
>   return major_;
> }
> {noformat}
> And the recursive include structure of the header `` leads to 
> `stdlib.h` as follows:
> {noformat}
> .   /usr/include/c++/7/string
> ..  /usr/include/c++/7/bits/basic_string.h
> ... /usr/include/c++/7/ext/string_conversions.h
>     /usr/include/c++/7/cstdlib
> .   /usr/include/stdlib.h
> ..  /usr/include/x86_64-linux-gnu/sys/types.h
> ... /usr/include/x86_64-linux-gnu/sys/sysmacros.h{noformat}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (MESOS-8725) Support deadline for tasks

2018-03-22 Thread Zhitao Li (JIRA)
Zhitao Li created MESOS-8725:


 Summary: Support deadline for tasks
 Key: MESOS-8725
 URL: https://issues.apache.org/jira/browse/MESOS-8725
 Project: Mesos
  Issue Type: Improvement
Reporter: Zhitao Li


In our environment, we run a lot of batch jobs, some of which have tight 
timeline. If any tasks in the job runs longer than x hours, it does not make 
sense to run it anymore. 
 
For instance, a team would submit a job which builds a weekly index and repeats 
every Monday. If the job does not finish before next Monday for whatever 
reason, there is no point to keep any task running.
 
We believe that implementing deadline tracking distributed across our cluster 
makes more sense as it makes the system more scalable and also makes our 
centralized state machine simpler.
 
One idea I have right now is to add an  *optional* *TimeInfo deadline* to 
TaskInfo field, and all default executors in Mesos can simply terminate the 
task and send a proper *StatusUpdate.*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-8714) Cleanup `containers_` hashmap once container exits

2018-03-22 Thread Greg Mann (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16409871#comment-16409871
 ] 

Greg Mann commented on MESOS-8714:
--

So it looks like we currently only remove container IDs from the 
{{containers_}} map when {{destroy()}} is called on a container, but not for 
other cases of container termination?

> Cleanup `containers_` hashmap once container exits
> --
>
> Key: MESOS-8714
> URL: https://issues.apache.org/jira/browse/MESOS-8714
> Project: Mesos
>  Issue Type: Task
>Reporter: Andrei Budnik
>Priority: Major
>
> To clean up a `containers_` hash map in composing c'zer, we need to subscribe 
> on a container termination event in `_launch` method. Also, it's desirable to 
> limit the number of places where we do the clean up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (MESOS-8724) G++ Warning about libc system macros `major` and `minor` prevents Mesos build

2018-03-22 Thread Benno Evers (JIRA)
Benno Evers created MESOS-8724:
--

 Summary: G++ Warning about libc system macros `major` and `minor` 
prevents Mesos build
 Key: MESOS-8724
 URL: https://issues.apache.org/jira/browse/MESOS-8724
 Project: Mesos
  Issue Type: Bug
Reporter: Benno Evers


On linux systems, the header `` defines three macros called 
makedev(), major() and minor(). (See also 
http://man7.org/linux/man-pages/man3/makedev.3.html)

Trying to compile Mesos using g++ 7.2.0 leads to the following warning:
{noformat}
../include/csi/csi.pb.h:6042:13: error: In the GNU C Library, "minor" is defined
 by . For historical compatibility, it is
 currently defined by  as well, but we plan to
 remove this soon. To use "minor", include 
 directly. If you did not intend to use a system-defined macro
 "minor", you should undefine it after including . [-Werror]
 inline ::google::protobuf::uint32 Version::minor() const {
{noformat}
The root cause is that csi.proto defines the following protobuf message:
{noformat}
message Version {
  uint32 major = 1;  // This field is REQUIRED.
  uint32 minor = 2;  // This field is REQUIRED.
  uint32 patch = 3;  // This field is REQUIRED.
}
{noformat}
The generated C++ in `csi.pb.h` headers will contain, amongst others, the 
following function:
{noformat}
#include 

// [6000 lines of code...]

inline ::google::protobuf::uint32 Version::major() const {
  // @@protoc_insertion_point(field_get:csi.Version.major)
  return major_;
}
{noformat}
And the recursive include structure of the header `` leads to 
`stdlib.h` as follows:
{noformat}
.   /usr/include/c++/7/string
..  /usr/include/c++/7/bits/basic_string.h
... /usr/include/c++/7/ext/string_conversions.h
    /usr/include/c++/7/cstdlib
.   /usr/include/stdlib.h{noformat}
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (MESOS-8723) ROOT_HealthCheckUsingPersistentVolume is flaky.

2018-03-22 Thread Alexander Rukletsov (JIRA)
Alexander Rukletsov created MESOS-8723:
--

 Summary: ROOT_HealthCheckUsingPersistentVolume is flaky.
 Key: MESOS-8723
 URL: https://issues.apache.org/jira/browse/MESOS-8723
 Project: Mesos
  Issue Type: Bug
  Components: test
Affects Versions: 1.5.0
 Environment: ec2's CentOS 7 with SSL
Reporter: Alexander Rukletsov
 Attachments: ROOT_HealthCheckUsingPersistentVolume-badrun.txt

{noformat}
../../src/tests/cluster.cpp:660: Failure
Failed to wait 15secs for destroy
I0321 19:45:11.676262  8064 master.cpp:1137] Master terminating
I0321 19:45:11.676625 27242 hierarchical.cpp:609] Removed agent 
b7675b9a-d9e9-4c97-a5c2-d50fc6101301-S0
{noformat}
Full log attached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-8550) Bug in `Master::detected()` leads to coredump in `MasterZooKeeperTest.MasterInfoAddress`.

2018-03-22 Thread Alexander Rukletsov (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16409660#comment-16409660
 ] 

Alexander Rukletsov commented on MESOS-8550:


Backport to 1.4.x:
{noformat}
commit 986894193810e271f4e15db9743bb9e1f6a24b01
Author: Benno Evers 
AuthorDate: Thu Mar 22 15:10:30 2018 +0100
Commit: Alexander Rukletsov 
CommitDate: Thu Mar 22 15:49:06 2018 +0100

Handled 'None' passed from the MasterDetector in 'Master::detect()'.

The function `MasterDetector::detect()` returns a value of type
`Future>`, which, according to its documentation,
can be `None` if an election occured and no master is elected.

However, the code in `Master::detected()` was only handling the
cases of a failed future or a valid `MasterInfo` object.

*NOTE*: This commit does not add a corresponding unit test, since
that would require starting a non-leading master. For the
ZooKeeperMasterDetector, this is blocked by MESOS-2976, and an API
change to make this possible with the StandaloneMasterDetector
would add a lot of complexity to the `cluster::Master::start()`
function for a feature that is unlikely to be re-used in any other
test.

Review: https://reviews.apache.org/r/65571/
(cherry picked from commit 972f31752dd99a59903370b9ebcf078501fa8ffc)
{noformat}

> Bug in `Master::detected()` leads to coredump in 
> `MasterZooKeeperTest.MasterInfoAddress`.
> -
>
> Key: MESOS-8550
> URL: https://issues.apache.org/jira/browse/MESOS-8550
> Project: Mesos
>  Issue Type: Bug
>  Components: leader election, master
>Affects Versions: 1.5.0
>Reporter: Andrei Budnik
>Assignee: Benno Evers
>Priority: Major
>  Labels: mesosphere
> Fix For: 1.4.2, 1.6.0, 1.5.1
>
> Attachments: MasterZooKeeperTest.MasterInfoAddress-badrun.txt
>
>
> {code:java}
> 15:55:17 Assertion failed: (isSome()), function get, file 
> ../../3rdparty/stout/include/stout/option.hpp, line 119.
> 15:55:17 *** Aborted at 1518018924 (unix time) try "date -d @1518018924" if 
> you are using GNU date ***
> 15:55:17 PC: @ 0x7fff4f8f2e3e __pthread_kill
> 15:55:17 *** SIGABRT (@0x7fff4f8f2e3e) received by PID 39896 (TID 
> 0x70427000) stack trace: ***
> 15:55:17 @ 0x7fff4fa24f5a _sigtramp
> 15:55:17 I0207 07:55:24.945252 4890624 group.cpp:511] ZooKeeper session 
> expired
> 15:55:17 @ 0x70425500 (unknown)
> 15:55:17 2018-02-07 07:55:24,945:39896(0x70633000):ZOO_INFO@log_env@794: 
> Client 
> environment:user.dir=/private/var/folders/6w/rw03zh013y38ys6cyn8qppf8gn/T/1mHCvU
> 15:55:17 @ 0x7fff4f84f312 abort
> 15:55:17 2018-02-07 
> 07:55:24,945:39896(0x70633000):ZOO_INFO@zookeeper_init@827: Initiating 
> client connection, host=127.0.0.1:52197 sessionTimeout=1 
> watcher=0x10d916590 sessionId=0 sessionPasswd= context=0x7fe1bda706a0 
> flags=0
> 15:55:17 @ 0x7fff4f817368 __assert_rtn
> 15:55:17 @0x10b9cff97 _ZNR6OptionIN5mesos10MasterInfoEE3getEv
> 15:55:17 @0x10bbb04b5 Option<>::operator->()
> 15:55:17 @0x10bd4514a mesos::internal::master::Master::detected()
> 15:55:17 @0x10bf54558 
> _ZZN7process8dispatchIN5mesos8internal6master6MasterERKNS_6FutureI6OptionINS1_10MasterInfoSB_EEvRKNS_3PIDIT_EEMSD_FvT0_EOT1_ENKUlOS9_PNS_11ProcessBaseEE_clESM_SO_
> 15:55:17 @0x10bf54310 
> _ZN5cpp176invokeIZN7process8dispatchIN5mesos8internal6master6MasterERKNS1_6FutureI6OptionINS3_10MasterInfoSD_EEvRKNS1_3PIDIT_EEMSF_FvT0_EOT1_EUlOSB_PNS1_11ProcessBaseEE_JSB_SQ_EEEDTclclsr3stdE7forwardISF_Efp_Espclsr3stdE7forwardIT0_Efp0_EEEOSF_DpOSS_
> 15:55:17 @0x10bf542bb 
> _ZN6lambda8internal7PartialIZN7process8dispatchIN5mesos8internal6master6MasterERKNS2_6FutureI6OptionINS4_10MasterInfoSE_EEvRKNS2_3PIDIT_EEMSG_FvT0_EOT1_EUlOSC_PNS2_11ProcessBaseEE_JSC_NSt3__112placeholders4__phILi1E13invoke_expandISS_NST_5tupleIJSC_SW_EEENSZ_IJOSR_EEEJLm0ELm1DTclsr5cpp17E6invokeclsr3stdE7forwardISG_Efp_Espcl6expandclsr3stdE3getIXT2_EEclsr3stdE7forwardISK_Efp0_EEclsr3stdE7forwardISN_Efp2_OSG_OSK_N5cpp1416integer_sequenceImJXspT2_SO_
> 15:55:17 @0x10bf541f3 
> _ZNO6lambda8internal7PartialIZN7process8dispatchIN5mesos8internal6master6MasterERKNS2_6FutureI6OptionINS4_10MasterInfoSE_EEvRKNS2_3PIDIT_EEMSG_FvT0_EOT1_EUlOSC_PNS2_11ProcessBaseEE_JSC_NSt3__112placeholders4__phILi1EclIJSR_EEEDTcl13invoke_expandclL_ZNST_4moveIRSS_EEONST_16remove_referenceISG_E4typeEOSG_EdtdefpT1fEclL_ZNSZ_IRNST_5tupleIJSC_SW_ES14_S15_EdtdefpT10bound_argsEcvN5cpp1416integer_sequenceImJLm0ELm1_Eclsr3stdE16forward_as_tuplespclsr3stdE7forwardIT_Efp_DpOS1C_
> 15:55:17

[jira] [Commented] (MESOS-8550) Bug in `Master::detected()` leads to coredump in `MasterZooKeeperTest.MasterInfoAddress`

2018-03-22 Thread Alexander Rukletsov (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16409630#comment-16409630
 ] 

Alexander Rukletsov commented on MESOS-8550:


Backport to 1.5.x:
{noformat}
commit 9281f922d7ec527763b3f88793b6821337f9c665
Author: Benno Evers 
AuthorDate: Thu Mar 22 15:10:30 2018 +0100
Commit: Alexander Rukletsov 
CommitDate: Thu Mar 22 15:30:27 2018 +0100

Handled 'None' passed from the MasterDetector in 'Master::detect()'.

The function `MasterDetector::detect()` returns a value of type
`Future>`, which, according to its documentation,
can be `None` if an election occured and no master is elected.

However, the code in `Master::detected()` was only handling the
cases of a failed future or a valid `MasterInfo` object.

*NOTE*: This commit does not add a corresponding unit test, since
that would require starting a non-leading master. For the
ZooKeeperMasterDetector, this is blocked by MESOS-2976, and an API
change to make this possible with the StandaloneMasterDetector
would add a lot of complexity to the `cluster::Master::start()`
function for a feature that is unlikely to be re-used in any other
test.

Review: https://reviews.apache.org/r/65571/
(cherry picked from commit 972f31752dd99a59903370b9ebcf078501fa8ffc)
{noformat}

> Bug in `Master::detected()` leads to coredump in 
> `MasterZooKeeperTest.MasterInfoAddress`
> 
>
> Key: MESOS-8550
> URL: https://issues.apache.org/jira/browse/MESOS-8550
> Project: Mesos
>  Issue Type: Bug
>  Components: leader election, master
>Affects Versions: 1.5.0
>Reporter: Andrei Budnik
>Assignee: Benno Evers
>Priority: Major
>  Labels: mesosphere
> Fix For: 1.6.0, 1.5.1
>
> Attachments: MasterZooKeeperTest.MasterInfoAddress-badrun.txt
>
>
> {code:java}
> 15:55:17 Assertion failed: (isSome()), function get, file 
> ../../3rdparty/stout/include/stout/option.hpp, line 119.
> 15:55:17 *** Aborted at 1518018924 (unix time) try "date -d @1518018924" if 
> you are using GNU date ***
> 15:55:17 PC: @ 0x7fff4f8f2e3e __pthread_kill
> 15:55:17 *** SIGABRT (@0x7fff4f8f2e3e) received by PID 39896 (TID 
> 0x70427000) stack trace: ***
> 15:55:17 @ 0x7fff4fa24f5a _sigtramp
> 15:55:17 I0207 07:55:24.945252 4890624 group.cpp:511] ZooKeeper session 
> expired
> 15:55:17 @ 0x70425500 (unknown)
> 15:55:17 2018-02-07 07:55:24,945:39896(0x70633000):ZOO_INFO@log_env@794: 
> Client 
> environment:user.dir=/private/var/folders/6w/rw03zh013y38ys6cyn8qppf8gn/T/1mHCvU
> 15:55:17 @ 0x7fff4f84f312 abort
> 15:55:17 2018-02-07 
> 07:55:24,945:39896(0x70633000):ZOO_INFO@zookeeper_init@827: Initiating 
> client connection, host=127.0.0.1:52197 sessionTimeout=1 
> watcher=0x10d916590 sessionId=0 sessionPasswd= context=0x7fe1bda706a0 
> flags=0
> 15:55:17 @ 0x7fff4f817368 __assert_rtn
> 15:55:17 @0x10b9cff97 _ZNR6OptionIN5mesos10MasterInfoEE3getEv
> 15:55:17 @0x10bbb04b5 Option<>::operator->()
> 15:55:17 @0x10bd4514a mesos::internal::master::Master::detected()
> 15:55:17 @0x10bf54558 
> _ZZN7process8dispatchIN5mesos8internal6master6MasterERKNS_6FutureI6OptionINS1_10MasterInfoSB_EEvRKNS_3PIDIT_EEMSD_FvT0_EOT1_ENKUlOS9_PNS_11ProcessBaseEE_clESM_SO_
> 15:55:17 @0x10bf54310 
> _ZN5cpp176invokeIZN7process8dispatchIN5mesos8internal6master6MasterERKNS1_6FutureI6OptionINS3_10MasterInfoSD_EEvRKNS1_3PIDIT_EEMSF_FvT0_EOT1_EUlOSB_PNS1_11ProcessBaseEE_JSB_SQ_EEEDTclclsr3stdE7forwardISF_Efp_Espclsr3stdE7forwardIT0_Efp0_EEEOSF_DpOSS_
> 15:55:17 @0x10bf542bb 
> _ZN6lambda8internal7PartialIZN7process8dispatchIN5mesos8internal6master6MasterERKNS2_6FutureI6OptionINS4_10MasterInfoSE_EEvRKNS2_3PIDIT_EEMSG_FvT0_EOT1_EUlOSC_PNS2_11ProcessBaseEE_JSC_NSt3__112placeholders4__phILi1E13invoke_expandISS_NST_5tupleIJSC_SW_EEENSZ_IJOSR_EEEJLm0ELm1DTclsr5cpp17E6invokeclsr3stdE7forwardISG_Efp_Espcl6expandclsr3stdE3getIXT2_EEclsr3stdE7forwardISK_Efp0_EEclsr3stdE7forwardISN_Efp2_OSG_OSK_N5cpp1416integer_sequenceImJXspT2_SO_
> 15:55:17 @0x10bf541f3 
> _ZNO6lambda8internal7PartialIZN7process8dispatchIN5mesos8internal6master6MasterERKNS2_6FutureI6OptionINS4_10MasterInfoSE_EEvRKNS2_3PIDIT_EEMSG_FvT0_EOT1_EUlOSC_PNS2_11ProcessBaseEE_JSC_NSt3__112placeholders4__phILi1EclIJSR_EEEDTcl13invoke_expandclL_ZNST_4moveIRSS_EEONST_16remove_referenceISG_E4typeEOSG_EdtdefpT1fEclL_ZNSZ_IRNST_5tupleIJSC_SW_ES14_S15_EdtdefpT10bound_argsEcvN5cpp1416integer_sequenceImJLm0ELm1_Eclsr3stdE16forward_as_tuplespclsr3stdE7forwardIT_Efp_DpOS1C_
> 15:55:17 @   

[jira] [Created] (MESOS-8722) Hard-coded timeout for authentication failures

2018-03-22 Thread Benno Evers (JIRA)
Benno Evers created MESOS-8722:
--

 Summary: Hard-coded timeout for authentication failures
 Key: MESOS-8722
 URL: https://issues.apache.org/jira/browse/MESOS-8722
 Project: Mesos
  Issue Type: Bug
Reporter: Benno Evers


In the mesos agent there is a hard-coded 5 second timeout for any 
authentication attempt:
{noformat}
void Slave::authenticate()
{
 [...]

  delay(Seconds(5), self(), &Self::authenticationTimeout, authenticating.get());
}
{noformat}
When the network is poor, this can lead to the situation where an agent doesn't 
get to authorize for a long time, preventing it from re-joining the cluster.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (MESOS-8721) Unnecessary cropping of agent id's in the web ui

2018-03-22 Thread Benno Evers (JIRA)
Benno Evers created MESOS-8721:
--

 Summary: Unnecessary cropping of agent id's in the web ui
 Key: MESOS-8721
 URL: https://issues.apache.org/jira/browse/MESOS-8721
 Project: Mesos
  Issue Type: Bug
Reporter: Benno Evers
 Attachments: cropped_ids.png

As seen in the attached image (captured from Firefox 59 and Mesos 1.2.3), the 
agents page of the web ui appears to be cropping agent ids even if the column 
would have enough space to display the full name.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (MESOS-8720) CSIClientTest segfaults on macOS.

2018-03-22 Thread Jan Schlicht (JIRA)
Jan Schlicht created MESOS-8720:
---

 Summary: CSIClientTest segfaults on macOS.
 Key: MESOS-8720
 URL: https://issues.apache.org/jira/browse/MESOS-8720
 Project: Mesos
  Issue Type: Bug
  Components: storage
Affects Versions: 1.6.0
 Environment: macOS 10.13.3, LLVM 6.0.0
Reporter: Jan Schlicht


This seems to be caused by the changes introduced in commit 
{{79c21981803dafd8a5e971b98961487a69017ce9}}. On a macOS build, configured with 
{{--enable-grpc}}, all test cases in {{CSIClientTest}} segfault. Running 
{{src/mesos-tests --gtest_filter=\*CSIClientTest\*}} results in
{noformat}
[ RUN  ] Identity/CSIClientTest.Call/Client_GetSupportedVersions
mesos-tests(57309,0x7fffa0293340) malloc: *** error for object 0x10bb63b68: 
pointer being freed was not allocated
*** set a breakpoint in malloc_error_break to debug
*** Aborted at 1521711802 (unix time) try "date -d @1521711802" if you are 
using GNU date ***
PC: @ 0x7fff6738ce3e __pthread_kill
*** SIGABRT (@0x7fff6738ce3e) received by PID 57309 (TID 0x7fffa0293340) stack 
trace: ***
@ 0x7fff674bef5a _sigtramp
@0x0 (unknown)
@ 0x7fff672e9312 abort
@ 0x7fff673e6866 free
@0x10aec51bd grpc::CompletionQueue::CompletionQueue()
@0x10b2087a4 process::grpc::client::Runtime::Data::Data()
@0x107bd697d mesos::internal::tests::CSIClientTest::CSIClientTest()
@0x107bd68ca 
testing::internal::ParameterizedTestFactory<>::CreateTest()
@0x107c58158 
testing::internal::HandleExceptionsInMethodIfSupported<>()
@0x107c57fd8 testing::TestInfo::Run()
@0x107c588c7 testing::TestCase::Run()
@0x107c612b7 testing::internal::UnitTestImpl::RunAllTests()
@0x107c60d58 
testing::internal::HandleExceptionsInMethodIfSupported<>()
@0x107c60cc8 testing::UnitTest::Run()
@0x106afc83d main
@ 0x7fff6723d115 start
@0x2 (unknown)
Abort trap: 6
{noformat}

Increasing GLog verbosity doesn't provide more information.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MESOS-8718) Add the fields `ExposedPorts` and `Volumes` into Docker v1 image spec

2018-03-22 Thread Qian Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-8718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16409303#comment-16409303
 ] 

Qian Zhang commented on MESOS-8718:
---

RR: https://reviews.apache.org/r/66211/

> Add the fields `ExposedPorts` and `Volumes` into Docker v1 image spec
> -
>
> Key: MESOS-8718
> URL: https://issues.apache.org/jira/browse/MESOS-8718
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization
>Reporter: Qian Zhang
>Assignee: Qian Zhang
>Priority: Major
>
> This ticket is to address the TODO below in the 
> [docker/v1.proto|https://github.com/apache/mesos/blob/1.5.0/include/mesos/docker/v1.proto#L70:L71]:
> {code:java}
> // TODO(gilbert): Create a message including string-message
> // pair to match ExposedPorts' map (map[nat.Port]struct{}).
> {code}
> And similar to the field `ExposedPorts` mentioned in the above TODO, we 
> should also add the field `Volumes` which is also a string-message pair.
> Once these two fields are added, we could consider to build features on top 
> of them in the `docker/runtime` isolator.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (MESOS-8719) Mesos compiled with `--enable-grpc` doesn't compile on non-Linux builds

2018-03-22 Thread Jan Schlicht (JIRA)
Jan Schlicht created MESOS-8719:
---

 Summary: Mesos compiled with `--enable-grpc` doesn't compile on 
non-Linux builds
 Key: MESOS-8719
 URL: https://issues.apache.org/jira/browse/MESOS-8719
 Project: Mesos
  Issue Type: Bug
  Components: storage
Affects Versions: 1.6.0
 Environment: macOS
Reporter: Jan Schlicht
Assignee: Jan Schlicht


Commit {{59cca968e04dee069e0df2663733b6d6f55af0da}} added 
{{examples/test_csi_plugin.cpp}} to non-Linux builds that are configured using 
the {{--enable-grpc}} flag. As {{examples/test_csi_plugin.cpp}} includes 
{{fs/linux.hpp}}, it can only compile on Linux and needs to be disabled for 
non-Linux builds.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (MESOS-8718) Add the fields `ExposedPorts` and `Volumes` into Docker v1 image spec

2018-03-22 Thread Qian Zhang (JIRA)
Qian Zhang created MESOS-8718:
-

 Summary: Add the fields `ExposedPorts` and `Volumes` into Docker 
v1 image spec
 Key: MESOS-8718
 URL: https://issues.apache.org/jira/browse/MESOS-8718
 Project: Mesos
  Issue Type: Improvement
  Components: containerization
Reporter: Qian Zhang
Assignee: Qian Zhang


This ticket is to address the TODO below in the 
[docker/v1.proto|https://github.com/apache/mesos/blob/1.5.0/include/mesos/docker/v1.proto#L70:L71]:

 
{code:java}
// TODO(gilbert): Create a message including string-message
// pair to match ExposedPorts' map (map[nat.Port]struct{}).
{code}
And similar to the field `ExposedPorts` mentioned in the above TODO, we should 
also add the field `Volumes` which is also a string-message pair.

Once these two fields are added, we could consider to build features on top of 
them in the `docker/runtime` isolator.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)