[jira] [Comment Edited] (MESOS-2707) Incorrect zh:// URI scheme causes Slave to SegFault
[ https://issues.apache.org/jira/browse/MESOS-2707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14536873#comment-14536873 ] haosdent edited comment on MESOS-2707 at 5/9/15 8:32 PM: - And I retry the command in my machine. I got this error: {code} Failed to create a master detector: Failed to parse 'zh://10.172.230.69:2181/mesos' {code} My mesos version in build from source. I think this code snippet in net.hpp {code} int error = getaddrinfo(hostname.c_str(), NULL, hints, result); if (error != 0) { return Error(gai_strerror(error)); } {code} have already check the hostname. was (Author: haosd...@gmail.com): And I retry the command in my machine. I got this error: {code} Failed to create a master detector: Failed to parse 'zh://10.172.230.69:2181/mesos' {code} My mesos version in build from source. Incorrect zh:// URI scheme causes Slave to SegFault --- Key: MESOS-2707 URL: https://issues.apache.org/jira/browse/MESOS-2707 Project: Mesos Issue Type: Bug Affects Versions: 0.22.0 Environment: Linux iZ25to7d407Z 2.6.32-504.16.2.el6.x86_64 #1 SMP Wed Apr 22 06:48:29 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux Mesos 0.22.0,built from sources Zookeeper 3.4.6 Reporter: Shengwu Jiang Assignee: Marco Massenzio I have 4 slave nodes with the same hardware, operating system and mesos configuration. Few minutes ago, all 4 nodes were functioning well. I tried to change the config of *master* from _10.172.230.69:5050_ to _zh://10.172.230.69:2181/mesos_ and restarted them in turn. The other three had started normally but the last one got a segmentation fault as you can see below. {code} [root@iZ25to7d407Z ~]# mesos-slave --master=zh://10.172.230.69:2181/mesos --hostname=123.57.42.237 --containerizers=docker,mesos --quiet [1] 1216 [root@iZ25to7d407Z ~]# *** Aborted at 1431085131 (unix time) try date -d @1431085131 if you are using GNU date *** PC: @ 0x3aede7b53c (unknown) *** SIGSEGV (@0x0) received by PID 1216 (TID 0x7f12f984b820) from PID 0; stack trace: *** @ 0x3aee20f710 (unknown) @ 0x3aede7b53c (unknown) @ 0x3aedecf630 (unknown) @ 0x7f12fce1593f net::getIP() @ 0x7f12fce507ae process::operator() @ 0x7f12fce50107 process::UPID::UPID() @ 0x7f12fc52af71 mesos::internal::MasterDetector::create() @ 0x4b1290 main @ 0x3aede1ed5d (unknown) @ 0x4b00b9 (unknown) [1]+ Segmentation fault mesos-slave --master=zh://10.172.230.69:2181/mesos --hostname=123.57.42.237 --containerizers=docker,mesos --quiet {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2707) Incorrect zh:// URI scheme causes Slave to SegFault
[ https://issues.apache.org/jira/browse/MESOS-2707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14536873#comment-14536873 ] haosdent commented on MESOS-2707: - And I retry the command in my machine. I got this error: {code} Failed to create a master detector: Failed to parse 'zh://10.172.230.69:2181/mesos' {code} My mesos version in build from source. Incorrect zh:// URI scheme causes Slave to SegFault --- Key: MESOS-2707 URL: https://issues.apache.org/jira/browse/MESOS-2707 Project: Mesos Issue Type: Bug Affects Versions: 0.22.0 Environment: Linux iZ25to7d407Z 2.6.32-504.16.2.el6.x86_64 #1 SMP Wed Apr 22 06:48:29 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux Mesos 0.22.0,built from sources Zookeeper 3.4.6 Reporter: Shengwu Jiang Assignee: Marco Massenzio I have 4 slave nodes with the same hardware, operating system and mesos configuration. Few minutes ago, all 4 nodes were functioning well. I tried to change the config of *master* from _10.172.230.69:5050_ to _zh://10.172.230.69:2181/mesos_ and restarted them in turn. The other three had started normally but the last one got a segmentation fault as you can see below. {code} [root@iZ25to7d407Z ~]# mesos-slave --master=zh://10.172.230.69:2181/mesos --hostname=123.57.42.237 --containerizers=docker,mesos --quiet [1] 1216 [root@iZ25to7d407Z ~]# *** Aborted at 1431085131 (unix time) try date -d @1431085131 if you are using GNU date *** PC: @ 0x3aede7b53c (unknown) *** SIGSEGV (@0x0) received by PID 1216 (TID 0x7f12f984b820) from PID 0; stack trace: *** @ 0x3aee20f710 (unknown) @ 0x3aede7b53c (unknown) @ 0x3aedecf630 (unknown) @ 0x7f12fce1593f net::getIP() @ 0x7f12fce507ae process::operator() @ 0x7f12fce50107 process::UPID::UPID() @ 0x7f12fc52af71 mesos::internal::MasterDetector::create() @ 0x4b1290 main @ 0x3aede1ed5d (unknown) @ 0x4b00b9 (unknown) [1]+ Segmentation fault mesos-slave --master=zh://10.172.230.69:2181/mesos --hostname=123.57.42.237 --containerizers=docker,mesos --quiet {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-2596) Update allocator docs
[ https://issues.apache.org/jira/browse/MESOS-2596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Rukletsov updated MESOS-2596: --- Description: Once Allocator interface changes, so does the way of writing new allocators. This should be reflected in Mesos docs. The modules doc should mention how to write and use allocator modules. Configuration doc should mention the new {{--allocator}} flag. (was: Once Allocator interface changes, so does the way of writing new allocators. This should be reflected in Mesos docs. The modules doc should mention how to write and use allocator modules.) Update allocator docs - Key: MESOS-2596 URL: https://issues.apache.org/jira/browse/MESOS-2596 Project: Mesos Issue Type: Task Components: allocation, documentation, modules Reporter: Alexander Rukletsov Labels: mesosphere Once Allocator interface changes, so does the way of writing new allocators. This should be reflected in Mesos docs. The modules doc should mention how to write and use allocator modules. Configuration doc should mention the new {{--allocator}} flag. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (MESOS-2340) Publish JSON in ZK instead of serialized MasterInfo
[ https://issues.apache.org/jira/browse/MESOS-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] haosdent reassigned MESOS-2340: --- Assignee: haosdent Publish JSON in ZK instead of serialized MasterInfo --- Key: MESOS-2340 URL: https://issues.apache.org/jira/browse/MESOS-2340 Project: Mesos Issue Type: Improvement Reporter: Zameer Manji Assignee: haosdent Currently to discover the master a client needs the ZK node location and access to the MasterInfo protobuf so it can deserialize the binary blob in the node. I think it would be nice to publish JSON (like Twitter's ServerSets) so clients are not tied to protobuf to do service discovery. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2707) Incorrect zh:// URI scheme causes Slave to SegFault
[ https://issues.apache.org/jira/browse/MESOS-2707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14536862#comment-14536862 ] haosdent commented on MESOS-2707: - [~oliverpp] What's the different between your 4 slaves? Incorrect zh:// URI scheme causes Slave to SegFault --- Key: MESOS-2707 URL: https://issues.apache.org/jira/browse/MESOS-2707 Project: Mesos Issue Type: Bug Affects Versions: 0.22.0 Environment: Linux iZ25to7d407Z 2.6.32-504.16.2.el6.x86_64 #1 SMP Wed Apr 22 06:48:29 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux Mesos 0.22.0,built from sources Zookeeper 3.4.6 Reporter: Shengwu Jiang Assignee: Marco Massenzio I have 4 slave nodes with the same hardware, operating system and mesos configuration. Few minutes ago, all 4 nodes were functioning well. I tried to change the config of *master* from _10.172.230.69:5050_ to _zh://10.172.230.69:2181/mesos_ and restarted them in turn. The other three had started normally but the last one got a segmentation fault as you can see below. {code} [root@iZ25to7d407Z ~]# mesos-slave --master=zh://10.172.230.69:2181/mesos --hostname=123.57.42.237 --containerizers=docker,mesos --quiet [1] 1216 [root@iZ25to7d407Z ~]# *** Aborted at 1431085131 (unix time) try date -d @1431085131 if you are using GNU date *** PC: @ 0x3aede7b53c (unknown) *** SIGSEGV (@0x0) received by PID 1216 (TID 0x7f12f984b820) from PID 0; stack trace: *** @ 0x3aee20f710 (unknown) @ 0x3aede7b53c (unknown) @ 0x3aedecf630 (unknown) @ 0x7f12fce1593f net::getIP() @ 0x7f12fce507ae process::operator() @ 0x7f12fce50107 process::UPID::UPID() @ 0x7f12fc52af71 mesos::internal::MasterDetector::create() @ 0x4b1290 main @ 0x3aede1ed5d (unknown) @ 0x4b00b9 (unknown) [1]+ Segmentation fault mesos-slave --master=zh://10.172.230.69:2181/mesos --hostname=123.57.42.237 --containerizers=docker,mesos --quiet {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2707) Incorrect zh:// URI scheme causes Slave to SegFault
[ https://issues.apache.org/jira/browse/MESOS-2707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14536861#comment-14536861 ] haosdent commented on MESOS-2707: - From document http://mesos.apache.org/documentation/latest/high-availability/ , mesos-slave should start with zk. And from code, {code} if (mechanism == ) { return new StandaloneMasterDetector(); } else if (strings::startsWith(mechanism, zk://)) { Tryzookeeper::URL url = zookeeper::URL::parse(mechanism); if (url.isError()) { return Error(url.error()); } if (url.get().path == /) { return Error( Expecting a (chroot) path for ZooKeeper ('/' is not supported)); } return new ZooKeeperMasterDetector(url.get()); } else if (strings::startsWith(mechanism, file://)) { {code} mesos-slave also should start with zk. So I think the real cause of segfault should be other reason. Incorrect zh:// URI scheme causes Slave to SegFault --- Key: MESOS-2707 URL: https://issues.apache.org/jira/browse/MESOS-2707 Project: Mesos Issue Type: Bug Affects Versions: 0.22.0 Environment: Linux iZ25to7d407Z 2.6.32-504.16.2.el6.x86_64 #1 SMP Wed Apr 22 06:48:29 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux Mesos 0.22.0,built from sources Zookeeper 3.4.6 Reporter: Shengwu Jiang Assignee: Marco Massenzio I have 4 slave nodes with the same hardware, operating system and mesos configuration. Few minutes ago, all 4 nodes were functioning well. I tried to change the config of *master* from _10.172.230.69:5050_ to _zh://10.172.230.69:2181/mesos_ and restarted them in turn. The other three had started normally but the last one got a segmentation fault as you can see below. {code} [root@iZ25to7d407Z ~]# mesos-slave --master=zh://10.172.230.69:2181/mesos --hostname=123.57.42.237 --containerizers=docker,mesos --quiet [1] 1216 [root@iZ25to7d407Z ~]# *** Aborted at 1431085131 (unix time) try date -d @1431085131 if you are using GNU date *** PC: @ 0x3aede7b53c (unknown) *** SIGSEGV (@0x0) received by PID 1216 (TID 0x7f12f984b820) from PID 0; stack trace: *** @ 0x3aee20f710 (unknown) @ 0x3aede7b53c (unknown) @ 0x3aedecf630 (unknown) @ 0x7f12fce1593f net::getIP() @ 0x7f12fce507ae process::operator() @ 0x7f12fce50107 process::UPID::UPID() @ 0x7f12fc52af71 mesos::internal::MasterDetector::create() @ 0x4b1290 main @ 0x3aede1ed5d (unknown) @ 0x4b00b9 (unknown) [1]+ Segmentation fault mesos-slave --master=zh://10.172.230.69:2181/mesos --hostname=123.57.42.237 --containerizers=docker,mesos --quiet {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (MESOS-2707) Incorrect zh:// URI scheme causes Slave to SegFault
[ https://issues.apache.org/jira/browse/MESOS-2707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] haosdent updated MESOS-2707: Comment: was deleted (was: [~oliverpp] What's the different between your 4 slaves?) Incorrect zh:// URI scheme causes Slave to SegFault --- Key: MESOS-2707 URL: https://issues.apache.org/jira/browse/MESOS-2707 Project: Mesos Issue Type: Bug Affects Versions: 0.22.0 Environment: Linux iZ25to7d407Z 2.6.32-504.16.2.el6.x86_64 #1 SMP Wed Apr 22 06:48:29 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux Mesos 0.22.0,built from sources Zookeeper 3.4.6 Reporter: Shengwu Jiang Assignee: Marco Massenzio I have 4 slave nodes with the same hardware, operating system and mesos configuration. Few minutes ago, all 4 nodes were functioning well. I tried to change the config of *master* from _10.172.230.69:5050_ to _zh://10.172.230.69:2181/mesos_ and restarted them in turn. The other three had started normally but the last one got a segmentation fault as you can see below. {code} [root@iZ25to7d407Z ~]# mesos-slave --master=zh://10.172.230.69:2181/mesos --hostname=123.57.42.237 --containerizers=docker,mesos --quiet [1] 1216 [root@iZ25to7d407Z ~]# *** Aborted at 1431085131 (unix time) try date -d @1431085131 if you are using GNU date *** PC: @ 0x3aede7b53c (unknown) *** SIGSEGV (@0x0) received by PID 1216 (TID 0x7f12f984b820) from PID 0; stack trace: *** @ 0x3aee20f710 (unknown) @ 0x3aede7b53c (unknown) @ 0x3aedecf630 (unknown) @ 0x7f12fce1593f net::getIP() @ 0x7f12fce507ae process::operator() @ 0x7f12fce50107 process::UPID::UPID() @ 0x7f12fc52af71 mesos::internal::MasterDetector::create() @ 0x4b1290 main @ 0x3aede1ed5d (unknown) @ 0x4b00b9 (unknown) [1]+ Segmentation fault mesos-slave --master=zh://10.172.230.69:2181/mesos --hostname=123.57.42.237 --containerizers=docker,mesos --quiet {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2707) Incorrect zh:// URI scheme causes Slave to SegFault
[ https://issues.apache.org/jira/browse/MESOS-2707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537025#comment-14537025 ] Shengwu Jiang commented on MESOS-2707: -- Oh, sorry, you're right. Incorrect zh:// URI scheme causes Slave to SegFault --- Key: MESOS-2707 URL: https://issues.apache.org/jira/browse/MESOS-2707 Project: Mesos Issue Type: Bug Affects Versions: 0.22.0 Environment: Linux iZ25to7d407Z 2.6.32-504.16.2.el6.x86_64 #1 SMP Wed Apr 22 06:48:29 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux Mesos 0.22.0,built from sources Zookeeper 3.4.6 Reporter: Shengwu Jiang Assignee: Marco Massenzio I have 4 slave nodes with the same hardware, operating system and mesos configuration. Few minutes ago, all 4 nodes were functioning well. I tried to change the config of *master* from _10.172.230.69:5050_ to _zh://10.172.230.69:2181/mesos_ and restarted them in turn. The other three had started normally but the last one got a segmentation fault as you can see below. {code} [root@iZ25to7d407Z ~]# mesos-slave --master=zh://10.172.230.69:2181/mesos --hostname=123.57.42.237 --containerizers=docker,mesos --quiet [1] 1216 [root@iZ25to7d407Z ~]# *** Aborted at 1431085131 (unix time) try date -d @1431085131 if you are using GNU date *** PC: @ 0x3aede7b53c (unknown) *** SIGSEGV (@0x0) received by PID 1216 (TID 0x7f12f984b820) from PID 0; stack trace: *** @ 0x3aee20f710 (unknown) @ 0x3aede7b53c (unknown) @ 0x3aedecf630 (unknown) @ 0x7f12fce1593f net::getIP() @ 0x7f12fce507ae process::operator() @ 0x7f12fce50107 process::UPID::UPID() @ 0x7f12fc52af71 mesos::internal::MasterDetector::create() @ 0x4b1290 main @ 0x3aede1ed5d (unknown) @ 0x4b00b9 (unknown) [1]+ Segmentation fault mesos-slave --master=zh://10.172.230.69:2181/mesos --hostname=123.57.42.237 --containerizers=docker,mesos --quiet {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-2707) Incorrect zh:// URI scheme causes Slave to SegFault
[ https://issues.apache.org/jira/browse/MESOS-2707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14536193#comment-14536193 ] Shengwu Jiang edited comment on MESOS-2707 at 5/10/15 5:00 AM: --- You mean *zh* scheme is no need here? But why the other nodes work fine with 'zh://' and connect to Zookeeper correctly. {code} [root@iZ25xeqr4wiZ ~]# ps aux|grep mesos root 11904 102 1.6 1004664 129264 pts/0 Sl 12:14 0:06 mesos-slave --master=zk://10.172.230.69:2181/mesos --hostname=123.56.117.48 --containerizers=docker,mesos --quiet root 11927 0.0 0.0 6388 668 pts/0S+ 12:14 0:00 grep mesos {code} add: my stupid question, didn't notice the typo was (Author: oliverpp): You mean *zh* scheme is no need here? But why the other nodes work fine with 'zh://' and connect to Zookeeper correctly. {code} [root@iZ25xeqr4wiZ ~]# ps aux|grep mesos root 11904 102 1.6 1004664 129264 pts/0 Sl 12:14 0:06 mesos-slave --master=zk://10.172.230.69:2181/mesos --hostname=123.56.117.48 --containerizers=docker,mesos --quiet root 11927 0.0 0.0 6388 668 pts/0S+ 12:14 0:00 grep mesos {code} Incorrect zh:// URI scheme causes Slave to SegFault --- Key: MESOS-2707 URL: https://issues.apache.org/jira/browse/MESOS-2707 Project: Mesos Issue Type: Bug Affects Versions: 0.22.0 Environment: Linux iZ25to7d407Z 2.6.32-504.16.2.el6.x86_64 #1 SMP Wed Apr 22 06:48:29 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux Mesos 0.22.0,built from sources Zookeeper 3.4.6 Reporter: Shengwu Jiang Assignee: Marco Massenzio I have 4 slave nodes with the same hardware, operating system and mesos configuration. Few minutes ago, all 4 nodes were functioning well. I tried to change the config of *master* from _10.172.230.69:5050_ to _zh://10.172.230.69:2181/mesos_ and restarted them in turn. The other three had started normally but the last one got a segmentation fault as you can see below. {code} [root@iZ25to7d407Z ~]# mesos-slave --master=zh://10.172.230.69:2181/mesos --hostname=123.57.42.237 --containerizers=docker,mesos --quiet [1] 1216 [root@iZ25to7d407Z ~]# *** Aborted at 1431085131 (unix time) try date -d @1431085131 if you are using GNU date *** PC: @ 0x3aede7b53c (unknown) *** SIGSEGV (@0x0) received by PID 1216 (TID 0x7f12f984b820) from PID 0; stack trace: *** @ 0x3aee20f710 (unknown) @ 0x3aede7b53c (unknown) @ 0x3aedecf630 (unknown) @ 0x7f12fce1593f net::getIP() @ 0x7f12fce507ae process::operator() @ 0x7f12fce50107 process::UPID::UPID() @ 0x7f12fc52af71 mesos::internal::MasterDetector::create() @ 0x4b1290 main @ 0x3aede1ed5d (unknown) @ 0x4b00b9 (unknown) [1]+ Segmentation fault mesos-slave --master=zh://10.172.230.69:2181/mesos --hostname=123.57.42.237 --containerizers=docker,mesos --quiet {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-2712) When trying to install mesos 0.22.0 version on Redhat Enterprise Linux 6.0 , i am getting error configure error cannot find libsvn_subr-1 headers . I tried with ./confi
Sujit created MESOS-2712: Summary: When trying to install mesos 0.22.0 version on Redhat Enterprise Linux 6.0 , i am getting error configure error cannot find libsvn_subr-1 headers . I tried with ./configure --with-svn option also but still the same. Key: MESOS-2712 URL: https://issues.apache.org/jira/browse/MESOS-2712 Project: Mesos Issue Type: Bug Components: general Affects Versions: 0.22.0 Reporter: Sujit Priority: Blocker When trying to install mesos 0.22.0 version on Redhat Enterprise Linux 6.0 , i am getting error configure error cannot find libsvn_subr-1 headers . I tried with ./configure --with-svn option also but still the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2023) mesos-execute should allow setting environment variables
[ https://issues.apache.org/jira/browse/MESOS-2023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14536423#comment-14536423 ] haosdent commented on MESOS-2023: - ping [~adam-mesos] I have update the code, could you review again? Thank you. mesos-execute should allow setting environment variables Key: MESOS-2023 URL: https://issues.apache.org/jira/browse/MESOS-2023 Project: Mesos Issue Type: Improvement Components: cli Affects Versions: 0.20.1 Reporter: Steven Schlansker Assignee: haosdent Labels: newbie mesos-execute does not allow setting various properties of the 'CommandInfo' protobuf. Most notably, being able to set environment variables and URIs would be very useful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-1739) Allow slave reconfiguration on restart
[ https://issues.apache.org/jira/browse/MESOS-1739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14536280#comment-14536280 ] Cody Maloney commented on MESOS-1739: - The biggest thing which came up in my old patchset was race conditions around re-registering in how the mesos registerSlave / reregisterSlave code is setup which probably will need some structural reworking. The case that was broken in my patch set is when a slave tries to register multiple times because it hasn't gotten a response from the master yet, and 1+ of those retries aren't identical to the first because they contain different resources / attributes (The slave started re-registration, then was restarted with new attributes before the master fully processed it), the master doesn't notice and just discards them as repeats. Allow slave reconfiguration on restart -- Key: MESOS-1739 URL: https://issues.apache.org/jira/browse/MESOS-1739 Project: Mesos Issue Type: Epic Reporter: Patrick Reilly Assignee: Cody Maloney Labels: mesosphere Make it so that either via a slave restart or a out of process reconfigure ping, the attributes and resources of a slave can be updated to be a superset of what they used to be. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2539) ExamplesTest.LowLevelSchedulerLibprocess is flaky
[ https://issues.apache.org/jira/browse/MESOS-2539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14536773#comment-14536773 ] haosdent commented on MESOS-2539: - [~bmahler] I have a simple patch here. https://reviews.apache.org/r/34016/diff/ I use this command to test it in CentOS. And the problem could not reproduce now. {code} ./bin/mesos-tests.sh --gtest_filter=ExamplesTest.LowLevelSchedulerLibprocess --verbose --gtest_repeat=50 --gtest_break_on_failure {code} But I not sure whether my patch is match your idea, could you give some advice? Thank you very much. ExamplesTest.LowLevelSchedulerLibprocess is flaky - Key: MESOS-2539 URL: https://issues.apache.org/jira/browse/MESOS-2539 Project: Mesos Issue Type: Bug Affects Versions: 0.22.0, 0.23.0 Reporter: Jie Yu Assignee: haosdent Centos6 gcc-44 sudo make check {noformat} [ RUN ] ExamplesTest.LowLevelSchedulerLibprocess 2015-03-24 19:54:54,995:5735(0x7fc007fff700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:37590] zk retcode=-4, errno =111(Connection refused): server refused to accept the client *** glibc detected *** /home/jyu/workspace/mesos-dist/build/src/.libs: double free or corruption (fasttop): 0x7f7f6c003150 *** === Backtrace: = /lib64/libc.so.6(+0x75e66)[0x7f7f8b79ee66] /home/jyu/workspace/mesos-dist/build/src/.libs/libmesos-0.23.0.so(_ZNSt3tr114_Function_base13_Base_managerIN7process6_DeferIFPFvRK NS2_3PIDIN5mesos8internal5slave5SlaveEEEMS8_FviiEiiES9_SD_NS_12_PlaceholderILi1EEENSG_ILi2EEE10_M_destroyERNS_9_Any_dataENS_17 integral_constantIbLb0EEE+0x31)[0x7f7f8ecef16b] /home/jyu/workspace/mesos-dist/build/src/.libs/libmesos-0.23.0.so(_ZNSt3tr114_Function_base13_Base_managerIN7process6_DeferIFPFvRK NS2_3PIDIN5mesos8internal5slave5SlaveEEEMS8_FviiEiiES9_SD_NS_12_PlaceholderILi1EEENSG_ILi2EEE10_M_managerERNS_9_Any_dataERKSM_ NS_18_Manager_operationE+0x92)[0x7f7f8ece17c0] /home/jyu/workspace/mesos-dist/build/src/.libs(_ZNSt3tr114_Function_baseD1Ev+0x37)[0x45107d] /home/jyu/workspace/mesos-dist/build/src/.libs/libmesos-0.23.0.so(_ZNSt3tr18functionIFviiEED1Ev+0x18)[0x7f7f8ecbeb34] /home/jyu/workspace/mesos-dist/build/src/.libs/libmesos-0.23.0.so(_ZNSt3tr18functionIFviiEEaSIN7process6_DeferIFPFvRKNS4_3PIDIN5me sos8internal5slave5SlaveEEEMSA_FviiEiiESB_SF_NS_12_PlaceholderILi1EEENSI_ILi2N9__gnu_cxx11__enable_ifIXntsrNS_11is_integra lIT_EE5valueERS2_E6__typeESQ_+0x85)[0x7f7f8ecbebbb] /home/jyu/workspace/mesos-dist/build/src/.libs/libmesos-0.23.0.so(_ZN5mesos8internal5slave5Slave10initializeEv+0x31bb)[0x7f7f8ec8b f99] /home/jyu/workspace/mesos-dist/build/src/.libs/libmesos-0.23.0.so(_ZN7process14ProcessManager6resumeEPNS_11ProcessBaseE+0x299)[0x7 f7f8f3bf007] /home/jyu/workspace/mesos-dist/build/src/.libs/libmesos-0.23.0.so(_ZN7process8scheduleEPv+0x91)[0x7f7f8f3b3a75] /lib64/libpthread.so.0(+0x79d1)[0x7f7f8c2649d1] /lib64/libc.so.6(clone+0x6d)[0x7f7f8b8118fd] {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2539) ExamplesTest.LowLevelSchedulerLibprocess is flaky
[ https://issues.apache.org/jira/browse/MESOS-2539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14536745#comment-14536745 ] haosdent commented on MESOS-2539: - And I still have a problem here. {code} signaledWrapper = defer(self(), Slave::signaled, lambda::_1, lambda::_2); {code} The return object of defer should only available in Slave::initialize, it should be destroyed after Slave::initialize return. Why we assign this temp object to signaledWrapper which scope is not in Slave::initialize. ExamplesTest.LowLevelSchedulerLibprocess is flaky - Key: MESOS-2539 URL: https://issues.apache.org/jira/browse/MESOS-2539 Project: Mesos Issue Type: Bug Affects Versions: 0.22.0, 0.23.0 Reporter: Jie Yu Assignee: haosdent Centos6 gcc-44 sudo make check {noformat} [ RUN ] ExamplesTest.LowLevelSchedulerLibprocess 2015-03-24 19:54:54,995:5735(0x7fc007fff700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:37590] zk retcode=-4, errno =111(Connection refused): server refused to accept the client *** glibc detected *** /home/jyu/workspace/mesos-dist/build/src/.libs: double free or corruption (fasttop): 0x7f7f6c003150 *** === Backtrace: = /lib64/libc.so.6(+0x75e66)[0x7f7f8b79ee66] /home/jyu/workspace/mesos-dist/build/src/.libs/libmesos-0.23.0.so(_ZNSt3tr114_Function_base13_Base_managerIN7process6_DeferIFPFvRK NS2_3PIDIN5mesos8internal5slave5SlaveEEEMS8_FviiEiiES9_SD_NS_12_PlaceholderILi1EEENSG_ILi2EEE10_M_destroyERNS_9_Any_dataENS_17 integral_constantIbLb0EEE+0x31)[0x7f7f8ecef16b] /home/jyu/workspace/mesos-dist/build/src/.libs/libmesos-0.23.0.so(_ZNSt3tr114_Function_base13_Base_managerIN7process6_DeferIFPFvRK NS2_3PIDIN5mesos8internal5slave5SlaveEEEMS8_FviiEiiES9_SD_NS_12_PlaceholderILi1EEENSG_ILi2EEE10_M_managerERNS_9_Any_dataERKSM_ NS_18_Manager_operationE+0x92)[0x7f7f8ece17c0] /home/jyu/workspace/mesos-dist/build/src/.libs(_ZNSt3tr114_Function_baseD1Ev+0x37)[0x45107d] /home/jyu/workspace/mesos-dist/build/src/.libs/libmesos-0.23.0.so(_ZNSt3tr18functionIFviiEED1Ev+0x18)[0x7f7f8ecbeb34] /home/jyu/workspace/mesos-dist/build/src/.libs/libmesos-0.23.0.so(_ZNSt3tr18functionIFviiEEaSIN7process6_DeferIFPFvRKNS4_3PIDIN5me sos8internal5slave5SlaveEEEMSA_FviiEiiESB_SF_NS_12_PlaceholderILi1EEENSI_ILi2N9__gnu_cxx11__enable_ifIXntsrNS_11is_integra lIT_EE5valueERS2_E6__typeESQ_+0x85)[0x7f7f8ecbebbb] /home/jyu/workspace/mesos-dist/build/src/.libs/libmesos-0.23.0.so(_ZN5mesos8internal5slave5Slave10initializeEv+0x31bb)[0x7f7f8ec8b f99] /home/jyu/workspace/mesos-dist/build/src/.libs/libmesos-0.23.0.so(_ZN7process14ProcessManager6resumeEPNS_11ProcessBaseE+0x299)[0x7 f7f8f3bf007] /home/jyu/workspace/mesos-dist/build/src/.libs/libmesos-0.23.0.so(_ZN7process8scheduleEPv+0x91)[0x7f7f8f3b3a75] /lib64/libpthread.so.0(+0x79d1)[0x7f7f8c2649d1] /lib64/libc.so.6(clone+0x6d)[0x7f7f8b8118fd] {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (MESOS-2539) ExamplesTest.LowLevelSchedulerLibprocess is flaky
[ https://issues.apache.org/jira/browse/MESOS-2539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] haosdent updated MESOS-2539: Comment: was deleted (was: And I still have a problem here. {code} signaledWrapper = defer(self(), Slave::signaled, lambda::_1, lambda::_2); {code} The return object of defer should only available in Slave::initialize, it should be destroyed after Slave::initialize return. Why we assign this temp object to signaledWrapper which scope is not in Slave::initialize.) ExamplesTest.LowLevelSchedulerLibprocess is flaky - Key: MESOS-2539 URL: https://issues.apache.org/jira/browse/MESOS-2539 Project: Mesos Issue Type: Bug Affects Versions: 0.22.0, 0.23.0 Reporter: Jie Yu Assignee: haosdent Centos6 gcc-44 sudo make check {noformat} [ RUN ] ExamplesTest.LowLevelSchedulerLibprocess 2015-03-24 19:54:54,995:5735(0x7fc007fff700):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:37590] zk retcode=-4, errno =111(Connection refused): server refused to accept the client *** glibc detected *** /home/jyu/workspace/mesos-dist/build/src/.libs: double free or corruption (fasttop): 0x7f7f6c003150 *** === Backtrace: = /lib64/libc.so.6(+0x75e66)[0x7f7f8b79ee66] /home/jyu/workspace/mesos-dist/build/src/.libs/libmesos-0.23.0.so(_ZNSt3tr114_Function_base13_Base_managerIN7process6_DeferIFPFvRK NS2_3PIDIN5mesos8internal5slave5SlaveEEEMS8_FviiEiiES9_SD_NS_12_PlaceholderILi1EEENSG_ILi2EEE10_M_destroyERNS_9_Any_dataENS_17 integral_constantIbLb0EEE+0x31)[0x7f7f8ecef16b] /home/jyu/workspace/mesos-dist/build/src/.libs/libmesos-0.23.0.so(_ZNSt3tr114_Function_base13_Base_managerIN7process6_DeferIFPFvRK NS2_3PIDIN5mesos8internal5slave5SlaveEEEMS8_FviiEiiES9_SD_NS_12_PlaceholderILi1EEENSG_ILi2EEE10_M_managerERNS_9_Any_dataERKSM_ NS_18_Manager_operationE+0x92)[0x7f7f8ece17c0] /home/jyu/workspace/mesos-dist/build/src/.libs(_ZNSt3tr114_Function_baseD1Ev+0x37)[0x45107d] /home/jyu/workspace/mesos-dist/build/src/.libs/libmesos-0.23.0.so(_ZNSt3tr18functionIFviiEED1Ev+0x18)[0x7f7f8ecbeb34] /home/jyu/workspace/mesos-dist/build/src/.libs/libmesos-0.23.0.so(_ZNSt3tr18functionIFviiEEaSIN7process6_DeferIFPFvRKNS4_3PIDIN5me sos8internal5slave5SlaveEEEMSA_FviiEiiESB_SF_NS_12_PlaceholderILi1EEENSI_ILi2N9__gnu_cxx11__enable_ifIXntsrNS_11is_integra lIT_EE5valueERS2_E6__typeESQ_+0x85)[0x7f7f8ecbebbb] /home/jyu/workspace/mesos-dist/build/src/.libs/libmesos-0.23.0.so(_ZN5mesos8internal5slave5Slave10initializeEv+0x31bb)[0x7f7f8ec8b f99] /home/jyu/workspace/mesos-dist/build/src/.libs/libmesos-0.23.0.so(_ZN7process14ProcessManager6resumeEPNS_11ProcessBaseE+0x299)[0x7 f7f8f3bf007] /home/jyu/workspace/mesos-dist/build/src/.libs/libmesos-0.23.0.so(_ZN7process8scheduleEPv+0x91)[0x7f7f8f3b3a75] /lib64/libpthread.so.0(+0x79d1)[0x7f7f8c2649d1] /lib64/libc.so.6(clone+0x6d)[0x7f7f8b8118fd] {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2712) When trying to install mesos 0.22.0 version on Redhat Enterprise Linux 6.0 , i am getting error configure error cannot find libsvn_subr-1 headers . I tried with ./con
[ https://issues.apache.org/jira/browse/MESOS-2712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14536779#comment-14536779 ] haosdent commented on MESOS-2712: - Please checkout this http://mesos.apache.org/gettingstarted/ start guide to build it. I could build it success in CentOS 6.5, I think Red Hat 6 should have similar steps. When trying to install mesos 0.22.0 version on Redhat Enterprise Linux 6.0 , i am getting error configure error cannot find libsvn_subr-1 headers . I tried with ./configure --with-svn option also but still the same. -- Key: MESOS-2712 URL: https://issues.apache.org/jira/browse/MESOS-2712 Project: Mesos Issue Type: Bug Components: general Affects Versions: 0.22.0 Reporter: Sujit Priority: Blocker When trying to install mesos 0.22.0 version on Redhat Enterprise Linux 6.0 , i am getting error configure error cannot find libsvn_subr-1 headers . I tried with ./configure --with-svn option also but still the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2713) Docker resource usage
[ https://issues.apache.org/jira/browse/MESOS-2713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14536794#comment-14536794 ] Timothy Chen commented on MESOS-2713: - At the time we did the integration we just wanted to use existing statistics code that's in Mesos, and I believe it was written to not assume cgroups is present. I'm not aware that using sysctl stats will be inaccurate, it should be roughly the same as cgrups AFAIK. Docker resource usage -- Key: MESOS-2713 URL: https://issues.apache.org/jira/browse/MESOS-2713 Project: Mesos Issue Type: Bug Components: containerization, docker, isolation Affects Versions: 0.22.1 Reporter: Ian Babrou Looks like resource usage for docker containers on slaves is not very accurate (/monitor/statistics.json). For example, cpu usage is calculated by travesing process tree and summing up cpu times. Resulting numbers are not even close to real usage, CPU time can even decrease. What is the reason for this if you can use cgroup data directly? Reading cgroup location from pid of docker container is pretty straighforward. Another similar question: what is the reason to set isolation to posix instead of cgroups by default? Looks like it suffers from the same issues as docker containerizer (incorrect stats). More docs on this topic would be great. Posix isolation also leads to bigger CPU usage from mesos slave process (higher usage — posix isolation): http://i.imgur.com/jepk5m6.png -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2713) Docker resource usage
[ https://issues.apache.org/jira/browse/MESOS-2713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14536807#comment-14536807 ] Ian Babrou commented on MESOS-2713: --- I'm not sure if you could run docker containers without cgroups. Anyway, graceful fallback to existing stats instead of cgroups would be better. Take a look: web300 ~ # cat /sys/fs/cgroup/cpuacct/docker/944fe900f60595d37ce4db3c4c09c196be3b500c2d3e89dab59351da2c8b597d/cpuacct.stat user 20964 system 1167 web300 ~ # curl -s http://web300:5051/monitor/statistics.json | jq . [ { statistics: { timestamp: 1431194945.15193, mem_rss_bytes: 408150016, mem_limit_bytes: 2181038080, cpus_user_time_secs: 1.46, cpus_system_time_secs: 0.35, cpus_limit: 3.6 }, source: topface_prod-test_app.c80a053f-f66f-11e4-a977-56847afe9799, framework_id: 20150126-100650-3909200064-5050-1-0007, executor_name: Command Executor (Task: topface_prod-test_app.c80a053f-f66f-11e4-a977-56847afe9799) (Command: sh -c 'exec /sbin/m...'), executor_id: topface_prod-test_app.c80a053f-f66f-11e4-a977-56847afe9799 } ] Now take another look, user time decreases: web300 ~ # curl -s http://web300:5051/monitor/statistics.json | jq . [ { statistics: { timestamp: 1431195057.42133, mem_rss_bytes: 428085248, mem_limit_bytes: 2181038080, cpus_user_time_secs: 4.56, cpus_system_time_secs: 0.43, cpus_limit: 3.6 }, source: topface_prod-test_app.c80a053f-f66f-11e4-a977-56847afe9799, framework_id: 20150126-100650-3909200064-5050-1-0007, executor_name: Command Executor (Task: topface_prod-test_app.c80a053f-f66f-11e4-a977-56847afe9799) (Command: sh -c 'exec /sbin/m...'), executor_id: topface_prod-test_app.c80a053f-f66f-11e4-a977-56847afe9799 } ] web300 ~ # curl -s http://web300:5051/monitor/statistics.json | jq . [ { statistics: { timestamp: 1431195058.38549, mem_rss_bytes: 335261696, mem_limit_bytes: 2181038080, cpus_user_time_secs: 0.73, cpus_system_time_secs: 0.31, cpus_limit: 3.6 }, source: topface_prod-test_app.c80a053f-f66f-11e4-a977-56847afe9799, framework_id: 20150126-100650-3909200064-5050-1-0007, executor_name: Command Executor (Task: topface_prod-test_app.c80a053f-f66f-11e4-a977-56847afe9799) (Command: sh -c 'exec /sbin/m...'), executor_id: topface_prod-test_app.c80a053f-f66f-11e4-a977-56847afe9799 } ] Docker resource usage -- Key: MESOS-2713 URL: https://issues.apache.org/jira/browse/MESOS-2713 Project: Mesos Issue Type: Bug Components: containerization, docker, isolation Affects Versions: 0.22.1 Reporter: Ian Babrou Looks like resource usage for docker containers on slaves is not very accurate (/monitor/statistics.json). For example, cpu usage is calculated by travesing process tree and summing up cpu times. Resulting numbers are not even close to real usage, CPU time can even decrease. What is the reason for this if you can use cgroup data directly? Reading cgroup location from pid of docker container is pretty straighforward. Another similar question: what is the reason to set isolation to posix instead of cgroups by default? Looks like it suffers from the same issues as docker containerizer (incorrect stats). More docs on this topic would be great. Posix isolation also leads to bigger CPU usage from mesos slave process (higher usage — posix isolation): http://i.imgur.com/jepk5m6.png -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-2670) Update existing lambdas to meet style guide
[ https://issues.apache.org/jira/browse/MESOS-2670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14536853#comment-14536853 ] haosdent commented on MESOS-2670: - According [~jvanremoortere] advice, replace the 'lambda::bind' expressions match these rules: * Binds to a static function without any side-effects. * Is self contained (i.e. does not rely on contextual parameters) * Does not bind in any arguments. (i.e. only uses lambda::_N for arguments) * Is only called in 1 place OR is so small that it is ok to repeat the code. Patch: https://reviews.apache.org/r/34017/ https://reviews.apache.org/r/34018/ Update existing lambdas to meet style guide --- Key: MESOS-2670 URL: https://issues.apache.org/jira/browse/MESOS-2670 Project: Mesos Issue Type: Task Reporter: Joris Van Remoortere Assignee: haosdent Labels: c++11 There are already some lambdas in C++11 specific files. Modify these to meet the updated style guide. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-1739) Allow slave reconfiguration on restart
[ https://issues.apache.org/jira/browse/MESOS-1739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14536242#comment-14536242 ] Joe Smith commented on MESOS-1739: -- Howdy all, What's the status of this? This change would greatly increase flexibility for us operators! Thanks, Joe Allow slave reconfiguration on restart -- Key: MESOS-1739 URL: https://issues.apache.org/jira/browse/MESOS-1739 Project: Mesos Issue Type: Epic Reporter: Patrick Reilly Assignee: Cody Maloney Make it so that either via a slave restart or a out of process reconfigure ping, the attributes and resources of a slave can be updated to be a superset of what they used to be. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-1739) Allow slave reconfiguration on restart
[ https://issues.apache.org/jira/browse/MESOS-1739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam B updated MESOS-1739: -- Labels: mesosphere (was: ) Allow slave reconfiguration on restart -- Key: MESOS-1739 URL: https://issues.apache.org/jira/browse/MESOS-1739 Project: Mesos Issue Type: Epic Reporter: Patrick Reilly Assignee: Cody Maloney Labels: mesosphere Make it so that either via a slave restart or a out of process reconfigure ping, the attributes and resources of a slave can be updated to be a superset of what they used to be. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MESOS-1739) Allow slave reconfiguration on restart
[ https://issues.apache.org/jira/browse/MESOS-1739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14536250#comment-14536250 ] Adam B edited comment on MESOS-1739 at 5/9/15 6:21 AM: --- [~cmaloney] created a design doc and a prototype, but hasn't had time to revisit it yet. Maybe somebody else should pick it up. We agree that this is very important, along with it's FrameworkInfo corollary MESOS-703 which just got some recent attention. We could try to get a phase 1 implemented in Mesos 0.23 if somebody has the time. was (Author: adam-mesos): [~cmaloney] created a design doc and a prototype, but hasn't had time to revisit it yet. Maybe somebody else should pick it up. We agree that this is very important, along with it's FrameworkInfo corollary MESOS-703 which just got some recent attention. We could try to get a phase 1 implemented in Mesos 0.23 if somebody has the time. Allow slave reconfiguration on restart -- Key: MESOS-1739 URL: https://issues.apache.org/jira/browse/MESOS-1739 Project: Mesos Issue Type: Epic Reporter: Patrick Reilly Assignee: Cody Maloney Labels: mesosphere Make it so that either via a slave restart or a out of process reconfigure ping, the attributes and resources of a slave can be updated to be a superset of what they used to be. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-1739) Allow slave reconfiguration on restart
[ https://issues.apache.org/jira/browse/MESOS-1739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14536250#comment-14536250 ] Adam B commented on MESOS-1739: --- [~cmaloney] created a design doc and a prototype, but hasn't had time to revisit it yet. Maybe somebody else should pick it up. We agree that this is very important, along with it's FrameworkInfo corollary MESOS-703 which just got some recent attention. We could try to get a phase 1 implemented in Mesos 0.23 if somebody has the time. Allow slave reconfiguration on restart -- Key: MESOS-1739 URL: https://issues.apache.org/jira/browse/MESOS-1739 Project: Mesos Issue Type: Epic Reporter: Patrick Reilly Assignee: Cody Maloney Labels: mesosphere Make it so that either via a slave restart or a out of process reconfigure ping, the attributes and resources of a slave can be updated to be a superset of what they used to be. -- This message was sent by Atlassian JIRA (v6.3.4#6332)