Re: Mesos slave help
Hi Stephen, You can see all the launch flags here: http://mesos.apache.org/documentation/latest/configuration/ (or just running .../mesos-slave.sh --help) If you launch it via systemd (which is actually how we run it ourselves in DCOS) you will have to configure your nodes (master/agents) via the MESOS_* environment variables. In production, obviously, you want to use ZooKeeper as the discovery / coordination method (as you correctly did here): you can obviously use whatever you like as the znode path there, but it must be the same for all masters/agents. Make sure, if your run a test/dev configuration with multiple masters/agents on the same node to (a) configure each master on their own port (--port) and (b) to make each node point to a different work_dir (or you'll get confusing errors around log-replicas). (@haosdent: I'm *almost* sure the packaging is correct, but needs the env vars to be configured properly) *Marco Massenzio* *Distributed Systems Engineerhttp://codetrips.com http://codetrips.com* On Thu, Aug 6, 2015 at 4:12 AM, Stephen Knight skni...@pivotal.io wrote: Ok, that's working if I run it like this: /usr/sbin/mesos-slave --master=zk://172.31.x.x:2181/mesos /dev/null 21 Thanks for your help, really appreciate it. On Thu, Aug 6, 2015 at 3:03 PM, haosdent haosd...@gmail.com wrote: Hm, need pass your master location, for example: /usr/sbin/mesos-slave --master=x.x.x.x:5050 if you use zookeeper, need use the format like: /usr/sbin/mesos-slave --master=zk://host1:port1,host2:port2,.../path On Thu, Aug 6, 2015 at 6:55 PM, Stephen Knight skni...@pivotal.io wrote: My system doesn't support cat with systemctl for some reason but here is the contents of /usr/lib/systemd/system/mesos-slave.service [Unit] Description=Mesos Slave After=network.target Wants=network.target [Service] ExecStart=/usr/bin/mesos-init-wrapper slave KillMode=process Restart=always RestartSec=20 LimitNOFILE=16384 CPUAccounting=true MemoryAccounting=true [Install] WantedBy=multi-user.target What are the required flags to start it manually? On Thu, Aug 6, 2015 at 2:51 PM, haosdent haosd...@gmail.com wrote: Or you could try systemctl cat mesos-slave.service and show us the file content. On Thu, Aug 6, 2015 at 6:49 PM, haosdent haosd...@gmail.com wrote: From this message, I think systemctl status mesos-slave.service -l run mesos-slave with uncorrect flags. And the status out of it is the help message of slave. Could you try to start mesos-slave in manual way? Not through systemctl. On Thu, Aug 6, 2015 at 6:41 PM, Stephen Knight skni...@pivotal.io wrote: systemctl gives me the following output on CentOS: The command to start I ran was systemctl start mesos-slave.service [root@ip-172-31-35-167 mesos]# systemctl status mesos-slave.service -l mesos-slave.service - Mesos Slave Loaded: loaded (/usr/lib/systemd/system/mesos-slave.service; enabled) Drop-In: /etc/systemd/system/mesos-slave.service.d └─mesos-slave-containerizers.conf Active: activating (auto-restart) (Result: exit-code) since Thu 2015-08-06 10:38:08 UTC; 2s ago Process: 1472 ExecStart=/usr/bin/mesos-init-wrapper slave *(code=exited, status=1/FAILURE)* Main PID: 1472 (code=exited, status=1/FAILURE) Aug 06 10:38:08 ip-172-31-35-167.ec2.internal mesos-slave[1483]: *If strict=false, any expected errors (e.g., slave cannot recover* Aug 06 10:38:08 ip-172-31-35-167.ec2.internal mesos-slave[1483]: *information about an executor, because the slave died right before* Aug 06 10:38:08 ip-172-31-35-167.ec2.internal mesos-slave[1483]: *the executor registered.) during recovery are ignored and as much* Aug 06 10:38:08 ip-172-31-35-167.ec2.internal mesos-slave[1483]: *state as possible is recovered.* Aug 06 10:38:08 ip-172-31-35-167.ec2.internal mesos-slave[1483]: *(default: true)* Aug 06 10:38:08 ip-172-31-35-167.ec2.internal mesos-slave[1483]: *--[no-]switch_user Whether to run tasks as the user who* Aug 06 10:38:08 ip-172-31-35-167.ec2.internal mesos-slave[1483]: *submitted them rather than the user running* Aug 06 10:38:08 ip-172-31-35-167.ec2.internal mesos-slave[1483]: *the slave (requires setuid permission) (default: true)* Aug 06 10:38:08 ip-172-31-35-167.ec2.internal mesos-slave[1483]: *--[no-]version Show version and exit. (default: false)* Aug 06 10:38:08 ip-172-31-35-167.ec2.internal mesos-slave[1483]: *--work_dir=VALUE Directory path to place framework work directories* I've also run strace against it, nothing sticks out: strace systemctl start mesos-slave.service execve(/bin/systemctl, [systemctl, start, mesos-slave.service], [/* 18 vars */]) = 0 brk(0) = 0x7f5c2af9f000 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5c2a5c6000
Re: Mesos slave help
Hi, @Stephen From your slave log, could not see the restart log about slave. Are you sure you restart slave after reboot? On Thu, Aug 6, 2015 at 5:54 PM, Stephen Knight skni...@pivotal.io wrote: Hi Klaus, I have attached all from a master and a slave. I've replicated the problem over and over again, not sure what to make of it. First registration is fine but then if I reboot the service for mesos-slave (process restart of full server restart) it never connects again. The VM's are in the same VPC on AWS with an open security group between them. On Thu, Aug 6, 2015 at 12:41 PM, Klaus Ma kl...@cguru.net wrote: Hi Stephen, Would you share the log of master slave? Thanks Klaus On 2015年08月06日 16:07, Stephen Knight wrote: Hi, I was wondering if anyone can help me. I have a test setup, 1 master/zookeeper and 2 slaves on Ubuntu 14.04. When I initialize the slaves the first time it all works and they register with the master (I can see it on x.x.x.x:5050) but when I reboot those slaves for any reason, they never re-register. Am I missing something? Thx -- --- Stephen Knight Infrastructure Consultant Pivotal Services @ EMC +971 (0)56 538 2071 skni...@pivotal.io stephen.knig...@emc.com Pivotal.io Notice of Confidentiality - This email message is for the sole use of the intended recipient and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. -- Best Regards, Haosdent Huang
Re: Mesos slave help
My system doesn't support cat with systemctl for some reason but here is the contents of /usr/lib/systemd/system/mesos-slave.service [Unit] Description=Mesos Slave After=network.target Wants=network.target [Service] ExecStart=/usr/bin/mesos-init-wrapper slave KillMode=process Restart=always RestartSec=20 LimitNOFILE=16384 CPUAccounting=true MemoryAccounting=true [Install] WantedBy=multi-user.target What are the required flags to start it manually? On Thu, Aug 6, 2015 at 2:51 PM, haosdent haosd...@gmail.com wrote: Or you could try systemctl cat mesos-slave.service and show us the file content. On Thu, Aug 6, 2015 at 6:49 PM, haosdent haosd...@gmail.com wrote: From this message, I think systemctl status mesos-slave.service -l run mesos-slave with uncorrect flags. And the status out of it is the help message of slave. Could you try to start mesos-slave in manual way? Not through systemctl. On Thu, Aug 6, 2015 at 6:41 PM, Stephen Knight skni...@pivotal.io wrote: systemctl gives me the following output on CentOS: The command to start I ran was systemctl start mesos-slave.service [root@ip-172-31-35-167 mesos]# systemctl status mesos-slave.service -l mesos-slave.service - Mesos Slave Loaded: loaded (/usr/lib/systemd/system/mesos-slave.service; enabled) Drop-In: /etc/systemd/system/mesos-slave.service.d └─mesos-slave-containerizers.conf Active: activating (auto-restart) (Result: exit-code) since Thu 2015-08-06 10:38:08 UTC; 2s ago Process: 1472 ExecStart=/usr/bin/mesos-init-wrapper slave *(code=exited, status=1/FAILURE)* Main PID: 1472 (code=exited, status=1/FAILURE) Aug 06 10:38:08 ip-172-31-35-167.ec2.internal mesos-slave[1483]: *If strict=false, any expected errors (e.g., slave cannot recover* Aug 06 10:38:08 ip-172-31-35-167.ec2.internal mesos-slave[1483]: *information about an executor, because the slave died right before* Aug 06 10:38:08 ip-172-31-35-167.ec2.internal mesos-slave[1483]: *the executor registered.) during recovery are ignored and as much* Aug 06 10:38:08 ip-172-31-35-167.ec2.internal mesos-slave[1483]: *state as possible is recovered.* Aug 06 10:38:08 ip-172-31-35-167.ec2.internal mesos-slave[1483]: *(default: true)* Aug 06 10:38:08 ip-172-31-35-167.ec2.internal mesos-slave[1483]: *--[no-]switch_user Whether to run tasks as the user who* Aug 06 10:38:08 ip-172-31-35-167.ec2.internal mesos-slave[1483]: *submitted them rather than the user running* Aug 06 10:38:08 ip-172-31-35-167.ec2.internal mesos-slave[1483]: *the slave (requires setuid permission) (default: true)* Aug 06 10:38:08 ip-172-31-35-167.ec2.internal mesos-slave[1483]: *--[no-]version Show version and exit. (default: false)* Aug 06 10:38:08 ip-172-31-35-167.ec2.internal mesos-slave[1483]: *--work_dir=VALUE Directory path to place framework work directories* I've also run strace against it, nothing sticks out: strace systemctl start mesos-slave.service execve(/bin/systemctl, [systemctl, start, mesos-slave.service], [/* 18 vars */]) = 0 brk(0) = 0x7f5c2af9f000 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5c2a5c6000 access(/etc/ld.so.preload, R_OK) = -1 ENOENT (No such file or directory) open(/etc/ld.so.cache, O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=20940, ...}) = 0 mmap(NULL, 20940, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f5c2a5c close(3)= 0 open(/lib64/libsystemd-daemon.so.0, O_RDONLY|O_CLOEXEC) = 3 read(3, \177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0\0\1\0\0\0\240\r\0\0\0\0\0\0..., 832) = 832 fstat(3, {st_mode=S_IFREG|0755, st_size=15216, ...}) = 0 mmap(NULL, 2109448, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f5c2a1a2000 mprotect(0x7f5c2a1a4000, 2097152, PROT_NONE) = 0 mmap(0x7f5c2a3a4000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2000) = 0x7f5c2a3a4000 mmap(0x7f5c2a3a5000, 8, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f5c2a3a5000 close(3)= 0 open(/lib64/libdbus-1.so.3, O_RDONLY|O_CLOEXEC) = 3 read(3, \177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0\0\1\0\0\0@x\0\0\0\0\0\0..., 832) = 832 fstat(3, {st_mode=S_IFREG|0755, st_size=304536, ...}) = 0 mmap(NULL, 2390496, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f5c29f5a000 mprotect(0x7f5c29fa, 2097152, PROT_NONE) = 0 mmap(0x7f5c2a1a, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x46000) = 0x7f5c2a1a close(3)= 0 open(/lib64/librt.so.1, O_RDONLY|O_CLOEXEC) = 3 read(3, \177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0\0\1\0\0\0\300\\0\0\0\0\0\0..., 832) = 832 fstat(3, {st_mode=S_IFREG|0755, st_size=44088, ...}) = 0 mmap(NULL, 4096,
Re: Mesos slave help
Hi Stephen, Would you share the log of master slave? Thanks Klaus On 2015年08月06日 16:07, Stephen Knight wrote: Hi, I was wondering if anyone can help me. I have a test setup, 1 master/zookeeper and 2 slaves on Ubuntu 14.04. When I initialize the slaves the first time it all works and they register with the master (I can see it on x.x.x.x:5050) but when I reboot those slaves for any reason, they never re-register. Am I missing something? Thx -- --- Stephen Knight Infrastructure Consultant Pivotal Services @ EMC +971 (0)56 538 2071 skni...@pivotal.io mailto:skni...@pivotal.io stephen.knig...@emc.com mailto:stephen.knig...@emc.com Pivotal.io Notice of Confidentiality - This email message is for the sole use of the intended recipient and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
Re: Mesos slave help
From this message, I think systemctl status mesos-slave.service -l run mesos-slave with uncorrect flags. And the status out of it is the help message of slave. Could you try to start mesos-slave in manual way? Not through systemctl. On Thu, Aug 6, 2015 at 6:41 PM, Stephen Knight skni...@pivotal.io wrote: systemctl gives me the following output on CentOS: The command to start I ran was systemctl start mesos-slave.service [root@ip-172-31-35-167 mesos]# systemctl status mesos-slave.service -l mesos-slave.service - Mesos Slave Loaded: loaded (/usr/lib/systemd/system/mesos-slave.service; enabled) Drop-In: /etc/systemd/system/mesos-slave.service.d └─mesos-slave-containerizers.conf Active: activating (auto-restart) (Result: exit-code) since Thu 2015-08-06 10:38:08 UTC; 2s ago Process: 1472 ExecStart=/usr/bin/mesos-init-wrapper slave *(code=exited, status=1/FAILURE)* Main PID: 1472 (code=exited, status=1/FAILURE) Aug 06 10:38:08 ip-172-31-35-167.ec2.internal mesos-slave[1483]: *If strict=false, any expected errors (e.g., slave cannot recover* Aug 06 10:38:08 ip-172-31-35-167.ec2.internal mesos-slave[1483]: *information about an executor, because the slave died right before* Aug 06 10:38:08 ip-172-31-35-167.ec2.internal mesos-slave[1483]: *the executor registered.) during recovery are ignored and as much* Aug 06 10:38:08 ip-172-31-35-167.ec2.internal mesos-slave[1483]: *state as possible is recovered.* Aug 06 10:38:08 ip-172-31-35-167.ec2.internal mesos-slave[1483]: *(default: true)* Aug 06 10:38:08 ip-172-31-35-167.ec2.internal mesos-slave[1483]: *--[no-]switch_user Whether to run tasks as the user who* Aug 06 10:38:08 ip-172-31-35-167.ec2.internal mesos-slave[1483]: *submitted them rather than the user running* Aug 06 10:38:08 ip-172-31-35-167.ec2.internal mesos-slave[1483]: *the slave (requires setuid permission) (default: true)* Aug 06 10:38:08 ip-172-31-35-167.ec2.internal mesos-slave[1483]: *--[no-]version Show version and exit. (default: false)* Aug 06 10:38:08 ip-172-31-35-167.ec2.internal mesos-slave[1483]: *--work_dir=VALUE Directory path to place framework work directories* I've also run strace against it, nothing sticks out: strace systemctl start mesos-slave.service execve(/bin/systemctl, [systemctl, start, mesos-slave.service], [/* 18 vars */]) = 0 brk(0) = 0x7f5c2af9f000 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5c2a5c6000 access(/etc/ld.so.preload, R_OK) = -1 ENOENT (No such file or directory) open(/etc/ld.so.cache, O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=20940, ...}) = 0 mmap(NULL, 20940, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f5c2a5c close(3)= 0 open(/lib64/libsystemd-daemon.so.0, O_RDONLY|O_CLOEXEC) = 3 read(3, \177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0\0\1\0\0\0\240\r\0\0\0\0\0\0..., 832) = 832 fstat(3, {st_mode=S_IFREG|0755, st_size=15216, ...}) = 0 mmap(NULL, 2109448, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f5c2a1a2000 mprotect(0x7f5c2a1a4000, 2097152, PROT_NONE) = 0 mmap(0x7f5c2a3a4000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2000) = 0x7f5c2a3a4000 mmap(0x7f5c2a3a5000, 8, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f5c2a3a5000 close(3)= 0 open(/lib64/libdbus-1.so.3, O_RDONLY|O_CLOEXEC) = 3 read(3, \177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0\0\1\0\0\0@x\0\0\0\0\0\0..., 832) = 832 fstat(3, {st_mode=S_IFREG|0755, st_size=304536, ...}) = 0 mmap(NULL, 2390496, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f5c29f5a000 mprotect(0x7f5c29fa, 2097152, PROT_NONE) = 0 mmap(0x7f5c2a1a, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x46000) = 0x7f5c2a1a close(3)= 0 open(/lib64/librt.so.1, O_RDONLY|O_CLOEXEC) = 3 read(3, \177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0\0\1\0\0\0\300\\0\0\0\0\0\0..., 832) = 832 fstat(3, {st_mode=S_IFREG|0755, st_size=44088, ...}) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5c2a5bf000 mmap(NULL, 2128952, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f5c29d52000 mprotect(0x7f5c29d59000, 2093056, PROT_NONE) = 0 mmap(0x7f5c29f58000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x6000) = 0x7f5c29f58000 close(3)= 0 open(/lib64/libselinux.so.1, O_RDONLY|O_CLOEXEC) = 3 read(3, \177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0\0\1\0\0\0\240d\0\0\0\0\0\0..., 832) = 832 fstat(3, {st_mode=S_IFREG|0755, st_size=147120, ...}) = 0 mmap(NULL, 2246784, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f5c29b2d000
Re: Mesos slave help
Hi Klaus, I have attached all from a master and a slave. I've replicated the problem over and over again, not sure what to make of it. First registration is fine but then if I reboot the service for mesos-slave (process restart of full server restart) it never connects again. The VM's are in the same VPC on AWS with an open security group between them. On Thu, Aug 6, 2015 at 12:41 PM, Klaus Ma kl...@cguru.net wrote: Hi Stephen, Would you share the log of master slave? Thanks Klaus On 2015年08月06日 16:07, Stephen Knight wrote: Hi, I was wondering if anyone can help me. I have a test setup, 1 master/zookeeper and 2 slaves on Ubuntu 14.04. When I initialize the slaves the first time it all works and they register with the master (I can see it on x.x.x.x:5050) but when I reboot those slaves for any reason, they never re-register. Am I missing something? Thx -- --- Stephen Knight Infrastructure Consultant Pivotal Services @ EMC +971 (0)56 538 2071 skni...@pivotal.io stephen.knig...@emc.com Pivotal.io Notice of Confidentiality - This email message is for the sole use of the intended recipient and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. mesos-master.ip-172-31-35-166.ec2.internal.invalid-user.log.ERROR.20150806-094757.1193 Description: Binary data mesos-master.ip-172-31-35-166.ec2.internal.invalid-user.log.INFO.20150806-094058.1193 Description: Binary data mesos-master.ip-172-31-35-166.ec2.internal.invalid-user.log.WARNING.20150806-094058.1193 Description: Binary data mesos-slave.ip-172-31-35-168.ec2.internal.invalid-user.log.INFO.20150806-09.2257 Description: Binary data mesos-slave.ip-172-31-35-168.ec2.internal.invalid-user.log.WARNING.20150806-09.2257 Description: Binary data
Re: Mesos slave help
Or you could try systemctl cat mesos-slave.service and show us the file content. On Thu, Aug 6, 2015 at 6:49 PM, haosdent haosd...@gmail.com wrote: From this message, I think systemctl status mesos-slave.service -l run mesos-slave with uncorrect flags. And the status out of it is the help message of slave. Could you try to start mesos-slave in manual way? Not through systemctl. On Thu, Aug 6, 2015 at 6:41 PM, Stephen Knight skni...@pivotal.io wrote: systemctl gives me the following output on CentOS: The command to start I ran was systemctl start mesos-slave.service [root@ip-172-31-35-167 mesos]# systemctl status mesos-slave.service -l mesos-slave.service - Mesos Slave Loaded: loaded (/usr/lib/systemd/system/mesos-slave.service; enabled) Drop-In: /etc/systemd/system/mesos-slave.service.d └─mesos-slave-containerizers.conf Active: activating (auto-restart) (Result: exit-code) since Thu 2015-08-06 10:38:08 UTC; 2s ago Process: 1472 ExecStart=/usr/bin/mesos-init-wrapper slave *(code=exited, status=1/FAILURE)* Main PID: 1472 (code=exited, status=1/FAILURE) Aug 06 10:38:08 ip-172-31-35-167.ec2.internal mesos-slave[1483]: *If strict=false, any expected errors (e.g., slave cannot recover* Aug 06 10:38:08 ip-172-31-35-167.ec2.internal mesos-slave[1483]: *information about an executor, because the slave died right before* Aug 06 10:38:08 ip-172-31-35-167.ec2.internal mesos-slave[1483]: *the executor registered.) during recovery are ignored and as much* Aug 06 10:38:08 ip-172-31-35-167.ec2.internal mesos-slave[1483]: *state as possible is recovered.* Aug 06 10:38:08 ip-172-31-35-167.ec2.internal mesos-slave[1483]: *(default: true)* Aug 06 10:38:08 ip-172-31-35-167.ec2.internal mesos-slave[1483]: *--[no-]switch_user Whether to run tasks as the user who* Aug 06 10:38:08 ip-172-31-35-167.ec2.internal mesos-slave[1483]: *submitted them rather than the user running* Aug 06 10:38:08 ip-172-31-35-167.ec2.internal mesos-slave[1483]: *the slave (requires setuid permission) (default: true)* Aug 06 10:38:08 ip-172-31-35-167.ec2.internal mesos-slave[1483]: *--[no-]version Show version and exit. (default: false)* Aug 06 10:38:08 ip-172-31-35-167.ec2.internal mesos-slave[1483]: *--work_dir=VALUE Directory path to place framework work directories* I've also run strace against it, nothing sticks out: strace systemctl start mesos-slave.service execve(/bin/systemctl, [systemctl, start, mesos-slave.service], [/* 18 vars */]) = 0 brk(0) = 0x7f5c2af9f000 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5c2a5c6000 access(/etc/ld.so.preload, R_OK) = -1 ENOENT (No such file or directory) open(/etc/ld.so.cache, O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=20940, ...}) = 0 mmap(NULL, 20940, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f5c2a5c close(3)= 0 open(/lib64/libsystemd-daemon.so.0, O_RDONLY|O_CLOEXEC) = 3 read(3, \177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0\0\1\0\0\0\240\r\0\0\0\0\0\0..., 832) = 832 fstat(3, {st_mode=S_IFREG|0755, st_size=15216, ...}) = 0 mmap(NULL, 2109448, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f5c2a1a2000 mprotect(0x7f5c2a1a4000, 2097152, PROT_NONE) = 0 mmap(0x7f5c2a3a4000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2000) = 0x7f5c2a3a4000 mmap(0x7f5c2a3a5000, 8, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f5c2a3a5000 close(3)= 0 open(/lib64/libdbus-1.so.3, O_RDONLY|O_CLOEXEC) = 3 read(3, \177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0\0\1\0\0\0@x\0\0\0\0\0\0..., 832) = 832 fstat(3, {st_mode=S_IFREG|0755, st_size=304536, ...}) = 0 mmap(NULL, 2390496, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f5c29f5a000 mprotect(0x7f5c29fa, 2097152, PROT_NONE) = 0 mmap(0x7f5c2a1a, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x46000) = 0x7f5c2a1a close(3)= 0 open(/lib64/librt.so.1, O_RDONLY|O_CLOEXEC) = 3 read(3, \177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0\0\1\0\0\0\300\\0\0\0\0\0\0..., 832) = 832 fstat(3, {st_mode=S_IFREG|0755, st_size=44088, ...}) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5c2a5bf000 mmap(NULL, 2128952, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f5c29d52000 mprotect(0x7f5c29d59000, 2093056, PROT_NONE) = 0 mmap(0x7f5c29f58000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x6000) = 0x7f5c29f58000 close(3)= 0 open(/lib64/libselinux.so.1, O_RDONLY|O_CLOEXEC) = 3 read(3, \177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0\0\1\0\0\0\240d\0\0\0\0\0\0..., 832) = 832 fstat(3,
Re: Mesos slave help
The work_dir was set to /tmp/mesos by default, I've deleted it and tried to start the slave again. The dir is not being recreated now, just a continual service failure. On Thu, Aug 6, 2015 at 2:59 PM, craig w codecr...@gmail.com wrote: Have you tried clearing out the data in the slave's work_dir? for example if work dir is /var/mesos, rm -rf /var/mesos/* then start the slave? On Thu, Aug 6, 2015 at 6:55 AM, Stephen Knight skni...@pivotal.io wrote: My system doesn't support cat with systemctl for some reason but here is the contents of /usr/lib/systemd/system/mesos-slave.service [Unit] Description=Mesos Slave After=network.target Wants=network.target [Service] ExecStart=/usr/bin/mesos-init-wrapper slave KillMode=process Restart=always RestartSec=20 LimitNOFILE=16384 CPUAccounting=true MemoryAccounting=true [Install] WantedBy=multi-user.target What are the required flags to start it manually? On Thu, Aug 6, 2015 at 2:51 PM, haosdent haosd...@gmail.com wrote: Or you could try systemctl cat mesos-slave.service and show us the file content. On Thu, Aug 6, 2015 at 6:49 PM, haosdent haosd...@gmail.com wrote: From this message, I think systemctl status mesos-slave.service -l run mesos-slave with uncorrect flags. And the status out of it is the help message of slave. Could you try to start mesos-slave in manual way? Not through systemctl. On Thu, Aug 6, 2015 at 6:41 PM, Stephen Knight skni...@pivotal.io wrote: systemctl gives me the following output on CentOS: The command to start I ran was systemctl start mesos-slave.service [root@ip-172-31-35-167 mesos]# systemctl status mesos-slave.service -l mesos-slave.service - Mesos Slave Loaded: loaded (/usr/lib/systemd/system/mesos-slave.service; enabled) Drop-In: /etc/systemd/system/mesos-slave.service.d └─mesos-slave-containerizers.conf Active: activating (auto-restart) (Result: exit-code) since Thu 2015-08-06 10:38:08 UTC; 2s ago Process: 1472 ExecStart=/usr/bin/mesos-init-wrapper slave *(code=exited, status=1/FAILURE)* Main PID: 1472 (code=exited, status=1/FAILURE) Aug 06 10:38:08 ip-172-31-35-167.ec2.internal mesos-slave[1483]: *If strict=false, any expected errors (e.g., slave cannot recover* Aug 06 10:38:08 ip-172-31-35-167.ec2.internal mesos-slave[1483]: *information about an executor, because the slave died right before* Aug 06 10:38:08 ip-172-31-35-167.ec2.internal mesos-slave[1483]: *the executor registered.) during recovery are ignored and as much* Aug 06 10:38:08 ip-172-31-35-167.ec2.internal mesos-slave[1483]: *state as possible is recovered.* Aug 06 10:38:08 ip-172-31-35-167.ec2.internal mesos-slave[1483]: *(default: true)* Aug 06 10:38:08 ip-172-31-35-167.ec2.internal mesos-slave[1483]: *--[no-]switch_user Whether to run tasks as the user who* Aug 06 10:38:08 ip-172-31-35-167.ec2.internal mesos-slave[1483]: *submitted them rather than the user running* Aug 06 10:38:08 ip-172-31-35-167.ec2.internal mesos-slave[1483]: *the slave (requires setuid permission) (default: true)* Aug 06 10:38:08 ip-172-31-35-167.ec2.internal mesos-slave[1483]: *--[no-]version Show version and exit. (default: false)* Aug 06 10:38:08 ip-172-31-35-167.ec2.internal mesos-slave[1483]: *--work_dir=VALUE Directory path to place framework work directories* I've also run strace against it, nothing sticks out: strace systemctl start mesos-slave.service execve(/bin/systemctl, [systemctl, start, mesos-slave.service], [/* 18 vars */]) = 0 brk(0) = 0x7f5c2af9f000 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5c2a5c6000 access(/etc/ld.so.preload, R_OK) = -1 ENOENT (No such file or directory) open(/etc/ld.so.cache, O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=20940, ...}) = 0 mmap(NULL, 20940, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f5c2a5c close(3)= 0 open(/lib64/libsystemd-daemon.so.0, O_RDONLY|O_CLOEXEC) = 3 read(3, \177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0\0\1\0\0\0\240\r\0\0\0\0\0\0..., 832) = 832 fstat(3, {st_mode=S_IFREG|0755, st_size=15216, ...}) = 0 mmap(NULL, 2109448, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f5c2a1a2000 mprotect(0x7f5c2a1a4000, 2097152, PROT_NONE) = 0 mmap(0x7f5c2a3a4000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2000) = 0x7f5c2a3a4000 mmap(0x7f5c2a3a5000, 8, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f5c2a3a5000 close(3)= 0 open(/lib64/libdbus-1.so.3, O_RDONLY|O_CLOEXEC) = 3 read(3, \177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0\0\1\0\0\0@x\0\0\0\0\0\0..., 832) = 832 fstat(3, {st_mode=S_IFREG|0755, st_size=304536, ...}) = 0 mmap(NULL, 2390496, PROT_READ|PROT_EXEC,
Re: Mesos slave help
Maybe you could report a issue to https://github.com/mesosphere/mesos-deb-packaging , I afraid the package way have some problem. On Thu, Aug 6, 2015 at 7:12 PM, Stephen Knight skni...@pivotal.io wrote: Ok, that's working if I run it like this: /usr/sbin/mesos-slave --master=zk://172.31.x.x:2181/mesos /dev/null 21 Thanks for your help, really appreciate it. On Thu, Aug 6, 2015 at 3:03 PM, haosdent haosd...@gmail.com wrote: Hm, need pass your master location, for example: /usr/sbin/mesos-slave --master=x.x.x.x:5050 if you use zookeeper, need use the format like: /usr/sbin/mesos-slave --master=zk://host1:port1,host2:port2,.../path On Thu, Aug 6, 2015 at 6:55 PM, Stephen Knight skni...@pivotal.io wrote: My system doesn't support cat with systemctl for some reason but here is the contents of /usr/lib/systemd/system/mesos-slave.service [Unit] Description=Mesos Slave After=network.target Wants=network.target [Service] ExecStart=/usr/bin/mesos-init-wrapper slave KillMode=process Restart=always RestartSec=20 LimitNOFILE=16384 CPUAccounting=true MemoryAccounting=true [Install] WantedBy=multi-user.target What are the required flags to start it manually? On Thu, Aug 6, 2015 at 2:51 PM, haosdent haosd...@gmail.com wrote: Or you could try systemctl cat mesos-slave.service and show us the file content. On Thu, Aug 6, 2015 at 6:49 PM, haosdent haosd...@gmail.com wrote: From this message, I think systemctl status mesos-slave.service -l run mesos-slave with uncorrect flags. And the status out of it is the help message of slave. Could you try to start mesos-slave in manual way? Not through systemctl. On Thu, Aug 6, 2015 at 6:41 PM, Stephen Knight skni...@pivotal.io wrote: systemctl gives me the following output on CentOS: The command to start I ran was systemctl start mesos-slave.service [root@ip-172-31-35-167 mesos]# systemctl status mesos-slave.service -l mesos-slave.service - Mesos Slave Loaded: loaded (/usr/lib/systemd/system/mesos-slave.service; enabled) Drop-In: /etc/systemd/system/mesos-slave.service.d └─mesos-slave-containerizers.conf Active: activating (auto-restart) (Result: exit-code) since Thu 2015-08-06 10:38:08 UTC; 2s ago Process: 1472 ExecStart=/usr/bin/mesos-init-wrapper slave *(code=exited, status=1/FAILURE)* Main PID: 1472 (code=exited, status=1/FAILURE) Aug 06 10:38:08 ip-172-31-35-167.ec2.internal mesos-slave[1483]: *If strict=false, any expected errors (e.g., slave cannot recover* Aug 06 10:38:08 ip-172-31-35-167.ec2.internal mesos-slave[1483]: *information about an executor, because the slave died right before* Aug 06 10:38:08 ip-172-31-35-167.ec2.internal mesos-slave[1483]: *the executor registered.) during recovery are ignored and as much* Aug 06 10:38:08 ip-172-31-35-167.ec2.internal mesos-slave[1483]: *state as possible is recovered.* Aug 06 10:38:08 ip-172-31-35-167.ec2.internal mesos-slave[1483]: *(default: true)* Aug 06 10:38:08 ip-172-31-35-167.ec2.internal mesos-slave[1483]: *--[no-]switch_user Whether to run tasks as the user who* Aug 06 10:38:08 ip-172-31-35-167.ec2.internal mesos-slave[1483]: *submitted them rather than the user running* Aug 06 10:38:08 ip-172-31-35-167.ec2.internal mesos-slave[1483]: *the slave (requires setuid permission) (default: true)* Aug 06 10:38:08 ip-172-31-35-167.ec2.internal mesos-slave[1483]: *--[no-]version Show version and exit. (default: false)* Aug 06 10:38:08 ip-172-31-35-167.ec2.internal mesos-slave[1483]: *--work_dir=VALUE Directory path to place framework work directories* I've also run strace against it, nothing sticks out: strace systemctl start mesos-slave.service execve(/bin/systemctl, [systemctl, start, mesos-slave.service], [/* 18 vars */]) = 0 brk(0) = 0x7f5c2af9f000 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5c2a5c6000 access(/etc/ld.so.preload, R_OK) = -1 ENOENT (No such file or directory) open(/etc/ld.so.cache, O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=20940, ...}) = 0 mmap(NULL, 20940, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f5c2a5c close(3)= 0 open(/lib64/libsystemd-daemon.so.0, O_RDONLY|O_CLOEXEC) = 3 read(3, \177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0\0\1\0\0\0\240\r\0\0\0\0\0\0..., 832) = 832 fstat(3, {st_mode=S_IFREG|0755, st_size=15216, ...}) = 0 mmap(NULL, 2109448, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f5c2a1a2000 mprotect(0x7f5c2a1a4000, 2097152, PROT_NONE) = 0 mmap(0x7f5c2a3a4000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2000) = 0x7f5c2a3a4000 mmap(0x7f5c2a3a5000, 8, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f5c2a3a5000 close(3)
Re: Mesos slave help
Hm, need pass your master location, for example: /usr/sbin/mesos-slave --master=x.x.x.x:5050 if you use zookeeper, need use the format like: /usr/sbin/mesos-slave --master=zk://host1:port1,host2:port2,.../path On Thu, Aug 6, 2015 at 6:55 PM, Stephen Knight skni...@pivotal.io wrote: My system doesn't support cat with systemctl for some reason but here is the contents of /usr/lib/systemd/system/mesos-slave.service [Unit] Description=Mesos Slave After=network.target Wants=network.target [Service] ExecStart=/usr/bin/mesos-init-wrapper slave KillMode=process Restart=always RestartSec=20 LimitNOFILE=16384 CPUAccounting=true MemoryAccounting=true [Install] WantedBy=multi-user.target What are the required flags to start it manually? On Thu, Aug 6, 2015 at 2:51 PM, haosdent haosd...@gmail.com wrote: Or you could try systemctl cat mesos-slave.service and show us the file content. On Thu, Aug 6, 2015 at 6:49 PM, haosdent haosd...@gmail.com wrote: From this message, I think systemctl status mesos-slave.service -l run mesos-slave with uncorrect flags. And the status out of it is the help message of slave. Could you try to start mesos-slave in manual way? Not through systemctl. On Thu, Aug 6, 2015 at 6:41 PM, Stephen Knight skni...@pivotal.io wrote: systemctl gives me the following output on CentOS: The command to start I ran was systemctl start mesos-slave.service [root@ip-172-31-35-167 mesos]# systemctl status mesos-slave.service -l mesos-slave.service - Mesos Slave Loaded: loaded (/usr/lib/systemd/system/mesos-slave.service; enabled) Drop-In: /etc/systemd/system/mesos-slave.service.d └─mesos-slave-containerizers.conf Active: activating (auto-restart) (Result: exit-code) since Thu 2015-08-06 10:38:08 UTC; 2s ago Process: 1472 ExecStart=/usr/bin/mesos-init-wrapper slave *(code=exited, status=1/FAILURE)* Main PID: 1472 (code=exited, status=1/FAILURE) Aug 06 10:38:08 ip-172-31-35-167.ec2.internal mesos-slave[1483]: *If strict=false, any expected errors (e.g., slave cannot recover* Aug 06 10:38:08 ip-172-31-35-167.ec2.internal mesos-slave[1483]: *information about an executor, because the slave died right before* Aug 06 10:38:08 ip-172-31-35-167.ec2.internal mesos-slave[1483]: *the executor registered.) during recovery are ignored and as much* Aug 06 10:38:08 ip-172-31-35-167.ec2.internal mesos-slave[1483]: *state as possible is recovered.* Aug 06 10:38:08 ip-172-31-35-167.ec2.internal mesos-slave[1483]: *(default: true)* Aug 06 10:38:08 ip-172-31-35-167.ec2.internal mesos-slave[1483]: *--[no-]switch_user Whether to run tasks as the user who* Aug 06 10:38:08 ip-172-31-35-167.ec2.internal mesos-slave[1483]: *submitted them rather than the user running* Aug 06 10:38:08 ip-172-31-35-167.ec2.internal mesos-slave[1483]: *the slave (requires setuid permission) (default: true)* Aug 06 10:38:08 ip-172-31-35-167.ec2.internal mesos-slave[1483]: *--[no-]version Show version and exit. (default: false)* Aug 06 10:38:08 ip-172-31-35-167.ec2.internal mesos-slave[1483]: *--work_dir=VALUE Directory path to place framework work directories* I've also run strace against it, nothing sticks out: strace systemctl start mesos-slave.service execve(/bin/systemctl, [systemctl, start, mesos-slave.service], [/* 18 vars */]) = 0 brk(0) = 0x7f5c2af9f000 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5c2a5c6000 access(/etc/ld.so.preload, R_OK) = -1 ENOENT (No such file or directory) open(/etc/ld.so.cache, O_RDONLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=20940, ...}) = 0 mmap(NULL, 20940, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f5c2a5c close(3)= 0 open(/lib64/libsystemd-daemon.so.0, O_RDONLY|O_CLOEXEC) = 3 read(3, \177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0\0\1\0\0\0\240\r\0\0\0\0\0\0..., 832) = 832 fstat(3, {st_mode=S_IFREG|0755, st_size=15216, ...}) = 0 mmap(NULL, 2109448, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f5c2a1a2000 mprotect(0x7f5c2a1a4000, 2097152, PROT_NONE) = 0 mmap(0x7f5c2a3a4000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2000) = 0x7f5c2a3a4000 mmap(0x7f5c2a3a5000, 8, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f5c2a3a5000 close(3)= 0 open(/lib64/libdbus-1.so.3, O_RDONLY|O_CLOEXEC) = 3 read(3, \177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0\0\1\0\0\0@x\0\0\0\0\0\0..., 832) = 832 fstat(3, {st_mode=S_IFREG|0755, st_size=304536, ...}) = 0 mmap(NULL, 2390496, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f5c29f5a000 mprotect(0x7f5c29fa, 2097152, PROT_NONE) = 0 mmap(0x7f5c2a1a, 8192, PROT_READ|PROT_WRITE,
Mesos slave help
Hi, I was wondering if anyone can help me. I have a test setup, 1 master/zookeeper and 2 slaves on Ubuntu 14.04. When I initialize the slaves the first time it all works and they register with the master (I can see it on x.x.x.x:5050) but when I reboot those slaves for any reason, they never re-register. Am I missing something? Thx -- --- Stephen Knight Infrastructure Consultant Pivotal Services @ EMC +971 (0)56 538 2071 skni...@pivotal.io stephen.knig...@emc.com Pivotal.io Notice of Confidentiality - This email message is for the sole use of the intended recipient and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.