[jira] [Commented] (MESOS-2842) Update FrameworkInfo.principal on framework re-registration
[ https://issues.apache.org/jira/browse/MESOS-2842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15997958#comment-15997958 ] Lei Xu commented on MESOS-2842: --- We meet the same problem this week, and the logs show us "Check failed: metrics->frameworks.contains(principal.get())", I think this issue may be the root cause of our problem. > Update FrameworkInfo.principal on framework re-registration > --- > > Key: MESOS-2842 > URL: https://issues.apache.org/jira/browse/MESOS-2842 > Project: Mesos > Issue Type: Bug >Reporter: Vinod Kone >Priority: Critical > Labels: security > > From the design doc: > This is a bit involved because ‘principal’ is used for authentication and > rate limiting. > The authentication part is straightforward because a framework with updated > ‘principal’ should authenticate with the new ‘principal’ before being allowed > to re-register. The ‘authenticated’ map already gets updated when the > framework disconnects and reconnects, so it is fine. > For rate limiting, Master:failoverFramework() needs to be changed to update > the principal in ‘frameworks.principals’ map and also remove the metrics for > the old principal if there are no other frameworks with this principal > (similar to what we do in Master::removeFramework()). > The Master::visit() and Master::_visit() should work with the current > semantics. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (MESOS-2842) Update FrameworkInfo.principal on framework re-registration
[ https://issues.apache.org/jira/browse/MESOS-2842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15997958#comment-15997958 ] Lei Xu edited comment on MESOS-2842 at 5/5/17 8:57 AM: --- We meet the same problem this week, and the logs show us "Check failed: metrics->frameworks.contains(principal.get())", I think this issue may be the root cause of our problem. Our mesos version is 0.28.2 was (Author: brickxu): We meet the same problem this week, and the logs show us "Check failed: metrics->frameworks.contains(principal.get())", I think this issue may be the root cause of our problem. > Update FrameworkInfo.principal on framework re-registration > --- > > Key: MESOS-2842 > URL: https://issues.apache.org/jira/browse/MESOS-2842 > Project: Mesos > Issue Type: Bug >Reporter: Vinod Kone >Priority: Critical > Labels: security > > From the design doc: > This is a bit involved because ‘principal’ is used for authentication and > rate limiting. > The authentication part is straightforward because a framework with updated > ‘principal’ should authenticate with the new ‘principal’ before being allowed > to re-register. The ‘authenticated’ map already gets updated when the > framework disconnects and reconnects, so it is fine. > For rate limiting, Master:failoverFramework() needs to be changed to update > the principal in ‘frameworks.principals’ map and also remove the metrics for > the old principal if there are no other frameworks with this principal > (similar to what we do in Master::removeFramework()). > The Master::visit() and Master::_visit() should work with the current > semantics. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (MESOS-6738) Mesos master help message gives unformatted documents.
[ https://issues.apache.org/jira/browse/MESOS-6738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei Xu updated MESOS-6738: -- Priority: Trivial (was: Minor) > Mesos master help message gives unformatted documents. > -- > > Key: MESOS-6738 > URL: https://issues.apache.org/jira/browse/MESOS-6738 > Project: Mesos > Issue Type: Bug > Components: cli >Affects Versions: 1.1.0 > Environment: Mesos 1.1.0 > Ubuntu 16.04 >Reporter: Lei Xu >Priority: Trivial > Attachments: mesos_agent_help_message.png, > mesos_master_help_message.png > > > build mesos from the release tarball and running the following command: > {code} > mesos master --help > {code} > it gives unformatted docs, but the slave/agent's help message is OK. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6738) Mesos master help message gives unformatted documents.
[ https://issues.apache.org/jira/browse/MESOS-6738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei Xu updated MESOS-6738: -- Attachment: mesos_agent_help_message.png mesos_master_help_message.png > Mesos master help message gives unformatted documents. > -- > > Key: MESOS-6738 > URL: https://issues.apache.org/jira/browse/MESOS-6738 > Project: Mesos > Issue Type: Bug > Components: cli >Affects Versions: 1.1.0 > Environment: Mesos 1.1.0 > Ubuntu 16.04 >Reporter: Lei Xu >Priority: Minor > Attachments: mesos_agent_help_message.png, > mesos_master_help_message.png > > > build mesos from the release tarball and running the following command: > {code} > mesos master --help > {code} > it gives unformatted docs, but the slave/agent's help message is OK. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6738) Mesos master help message gives unformatted documents.
[ https://issues.apache.org/jira/browse/MESOS-6738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei Xu updated MESOS-6738: -- Description: build mesos from the release tarball and running the following command: {code} mesos master --help {code} it gives unformatted docs, but the slave/agent's help message is OK. was: build mesos from the release tarball and running the following command: {code} mesos master --help {code} it gives unformatted docs, but the slave/agent's help message is OK. {code} Usage: mesos-master [options] --acls=VALUE The value could be a JSON-formatted string of ACLs or a file path containing the JSON-formatted ACLs used for authorization. Path could be of the form `file:///path/to/file` or `/path/to/file`. Note that if the flag `--authorizers` is provided with a value different than `local`, the ACLs contents will be ignored. See the ACLs protobuf in acls.proto for the expected format. Example: { "register_frameworks": [ { "principals": { "type": "ANY" }, "roles": { "values": ["a"] } } ], "run_tasks": [ { "principals": { "values": ["a", "b"] }, "users": { "values": ["c"] } } ], "teardown_frameworks": [ { "principals": { "values": ["a", "b"] }, "framework_principals": { "values": ["c"] } } ], "set_quotas": [ { "principals": { "values": ["a"] }, "roles": { "values": ["a", "b"] } } ], "remove_quotas": [ { "principals": { "values": ["a"] }, "quota_principals": { "values": ["a"] }
[jira] [Created] (MESOS-6738) Mesos master help message gives unformatted documents.
Lei Xu created MESOS-6738: - Summary: Mesos master help message gives unformatted documents. Key: MESOS-6738 URL: https://issues.apache.org/jira/browse/MESOS-6738 Project: Mesos Issue Type: Bug Components: cli Affects Versions: 1.1.0 Environment: Mesos 1.1.0 Ubuntu 16.04 Reporter: Lei Xu Priority: Minor build mesos from the release tarball and running the following command: {code} mesos master --help {code} it gives unformatted docs, but the slave/agent's help message is OK. {code} Usage: mesos-master [options] --acls=VALUE The value could be a JSON-formatted string of ACLs or a file path containing the JSON-formatted ACLs used for authorization. Path could be of the form `file:///path/to/file` or `/path/to/file`. Note that if the flag `--authorizers` is provided with a value different than `local`, the ACLs contents will be ignored. See the ACLs protobuf in acls.proto for the expected format. Example: { "register_frameworks": [ { "principals": { "type": "ANY" }, "roles": { "values": ["a"] } } ], "run_tasks": [ { "principals": { "values": ["a", "b"] }, "users": { "values": ["c"] } } ], "teardown_frameworks": [ { "principals": { "values": ["a", "b"] }, "framework_principals": { "values": ["c"] } } ], "set_quotas": [ { "principals": { "values": ["a"] }, "roles": { "values": ["a", "b"] } } ], "remove_quotas": [ { "principals": { "values": ["a"] },
[jira] [Updated] (MESOS-6615) Running mesos-slave in the docker that leave many zombie process
[ https://issues.apache.org/jira/browse/MESOS-6615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei Xu updated MESOS-6615: -- Component/s: containerization > Running mesos-slave in the docker that leave many zombie process > > > Key: MESOS-6615 > URL: https://issues.apache.org/jira/browse/MESOS-6615 > Project: Mesos > Issue Type: Bug > Components: containerization, slave >Affects Versions: 0.28.2 > Environment: Mesos 0.28.2 > Docker 1.12.1 >Reporter: Lei Xu >Priority: Critical > > Here are some zombie process if I run mesos-slave in the docker. > {code} > root 10547 19464 0 Oct25 ?00:00:00 [docker] > root 14505 19464 0 Oct25 ?00:00:00 [docker] > root 16069 19464 0 Oct25 ?00:00:00 [docker] > root 19962 19464 0 Oct25 ?00:00:00 [docker] > root 23346 19464 0 Oct25 ?00:00:00 [docker] > root 24544 19464 0 Oct25 ?00:00:00 [docker] > {code} > And I find the zombies come from {{mesos-slave}} process: > {code} > pstree -p -s 10547 > systemd(1)───docker-containe(19448)───mesos-slave(19464)───docker(10547) > {code} > The logs has been deleted by the cron job a few weeks ago, but I remember so > many {{Failed to shutdown socket with fd xx: Transport endpoint is not > connected}} in the log. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-6615) Running mesos-slave in the docker that leave many zombie process
Lei Xu created MESOS-6615: - Summary: Running mesos-slave in the docker that leave many zombie process Key: MESOS-6615 URL: https://issues.apache.org/jira/browse/MESOS-6615 Project: Mesos Issue Type: Bug Components: slave Affects Versions: 0.28.2 Environment: Mesos 0.28.2 Docker 1.12.1 Reporter: Lei Xu Priority: Critical Here are some zombie process if I run mesos-slave in the docker. {code} root 10547 19464 0 Oct25 ?00:00:00 [docker] root 14505 19464 0 Oct25 ?00:00:00 [docker] root 16069 19464 0 Oct25 ?00:00:00 [docker] root 19962 19464 0 Oct25 ?00:00:00 [docker] root 23346 19464 0 Oct25 ?00:00:00 [docker] root 24544 19464 0 Oct25 ?00:00:00 [docker] {code} And I find the zombies come from {{mesos-slave}} process: {code} pstree -p -s 10547 systemd(1)───docker-containe(19448)───mesos-slave(19464)───docker(10547) {code} The logs has been deleted by the cron job a few weeks ago, but I remember so many {{Failed to shutdown socket with fd xx: Transport endpoint is not connected}} in the log. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5540) Support building with non-GNU libc
[ https://issues.apache.org/jira/browse/MESOS-5540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15682353#comment-15682353 ] Lei Xu commented on MESOS-5540: --- join this thread, very useful issue for building mesos in the musl-libc. thanks all ! > Support building with non-GNU libc > -- > > Key: MESOS-5540 > URL: https://issues.apache.org/jira/browse/MESOS-5540 > Project: Mesos > Issue Type: Improvement > Components: general >Reporter: Neil Conway >Assignee: Neil Conway >Priority: Minor > Labels: mesosphere > Fix For: 1.0.0 > > > Some Linux distributions don't use glibc -- e.g., Alpine Linux uses musl. > Mesos currently fails to compile using musl for at least the following two > reasons: > 1. {{linux/fs.hpp}} includes {{fstab.h}}, which isn't provided by musl. > 2. various places use {{fts.h}}, which isn't provided by musl > For (1), it seems this functionality is only needed by > {{FsTest.FileSystemTableRead}}, so I think it can be safely removed. > For (2), there are standalone implementations of the FTS functions, e.g., > https://github.com/pullmoll/musl-fts/ . We could either vendor such an > implementation or require the user to install an FTS implementation as a > library (e.g., https://pkgs.alpinelinux.org/package/edge/main/x86_64/fts). If > we do the latter, we'd need to be prepared to link against {{libfts.a}} if > needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6410) Fail to mount persistent volume when run mesos slave in docker
[ https://issues.apache.org/jira/browse/MESOS-6410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15590575#comment-15590575 ] Lei Xu commented on MESOS-6410: --- Hi [~haosd...@gmail.com], It's OK now with `--privileged=true`, thanks very much. > Fail to mount persistent volume when run mesos slave in docker > -- > > Key: MESOS-6410 > URL: https://issues.apache.org/jira/browse/MESOS-6410 > Project: Mesos > Issue Type: Bug > Components: containerization, volumes >Affects Versions: 0.28.2 > Environment: Mesos 0.28.2 > Docker 1.12.1 >Reporter: Lei Xu >Priority: Critical > > Here are some error logs from the slave: > {code} > E1018 07:52:06.18692630 slave.cpp:3758] Container > 'fbfd5e46-4460-45af-bd64-e03e8664f575' for executor > 'storm_nimbus_mpubpushsmart.d > 60e9066-94ec-11e6-99ff-0242d43b0395' of framework > 06ccc047-7137-41ef-a4ac-4090b9cd9e42-0023 failed to start: Failed to mount > persistent > volume from > '/var/lib/mesos/volumes/roles/storm/storm_nimbus_mpubpushsmart#tmp#d60e4245-94ec-11e6-99ff-0242d43b0395' > to '/var/lib/meso > s/slaves/06ccc047-7137-41ef-a4ac-4090b9cd9e42-S45/frameworks/06ccc047-7137-41ef-a4ac-4090b9cd9e42-0023/executors/storm_nimbus_mpubpushs > mart.d60e9066-94ec-11e6-99ff-0242d43b0395/runs/fbfd5e46-4460-45af-bd64-e03e8664f575/tmp': > Operation not permitted > E1018 07:52:09.91687725 slave.cpp:3758] Container > 'bb8ca08b-1cbf-450d-93e2-18a6322cb5be' for executor > 'storm_nimbus_mpubpushsmart.d > 60e9066-94ec-11e6-99ff-0242d43b0395' of framework > 06ccc047-7137-41ef-a4ac-4090b9cd9e42-0023 failed to start: Failed to mount > persistent > volume from > '/var/lib/mesos/volumes/roles/storm/storm_nimbus_mpubpushsmart#tmp#d60e4245-94ec-11e6-99ff-0242d43b0395' > to '/var/lib/meso > s/slaves/06ccc047-7137-41ef-a4ac-4090b9cd9e42-S45/frameworks/06ccc047-7137-41ef-a4ac-4090b9cd9e42-0023/executors/storm_nimbus_mpubpushs > mart.d60e9066-94ec-11e6-99ff-0242d43b0395/runs/bb8ca08b-1cbf-450d-93e2-18a6322cb5be/tmp': > Operation not permitted > {code} > But out of the docker, the mesos slave works OK with the persistent volumes -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6410) Fail to mount persistent volume when run mesos slave in docker
[ https://issues.apache.org/jira/browse/MESOS-6410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei Xu updated MESOS-6410: -- Description: Here are some error logs from the slave: {quote} E1018 07:52:06.18692630 slave.cpp:3758] Container 'fbfd5e46-4460-45af-bd64-e03e8664f575' for executor 'storm_nimbus_mpubpushsmart.d 60e9066-94ec-11e6-99ff-0242d43b0395' of framework 06ccc047-7137-41ef-a4ac-4090b9cd9e42-0023 failed to start: Failed to mount persistent volume from '/var/lib/mesos/volumes/roles/storm/storm_nimbus_mpubpushsmart#tmp#d60e4245-94ec-11e6-99ff-0242d43b0395' to '/var/lib/meso s/slaves/06ccc047-7137-41ef-a4ac-4090b9cd9e42-S45/frameworks/06ccc047-7137-41ef-a4ac-4090b9cd9e42-0023/executors/storm_nimbus_mpubpushs mart.d60e9066-94ec-11e6-99ff-0242d43b0395/runs/fbfd5e46-4460-45af-bd64-e03e8664f575/tmp': Operation not permitted E1018 07:52:09.91687725 slave.cpp:3758] Container 'bb8ca08b-1cbf-450d-93e2-18a6322cb5be' for executor 'storm_nimbus_mpubpushsmart.d 60e9066-94ec-11e6-99ff-0242d43b0395' of framework 06ccc047-7137-41ef-a4ac-4090b9cd9e42-0023 failed to start: Failed to mount persistent volume from '/var/lib/mesos/volumes/roles/storm/storm_nimbus_mpubpushsmart#tmp#d60e4245-94ec-11e6-99ff-0242d43b0395' to '/var/lib/meso s/slaves/06ccc047-7137-41ef-a4ac-4090b9cd9e42-S45/frameworks/06ccc047-7137-41ef-a4ac-4090b9cd9e42-0023/executors/storm_nimbus_mpubpushs mart.d60e9066-94ec-11e6-99ff-0242d43b0395/runs/bb8ca08b-1cbf-450d-93e2-18a6322cb5be/tmp': Operation not permitted {quote} But out of the docker, the mesos slave works OK with the persistent volumes was: Here are some error logs from the slave: {quote} E1018 07:52:06.18692630 slave.cpp:3758] Container 'fbfd5e46-4460-45af-bd64-e03e8664f575' for executor 'storm_nimbus_mpubpushsmart.d 60e9066-94ec-11e6-99ff-0242d43b0395' of framework 06ccc047-7137-41ef-a4ac-4090b9cd9e42-0023 failed to start: Failed to mount persistent volume from '/var/lib/mesos/volumes/roles/storm/storm_nimbus_mpubpushsmart#tmp#d60e4245-94ec-11e6-99ff-0242d43b0395' to '/var/lib/meso s/slaves/06ccc047-7137-41ef-a4ac-4090b9cd9e42-S45/frameworks/06ccc047-7137-41ef-a4ac-4090b9cd9e42-0023/executors/storm_nimbus_mpubpushs mart.d60e9066-94ec-11e6-99ff-0242d43b0395/runs/fbfd5e46-4460-45af-bd64-e03e8664f575/tmp': Operation not permitted E1018 07:52:09.91687725 slave.cpp:3758] Container 'bb8ca08b-1cbf-450d-93e2-18a6322cb5be' for executor 'storm_nimbus_mpubpushsmart.d 60e9066-94ec-11e6-99ff-0242d43b0395' of framework 06ccc047-7137-41ef-a4ac-4090b9cd9e42-0023 failed to start: Failed to mount persistent volume from '/var/lib/mesos/volumes/roles/storm/storm_nimbus_mpubpushsmart#tmp#d60e4245-94ec-11e6-99ff-0242d43b0395' to '/var/lib/meso s/slaves/06ccc047-7137-41ef-a4ac-4090b9cd9e42-S45/frameworks/06ccc047-7137-41ef-a4ac-4090b9cd9e42-0023/executors/storm_nimbus_mpubpushs mart.d60e9066-94ec-11e6-99ff-0242d43b0395/runs/bb8ca08b-1cbf-450d-93e2-18a6322cb5be/tmp': Operation not permitted {quote} > Fail to mount persistent volume when run mesos slave in docker > -- > > Key: MESOS-6410 > URL: https://issues.apache.org/jira/browse/MESOS-6410 > Project: Mesos > Issue Type: Bug > Components: containerization, volumes >Affects Versions: 0.28.2 > Environment: Mesos 0.28.2 > Docker 1.12.1 >Reporter: Lei Xu >Priority: Critical > > Here are some error logs from the slave: > {quote} > E1018 07:52:06.18692630 slave.cpp:3758] Container > 'fbfd5e46-4460-45af-bd64-e03e8664f575' for executor > 'storm_nimbus_mpubpushsmart.d > 60e9066-94ec-11e6-99ff-0242d43b0395' of framework > 06ccc047-7137-41ef-a4ac-4090b9cd9e42-0023 failed to start: Failed to mount > persistent > volume from > '/var/lib/mesos/volumes/roles/storm/storm_nimbus_mpubpushsmart#tmp#d60e4245-94ec-11e6-99ff-0242d43b0395' > to '/var/lib/meso > s/slaves/06ccc047-7137-41ef-a4ac-4090b9cd9e42-S45/frameworks/06ccc047-7137-41ef-a4ac-4090b9cd9e42-0023/executors/storm_nimbus_mpubpushs > mart.d60e9066-94ec-11e6-99ff-0242d43b0395/runs/fbfd5e46-4460-45af-bd64-e03e8664f575/tmp': > Operation not permitted > E1018 07:52:09.91687725 slave.cpp:3758] Container > 'bb8ca08b-1cbf-450d-93e2-18a6322cb5be' for executor > 'storm_nimbus_mpubpushsmart.d > 60e9066-94ec-11e6-99ff-0242d43b0395' of framework > 06ccc047-7137-41ef-a4ac-4090b9cd9e42-0023 failed to start: Failed to mount > persistent > volume from > '/var/lib/mesos/volumes/roles/storm/storm_nimbus_mpubpushsmart#tmp#d60e4245-94ec-11e6-99ff-0242d43b0395' > to '/var/lib/meso > s/slaves/06ccc047-7137-41ef-a4ac-4090b9cd9e42-S45/frameworks/06ccc047-7137-41ef-a4ac-4090b9cd9e42-0023/executors/storm_nimbus_mpubpushs > mart.d60e9066-94ec-11e6-99ff-0242d43b0395/runs/bb8ca08b-1cbf-450d-93e2-18a6322cb5be/tmp': > Operation not permitted > {quote} >
[jira] [Updated] (MESOS-6410) Fail to mount persistent volume when run mesos slave in docker
[ https://issues.apache.org/jira/browse/MESOS-6410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei Xu updated MESOS-6410: -- Description: Here are some error logs from the slave: {quote} E1018 07:52:06.18692630 slave.cpp:3758] Container 'fbfd5e46-4460-45af-bd64-e03e8664f575' for executor 'storm_nimbus_mpubpushsmart.d 60e9066-94ec-11e6-99ff-0242d43b0395' of framework 06ccc047-7137-41ef-a4ac-4090b9cd9e42-0023 failed to start: Failed to mount persistent volume from '/var/lib/mesos/volumes/roles/storm/storm_nimbus_mpubpushsmart#tmp#d60e4245-94ec-11e6-99ff-0242d43b0395' to '/var/lib/meso s/slaves/06ccc047-7137-41ef-a4ac-4090b9cd9e42-S45/frameworks/06ccc047-7137-41ef-a4ac-4090b9cd9e42-0023/executors/storm_nimbus_mpubpushs mart.d60e9066-94ec-11e6-99ff-0242d43b0395/runs/fbfd5e46-4460-45af-bd64-e03e8664f575/tmp': Operation not permitted E1018 07:52:09.91687725 slave.cpp:3758] Container 'bb8ca08b-1cbf-450d-93e2-18a6322cb5be' for executor 'storm_nimbus_mpubpushsmart.d 60e9066-94ec-11e6-99ff-0242d43b0395' of framework 06ccc047-7137-41ef-a4ac-4090b9cd9e42-0023 failed to start: Failed to mount persistent volume from '/var/lib/mesos/volumes/roles/storm/storm_nimbus_mpubpushsmart#tmp#d60e4245-94ec-11e6-99ff-0242d43b0395' to '/var/lib/meso s/slaves/06ccc047-7137-41ef-a4ac-4090b9cd9e42-S45/frameworks/06ccc047-7137-41ef-a4ac-4090b9cd9e42-0023/executors/storm_nimbus_mpubpushs mart.d60e9066-94ec-11e6-99ff-0242d43b0395/runs/bb8ca08b-1cbf-450d-93e2-18a6322cb5be/tmp': Operation not permitted {quote} > Fail to mount persistent volume when run mesos slave in docker > -- > > Key: MESOS-6410 > URL: https://issues.apache.org/jira/browse/MESOS-6410 > Project: Mesos > Issue Type: Bug > Components: containerization, volumes >Affects Versions: 0.28.2 > Environment: Mesos 0.28.2 > Docker 1.12.1 >Reporter: Lei Xu >Priority: Critical > > Here are some error logs from the slave: > {quote} > E1018 07:52:06.18692630 slave.cpp:3758] Container > 'fbfd5e46-4460-45af-bd64-e03e8664f575' for executor > 'storm_nimbus_mpubpushsmart.d > 60e9066-94ec-11e6-99ff-0242d43b0395' of framework > 06ccc047-7137-41ef-a4ac-4090b9cd9e42-0023 failed to start: Failed to mount > persistent > volume from > '/var/lib/mesos/volumes/roles/storm/storm_nimbus_mpubpushsmart#tmp#d60e4245-94ec-11e6-99ff-0242d43b0395' > to '/var/lib/meso > s/slaves/06ccc047-7137-41ef-a4ac-4090b9cd9e42-S45/frameworks/06ccc047-7137-41ef-a4ac-4090b9cd9e42-0023/executors/storm_nimbus_mpubpushs > mart.d60e9066-94ec-11e6-99ff-0242d43b0395/runs/fbfd5e46-4460-45af-bd64-e03e8664f575/tmp': > Operation not permitted > E1018 07:52:09.91687725 slave.cpp:3758] Container > 'bb8ca08b-1cbf-450d-93e2-18a6322cb5be' for executor > 'storm_nimbus_mpubpushsmart.d > 60e9066-94ec-11e6-99ff-0242d43b0395' of framework > 06ccc047-7137-41ef-a4ac-4090b9cd9e42-0023 failed to start: Failed to mount > persistent > volume from > '/var/lib/mesos/volumes/roles/storm/storm_nimbus_mpubpushsmart#tmp#d60e4245-94ec-11e6-99ff-0242d43b0395' > to '/var/lib/meso > s/slaves/06ccc047-7137-41ef-a4ac-4090b9cd9e42-S45/frameworks/06ccc047-7137-41ef-a4ac-4090b9cd9e42-0023/executors/storm_nimbus_mpubpushs > mart.d60e9066-94ec-11e6-99ff-0242d43b0395/runs/bb8ca08b-1cbf-450d-93e2-18a6322cb5be/tmp': > Operation not permitted > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-6410) Fail to mount persistent volume when run mesos slave in docker
Lei Xu created MESOS-6410: - Summary: Fail to mount persistent volume when run mesos slave in docker Key: MESOS-6410 URL: https://issues.apache.org/jira/browse/MESOS-6410 Project: Mesos Issue Type: Bug Components: containerization, volumes Affects Versions: 0.28.2 Environment: Mesos 0.28.2 Docker 1.12.1 Reporter: Lei Xu Priority: Critical -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-6200) Hope mesos support soft and hard cpu/memory resource in the task
[ https://issues.apache.org/jira/browse/MESOS-6200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15500642#comment-15500642 ] Lei Xu commented on MESOS-6200: --- yes, but it is a little tricky. I still hope the executor could do all the things with the `resource` field, the user focus on the soft/hard resource limit and the executor set the resource with the correct cmd options or cgroup file value. > Hope mesos support soft and hard cpu/memory resource in the task > > > Key: MESOS-6200 > URL: https://issues.apache.org/jira/browse/MESOS-6200 > Project: Mesos > Issue Type: Improvement > Components: cgroups, containerization, docker, scheduler api >Affects Versions: 0.28.2 > Environment: CentOS 7 > Kernel 3.10.0-327.28.3.el7.x86_64 > Mesos 0.28.2 > Docker 1.11.2 >Reporter: Lei Xu > > The Docker executor maybe could support soft/hard resource limit to enable > more flexible resources sharing among the applications. > || || CPU || Memory || > | hard limit| --cpu-period & --cpu-quota | --memory & --memory-swap| > | soft limit| --cpu-shares | --memory-reservation| > And now the task protobuf message has only one resource struct that used to > describe the cgroup limit, and the docker executor handle is like the > following, only --memory and --cpu-shares were set: > {code} > if (resources.isSome()) { > // TODO(yifan): Support other resources (e.g. disk). > Option cpus = resources.get().cpus(); > if (cpus.isSome()) { > uint64_t cpuShare = > std::max((uint64_t) (CPU_SHARES_PER_CPU * cpus.get()), > MIN_CPU_SHARES); > argv.push_back("--cpu-shares"); > argv.push_back(stringify(cpuShare)); > } > Option mem = resources.get().mem(); > if (mem.isSome()) { > Bytes memLimit = std::max(mem.get(), MIN_MEMORY); > argv.push_back("--memory"); > argv.push_back(stringify(memLimit.bytes())); > } > } > {code} > I hope that the executor and the protobuf message could separate the resource > to the two parts: soft and hard. Then the user could set 2 levels resource > limits for the docker. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6200) Hope mesos support soft and hard cpu/memory resource in the task
[ https://issues.apache.org/jira/browse/MESOS-6200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei Xu updated MESOS-6200: -- Description: The Docker executor maybe could support soft/hard resource limit to enable more flexible resources sharing among the applications. || || CPU || Memory || | hard limit| --cpu-period & --cpu-quota | --memory & --memory-swap| | soft limit| --cpu-shares | --memory-reservation| And now the task protobuf message has only one resource struct that used to describe the cgroup limit, and the docker executor handle is like the following, only --memory and --cpu-shares were set: {code} if (resources.isSome()) { // TODO(yifan): Support other resources (e.g. disk). Option cpus = resources.get().cpus(); if (cpus.isSome()) { uint64_t cpuShare = std::max((uint64_t) (CPU_SHARES_PER_CPU * cpus.get()), MIN_CPU_SHARES); argv.push_back("--cpu-shares"); argv.push_back(stringify(cpuShare)); } Option mem = resources.get().mem(); if (mem.isSome()) { Bytes memLimit = std::max(mem.get(), MIN_MEMORY); argv.push_back("--memory"); argv.push_back(stringify(memLimit.bytes())); } } {code} I hope that the executor and the protobuf message could separate the resource to the two parts: soft and hard. Then the user could set 2 levels resource limits for the docker. was: The Docker executor maybe could support soft/hard resource limit to enable more flexible resources sharing among the applications. || || CPU || Memory || | hard limit| --cpu-period & --cpu-quota | --memory & --memory-swap| | soft limit| --cpu-shares | --memory-reservation| And now the task protobuf message has only one resource struct that used to describe the cgroup limit, and the docker executor handle is like the following: {code} if (resources.isSome()) { // TODO(yifan): Support other resources (e.g. disk). Option cpus = resources.get().cpus(); if (cpus.isSome()) { uint64_t cpuShare = std::max((uint64_t) (CPU_SHARES_PER_CPU * cpus.get()), MIN_CPU_SHARES); argv.push_back("--cpu-shares"); argv.push_back(stringify(cpuShare)); } Option mem = resources.get().mem(); if (mem.isSome()) { Bytes memLimit = std::max(mem.get(), MIN_MEMORY); argv.push_back("--memory"); argv.push_back(stringify(memLimit.bytes())); } } {code} I hope that the executor and the protobuf message could separate the resource to the two parts: soft and hard. Then the user could set 2 levels resource limits for the docker. > Hope mesos support soft and hard cpu/memory resource in the task > > > Key: MESOS-6200 > URL: https://issues.apache.org/jira/browse/MESOS-6200 > Project: Mesos > Issue Type: Improvement > Components: cgroups, containerization, docker, scheduler api >Affects Versions: 0.28.2 > Environment: CentOS 7 > Kernel 3.10.0-327.28.3.el7.x86_64 > Mesos 0.28.2 > Docker 1.11.2 >Reporter: Lei Xu > > The Docker executor maybe could support soft/hard resource limit to enable > more flexible resources sharing among the applications. > || || CPU || Memory || > | hard limit| --cpu-period & --cpu-quota | --memory & --memory-swap| > | soft limit| --cpu-shares | --memory-reservation| > And now the task protobuf message has only one resource struct that used to > describe the cgroup limit, and the docker executor handle is like the > following, only --memory and --cpu-shares were set: > {code} > if (resources.isSome()) { > // TODO(yifan): Support other resources (e.g. disk). > Option cpus = resources.get().cpus(); > if (cpus.isSome()) { > uint64_t cpuShare = > std::max((uint64_t) (CPU_SHARES_PER_CPU * cpus.get()), > MIN_CPU_SHARES); > argv.push_back("--cpu-shares"); > argv.push_back(stringify(cpuShare)); > } > Option mem = resources.get().mem(); > if (mem.isSome()) { > Bytes memLimit = std::max(mem.get(), MIN_MEMORY); > argv.push_back("--memory"); > argv.push_back(stringify(memLimit.bytes())); > } > } > {code} > I hope that the executor and the protobuf message could separate the resource > to the two parts: soft and hard. Then the user could set 2 levels resource > limits for the docker. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6200) Hop mesos support soft and hard cpu/memory resource in the task
[ https://issues.apache.org/jira/browse/MESOS-6200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei Xu updated MESOS-6200: -- Summary: Hop mesos support soft and hard cpu/memory resource in the task (was: Hop mesos support request/limit resource in the task) > Hop mesos support soft and hard cpu/memory resource in the task > --- > > Key: MESOS-6200 > URL: https://issues.apache.org/jira/browse/MESOS-6200 > Project: Mesos > Issue Type: Improvement > Components: cgroups, containerization, docker, scheduler api >Affects Versions: 0.28.2 > Environment: CentOS 7 > Kernel 3.10.0-327.28.3.el7.x86_64 > Mesos 0.28.2 > Docker 1.11.2 >Reporter: Lei Xu > > The Docker executor maybe could support soft/hard resource limit to enable > more flexible resources sharing among the applications. > || || CPU || Memory || > | hard limit| --cpu-shares| --memory & --memory-swap| > | soft limit| --cpu-period & --cpu-quota | --memory-reservation| > And now the task protobuf message has only one resource struct that used to > describe the cgroup limit, and the docker executor handle is like the > following: > {code} > if (resources.isSome()) { > // TODO(yifan): Support other resources (e.g. disk). > Option cpus = resources.get().cpus(); > if (cpus.isSome()) { > uint64_t cpuShare = > std::max((uint64_t) (CPU_SHARES_PER_CPU * cpus.get()), > MIN_CPU_SHARES); > argv.push_back("--cpu-shares"); > argv.push_back(stringify(cpuShare)); > } > Option mem = resources.get().mem(); > if (mem.isSome()) { > Bytes memLimit = std::max(mem.get(), MIN_MEMORY); > argv.push_back("--memory"); > argv.push_back(stringify(memLimit.bytes())); > } > } > {code} > I hope that the executor and the protobuf message could separate the resource > to the two parts: soft and hard. Then the user could set 2 levels resource > limits for the docker. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6200) Hope mesos support soft and hard cpu/memory resource in the task
[ https://issues.apache.org/jira/browse/MESOS-6200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei Xu updated MESOS-6200: -- Description: The Docker executor maybe could support soft/hard resource limit to enable more flexible resources sharing among the applications. || || CPU || Memory || | hard limit| --cpu-period & --cpu-quota | --memory & --memory-swap| | soft limit| --cpu-shares | --memory-reservation| And now the task protobuf message has only one resource struct that used to describe the cgroup limit, and the docker executor handle is like the following: {code} if (resources.isSome()) { // TODO(yifan): Support other resources (e.g. disk). Option cpus = resources.get().cpus(); if (cpus.isSome()) { uint64_t cpuShare = std::max((uint64_t) (CPU_SHARES_PER_CPU * cpus.get()), MIN_CPU_SHARES); argv.push_back("--cpu-shares"); argv.push_back(stringify(cpuShare)); } Option mem = resources.get().mem(); if (mem.isSome()) { Bytes memLimit = std::max(mem.get(), MIN_MEMORY); argv.push_back("--memory"); argv.push_back(stringify(memLimit.bytes())); } } {code} I hope that the executor and the protobuf message could separate the resource to the two parts: soft and hard. Then the user could set 2 levels resource limits for the docker. was: The Docker executor maybe could support soft/hard resource limit to enable more flexible resources sharing among the applications. || || CPU || Memory || | hard limit| --cpu-shares| --memory & --memory-swap| | soft limit| --cpu-period & --cpu-quota | --memory-reservation| And now the task protobuf message has only one resource struct that used to describe the cgroup limit, and the docker executor handle is like the following: {code} if (resources.isSome()) { // TODO(yifan): Support other resources (e.g. disk). Option cpus = resources.get().cpus(); if (cpus.isSome()) { uint64_t cpuShare = std::max((uint64_t) (CPU_SHARES_PER_CPU * cpus.get()), MIN_CPU_SHARES); argv.push_back("--cpu-shares"); argv.push_back(stringify(cpuShare)); } Option mem = resources.get().mem(); if (mem.isSome()) { Bytes memLimit = std::max(mem.get(), MIN_MEMORY); argv.push_back("--memory"); argv.push_back(stringify(memLimit.bytes())); } } {code} I hope that the executor and the protobuf message could separate the resource to the two parts: soft and hard. Then the user could set 2 levels resource limits for the docker. > Hope mesos support soft and hard cpu/memory resource in the task > > > Key: MESOS-6200 > URL: https://issues.apache.org/jira/browse/MESOS-6200 > Project: Mesos > Issue Type: Improvement > Components: cgroups, containerization, docker, scheduler api >Affects Versions: 0.28.2 > Environment: CentOS 7 > Kernel 3.10.0-327.28.3.el7.x86_64 > Mesos 0.28.2 > Docker 1.11.2 >Reporter: Lei Xu > > The Docker executor maybe could support soft/hard resource limit to enable > more flexible resources sharing among the applications. > || || CPU || Memory || > | hard limit| --cpu-period & --cpu-quota | --memory & --memory-swap| > | soft limit| --cpu-shares | --memory-reservation| > And now the task protobuf message has only one resource struct that used to > describe the cgroup limit, and the docker executor handle is like the > following: > {code} > if (resources.isSome()) { > // TODO(yifan): Support other resources (e.g. disk). > Option cpus = resources.get().cpus(); > if (cpus.isSome()) { > uint64_t cpuShare = > std::max((uint64_t) (CPU_SHARES_PER_CPU * cpus.get()), > MIN_CPU_SHARES); > argv.push_back("--cpu-shares"); > argv.push_back(stringify(cpuShare)); > } > Option mem = resources.get().mem(); > if (mem.isSome()) { > Bytes memLimit = std::max(mem.get(), MIN_MEMORY); > argv.push_back("--memory"); > argv.push_back(stringify(memLimit.bytes())); > } > } > {code} > I hope that the executor and the protobuf message could separate the resource > to the two parts: soft and hard. Then the user could set 2 levels resource > limits for the docker. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MESOS-6200) Hope mesos support soft and hard cpu/memory resource in the task
[ https://issues.apache.org/jira/browse/MESOS-6200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei Xu updated MESOS-6200: -- Summary: Hope mesos support soft and hard cpu/memory resource in the task (was: Hop mesos support soft and hard cpu/memory resource in the task) > Hope mesos support soft and hard cpu/memory resource in the task > > > Key: MESOS-6200 > URL: https://issues.apache.org/jira/browse/MESOS-6200 > Project: Mesos > Issue Type: Improvement > Components: cgroups, containerization, docker, scheduler api >Affects Versions: 0.28.2 > Environment: CentOS 7 > Kernel 3.10.0-327.28.3.el7.x86_64 > Mesos 0.28.2 > Docker 1.11.2 >Reporter: Lei Xu > > The Docker executor maybe could support soft/hard resource limit to enable > more flexible resources sharing among the applications. > || || CPU || Memory || > | hard limit| --cpu-shares| --memory & --memory-swap| > | soft limit| --cpu-period & --cpu-quota | --memory-reservation| > And now the task protobuf message has only one resource struct that used to > describe the cgroup limit, and the docker executor handle is like the > following: > {code} > if (resources.isSome()) { > // TODO(yifan): Support other resources (e.g. disk). > Option cpus = resources.get().cpus(); > if (cpus.isSome()) { > uint64_t cpuShare = > std::max((uint64_t) (CPU_SHARES_PER_CPU * cpus.get()), > MIN_CPU_SHARES); > argv.push_back("--cpu-shares"); > argv.push_back(stringify(cpuShare)); > } > Option mem = resources.get().mem(); > if (mem.isSome()) { > Bytes memLimit = std::max(mem.get(), MIN_MEMORY); > argv.push_back("--memory"); > argv.push_back(stringify(memLimit.bytes())); > } > } > {code} > I hope that the executor and the protobuf message could separate the resource > to the two parts: soft and hard. Then the user could set 2 levels resource > limits for the docker. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-6200) Hop mesos support request/limit resource in the task
Lei Xu created MESOS-6200: - Summary: Hop mesos support request/limit resource in the task Key: MESOS-6200 URL: https://issues.apache.org/jira/browse/MESOS-6200 Project: Mesos Issue Type: Improvement Components: cgroups, containerization, docker, scheduler api Affects Versions: 0.28.2 Environment: CentOS 7 Kernel 3.10.0-327.28.3.el7.x86_64 Mesos 0.28.2 Docker 1.11.2 Reporter: Lei Xu The Docker executor maybe could support soft/hard resource limit to enable more flexible resources sharing among the applications. || || CPU || Memory || | hard limit| --cpu-shares| --memory & --memory-swap| | soft limit| --cpu-period & --cpu-quota | --memory-reservation| And now the task protobuf message has only one resource struct that used to describe the cgroup limit, and the docker executor handle is like the following: {code} if (resources.isSome()) { // TODO(yifan): Support other resources (e.g. disk). Option cpus = resources.get().cpus(); if (cpus.isSome()) { uint64_t cpuShare = std::max((uint64_t) (CPU_SHARES_PER_CPU * cpus.get()), MIN_CPU_SHARES); argv.push_back("--cpu-shares"); argv.push_back(stringify(cpuShare)); } Option mem = resources.get().mem(); if (mem.isSome()) { Bytes memLimit = std::max(mem.get(), MIN_MEMORY); argv.push_back("--memory"); argv.push_back(stringify(memLimit.bytes())); } } {code} I hope that the executor and the protobuf message could separate the resource to the two parts: soft and hard. Then the user could set 2 levels resource limits for the docker. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3325) Running mesos-slave@0.23 in a container causes slave to be lost after a restart
[ https://issues.apache.org/jira/browse/MESOS-3325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15411550#comment-15411550 ] Lei Xu commented on MESOS-3325: --- Hi, We hit this issue months ago, mesos agent always read boot_id from host os and re-generate the slave id and register with master, I remember here is a issue to track this, but I forget the issue id, you can give a boot id to the agent to make sure the slave id do not change when restart. > Running mesos-slave@0.23 in a container causes slave to be lost after a > restart > --- > > Key: MESOS-3325 > URL: https://issues.apache.org/jira/browse/MESOS-3325 > Project: Mesos > Issue Type: Bug > Components: slave >Affects Versions: 0.23.0 > Environment: CoreOS, Container, Docker >Reporter: Chris Fortier >Priority: Critical > > We are attempting to run mesos-slave 0.23 in a container. However it appears > that the mesos-slave agent registers as a new slave instead of > re-registering. This causes the formerly-launched tasks to continue running. > systemd unit being used: > ``` > [Unit] > Description=MesosSlave > After=docker.service dockercfg.service > Requires=docker.service dockercfg.service > [Service] > Environment=MESOS_IMAGE=mesosphere/mesos-slave:0.23.0-1.0.ubuntu1404 > Environment=ZOOKEEPER=redacted > User=core > KillMode=process > Restart=always > RestartSec=20 > TimeoutStartSec=0 > ExecStartPre=-/usr/bin/docker kill mesos_slave > ExecStartPre=-/usr/bin/docker rm mesos_slave > ExecStartPre=/usr/bin/docker pull ${MESOS_IMAGE} > ExecStart=/usr/bin/sh -c "sudo /usr/bin/docker run \ > --name=mesos_slave \ > --net=host \ > --pid=host \ > --privileged \ > -v /home/core/.dockercfg:/root/.dockercfg:ro \ > -v /sys:/sys \ > -v /usr/bin/docker:/usr/bin/docker:ro \ > -v /var/run/docker.sock:/var/run/docker.sock \ > -v /lib64/libdevmapper.so.1.02:/lib/libdevmapper.so.1.02:ro \ > -v /var/lib/mesos/slave:/var/lib/mesos/slave \ > ${MESOS_IMAGE} \ > --ip=`curl -s http://169.254.169.254/latest/meta-data/local-ipv4` \ > --attributes=zone:$(curl -s > http://169.254.169.254/latest/meta-data/placement/availability-zone)\;os:coreos > \ > --containerizers=docker,mesos \ > --executor_registration_timeout=10mins \ > --hostname=`curl -s > http://169.254.169.254/latest/meta-data/public-hostname` \ > --log_dir=/var/log/mesos \ > --master=zk://${ZOOKEEPER}/mesos \ > --work_dir=/var/lib/mesos/slave" > ExecStop=/usr/bin/docker stop mesos_slave > [Install] > WantedBy=multi-user.target > [X-Fleet] > Global=true > MachineMetadata=role=worker > ``` > ps, yes I saw the coreos-setup repo was deprecated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5914) mesos-docker-executor initialize many threads
[ https://issues.apache.org/jira/browse/MESOS-5914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15395209#comment-15395209 ] Lei Xu commented on MESOS-5914: --- Thanks :) > mesos-docker-executor initialize many threads > - > > Key: MESOS-5914 > URL: https://issues.apache.org/jira/browse/MESOS-5914 > Project: Mesos > Issue Type: Improvement > Components: containerization, libprocess >Affects Versions: 0.28.2 > Environment: CentOS7 > Kernel 4.6.2-1.el7.elrepo.x86_64 > Docker 1.11-2 >Reporter: Lei Xu >Priority: Minor > > I found mesos-docker-executor initialize many threads when running a docker > container task. Most of them seems not necessary. And I look up libprocess > github but found no way to reduce the threads number. > Is there any env variables to do with this ? > {code} >PID USER PR NIVIRTRESSHR S %CPU %MEM TIME+ COMMAND > > 16966 root 20 0 2639056 85108 80460 S 0.3 0.0 2:42.90 > mesos-docker-ex > 16979 root 20 0 2639056 85108 80460 S 0.3 0.0 2:43.59 > mesos-docker-ex > 17012 root 20 0 2639056 85108 80460 S 0.3 0.0 2:43.14 > mesos-docker-ex > 16954 root 20 0 2639056 85108 80460 S 0.0 0.0 0:00.03 > mesos-docker-ex > 16964 root 20 0 2639056 85108 80460 S 0.0 0.0 2:43.41 > mesos-docker-ex > 16965 root 20 0 2639056 85108 80460 S 0.0 0.0 2:42.95 > mesos-docker-ex > 16967 root 20 0 2639056 85108 80460 S 0.0 0.0 2:42.96 > mesos-docker-ex > 16968 root 20 0 2639056 85108 80460 S 0.0 0.0 2:43.17 > mesos-docker-ex > 16969 root 20 0 2639056 85108 80460 S 0.0 0.0 2:43.12 > mesos-docker-ex > 16970 root 20 0 2639056 85108 80460 S 0.0 0.0 2:43.37 > mesos-docker-ex > 16971 root 20 0 2639056 85108 80460 S 0.0 0.0 2:42.92 > mesos-docker-ex > 16972 root 20 0 2639056 85108 80460 S 0.0 0.0 2:42.70 > mesos-docker-ex > 16973 root 20 0 2639056 85108 80460 S 0.0 0.0 2:43.35 > mesos-docker-ex > 16974 root 20 0 2639056 85108 80460 S 0.0 0.0 2:43.59 > mesos-docker-ex > 16975 root 20 0 2639056 85108 80460 S 0.0 0.0 2:43.54 > mesos-docker-ex > 16976 root 20 0 2639056 85108 80460 S 0.0 0.0 2:43.47 > mesos-docker-ex > 16977 root 20 0 2639056 85108 80460 S 0.0 0.0 2:43.61 > mesos-docker-ex > 16980 root 20 0 2639056 85108 80460 S 0.0 0.0 2:43.44 > mesos-docker-ex > 16982 root 20 0 2639056 85108 80460 S 0.0 0.0 2:43.33 > mesos-docker-ex > 16984 root 20 0 2639056 85108 80460 S 0.0 0.0 2:43.06 > mesos-docker-ex > 16986 root 20 0 2639056 85108 80460 S 0.0 0.0 2:43.02 > mesos-docker-ex > 16988 root 20 0 2639056 85108 80460 S 0.0 0.0 2:42.80 > mesos-docker-ex > 16990 root 20 0 2639056 85108 80460 S 0.0 0.0 2:43.05 > mesos-docker-ex > 16992 root 20 0 2639056 85108 80460 S 0.0 0.0 2:43.40 > mesos-docker-ex > 16994 root 20 0 2639056 85108 80460 S 0.0 0.0 2:42.99 > mesos-docker-ex > 16996 root 20 0 2639056 85108 80460 S 0.0 0.0 2:42.93 > mesos-docker-ex > 16998 root 20 0 2639056 85108 80460 S 0.0 0.0 2:42.94 > mesos-docker-ex > 17000 root 20 0 2639056 85108 80460 S
[jira] [Created] (MESOS-5914) mesos-docker-executor initialize many threads
Lei Xu created MESOS-5914: - Summary: mesos-docker-executor initialize many threads Key: MESOS-5914 URL: https://issues.apache.org/jira/browse/MESOS-5914 Project: Mesos Issue Type: Improvement Components: containerization, libprocess Affects Versions: 0.28.2 Environment: CentOS7 Kernel 4.6.2-1.el7.elrepo.x86_64 Docker 1.11-2 Reporter: Lei Xu Priority: Minor I found mesos-docker-executor initialize many threads when running a docker container task. Most of them seems not necessary. And I look up libprocess github but found no way to reduce the threads number. Is there any env variables to do with this ? {code} PID USER PR NIVIRTRESSHR S %CPU %MEM TIME+ COMMAND 16966 root 20 0 2639056 85108 80460 S 0.3 0.0 2:42.90 mesos-docker-ex 16979 root 20 0 2639056 85108 80460 S 0.3 0.0 2:43.59 mesos-docker-ex 17012 root 20 0 2639056 85108 80460 S 0.3 0.0 2:43.14 mesos-docker-ex 16954 root 20 0 2639056 85108 80460 S 0.0 0.0 0:00.03 mesos-docker-ex 16964 root 20 0 2639056 85108 80460 S 0.0 0.0 2:43.41 mesos-docker-ex 16965 root 20 0 2639056 85108 80460 S 0.0 0.0 2:42.95 mesos-docker-ex 16967 root 20 0 2639056 85108 80460 S 0.0 0.0 2:42.96 mesos-docker-ex 16968 root 20 0 2639056 85108 80460 S 0.0 0.0 2:43.17 mesos-docker-ex 16969 root 20 0 2639056 85108 80460 S 0.0 0.0 2:43.12 mesos-docker-ex 16970 root 20 0 2639056 85108 80460 S 0.0 0.0 2:43.37 mesos-docker-ex 16971 root 20 0 2639056 85108 80460 S 0.0 0.0 2:42.92 mesos-docker-ex 16972 root 20 0 2639056 85108 80460 S 0.0 0.0 2:42.70 mesos-docker-ex 16973 root 20 0 2639056 85108 80460 S 0.0 0.0 2:43.35 mesos-docker-ex 16974 root 20 0 2639056 85108 80460 S 0.0 0.0 2:43.59 mesos-docker-ex 16975 root 20 0 2639056 85108 80460 S 0.0 0.0 2:43.54 mesos-docker-ex 16976 root 20 0 2639056 85108 80460 S 0.0 0.0 2:43.47 mesos-docker-ex 16977 root 20 0 2639056 85108 80460 S 0.0 0.0 2:43.61 mesos-docker-ex 16980 root 20 0 2639056 85108 80460 S 0.0 0.0 2:43.44 mesos-docker-ex 16982 root 20 0 2639056 85108 80460 S 0.0 0.0 2:43.33 mesos-docker-ex 16984 root 20 0 2639056 85108 80460 S 0.0 0.0 2:43.06 mesos-docker-ex 16986 root 20 0 2639056 85108 80460 S 0.0 0.0 2:43.02 mesos-docker-ex 16988 root 20 0 2639056 85108 80460 S 0.0 0.0 2:42.80 mesos-docker-ex 16990 root 20 0 2639056 85108 80460 S 0.0 0.0 2:43.05 mesos-docker-ex 16992 root 20 0 2639056 85108 80460 S 0.0 0.0 2:43.40 mesos-docker-ex 16994 root 20 0 2639056 85108 80460 S 0.0 0.0 2:42.99 mesos-docker-ex 16996 root 20 0 2639056 85108 80460 S 0.0 0.0 2:42.93 mesos-docker-ex 16998 root 20 0 2639056 85108 80460 S 0.0 0.0 2:42.94 mesos-docker-ex 17000 root 20 0 2639056 85108 80460 S 0.0 0.0 2:42.79 mesos-docker-ex 17002 root 20 0 2639056 85108 80460 S 0.0 0.0 2:43.28 mesos-docker-ex 17004 root 20 0 2639056 85108 80460 S 0.0 0.0 2:42.99 mesos-docker-ex {code} -- This message
[jira] [Commented] (MESOS-5544) Support running Mesos agent in a Docker container.
[ https://issues.apache.org/jira/browse/MESOS-5544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15395029#comment-15395029 ] Lei Xu commented on MESOS-5544: --- I've containerized mesos and running well without network namespace. > Support running Mesos agent in a Docker container. > -- > > Key: MESOS-5544 > URL: https://issues.apache.org/jira/browse/MESOS-5544 > Project: Mesos > Issue Type: Improvement >Reporter: Jie Yu > > Currently, this does not work if one tries to use Mesos containerizer. > The main problem is that we want to make sure the executor is not killed when > agent crashes. So we have to use --pid=host so that the agent is in the host > pid namespace. > But that is not sufficient, Docker daemon will put agent into all cgroups > available on the host. We need to make sure we migrate the executor pid out > of those cgroups so that when agent crashes, executors are not killed. > Also, when start the agent container, volumes need to be setup properly so > that any mounts under agent's work_dir will be propagate back to the host > mount table. This is to make sure we can recover those mounts after agent > restarts. This is also true for those mounts that are needed by some isolator > (e.g., network/cni isolator). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-5368) Consider introducing persistent agent ID
[ https://issues.apache.org/jira/browse/MESOS-5368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15391450#comment-15391450 ] Lei Xu commented on MESOS-5368: --- +1 > Consider introducing persistent agent ID > > > Key: MESOS-5368 > URL: https://issues.apache.org/jira/browse/MESOS-5368 > Project: Mesos > Issue Type: Improvement >Reporter: Neil Conway >Assignee: Abhishek Dasgupta > Labels: mesosphere > > Currently, agent IDs identify a single "session" by an agent: that is, an > agent receives an agent ID when it registers with the master; it reuses that > agent ID if it disconnects and successfully reregisters; if the agent shuts > down and restarts, it registers anew and receives a new agent ID. > It would be convenient to have a "persistent agent ID" that remains the same > for the duration of a given agent {{work_dir}}. This would mean that a given > persistent volume would not migrate between different agent IDs over time, > for example (see MESOS-4894). If we supported permanently removing an agent > from the cluster (i.e., the {{work_dir}} and any volumes used by the agent > will never be reused), we could use the persistent agent ID to report which > agent has been removed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-4297) Executor does not shutdown when framework teardown.
Lei Xu created MESOS-4297: - Summary: Executor does not shutdown when framework teardown. Key: MESOS-4297 URL: https://issues.apache.org/jira/browse/MESOS-4297 Project: Mesos Issue Type: Bug Components: framework Affects Versions: 0.25.0 Environment: Marathon 0.11.0 Mesos 0.25.0 Spark 1.5.2 Reporter: Lei Xu Priority: Critical We found a problem when teardown a Spark framework on Mesos, the executor could not exit and still running. {code} root 48548 48539 2 2015 ?04:28:11 /home/q/java/default/bin/java -cp /home/q/mesos/data/slaves/4d0f0fc7-99f4-4a9a-b5d5-6c25affcb4f1-S127/frameworks/20151228-163100-504125962-5050-31081-0016/executors/3/runs/ca324f08-5be9-4457-a2a7-56f2605d6027/spark-1.5.2-bin-2.2.0/conf/:/home/q/mesos/data/slaves/4d0f0fc7-99f4-4a9a-b5d5-6c25affcb4f1-S127/frameworks/20151228-163100-504125962-5050-31081-0016/executors/3/runs/ca324f08-5be9-4457-a2a7-56f2605d6027/spark-1.5.2-bin-2.2.0/lib/spark-assembly-1.5.2-hadoop2.2.0.jar -Xms8192m -Xmx8192m org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url akka.tcp://sparkDriver@10.90.27.71:47938/user/CoarseGrainedScheduler --executor-id 4d0f0fc7-99f4-4a9a-b5d5-6c25affcb4f1-S127/3 --hostname l-qosslave26.ops.cn2.qunar.com --cores 2 --app-id 20151228-163100-504125962-5050-31081-0016 root 48644 48348 0 2015 ?00:00:00 sh -c cd spark-1*; ./bin/spark-class org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url akka.tcp://sparkDriver@10.90.27.71:47938/user/CoarseGrainedScheduler --executor-id 4d0f0fc7-99f4-4a9a-b5d5-6c25affcb4f1-S127/5 --hostname l-qosslave26.ops.cn2.qunar.com --cores 2 --app-id 20151228-163100-504125962-5050-31081-0016 root 48645 48644 2 2015 ?04:28:45 /home/q/java/default/bin/java -cp /home/q/mesos/data/slaves/4d0f0fc7-99f4-4a9a-b5d5-6c25affcb4f1-S127/frameworks/20151228-163100-504125962-5050-31081-0016/executors/5/runs/851073c4-d225-426b-b1b5-3d294eb76f8e/spark-1.5.2-bin-2.2.0/conf/:/home/q/mesos/data/slaves/4d0f0fc7-99f4-4a9a-b5d5-6c25affcb4f1-S127/frameworks/20151228-163100-504125962-5050-31081-0016/executors/5/runs/851073c4-d225-426b-b1b5-3d294eb76f8e/spark-1.5.2-bin-2.2.0/lib/spark-assembly-1.5.2-hadoop2.2.0.jar -Xms8192m -Xmx8192m org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url akka.tcp://sparkDriver@10.90.27.71:47938/user/CoarseGrainedScheduler --executor-id 4d0f0fc7-99f4-4a9a-b5d5-6c25affcb4f1-S127/5 --hostname l-qosslave26.ops.cn2.qunar.com --cores 2 --app-id 20151228-163100-504125962-5050-31081-0016 {code} This framework {{20151228-163100-504125962-5050-31081-0016}} has already teardown a few days ago, And could not find in "Frameworks" page via webui. But in the slave page, I found it still registered with slave node and run some executors. And I try to use REST API to kill the framework again, it returns {{No framework found with specified ID}}. At last I killed the Spark task and mesos executor, there is no new task started by framework, but it still on this slave and does not exit. {code} Frameworks ID UserNameActive TasksCPUs (Used / Allocated) Mem (Used / Allocated) …5050-31081-0016 rootwireless-m_invocation_kylin 0 / 0.6 / 192 MB Executors ID NameSource Active TasksQueued TasksCPUs (Used / Allocated) Mem (Used / Allocated) 5 Command Executor (Task: 5) (Command: sh -c 'cd spark-1*;...') 5 0 0 / 0.1 / 32 MB Sandbox 4 Command Executor (Task: 4) (Command: sh -c 'cd spark-1*;...') 4 0 0 / 0.1 / 32 MB Sandbox 3 Command Executor (Task: 3) (Command: sh -c 'cd spark-1*;...') 3 0 0 / 0.1 / 32 MB Sandbox 2 Command Executor (Task: 2) (Command: sh -c 'cd spark-1*;...') 2 0 0 / 0.1 / 32 MB Sandbox 1 Command Executor (Task: 1) (Command: sh -c 'cd spark-1*;...') 1 0 0 / 0.1 / 32 MB Sandbox 0 Command Executor (Task: 0) (Command: sh -c 'cd spark-1*;...') 0 0 0 / 0.1 / 32 MB Sandbox {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-4297) Executor does not shutdown when framework teardown.
[ https://issues.apache.org/jira/browse/MESOS-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15085536#comment-15085536 ] Lei Xu commented on MESOS-4297: --- Here is some master logs when I kill task. {code} ./mesos-master.WARNING:W0106 19:47:12.636579 1548 master.cpp:4408] Ignoring status update TASK_FAILED (UUID: e897f8b4-358a-4bd3-a570-c5cedd5cd822) for task driver-20151230225518-0013 of framework 20151228-163100-504125962-5050-31081-0003 from slave 4d0f0fc7-99f4-4a9a-b5d5-6c25affcb4f1-S121 at slave(1)@10.90.27.71:5051 (l-qosslave20.ops.cn2.qunar.com) because the framework is unknown ./mesos-master.WARNING:W0106 19:47:52.453431 1547 master.cpp:4408] Ignoring status update TASK_FAILED (UUID: e897f8b4-358a-4bd3-a570-c5cedd5cd822) for task driver-20151230225518-0013 of framework 20151228-163100-504125962-5050-31081-0003 from slave 4d0f0fc7-99f4-4a9a-b5d5-6c25affcb4f1-S121 at slave(1)@10.90.27.71:5051 (l-qosslave20.ops.cn2.qunar.com) because the framework is unknown ./mesos-master.WARNING:W0106 19:49:12.115389 1550 master.cpp:4408] Ignoring status update TASK_FAILED (UUID: e897f8b4-358a-4bd3-a570-c5cedd5cd822) for task driver-20151230225518-0013 of framework 20151228-163100-504125962-5050-31081-0003 from slave 4d0f0fc7-99f4-4a9a-b5d5-6c25affcb4f1-S121 at slave(1)@10.90.27.71:5051 (l-qosslave20.ops.cn2.qunar.com) because the framework is unknown ./mesos-master.WARNING:W0106 19:51:52.144099 1543 master.cpp:4408] Ignoring status update TASK_FAILED (UUID: e897f8b4-358a-4bd3-a570-c5cedd5cd822) for task driver-20151230225518-0013 of framework 20151228-163100-504125962-5050-31081-0003 from slave 4d0f0fc7-99f4-4a9a-b5d5-6c25affcb4f1-S121 at slave(1)@10.90.27.71:5051 (l-qosslave20.ops.cn2.qunar.com) because the framework is unknown ./mesos-master.WARNING:W0106 19:52:39.169888 1549 master.cpp:4408] Ignoring status update TASK_FAILED (UUID: ab05e568-f04f-42dc-bdbe-40e19b421c95) for task driver-20151230223633-0011 of framework 20151228-163100-504125962-5050-31081-0003 from slave 4d0f0fc7-99f4-4a9a-b5d5-6c25affcb4f1-S126 at slave(1)@10.90.27.76:5051 (l-qosslave25.ops.cn2.qunar.com) because the framework is unknown ./mesos-master.WARNING:W0106 19:57:12.453138 1549 master.cpp:4408] Ignoring status update TASK_FAILED (UUID: e897f8b4-358a-4bd3-a570-c5cedd5cd822) for task driver-20151230225518-0013 of framework 20151228-163100-504125962-5050-31081-0003 from slave 4d0f0fc7-99f4-4a9a-b5d5-6c25affcb4f1-S121 at slave(1)@10.90.27.71:5051 (l-qosslave20.ops.cn2.qunar.com) because the framework is unknown ./mesos-master.WARNING:W0106 20:02:39.168820 1545 master.cpp:4408] Ignoring status update TASK_FAILED (UUID: ab05e568-f04f-42dc-bdbe-40e19b421c95) for task driver-20151230223633-0011 of framework 20151228-163100-504125962-5050-31081-0003 from slave 4d0f0fc7-99f4-4a9a-b5d5-6c25affcb4f1-S126 at slave(1)@10.90.27.76:5051 (l-qosslave25.ops.cn2.qunar.com) because the framework is unknown ./mesos-master.WARNING:W0106 20:07:12.110839 1548 master.cpp:4408] Ignoring status update TASK_FAILED (UUID: e897f8b4-358a-4bd3-a570-c5cedd5cd822) for task driver-20151230225518-0013 of framework 20151228-163100-504125962-5050-31081-0003 from slave 4d0f0fc7-99f4-4a9a-b5d5-6c25affcb4f1-S121 at slave(1)@10.90.27.71:5051 (l-qosslave20.ops.cn2.qunar.com) because the framework is unknown ./mesos-master.WARNING:W0106 20:12:39.215056 1543 master.cpp:4408] Ignoring status update TASK_FAILED (UUID: ab05e568-f04f-42dc-bdbe-40e19b421c95) for task driver-20151230223633-0011 of framework 20151228-163100-504125962-5050-31081-0003 from slave 4d0f0fc7-99f4-4a9a-b5d5-6c25affcb4f1-S126 at slave(1)@10.90.27.76:5051 (l-qosslave25.ops.cn2.qunar.com) because the framework is unknown {code} > Executor does not shutdown when framework teardown. > --- > > Key: MESOS-4297 > URL: https://issues.apache.org/jira/browse/MESOS-4297 > Project: Mesos > Issue Type: Bug > Components: framework >Affects Versions: 0.25.0 > Environment: Marathon 0.11.0 > Mesos 0.25.0 > Spark 1.5.2 >Reporter: Lei Xu >Priority: Critical > > We found a problem when teardown a Spark framework on Mesos, the executor > could not exit and still running. > {code} > root 48548 48539 2 2015 ?04:28:11 /home/q/java/default/bin/java > -cp > /home/q/mesos/data/slaves/4d0f0fc7-99f4-4a9a-b5d5-6c25affcb4f1-S127/frameworks/20151228-163100-504125962-5050-31081-0016/executors/3/runs/ca324f08-5be9-4457-a2a7-56f2605d6027/spark-1.5.2-bin-2.2.0/conf/:/home/q/mesos/data/slaves/4d0f0fc7-99f4-4a9a-b5d5-6c25affcb4f1-S127/frameworks/20151228-163100-504125962-5050-31081-0016/executors/3/runs/ca324f08-5be9-4457-a2a7-56f2605d6027/spark-1.5.2-bin-2.2.0/lib/spark-assembly-1.5.2-hadoop2.2.0.jar > -Xms8192m -Xmx8192m
[jira] [Commented] (MESOS-4299) Slave lives in two different cluster at the same time with different slave id
[ https://issues.apache.org/jira/browse/MESOS-4299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15085644#comment-15085644 ] Lei Xu commented on MESOS-4299: --- {{master/slaves}} response from Cluster B: {code} { "slaves": [ { "active": true, "attributes": { "apps": "logstash", "colo": "cn5", "type": "prod" }, "hostname": "l-bu128g9-10k10.ops.cn2.qunar.com", "id": "3e7ba6b1-29fd-44e8-9be2-f72896054ac6-S5", "pid": "slave(1)@10.90.5.23:5051", "registered_time": 1451990379.49813, "reregistered_time": 1452093251.39516, "resources": { "cpus": 32, "disk": 2728919, "mem": 128126, "ports": "[8100-1, 31000-32000]" } }, {code} > Slave lives in two different cluster at the same time with different slave id > - > > Key: MESOS-4299 > URL: https://issues.apache.org/jira/browse/MESOS-4299 > Project: Mesos > Issue Type: Bug > Components: master, webui >Affects Versions: 0.25.0 > Environment: Mesos 0.25.0 >Reporter: Lei Xu > > I've migrated some nodes from Cluster A to B, and today I found these nodes > lives both in Cluster A and B, and the here is the {{/master/slaves}} > response: > {code} > { > "slaves": [ > { > "active": false, > "attributes": { > "apps": "logstash", > "colo": "cn5", > "type": "prod" > }, > "hostname": "l-bu128g5-10k10.ops.cn2.qunar.com", > "id": "3e7ba6b1-29fd-44e8-9be2-f72896054ac6-S2", > "offered_resources": { > "cpus": 0, > "disk": 0, > "mem": 0 > }, > "pid": "slave(1)@10.90.5.19:5051", > "registered_time": 1451988622.66323, > "reserved_resources": {}, > "resources": { > "cpus": 32.0, > "disk": 2728919.0, > "mem": 128126.0, > "ports": "[8100-1, 31000-32000]" > }, > "unreserved_resources": { > "cpus": 32.0, > "disk": 2728919.0, > "mem": 128126.0, > "ports": "[8100-1, 31000-32000]" > }, > "used_resources": { > "cpus": 0, > "disk": 0, > "mem": 0 > } > }, > . > {code} > And the following is mesos slave logs: > {quote} > I0105 18:36:22.683724 6452 slave.cpp:2248] Updated checkpointed resources > from to > I0105 18:37:09.900497 6459 slave.cpp:3926] Current disk usage 0.06%. Max > allowed age: 1.798706758587755days > I0105 18:37:22.678374 6453 slave.cpp:3146] Master marked the slave as > disconnected but the slave considers itself registered! Forcing > re-registration. > I0105 18:37:22.678699 6453 slave.cpp:694] Re-detecting master > I0105 18:37:22.678715 6471 status_update_manager.cpp:176] Pausing sending > status updates > I0105 18:37:22.678753 6453 slave.cpp:741] Detecting new master > I0105 18:37:22.678977 6456 status_update_manager.cpp:176] Pausing sending > status updates > I0105 18:37:22.679047 6455 slave.cpp:705] New master detected at > master@10.88.169.195:5050 > I0105 18:37:22.679108 6455 slave.cpp:768] Authenticating with master > master@10.88.169.195:5050 > I0105 18:37:22.679136 6455 slave.cpp:773] Using default CRAM-MD5 > authenticatee > I0105 18:37:22.679239 6455 slave.cpp:741] Detecting new master > I0105 18:37:22.679354 6464 authenticatee.cpp:115] Creating new client SASL > connection > I0105 18:37:22.680883 6461 authenticatee.cpp:206] Received SASL > authentication mechanisms: CRAM-MD5 > I0105 18:37:22.680946 6461 authenticatee.cpp:232] Attempting to authenticate > with mechanism 'CRAM-MD5' > I0105 18:37:22.681759 6455 authenticatee.cpp:252] Received SASL > authentication step > I0105 18:37:22.682874 6454 authenticatee.cpp:292] Authentication success > I0105 18:37:22.682986 6441 slave.cpp:836] Successfully authenticated with > master master@10.88.169.195:5050 > I0105 18:37:22.684303 6454 slave.cpp:980] Re-registered with master > master@10.88.169.195:5050 > I0105 18:37:22.684455 6454 slave.cpp:1016] Forwarding total oversubscribed > resources > I0105 18:37:22.684471 6468 status_update_manager.cpp:183] Resuming sending > status updates > I0105 18:37:22.684649 6454 slave.cpp:2152] Updating framework > 20150610-204949-3299432458-5050-25057- pid to > scheduler-1bef8172-5068-44c6-93f5-e97a3910ed79@10.88.169.195:35708 > I0105 18:37:22.685025 6452 status_update_manager.cpp:183] Resuming sending > status updates > I0105 18:37:22.685117 6454 slave.cpp:2248] Updated checkpointed resources > from to > I0105 18:38:09.901587 6464 slave.cpp:3926] Current disk usage 0.06%. Max > allowed age: 1.798706755730266days > I0105 18:38:22.679468 6451 slave.cpp:3146] Master marked the slave as > disconnected but the slave considers itself registered!
[jira] [Commented] (MESOS-4299) Slave lives in two different cluster at the same time with different slave id
[ https://issues.apache.org/jira/browse/MESOS-4299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15085704#comment-15085704 ] Lei Xu commented on MESOS-4299: --- update: I stop the slave and remove all files in data_dir path, and restart the slave, it still shows the same logs above. How to clear up a slave node and join the cluster as a new one ? > Slave lives in two different cluster at the same time with different slave id > - > > Key: MESOS-4299 > URL: https://issues.apache.org/jira/browse/MESOS-4299 > Project: Mesos > Issue Type: Bug > Components: master, webui >Affects Versions: 0.25.0 > Environment: Mesos 0.25.0 >Reporter: Lei Xu > > I've migrated some nodes from Cluster A to B, and today I found these nodes > lives both in Cluster A and B, and the here is the {{/master/slaves}} > response: > {code} > { > "slaves": [ > { > "active": false, > "attributes": { > "apps": "logstash", > "colo": "cn5", > "type": "prod" > }, > "hostname": "l-bu128g5-10k10.ops.cn2.qunar.com", > "id": "3e7ba6b1-29fd-44e8-9be2-f72896054ac6-S2", > "offered_resources": { > "cpus": 0, > "disk": 0, > "mem": 0 > }, > "pid": "slave(1)@10.90.5.19:5051", > "registered_time": 1451988622.66323, > "reserved_resources": {}, > "resources": { > "cpus": 32.0, > "disk": 2728919.0, > "mem": 128126.0, > "ports": "[8100-1, 31000-32000]" > }, > "unreserved_resources": { > "cpus": 32.0, > "disk": 2728919.0, > "mem": 128126.0, > "ports": "[8100-1, 31000-32000]" > }, > "used_resources": { > "cpus": 0, > "disk": 0, > "mem": 0 > } > }, > . > {code} > And the following is mesos slave logs: > {quote} > I0105 18:36:22.683724 6452 slave.cpp:2248] Updated checkpointed resources > from to > I0105 18:37:09.900497 6459 slave.cpp:3926] Current disk usage 0.06%. Max > allowed age: 1.798706758587755days > I0105 18:37:22.678374 6453 slave.cpp:3146] Master marked the slave as > disconnected but the slave considers itself registered! Forcing > re-registration. > I0105 18:37:22.678699 6453 slave.cpp:694] Re-detecting master > I0105 18:37:22.678715 6471 status_update_manager.cpp:176] Pausing sending > status updates > I0105 18:37:22.678753 6453 slave.cpp:741] Detecting new master > I0105 18:37:22.678977 6456 status_update_manager.cpp:176] Pausing sending > status updates > I0105 18:37:22.679047 6455 slave.cpp:705] New master detected at > master@10.88.169.195:5050 > I0105 18:37:22.679108 6455 slave.cpp:768] Authenticating with master > master@10.88.169.195:5050 > I0105 18:37:22.679136 6455 slave.cpp:773] Using default CRAM-MD5 > authenticatee > I0105 18:37:22.679239 6455 slave.cpp:741] Detecting new master > I0105 18:37:22.679354 6464 authenticatee.cpp:115] Creating new client SASL > connection > I0105 18:37:22.680883 6461 authenticatee.cpp:206] Received SASL > authentication mechanisms: CRAM-MD5 > I0105 18:37:22.680946 6461 authenticatee.cpp:232] Attempting to authenticate > with mechanism 'CRAM-MD5' > I0105 18:37:22.681759 6455 authenticatee.cpp:252] Received SASL > authentication step > I0105 18:37:22.682874 6454 authenticatee.cpp:292] Authentication success > I0105 18:37:22.682986 6441 slave.cpp:836] Successfully authenticated with > master master@10.88.169.195:5050 > I0105 18:37:22.684303 6454 slave.cpp:980] Re-registered with master > master@10.88.169.195:5050 > I0105 18:37:22.684455 6454 slave.cpp:1016] Forwarding total oversubscribed > resources > I0105 18:37:22.684471 6468 status_update_manager.cpp:183] Resuming sending > status updates > I0105 18:37:22.684649 6454 slave.cpp:2152] Updating framework > 20150610-204949-3299432458-5050-25057- pid to > scheduler-1bef8172-5068-44c6-93f5-e97a3910ed79@10.88.169.195:35708 > I0105 18:37:22.685025 6452 status_update_manager.cpp:183] Resuming sending > status updates > I0105 18:37:22.685117 6454 slave.cpp:2248] Updated checkpointed resources > from to > I0105 18:38:09.901587 6464 slave.cpp:3926] Current disk usage 0.06%. Max > allowed age: 1.798706755730266days > I0105 18:38:22.679468 6451 slave.cpp:3146] Master marked the slave as > disconnected but the slave considers itself registered! Forcing > re-registration. > I0105 18:38:22.679739 6451 slave.cpp:694] Re-detecting master > I0105 18:38:22.679754 6453 status_update_manager.cpp:176] Pausing sending > status updates > I0105 18:38:22.679785 6451 slave.cpp:741] Detecting new master > I0105 18:38:22.680054 6461 slave.cpp:705] New master detected at > master@10.88.169.195:5050 > I0105 18:38:22.680106 6470
[jira] [Created] (MESOS-4299) Slave lives in two different cluster at the same time with different slave id
Lei Xu created MESOS-4299: - Summary: Slave lives in two different cluster at the same time with different slave id Key: MESOS-4299 URL: https://issues.apache.org/jira/browse/MESOS-4299 Project: Mesos Issue Type: Bug Components: master, webui Affects Versions: 0.25.0 Environment: Mesos 0.25.0 Reporter: Lei Xu I've migrated some nodes from Cluster A to B, and today I found these nodes lives both in Cluster A and B, and the here is the {{/master/slaves}} response: {code} { "slaves": [ { "active": false, "attributes": { "apps": "logstash", "colo": "cn5", "type": "prod" }, "hostname": "l-bu128g5-10k10.ops.cn2.qunar.com", "id": "3e7ba6b1-29fd-44e8-9be2-f72896054ac6-S2", "offered_resources": { "cpus": 0, "disk": 0, "mem": 0 }, "pid": "slave(1)@10.90.5.19:5051", "registered_time": 1451988622.66323, "reserved_resources": {}, "resources": { "cpus": 32.0, "disk": 2728919.0, "mem": 128126.0, "ports": "[8100-1, 31000-32000]" }, "unreserved_resources": { "cpus": 32.0, "disk": 2728919.0, "mem": 128126.0, "ports": "[8100-1, 31000-32000]" }, "used_resources": { "cpus": 0, "disk": 0, "mem": 0 } }, . {code} And the following is mesos slave logs: {quote} I0105 18:36:22.683724 6452 slave.cpp:2248] Updated checkpointed resources from to I0105 18:37:09.900497 6459 slave.cpp:3926] Current disk usage 0.06%. Max allowed age: 1.798706758587755days I0105 18:37:22.678374 6453 slave.cpp:3146] Master marked the slave as disconnected but the slave considers itself registered! Forcing re-registration. I0105 18:37:22.678699 6453 slave.cpp:694] Re-detecting master I0105 18:37:22.678715 6471 status_update_manager.cpp:176] Pausing sending status updates I0105 18:37:22.678753 6453 slave.cpp:741] Detecting new master I0105 18:37:22.678977 6456 status_update_manager.cpp:176] Pausing sending status updates I0105 18:37:22.679047 6455 slave.cpp:705] New master detected at master@10.88.169.195:5050 I0105 18:37:22.679108 6455 slave.cpp:768] Authenticating with master master@10.88.169.195:5050 I0105 18:37:22.679136 6455 slave.cpp:773] Using default CRAM-MD5 authenticatee I0105 18:37:22.679239 6455 slave.cpp:741] Detecting new master I0105 18:37:22.679354 6464 authenticatee.cpp:115] Creating new client SASL connection I0105 18:37:22.680883 6461 authenticatee.cpp:206] Received SASL authentication mechanisms: CRAM-MD5 I0105 18:37:22.680946 6461 authenticatee.cpp:232] Attempting to authenticate with mechanism 'CRAM-MD5' I0105 18:37:22.681759 6455 authenticatee.cpp:252] Received SASL authentication step I0105 18:37:22.682874 6454 authenticatee.cpp:292] Authentication success I0105 18:37:22.682986 6441 slave.cpp:836] Successfully authenticated with master master@10.88.169.195:5050 I0105 18:37:22.684303 6454 slave.cpp:980] Re-registered with master master@10.88.169.195:5050 I0105 18:37:22.684455 6454 slave.cpp:1016] Forwarding total oversubscribed resources I0105 18:37:22.684471 6468 status_update_manager.cpp:183] Resuming sending status updates I0105 18:37:22.684649 6454 slave.cpp:2152] Updating framework 20150610-204949-3299432458-5050-25057- pid to scheduler-1bef8172-5068-44c6-93f5-e97a3910ed79@10.88.169.195:35708 I0105 18:37:22.685025 6452 status_update_manager.cpp:183] Resuming sending status updates I0105 18:37:22.685117 6454 slave.cpp:2248] Updated checkpointed resources from to I0105 18:38:09.901587 6464 slave.cpp:3926] Current disk usage 0.06%. Max allowed age: 1.798706755730266days I0105 18:38:22.679468 6451 slave.cpp:3146] Master marked the slave as disconnected but the slave considers itself registered! Forcing re-registration. I0105 18:38:22.679739 6451 slave.cpp:694] Re-detecting master I0105 18:38:22.679754 6453 status_update_manager.cpp:176] Pausing sending status updates I0105 18:38:22.679785 6451 slave.cpp:741] Detecting new master I0105 18:38:22.680054 6461 slave.cpp:705] New master detected at master@10.88.169.195:5050 I0105 18:38:22.680106 6470 status_update_manager.cpp:176] Pausing sending status updates I0105 18:38:22.680107 6461 slave.cpp:768] Authenticating with master master@10.88.169.195:5050 I0105 18:38:22.680197 6461 slave.cpp:773] Using default CRAM-MD5 authenticatee I0105 18:38:22.680271 6461 slave.cpp:741] Detecting new master . W0105 19:05:38.207882 6450 slave.cpp:1973] Ignoring shutdown framework message for 3e7ba6b1-29fd-44e8-9be2-f72896054ac6-0116 from master@10.90.12.29:5050 because it is not from the registered master (master@10.88.169.195:5050) W0106 09:12:38.666767 6468 slave.cpp:1973] Ignoring shutdown framework message for
[jira] [Created] (MESOS-4182) Add Qunar to the "Powered by" page.
Lei Xu created MESOS-4182: - Summary: Add Qunar to the "Powered by" page. Key: MESOS-4182 URL: https://issues.apache.org/jira/browse/MESOS-4182 Project: Mesos Issue Type: Wish Components: documentation Reporter: Lei Xu Priority: Trivial Hi, We use Mesos and Marathon to support the log analyize programs, such as ELK, Spark. It is a great resource manager to hold thousands of applications to deal with 6~8 billion lines text per day, thanks very much! https://github.com/apache/mesos/pull/83 We'd love if you could merge it. :) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3821) DOCKER_HOST does not work well with --executor_environment_variables
[ https://issues.apache.org/jira/browse/MESOS-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14986822#comment-14986822 ] Lei Xu commented on MESOS-3821: --- Cool :) > DOCKER_HOST does not work well with --executor_environment_variables > > > Key: MESOS-3821 > URL: https://issues.apache.org/jira/browse/MESOS-3821 > Project: Mesos > Issue Type: Bug > Components: docker >Affects Versions: 0.25.0 > Environment: Docker 1.7.1 > Mesos 0.25.0 >Reporter: Lei Xu >Assignee: haosdent > > Hi guys, > I found that DOCKER_HOST does not work now if I set > bq. --executor_environment_variables={"DOCKER_HOST":"localhost:2377"} > but the docker executor always append > bq. -H unix:///var/run/docker.sock > on each command, it will overwrite the DOCKER_HOST in fact. > I think it is too strict now, and I could not disable it via some command > flags. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MESOS-3821) DOCKER_HOST does not work well with --executor_environment_variables
[ https://issues.apache.org/jira/browse/MESOS-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14986755#comment-14986755 ] Lei Xu commented on MESOS-3821: --- Hi [~haosd...@gmail.com], you're right, I hope that user can specify protocol and schema in the --docker_socket, like: --docker_socket unix:///var/run/docker.sock or --docker_socket tcp://127.0.0.1:2376 > DOCKER_HOST does not work well with --executor_environment_variables > > > Key: MESOS-3821 > URL: https://issues.apache.org/jira/browse/MESOS-3821 > Project: Mesos > Issue Type: Bug > Components: docker >Affects Versions: 0.25.0 > Environment: Docker 1.7.1 > Mesos 0.25.0 >Reporter: Lei Xu > > Hi guys, > I found that DOCKER_HOST does not work now if I set > bq. --executor_environment_variables={"DOCKER_HOST":"localhost:2377"} > but the docker executor always append > bq. -H unix:///var/run/docker.sock > on each command, it will overwrite the DOCKER_HOST in fact. > I think it is too strict now, and I could not disable it via some command > flags. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MESOS-3821) DOCKER_HOST does not work well with --executor_environment_variables
Lei Xu created MESOS-3821: - Summary: DOCKER_HOST does not work well with --executor_environment_variables Key: MESOS-3821 URL: https://issues.apache.org/jira/browse/MESOS-3821 Project: Mesos Issue Type: Bug Components: docker Affects Versions: 0.25.0 Environment: Docker 1.7.1 Mesos 0.25.0 Reporter: Lei Xu Hi guys, I found that DOCKER_HOST does not work now if I set bq. --executor_environment_variables={"DOCKER_HOST":"localhost:2377"} but the docker executor always append bq. -H unix:///var/run/docker.sock on each command, it will overwrite the DOCKER_HOST in fact. I think it is too strict now, and I could not disable it via some command flags. -- This message was sent by Atlassian JIRA (v6.3.4#6332)