[jira] [Commented] (MESOS-2842) Update FrameworkInfo.principal on framework re-registration

2017-05-05 Thread Lei Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15997958#comment-15997958
 ] 

Lei Xu commented on MESOS-2842:
---

We meet the same problem this week, and the logs show us "Check failed: 
metrics->frameworks.contains(principal.get())", I think this issue may be the 
root cause of our problem.

> Update FrameworkInfo.principal on framework re-registration
> ---
>
> Key: MESOS-2842
> URL: https://issues.apache.org/jira/browse/MESOS-2842
> Project: Mesos
>  Issue Type: Bug
>Reporter: Vinod Kone
>Priority: Critical
>  Labels: security
>
> From the design doc:
> This is a bit involved because ‘principal’ is used for authentication and 
> rate limiting.
> The authentication part is straightforward because a framework with updated 
> ‘principal’ should authenticate with the new ‘principal’ before being allowed 
> to re-register. The ‘authenticated’ map already gets updated when the 
> framework disconnects and reconnects, so it is fine.
> For rate limiting, Master:failoverFramework() needs to be changed to update 
> the principal in ‘frameworks.principals’ map and also remove the metrics for 
> the old principal if there are no other frameworks with this principal 
> (similar to what we do in Master::removeFramework()).
> The Master::visit() and Master::_visit() should work with the current 
> semantics.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (MESOS-2842) Update FrameworkInfo.principal on framework re-registration

2017-05-05 Thread Lei Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15997958#comment-15997958
 ] 

Lei Xu edited comment on MESOS-2842 at 5/5/17 8:57 AM:
---

We meet the same problem this week, and the logs show us "Check failed: 
metrics->frameworks.contains(principal.get())", I think this issue may be the 
root cause of our problem.

Our mesos version is 0.28.2


was (Author: brickxu):
We meet the same problem this week, and the logs show us "Check failed: 
metrics->frameworks.contains(principal.get())", I think this issue may be the 
root cause of our problem.

> Update FrameworkInfo.principal on framework re-registration
> ---
>
> Key: MESOS-2842
> URL: https://issues.apache.org/jira/browse/MESOS-2842
> Project: Mesos
>  Issue Type: Bug
>Reporter: Vinod Kone
>Priority: Critical
>  Labels: security
>
> From the design doc:
> This is a bit involved because ‘principal’ is used for authentication and 
> rate limiting.
> The authentication part is straightforward because a framework with updated 
> ‘principal’ should authenticate with the new ‘principal’ before being allowed 
> to re-register. The ‘authenticated’ map already gets updated when the 
> framework disconnects and reconnects, so it is fine.
> For rate limiting, Master:failoverFramework() needs to be changed to update 
> the principal in ‘frameworks.principals’ map and also remove the metrics for 
> the old principal if there are no other frameworks with this principal 
> (similar to what we do in Master::removeFramework()).
> The Master::visit() and Master::_visit() should work with the current 
> semantics.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (MESOS-6738) Mesos master help message gives unformatted documents.

2016-12-06 Thread Lei Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Xu updated MESOS-6738:
--
Priority: Trivial  (was: Minor)

> Mesos master help message gives unformatted documents.
> --
>
> Key: MESOS-6738
> URL: https://issues.apache.org/jira/browse/MESOS-6738
> Project: Mesos
>  Issue Type: Bug
>  Components: cli
>Affects Versions: 1.1.0
> Environment: Mesos 1.1.0
> Ubuntu 16.04
>Reporter: Lei Xu
>Priority: Trivial
> Attachments: mesos_agent_help_message.png, 
> mesos_master_help_message.png
>
>
> build mesos from the release tarball and running the following command:
> {code}
> mesos master --help
> {code}
> it gives unformatted docs, but the slave/agent's help message is OK.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6738) Mesos master help message gives unformatted documents.

2016-12-06 Thread Lei Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Xu updated MESOS-6738:
--
Attachment: mesos_agent_help_message.png
mesos_master_help_message.png

> Mesos master help message gives unformatted documents.
> --
>
> Key: MESOS-6738
> URL: https://issues.apache.org/jira/browse/MESOS-6738
> Project: Mesos
>  Issue Type: Bug
>  Components: cli
>Affects Versions: 1.1.0
> Environment: Mesos 1.1.0
> Ubuntu 16.04
>Reporter: Lei Xu
>Priority: Minor
> Attachments: mesos_agent_help_message.png, 
> mesos_master_help_message.png
>
>
> build mesos from the release tarball and running the following command:
> {code}
> mesos master --help
> {code}
> it gives unformatted docs, but the slave/agent's help message is OK.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6738) Mesos master help message gives unformatted documents.

2016-12-06 Thread Lei Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Xu updated MESOS-6738:
--
Description: 
build mesos from the release tarball and running the following command:

{code}
mesos master --help
{code}

it gives unformatted docs, but the slave/agent's help message is OK.

  was:
build mesos from the release tarball and running the following command:

{code}
mesos master --help
{code}

it gives unformatted docs, but the slave/agent's help message is OK.

{code}
Usage: mesos-master [options]

  --acls=VALUE  
 The value could be a JSON-formatted string of ACLs

 or a file path containing the JSON-formatted ACLs used

 for authorization. Path could be of the form `file:///path/to/file`

 or `/path/to/file`.

 

 Note that if the flag `--authorizers` is provided with a value

 different than `local`, the ACLs contents

 will be ignored.

 

 See the ACLs protobuf in acls.proto for the expected format.

 

 Example:

 {

   "register_frameworks": [

 {

   "principals": { "type": "ANY" },

   "roles": { "values": ["a"] }

 }

   ],

   "run_tasks": [

 {

   "principals": { "values": ["a", "b"] },

   "users": { "values": ["c"] }

 }

   ],

   "teardown_frameworks": [

 {

   "principals": { "values": ["a", "b"] },

   "framework_principals": { "values": ["c"] }

 }

   ],

   "set_quotas": [

 {

   "principals": { "values": ["a"] },

   "roles": { "values": ["a", "b"] }

 }

   ],

   "remove_quotas": [

 {

   "principals": { "values": ["a"] },

   "quota_principals": { "values": ["a"] }
  

[jira] [Created] (MESOS-6738) Mesos master help message gives unformatted documents.

2016-12-06 Thread Lei Xu (JIRA)
Lei Xu created MESOS-6738:
-

 Summary: Mesos master help message gives unformatted documents.
 Key: MESOS-6738
 URL: https://issues.apache.org/jira/browse/MESOS-6738
 Project: Mesos
  Issue Type: Bug
  Components: cli
Affects Versions: 1.1.0
 Environment: Mesos 1.1.0
Ubuntu 16.04
Reporter: Lei Xu
Priority: Minor


build mesos from the release tarball and running the following command:

{code}
mesos master --help
{code}

it gives unformatted docs, but the slave/agent's help message is OK.

{code}
Usage: mesos-master [options]

  --acls=VALUE  
 The value could be a JSON-formatted string of ACLs

 or a file path containing the JSON-formatted ACLs used

 for authorization. Path could be of the form `file:///path/to/file`

 or `/path/to/file`.

 

 Note that if the flag `--authorizers` is provided with a value

 different than `local`, the ACLs contents

 will be ignored.

 

 See the ACLs protobuf in acls.proto for the expected format.

 

 Example:

 {

   "register_frameworks": [

 {

   "principals": { "type": "ANY" },

   "roles": { "values": ["a"] }

 }

   ],

   "run_tasks": [

 {

   "principals": { "values": ["a", "b"] },

   "users": { "values": ["c"] }

 }

   ],

   "teardown_frameworks": [

 {

   "principals": { "values": ["a", "b"] },

   "framework_principals": { "values": ["c"] }

 }

   ],

   "set_quotas": [

 {

   "principals": { "values": ["a"] },

   "roles": { "values": ["a", "b"] }

 }

   ],

   "remove_quotas": [

 {

   "principals": { "values": ["a"] },

   

[jira] [Updated] (MESOS-6615) Running mesos-slave in the docker that leave many zombie process

2016-11-20 Thread Lei Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Xu updated MESOS-6615:
--
Component/s: containerization

> Running mesos-slave in the docker that leave many zombie process
> 
>
> Key: MESOS-6615
> URL: https://issues.apache.org/jira/browse/MESOS-6615
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, slave
>Affects Versions: 0.28.2
> Environment: Mesos 0.28.2 
> Docker 1.12.1
>Reporter: Lei Xu
>Priority: Critical
>
> Here are some zombie process if I run mesos-slave in the docker.
> {code}
> root 10547 19464  0 Oct25 ?00:00:00 [docker] 
> root 14505 19464  0 Oct25 ?00:00:00 [docker] 
> root 16069 19464  0 Oct25 ?00:00:00 [docker] 
> root 19962 19464  0 Oct25 ?00:00:00 [docker] 
> root 23346 19464  0 Oct25 ?00:00:00 [docker] 
> root 24544 19464  0 Oct25 ?00:00:00 [docker] 
> {code}
> And I find the zombies come from {{mesos-slave}} process:
> {code}
> pstree -p -s 10547
> systemd(1)───docker-containe(19448)───mesos-slave(19464)───docker(10547)
> {code}
> The logs has been deleted by the cron job a few weeks ago, but I remember so 
> many {{Failed to shutdown socket with fd xx: Transport endpoint is not 
> connected}} in the log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6615) Running mesos-slave in the docker that leave many zombie process

2016-11-20 Thread Lei Xu (JIRA)
Lei Xu created MESOS-6615:
-

 Summary: Running mesos-slave in the docker that leave many zombie 
process
 Key: MESOS-6615
 URL: https://issues.apache.org/jira/browse/MESOS-6615
 Project: Mesos
  Issue Type: Bug
  Components: slave
Affects Versions: 0.28.2
 Environment: Mesos 0.28.2 
Docker 1.12.1
Reporter: Lei Xu
Priority: Critical


Here are some zombie process if I run mesos-slave in the docker.

{code}
root 10547 19464  0 Oct25 ?00:00:00 [docker] 
root 14505 19464  0 Oct25 ?00:00:00 [docker] 
root 16069 19464  0 Oct25 ?00:00:00 [docker] 
root 19962 19464  0 Oct25 ?00:00:00 [docker] 
root 23346 19464  0 Oct25 ?00:00:00 [docker] 
root 24544 19464  0 Oct25 ?00:00:00 [docker] 
{code}

And I find the zombies come from {{mesos-slave}} process:

{code}
pstree -p -s 10547
systemd(1)───docker-containe(19448)───mesos-slave(19464)───docker(10547)
{code}

The logs has been deleted by the cron job a few weeks ago, but I remember so 
many {{Failed to shutdown socket with fd xx: Transport endpoint is not 
connected}} in the log.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5540) Support building with non-GNU libc

2016-11-20 Thread Lei Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15682353#comment-15682353
 ] 

Lei Xu commented on MESOS-5540:
---

join this thread, very useful issue for building mesos in the musl-libc. thanks 
all !

> Support building with non-GNU libc
> --
>
> Key: MESOS-5540
> URL: https://issues.apache.org/jira/browse/MESOS-5540
> Project: Mesos
>  Issue Type: Improvement
>  Components: general
>Reporter: Neil Conway
>Assignee: Neil Conway
>Priority: Minor
>  Labels: mesosphere
> Fix For: 1.0.0
>
>
> Some Linux distributions don't use glibc -- e.g., Alpine Linux uses musl. 
> Mesos currently fails to compile using musl for at least the following two 
> reasons:
> 1. {{linux/fs.hpp}} includes {{fstab.h}}, which isn't provided by musl.
> 2. various places use {{fts.h}}, which isn't provided by musl
> For (1), it seems this functionality is only needed by 
> {{FsTest.FileSystemTableRead}}, so I think it can be safely removed.
> For (2), there are standalone implementations of the FTS functions, e.g., 
> https://github.com/pullmoll/musl-fts/ . We could either vendor such an 
> implementation or require the user to install an FTS implementation as a 
> library (e.g., https://pkgs.alpinelinux.org/package/edge/main/x86_64/fts). If 
> we do the latter, we'd need to be prepared to link against {{libfts.a}} if 
> needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6410) Fail to mount persistent volume when run mesos slave in docker

2016-10-19 Thread Lei Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15590575#comment-15590575
 ] 

Lei Xu commented on MESOS-6410:
---

Hi [~haosd...@gmail.com], It's OK now with `--privileged=true`, thanks very 
much.

> Fail to mount persistent volume when run mesos slave in docker
> --
>
> Key: MESOS-6410
> URL: https://issues.apache.org/jira/browse/MESOS-6410
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, volumes
>Affects Versions: 0.28.2
> Environment: Mesos 0.28.2
> Docker 1.12.1
>Reporter: Lei Xu
>Priority: Critical
>
> Here are some error logs from the slave:
> {code}
> E1018 07:52:06.18692630 slave.cpp:3758] Container 
> 'fbfd5e46-4460-45af-bd64-e03e8664f575' for executor 
> 'storm_nimbus_mpubpushsmart.d
> 60e9066-94ec-11e6-99ff-0242d43b0395' of framework 
> 06ccc047-7137-41ef-a4ac-4090b9cd9e42-0023 failed to start: Failed to mount 
> persistent
>  volume from 
> '/var/lib/mesos/volumes/roles/storm/storm_nimbus_mpubpushsmart#tmp#d60e4245-94ec-11e6-99ff-0242d43b0395'
>  to '/var/lib/meso
> s/slaves/06ccc047-7137-41ef-a4ac-4090b9cd9e42-S45/frameworks/06ccc047-7137-41ef-a4ac-4090b9cd9e42-0023/executors/storm_nimbus_mpubpushs
> mart.d60e9066-94ec-11e6-99ff-0242d43b0395/runs/fbfd5e46-4460-45af-bd64-e03e8664f575/tmp':
>  Operation not permitted
> E1018 07:52:09.91687725 slave.cpp:3758] Container 
> 'bb8ca08b-1cbf-450d-93e2-18a6322cb5be' for executor 
> 'storm_nimbus_mpubpushsmart.d
> 60e9066-94ec-11e6-99ff-0242d43b0395' of framework 
> 06ccc047-7137-41ef-a4ac-4090b9cd9e42-0023 failed to start: Failed to mount 
> persistent
>  volume from 
> '/var/lib/mesos/volumes/roles/storm/storm_nimbus_mpubpushsmart#tmp#d60e4245-94ec-11e6-99ff-0242d43b0395'
>  to '/var/lib/meso
> s/slaves/06ccc047-7137-41ef-a4ac-4090b9cd9e42-S45/frameworks/06ccc047-7137-41ef-a4ac-4090b9cd9e42-0023/executors/storm_nimbus_mpubpushs
> mart.d60e9066-94ec-11e6-99ff-0242d43b0395/runs/bb8ca08b-1cbf-450d-93e2-18a6322cb5be/tmp':
>  Operation not permitted
> {code}
> But out of the docker, the mesos slave works OK with the persistent volumes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6410) Fail to mount persistent volume when run mesos slave in docker

2016-10-18 Thread Lei Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Xu updated MESOS-6410:
--
Description: 
Here are some error logs from the slave:

{quote}
E1018 07:52:06.18692630 slave.cpp:3758] Container 
'fbfd5e46-4460-45af-bd64-e03e8664f575' for executor 
'storm_nimbus_mpubpushsmart.d
60e9066-94ec-11e6-99ff-0242d43b0395' of framework 
06ccc047-7137-41ef-a4ac-4090b9cd9e42-0023 failed to start: Failed to mount 
persistent
 volume from 
'/var/lib/mesos/volumes/roles/storm/storm_nimbus_mpubpushsmart#tmp#d60e4245-94ec-11e6-99ff-0242d43b0395'
 to '/var/lib/meso
s/slaves/06ccc047-7137-41ef-a4ac-4090b9cd9e42-S45/frameworks/06ccc047-7137-41ef-a4ac-4090b9cd9e42-0023/executors/storm_nimbus_mpubpushs
mart.d60e9066-94ec-11e6-99ff-0242d43b0395/runs/fbfd5e46-4460-45af-bd64-e03e8664f575/tmp':
 Operation not permitted
E1018 07:52:09.91687725 slave.cpp:3758] Container 
'bb8ca08b-1cbf-450d-93e2-18a6322cb5be' for executor 
'storm_nimbus_mpubpushsmart.d
60e9066-94ec-11e6-99ff-0242d43b0395' of framework 
06ccc047-7137-41ef-a4ac-4090b9cd9e42-0023 failed to start: Failed to mount 
persistent
 volume from 
'/var/lib/mesos/volumes/roles/storm/storm_nimbus_mpubpushsmart#tmp#d60e4245-94ec-11e6-99ff-0242d43b0395'
 to '/var/lib/meso
s/slaves/06ccc047-7137-41ef-a4ac-4090b9cd9e42-S45/frameworks/06ccc047-7137-41ef-a4ac-4090b9cd9e42-0023/executors/storm_nimbus_mpubpushs
mart.d60e9066-94ec-11e6-99ff-0242d43b0395/runs/bb8ca08b-1cbf-450d-93e2-18a6322cb5be/tmp':
 Operation not permitted
{quote}

But out of the docker, the mesos slave works OK with the persistent volumes


  was:
Here are some error logs from the slave:

{quote}
E1018 07:52:06.18692630 slave.cpp:3758] Container 
'fbfd5e46-4460-45af-bd64-e03e8664f575' for executor 
'storm_nimbus_mpubpushsmart.d
60e9066-94ec-11e6-99ff-0242d43b0395' of framework 
06ccc047-7137-41ef-a4ac-4090b9cd9e42-0023 failed to start: Failed to mount 
persistent
 volume from 
'/var/lib/mesos/volumes/roles/storm/storm_nimbus_mpubpushsmart#tmp#d60e4245-94ec-11e6-99ff-0242d43b0395'
 to '/var/lib/meso
s/slaves/06ccc047-7137-41ef-a4ac-4090b9cd9e42-S45/frameworks/06ccc047-7137-41ef-a4ac-4090b9cd9e42-0023/executors/storm_nimbus_mpubpushs
mart.d60e9066-94ec-11e6-99ff-0242d43b0395/runs/fbfd5e46-4460-45af-bd64-e03e8664f575/tmp':
 Operation not permitted
E1018 07:52:09.91687725 slave.cpp:3758] Container 
'bb8ca08b-1cbf-450d-93e2-18a6322cb5be' for executor 
'storm_nimbus_mpubpushsmart.d
60e9066-94ec-11e6-99ff-0242d43b0395' of framework 
06ccc047-7137-41ef-a4ac-4090b9cd9e42-0023 failed to start: Failed to mount 
persistent
 volume from 
'/var/lib/mesos/volumes/roles/storm/storm_nimbus_mpubpushsmart#tmp#d60e4245-94ec-11e6-99ff-0242d43b0395'
 to '/var/lib/meso
s/slaves/06ccc047-7137-41ef-a4ac-4090b9cd9e42-S45/frameworks/06ccc047-7137-41ef-a4ac-4090b9cd9e42-0023/executors/storm_nimbus_mpubpushs
mart.d60e9066-94ec-11e6-99ff-0242d43b0395/runs/bb8ca08b-1cbf-450d-93e2-18a6322cb5be/tmp':
 Operation not permitted
{quote}




> Fail to mount persistent volume when run mesos slave in docker
> --
>
> Key: MESOS-6410
> URL: https://issues.apache.org/jira/browse/MESOS-6410
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, volumes
>Affects Versions: 0.28.2
> Environment: Mesos 0.28.2
> Docker 1.12.1
>Reporter: Lei Xu
>Priority: Critical
>
> Here are some error logs from the slave:
> {quote}
> E1018 07:52:06.18692630 slave.cpp:3758] Container 
> 'fbfd5e46-4460-45af-bd64-e03e8664f575' for executor 
> 'storm_nimbus_mpubpushsmart.d
> 60e9066-94ec-11e6-99ff-0242d43b0395' of framework 
> 06ccc047-7137-41ef-a4ac-4090b9cd9e42-0023 failed to start: Failed to mount 
> persistent
>  volume from 
> '/var/lib/mesos/volumes/roles/storm/storm_nimbus_mpubpushsmart#tmp#d60e4245-94ec-11e6-99ff-0242d43b0395'
>  to '/var/lib/meso
> s/slaves/06ccc047-7137-41ef-a4ac-4090b9cd9e42-S45/frameworks/06ccc047-7137-41ef-a4ac-4090b9cd9e42-0023/executors/storm_nimbus_mpubpushs
> mart.d60e9066-94ec-11e6-99ff-0242d43b0395/runs/fbfd5e46-4460-45af-bd64-e03e8664f575/tmp':
>  Operation not permitted
> E1018 07:52:09.91687725 slave.cpp:3758] Container 
> 'bb8ca08b-1cbf-450d-93e2-18a6322cb5be' for executor 
> 'storm_nimbus_mpubpushsmart.d
> 60e9066-94ec-11e6-99ff-0242d43b0395' of framework 
> 06ccc047-7137-41ef-a4ac-4090b9cd9e42-0023 failed to start: Failed to mount 
> persistent
>  volume from 
> '/var/lib/mesos/volumes/roles/storm/storm_nimbus_mpubpushsmart#tmp#d60e4245-94ec-11e6-99ff-0242d43b0395'
>  to '/var/lib/meso
> s/slaves/06ccc047-7137-41ef-a4ac-4090b9cd9e42-S45/frameworks/06ccc047-7137-41ef-a4ac-4090b9cd9e42-0023/executors/storm_nimbus_mpubpushs
> mart.d60e9066-94ec-11e6-99ff-0242d43b0395/runs/bb8ca08b-1cbf-450d-93e2-18a6322cb5be/tmp':
>  Operation not permitted
> {quote}
> 

[jira] [Updated] (MESOS-6410) Fail to mount persistent volume when run mesos slave in docker

2016-10-18 Thread Lei Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Xu updated MESOS-6410:
--
Description: 
Here are some error logs from the slave:

{quote}
E1018 07:52:06.18692630 slave.cpp:3758] Container 
'fbfd5e46-4460-45af-bd64-e03e8664f575' for executor 
'storm_nimbus_mpubpushsmart.d
60e9066-94ec-11e6-99ff-0242d43b0395' of framework 
06ccc047-7137-41ef-a4ac-4090b9cd9e42-0023 failed to start: Failed to mount 
persistent
 volume from 
'/var/lib/mesos/volumes/roles/storm/storm_nimbus_mpubpushsmart#tmp#d60e4245-94ec-11e6-99ff-0242d43b0395'
 to '/var/lib/meso
s/slaves/06ccc047-7137-41ef-a4ac-4090b9cd9e42-S45/frameworks/06ccc047-7137-41ef-a4ac-4090b9cd9e42-0023/executors/storm_nimbus_mpubpushs
mart.d60e9066-94ec-11e6-99ff-0242d43b0395/runs/fbfd5e46-4460-45af-bd64-e03e8664f575/tmp':
 Operation not permitted
E1018 07:52:09.91687725 slave.cpp:3758] Container 
'bb8ca08b-1cbf-450d-93e2-18a6322cb5be' for executor 
'storm_nimbus_mpubpushsmart.d
60e9066-94ec-11e6-99ff-0242d43b0395' of framework 
06ccc047-7137-41ef-a4ac-4090b9cd9e42-0023 failed to start: Failed to mount 
persistent
 volume from 
'/var/lib/mesos/volumes/roles/storm/storm_nimbus_mpubpushsmart#tmp#d60e4245-94ec-11e6-99ff-0242d43b0395'
 to '/var/lib/meso
s/slaves/06ccc047-7137-41ef-a4ac-4090b9cd9e42-S45/frameworks/06ccc047-7137-41ef-a4ac-4090b9cd9e42-0023/executors/storm_nimbus_mpubpushs
mart.d60e9066-94ec-11e6-99ff-0242d43b0395/runs/bb8ca08b-1cbf-450d-93e2-18a6322cb5be/tmp':
 Operation not permitted
{quote}



> Fail to mount persistent volume when run mesos slave in docker
> --
>
> Key: MESOS-6410
> URL: https://issues.apache.org/jira/browse/MESOS-6410
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, volumes
>Affects Versions: 0.28.2
> Environment: Mesos 0.28.2
> Docker 1.12.1
>Reporter: Lei Xu
>Priority: Critical
>
> Here are some error logs from the slave:
> {quote}
> E1018 07:52:06.18692630 slave.cpp:3758] Container 
> 'fbfd5e46-4460-45af-bd64-e03e8664f575' for executor 
> 'storm_nimbus_mpubpushsmart.d
> 60e9066-94ec-11e6-99ff-0242d43b0395' of framework 
> 06ccc047-7137-41ef-a4ac-4090b9cd9e42-0023 failed to start: Failed to mount 
> persistent
>  volume from 
> '/var/lib/mesos/volumes/roles/storm/storm_nimbus_mpubpushsmart#tmp#d60e4245-94ec-11e6-99ff-0242d43b0395'
>  to '/var/lib/meso
> s/slaves/06ccc047-7137-41ef-a4ac-4090b9cd9e42-S45/frameworks/06ccc047-7137-41ef-a4ac-4090b9cd9e42-0023/executors/storm_nimbus_mpubpushs
> mart.d60e9066-94ec-11e6-99ff-0242d43b0395/runs/fbfd5e46-4460-45af-bd64-e03e8664f575/tmp':
>  Operation not permitted
> E1018 07:52:09.91687725 slave.cpp:3758] Container 
> 'bb8ca08b-1cbf-450d-93e2-18a6322cb5be' for executor 
> 'storm_nimbus_mpubpushsmart.d
> 60e9066-94ec-11e6-99ff-0242d43b0395' of framework 
> 06ccc047-7137-41ef-a4ac-4090b9cd9e42-0023 failed to start: Failed to mount 
> persistent
>  volume from 
> '/var/lib/mesos/volumes/roles/storm/storm_nimbus_mpubpushsmart#tmp#d60e4245-94ec-11e6-99ff-0242d43b0395'
>  to '/var/lib/meso
> s/slaves/06ccc047-7137-41ef-a4ac-4090b9cd9e42-S45/frameworks/06ccc047-7137-41ef-a4ac-4090b9cd9e42-0023/executors/storm_nimbus_mpubpushs
> mart.d60e9066-94ec-11e6-99ff-0242d43b0395/runs/bb8ca08b-1cbf-450d-93e2-18a6322cb5be/tmp':
>  Operation not permitted
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6410) Fail to mount persistent volume when run mesos slave in docker

2016-10-18 Thread Lei Xu (JIRA)
Lei Xu created MESOS-6410:
-

 Summary: Fail to mount persistent volume when run mesos slave in 
docker
 Key: MESOS-6410
 URL: https://issues.apache.org/jira/browse/MESOS-6410
 Project: Mesos
  Issue Type: Bug
  Components: containerization, volumes
Affects Versions: 0.28.2
 Environment: Mesos 0.28.2
Docker 1.12.1
Reporter: Lei Xu
Priority: Critical






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-6200) Hope mesos support soft and hard cpu/memory resource in the task

2016-09-18 Thread Lei Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-6200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15500642#comment-15500642
 ] 

Lei Xu commented on MESOS-6200:
---

yes, but it is a little tricky. I still hope the executor could do all the 
things with the `resource` field, the user focus on the soft/hard resource 
limit and the executor set the resource with the correct cmd options or cgroup 
file value.

> Hope mesos support soft and hard cpu/memory resource in the task
> 
>
> Key: MESOS-6200
> URL: https://issues.apache.org/jira/browse/MESOS-6200
> Project: Mesos
>  Issue Type: Improvement
>  Components: cgroups, containerization, docker, scheduler api
>Affects Versions: 0.28.2
> Environment: CentOS 7 
> Kernel 3.10.0-327.28.3.el7.x86_64
> Mesos 0.28.2
> Docker 1.11.2
>Reporter: Lei Xu
>
> The Docker executor maybe could support soft/hard resource limit to enable 
> more flexible resources sharing among the applications.
> ||  || CPU || Memory ||
> | hard limit| --cpu-period & --cpu-quota | --memory & --memory-swap|
> | soft limit| --cpu-shares | --memory-reservation|
> And now the task protobuf message has only one resource struct that used to 
> describe the cgroup limit, and the docker executor handle is like the 
> following, only --memory and --cpu-shares were set:
> {code}
>   if (resources.isSome()) {
> // TODO(yifan): Support other resources (e.g. disk).
> Option cpus = resources.get().cpus();
> if (cpus.isSome()) {
>   uint64_t cpuShare =
> std::max((uint64_t) (CPU_SHARES_PER_CPU * cpus.get()), 
> MIN_CPU_SHARES);
>   argv.push_back("--cpu-shares");
>   argv.push_back(stringify(cpuShare));
> }
> Option mem = resources.get().mem();
> if (mem.isSome()) {
>   Bytes memLimit = std::max(mem.get(), MIN_MEMORY);
>   argv.push_back("--memory");
>   argv.push_back(stringify(memLimit.bytes()));
> }
>   }
> {code}
> I hope that the executor and the protobuf message could separate the resource 
> to the two parts: soft and hard. Then the user could set 2 levels resource 
> limits for the docker.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6200) Hope mesos support soft and hard cpu/memory resource in the task

2016-09-18 Thread Lei Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Xu updated MESOS-6200:
--
Description: 
The Docker executor maybe could support soft/hard resource limit to enable more 
flexible resources sharing among the applications.

||  || CPU || Memory ||
| hard limit| --cpu-period & --cpu-quota | --memory & --memory-swap|
| soft limit| --cpu-shares | --memory-reservation|

And now the task protobuf message has only one resource struct that used to 
describe the cgroup limit, and the docker executor handle is like the 
following, only --memory and --cpu-shares were set:

{code}
  if (resources.isSome()) {
// TODO(yifan): Support other resources (e.g. disk).
Option cpus = resources.get().cpus();
if (cpus.isSome()) {
  uint64_t cpuShare =
std::max((uint64_t) (CPU_SHARES_PER_CPU * cpus.get()), MIN_CPU_SHARES);
  argv.push_back("--cpu-shares");
  argv.push_back(stringify(cpuShare));
}

Option mem = resources.get().mem();
if (mem.isSome()) {
  Bytes memLimit = std::max(mem.get(), MIN_MEMORY);
  argv.push_back("--memory");
  argv.push_back(stringify(memLimit.bytes()));
}
  }
{code}

I hope that the executor and the protobuf message could separate the resource 
to the two parts: soft and hard. Then the user could set 2 levels resource 
limits for the docker.

  was:
The Docker executor maybe could support soft/hard resource limit to enable more 
flexible resources sharing among the applications.

||  || CPU || Memory ||
| hard limit| --cpu-period & --cpu-quota | --memory & --memory-swap|
| soft limit| --cpu-shares | --memory-reservation|

And now the task protobuf message has only one resource struct that used to 
describe the cgroup limit, and the docker executor handle is like the following:

{code}
  if (resources.isSome()) {
// TODO(yifan): Support other resources (e.g. disk).
Option cpus = resources.get().cpus();
if (cpus.isSome()) {
  uint64_t cpuShare =
std::max((uint64_t) (CPU_SHARES_PER_CPU * cpus.get()), MIN_CPU_SHARES);
  argv.push_back("--cpu-shares");
  argv.push_back(stringify(cpuShare));
}

Option mem = resources.get().mem();
if (mem.isSome()) {
  Bytes memLimit = std::max(mem.get(), MIN_MEMORY);
  argv.push_back("--memory");
  argv.push_back(stringify(memLimit.bytes()));
}
  }
{code}

I hope that the executor and the protobuf message could separate the resource 
to the two parts: soft and hard. Then the user could set 2 levels resource 
limits for the docker.


> Hope mesos support soft and hard cpu/memory resource in the task
> 
>
> Key: MESOS-6200
> URL: https://issues.apache.org/jira/browse/MESOS-6200
> Project: Mesos
>  Issue Type: Improvement
>  Components: cgroups, containerization, docker, scheduler api
>Affects Versions: 0.28.2
> Environment: CentOS 7 
> Kernel 3.10.0-327.28.3.el7.x86_64
> Mesos 0.28.2
> Docker 1.11.2
>Reporter: Lei Xu
>
> The Docker executor maybe could support soft/hard resource limit to enable 
> more flexible resources sharing among the applications.
> ||  || CPU || Memory ||
> | hard limit| --cpu-period & --cpu-quota | --memory & --memory-swap|
> | soft limit| --cpu-shares | --memory-reservation|
> And now the task protobuf message has only one resource struct that used to 
> describe the cgroup limit, and the docker executor handle is like the 
> following, only --memory and --cpu-shares were set:
> {code}
>   if (resources.isSome()) {
> // TODO(yifan): Support other resources (e.g. disk).
> Option cpus = resources.get().cpus();
> if (cpus.isSome()) {
>   uint64_t cpuShare =
> std::max((uint64_t) (CPU_SHARES_PER_CPU * cpus.get()), 
> MIN_CPU_SHARES);
>   argv.push_back("--cpu-shares");
>   argv.push_back(stringify(cpuShare));
> }
> Option mem = resources.get().mem();
> if (mem.isSome()) {
>   Bytes memLimit = std::max(mem.get(), MIN_MEMORY);
>   argv.push_back("--memory");
>   argv.push_back(stringify(memLimit.bytes()));
> }
>   }
> {code}
> I hope that the executor and the protobuf message could separate the resource 
> to the two parts: soft and hard. Then the user could set 2 levels resource 
> limits for the docker.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6200) Hop mesos support soft and hard cpu/memory resource in the task

2016-09-18 Thread Lei Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Xu updated MESOS-6200:
--
Summary: Hop mesos support soft and hard cpu/memory resource in the task  
(was: Hop mesos support request/limit resource in the task)

> Hop mesos support soft and hard cpu/memory resource in the task
> ---
>
> Key: MESOS-6200
> URL: https://issues.apache.org/jira/browse/MESOS-6200
> Project: Mesos
>  Issue Type: Improvement
>  Components: cgroups, containerization, docker, scheduler api
>Affects Versions: 0.28.2
> Environment: CentOS 7 
> Kernel 3.10.0-327.28.3.el7.x86_64
> Mesos 0.28.2
> Docker 1.11.2
>Reporter: Lei Xu
>
> The Docker executor maybe could support soft/hard resource limit to enable 
> more flexible resources sharing among the applications.
> ||  || CPU || Memory ||
> | hard limit| --cpu-shares| --memory & --memory-swap|
> | soft limit| --cpu-period & --cpu-quota | --memory-reservation|
> And now the task protobuf message has only one resource struct that used to 
> describe the cgroup limit, and the docker executor handle is like the 
> following:
> {code}
>   if (resources.isSome()) {
> // TODO(yifan): Support other resources (e.g. disk).
> Option cpus = resources.get().cpus();
> if (cpus.isSome()) {
>   uint64_t cpuShare =
> std::max((uint64_t) (CPU_SHARES_PER_CPU * cpus.get()), 
> MIN_CPU_SHARES);
>   argv.push_back("--cpu-shares");
>   argv.push_back(stringify(cpuShare));
> }
> Option mem = resources.get().mem();
> if (mem.isSome()) {
>   Bytes memLimit = std::max(mem.get(), MIN_MEMORY);
>   argv.push_back("--memory");
>   argv.push_back(stringify(memLimit.bytes()));
> }
>   }
> {code}
> I hope that the executor and the protobuf message could separate the resource 
> to the two parts: soft and hard. Then the user could set 2 levels resource 
> limits for the docker.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6200) Hope mesos support soft and hard cpu/memory resource in the task

2016-09-18 Thread Lei Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Xu updated MESOS-6200:
--
Description: 
The Docker executor maybe could support soft/hard resource limit to enable more 
flexible resources sharing among the applications.

||  || CPU || Memory ||
| hard limit| --cpu-period & --cpu-quota | --memory & --memory-swap|
| soft limit| --cpu-shares | --memory-reservation|

And now the task protobuf message has only one resource struct that used to 
describe the cgroup limit, and the docker executor handle is like the following:

{code}
  if (resources.isSome()) {
// TODO(yifan): Support other resources (e.g. disk).
Option cpus = resources.get().cpus();
if (cpus.isSome()) {
  uint64_t cpuShare =
std::max((uint64_t) (CPU_SHARES_PER_CPU * cpus.get()), MIN_CPU_SHARES);
  argv.push_back("--cpu-shares");
  argv.push_back(stringify(cpuShare));
}

Option mem = resources.get().mem();
if (mem.isSome()) {
  Bytes memLimit = std::max(mem.get(), MIN_MEMORY);
  argv.push_back("--memory");
  argv.push_back(stringify(memLimit.bytes()));
}
  }
{code}

I hope that the executor and the protobuf message could separate the resource 
to the two parts: soft and hard. Then the user could set 2 levels resource 
limits for the docker.

  was:
The Docker executor maybe could support soft/hard resource limit to enable more 
flexible resources sharing among the applications.

||  || CPU || Memory ||
| hard limit| --cpu-shares| --memory & --memory-swap|
| soft limit| --cpu-period & --cpu-quota | --memory-reservation|

And now the task protobuf message has only one resource struct that used to 
describe the cgroup limit, and the docker executor handle is like the following:

{code}
  if (resources.isSome()) {
// TODO(yifan): Support other resources (e.g. disk).
Option cpus = resources.get().cpus();
if (cpus.isSome()) {
  uint64_t cpuShare =
std::max((uint64_t) (CPU_SHARES_PER_CPU * cpus.get()), MIN_CPU_SHARES);
  argv.push_back("--cpu-shares");
  argv.push_back(stringify(cpuShare));
}

Option mem = resources.get().mem();
if (mem.isSome()) {
  Bytes memLimit = std::max(mem.get(), MIN_MEMORY);
  argv.push_back("--memory");
  argv.push_back(stringify(memLimit.bytes()));
}
  }
{code}

I hope that the executor and the protobuf message could separate the resource 
to the two parts: soft and hard. Then the user could set 2 levels resource 
limits for the docker.


> Hope mesos support soft and hard cpu/memory resource in the task
> 
>
> Key: MESOS-6200
> URL: https://issues.apache.org/jira/browse/MESOS-6200
> Project: Mesos
>  Issue Type: Improvement
>  Components: cgroups, containerization, docker, scheduler api
>Affects Versions: 0.28.2
> Environment: CentOS 7 
> Kernel 3.10.0-327.28.3.el7.x86_64
> Mesos 0.28.2
> Docker 1.11.2
>Reporter: Lei Xu
>
> The Docker executor maybe could support soft/hard resource limit to enable 
> more flexible resources sharing among the applications.
> ||  || CPU || Memory ||
> | hard limit| --cpu-period & --cpu-quota | --memory & --memory-swap|
> | soft limit| --cpu-shares | --memory-reservation|
> And now the task protobuf message has only one resource struct that used to 
> describe the cgroup limit, and the docker executor handle is like the 
> following:
> {code}
>   if (resources.isSome()) {
> // TODO(yifan): Support other resources (e.g. disk).
> Option cpus = resources.get().cpus();
> if (cpus.isSome()) {
>   uint64_t cpuShare =
> std::max((uint64_t) (CPU_SHARES_PER_CPU * cpus.get()), 
> MIN_CPU_SHARES);
>   argv.push_back("--cpu-shares");
>   argv.push_back(stringify(cpuShare));
> }
> Option mem = resources.get().mem();
> if (mem.isSome()) {
>   Bytes memLimit = std::max(mem.get(), MIN_MEMORY);
>   argv.push_back("--memory");
>   argv.push_back(stringify(memLimit.bytes()));
> }
>   }
> {code}
> I hope that the executor and the protobuf message could separate the resource 
> to the two parts: soft and hard. Then the user could set 2 levels resource 
> limits for the docker.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-6200) Hope mesos support soft and hard cpu/memory resource in the task

2016-09-18 Thread Lei Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-6200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Xu updated MESOS-6200:
--
Summary: Hope mesos support soft and hard cpu/memory resource in the task  
(was: Hop mesos support soft and hard cpu/memory resource in the task)

> Hope mesos support soft and hard cpu/memory resource in the task
> 
>
> Key: MESOS-6200
> URL: https://issues.apache.org/jira/browse/MESOS-6200
> Project: Mesos
>  Issue Type: Improvement
>  Components: cgroups, containerization, docker, scheduler api
>Affects Versions: 0.28.2
> Environment: CentOS 7 
> Kernel 3.10.0-327.28.3.el7.x86_64
> Mesos 0.28.2
> Docker 1.11.2
>Reporter: Lei Xu
>
> The Docker executor maybe could support soft/hard resource limit to enable 
> more flexible resources sharing among the applications.
> ||  || CPU || Memory ||
> | hard limit| --cpu-shares| --memory & --memory-swap|
> | soft limit| --cpu-period & --cpu-quota | --memory-reservation|
> And now the task protobuf message has only one resource struct that used to 
> describe the cgroup limit, and the docker executor handle is like the 
> following:
> {code}
>   if (resources.isSome()) {
> // TODO(yifan): Support other resources (e.g. disk).
> Option cpus = resources.get().cpus();
> if (cpus.isSome()) {
>   uint64_t cpuShare =
> std::max((uint64_t) (CPU_SHARES_PER_CPU * cpus.get()), 
> MIN_CPU_SHARES);
>   argv.push_back("--cpu-shares");
>   argv.push_back(stringify(cpuShare));
> }
> Option mem = resources.get().mem();
> if (mem.isSome()) {
>   Bytes memLimit = std::max(mem.get(), MIN_MEMORY);
>   argv.push_back("--memory");
>   argv.push_back(stringify(memLimit.bytes()));
> }
>   }
> {code}
> I hope that the executor and the protobuf message could separate the resource 
> to the two parts: soft and hard. Then the user could set 2 levels resource 
> limits for the docker.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-6200) Hop mesos support request/limit resource in the task

2016-09-18 Thread Lei Xu (JIRA)
Lei Xu created MESOS-6200:
-

 Summary: Hop mesos support request/limit resource in the task
 Key: MESOS-6200
 URL: https://issues.apache.org/jira/browse/MESOS-6200
 Project: Mesos
  Issue Type: Improvement
  Components: cgroups, containerization, docker, scheduler api
Affects Versions: 0.28.2
 Environment: CentOS 7 
Kernel 3.10.0-327.28.3.el7.x86_64
Mesos 0.28.2
Docker 1.11.2
Reporter: Lei Xu


The Docker executor maybe could support soft/hard resource limit to enable more 
flexible resources sharing among the applications.

||  || CPU || Memory ||
| hard limit| --cpu-shares| --memory & --memory-swap|
| soft limit| --cpu-period & --cpu-quota | --memory-reservation|

And now the task protobuf message has only one resource struct that used to 
describe the cgroup limit, and the docker executor handle is like the following:

{code}
  if (resources.isSome()) {
// TODO(yifan): Support other resources (e.g. disk).
Option cpus = resources.get().cpus();
if (cpus.isSome()) {
  uint64_t cpuShare =
std::max((uint64_t) (CPU_SHARES_PER_CPU * cpus.get()), MIN_CPU_SHARES);
  argv.push_back("--cpu-shares");
  argv.push_back(stringify(cpuShare));
}

Option mem = resources.get().mem();
if (mem.isSome()) {
  Bytes memLimit = std::max(mem.get(), MIN_MEMORY);
  argv.push_back("--memory");
  argv.push_back(stringify(memLimit.bytes()));
}
  }
{code}

I hope that the executor and the protobuf message could separate the resource 
to the two parts: soft and hard. Then the user could set 2 levels resource 
limits for the docker.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3325) Running mesos-slave@0.23 in a container causes slave to be lost after a restart

2016-08-08 Thread Lei Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15411550#comment-15411550
 ] 

Lei Xu commented on MESOS-3325:
---

Hi, We hit this issue months ago, mesos agent always read boot_id from host os 
and re-generate the slave id and register with master, I remember here is a 
issue to track this, but I forget the issue id, you can give a boot id to the 
agent to make sure the slave id do not change when restart.

> Running mesos-slave@0.23 in a container causes slave to be lost after a 
> restart
> ---
>
> Key: MESOS-3325
> URL: https://issues.apache.org/jira/browse/MESOS-3325
> Project: Mesos
>  Issue Type: Bug
>  Components: slave
>Affects Versions: 0.23.0
> Environment: CoreOS, Container, Docker
>Reporter: Chris Fortier
>Priority: Critical
>
> We are attempting to run mesos-slave 0.23 in a container. However it appears 
> that the mesos-slave agent registers as a new slave instead of 
> re-registering. This causes the formerly-launched tasks to continue running.
> systemd unit being used:
> ```
> [Unit]
> Description=MesosSlave
> After=docker.service dockercfg.service
> Requires=docker.service dockercfg.service
> [Service]
> Environment=MESOS_IMAGE=mesosphere/mesos-slave:0.23.0-1.0.ubuntu1404
> Environment=ZOOKEEPER=redacted
> User=core
> KillMode=process
> Restart=always
> RestartSec=20
> TimeoutStartSec=0
> ExecStartPre=-/usr/bin/docker kill mesos_slave
> ExecStartPre=-/usr/bin/docker rm mesos_slave
> ExecStartPre=/usr/bin/docker pull ${MESOS_IMAGE}
> ExecStart=/usr/bin/sh -c "sudo /usr/bin/docker run \
> --name=mesos_slave \
> --net=host \
> --pid=host \
> --privileged \
> -v /home/core/.dockercfg:/root/.dockercfg:ro \
> -v /sys:/sys \
> -v /usr/bin/docker:/usr/bin/docker:ro \
> -v /var/run/docker.sock:/var/run/docker.sock \
> -v /lib64/libdevmapper.so.1.02:/lib/libdevmapper.so.1.02:ro \
> -v /var/lib/mesos/slave:/var/lib/mesos/slave \
> ${MESOS_IMAGE} \
> --ip=`curl -s http://169.254.169.254/latest/meta-data/local-ipv4` \
> --attributes=zone:$(curl -s 
> http://169.254.169.254/latest/meta-data/placement/availability-zone)\;os:coreos
>  \
> --containerizers=docker,mesos \
> --executor_registration_timeout=10mins \
> --hostname=`curl -s 
> http://169.254.169.254/latest/meta-data/public-hostname` \
> --log_dir=/var/log/mesos \
> --master=zk://${ZOOKEEPER}/mesos \
> --work_dir=/var/lib/mesos/slave"
> ExecStop=/usr/bin/docker stop mesos_slave
> [Install]
> WantedBy=multi-user.target
> [X-Fleet]
> Global=true
> MachineMetadata=role=worker
> ```
> ps, yes I saw the coreos-setup repo was deprecated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5914) mesos-docker-executor initialize many threads

2016-07-27 Thread Lei Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15395209#comment-15395209
 ] 

Lei Xu commented on MESOS-5914:
---

Thanks :)

> mesos-docker-executor initialize many threads
> -
>
> Key: MESOS-5914
> URL: https://issues.apache.org/jira/browse/MESOS-5914
> Project: Mesos
>  Issue Type: Improvement
>  Components: containerization, libprocess
>Affects Versions: 0.28.2
> Environment: CentOS7
> Kernel 4.6.2-1.el7.elrepo.x86_64
> Docker 1.11-2
>Reporter: Lei Xu
>Priority: Minor
>
> I found mesos-docker-executor initialize many threads when running a docker 
> container task. Most of them seems not necessary. And I look up libprocess 
> github but found no way to reduce the threads number. 
> Is there any env variables to do with this ?
> {code}
>PID USER  PR  NIVIRTRESSHR S %CPU %MEM TIME+ COMMAND   
>  
>  16966 root  20   0 2639056  85108  80460 S  0.3  0.0   2:42.90 
> mesos-docker-ex
>  16979 root  20   0 2639056  85108  80460 S  0.3  0.0   2:43.59 
> mesos-docker-ex
>  17012 root  20   0 2639056  85108  80460 S  0.3  0.0   2:43.14 
> mesos-docker-ex
>  16954 root  20   0 2639056  85108  80460 S  0.0  0.0   0:00.03 
> mesos-docker-ex
>  16964 root  20   0 2639056  85108  80460 S  0.0  0.0   2:43.41 
> mesos-docker-ex
>  16965 root  20   0 2639056  85108  80460 S  0.0  0.0   2:42.95 
> mesos-docker-ex
>  16967 root  20   0 2639056  85108  80460 S  0.0  0.0   2:42.96 
> mesos-docker-ex
>  16968 root  20   0 2639056  85108  80460 S  0.0  0.0   2:43.17 
> mesos-docker-ex
>  16969 root  20   0 2639056  85108  80460 S  0.0  0.0   2:43.12 
> mesos-docker-ex
>  16970 root  20   0 2639056  85108  80460 S  0.0  0.0   2:43.37 
> mesos-docker-ex
>  16971 root  20   0 2639056  85108  80460 S  0.0  0.0   2:42.92 
> mesos-docker-ex
>  16972 root  20   0 2639056  85108  80460 S  0.0  0.0   2:42.70 
> mesos-docker-ex
>  16973 root  20   0 2639056  85108  80460 S  0.0  0.0   2:43.35 
> mesos-docker-ex
>  16974 root  20   0 2639056  85108  80460 S  0.0  0.0   2:43.59 
> mesos-docker-ex
>  16975 root  20   0 2639056  85108  80460 S  0.0  0.0   2:43.54 
> mesos-docker-ex
>  16976 root  20   0 2639056  85108  80460 S  0.0  0.0   2:43.47 
> mesos-docker-ex
>  16977 root  20   0 2639056  85108  80460 S  0.0  0.0   2:43.61 
> mesos-docker-ex
>  16980 root  20   0 2639056  85108  80460 S  0.0  0.0   2:43.44 
> mesos-docker-ex
>  16982 root  20   0 2639056  85108  80460 S  0.0  0.0   2:43.33 
> mesos-docker-ex
>  16984 root  20   0 2639056  85108  80460 S  0.0  0.0   2:43.06 
> mesos-docker-ex
>  16986 root  20   0 2639056  85108  80460 S  0.0  0.0   2:43.02 
> mesos-docker-ex
>  16988 root  20   0 2639056  85108  80460 S  0.0  0.0   2:42.80 
> mesos-docker-ex
>  16990 root  20   0 2639056  85108  80460 S  0.0  0.0   2:43.05 
> mesos-docker-ex
>  16992 root  20   0 2639056  85108  80460 S  0.0  0.0   2:43.40 
> mesos-docker-ex
>  16994 root  20   0 2639056  85108  80460 S  0.0  0.0   2:42.99 
> mesos-docker-ex
>  16996 root  20   0 2639056  85108  80460 S  0.0  0.0   2:42.93 
> mesos-docker-ex
>  16998 root  20   0 2639056  85108  80460 S  0.0  0.0   2:42.94 
> mesos-docker-ex
>  17000 root  20   0 2639056  85108  80460 S  

[jira] [Created] (MESOS-5914) mesos-docker-executor initialize many threads

2016-07-26 Thread Lei Xu (JIRA)
Lei Xu created MESOS-5914:
-

 Summary: mesos-docker-executor initialize many threads
 Key: MESOS-5914
 URL: https://issues.apache.org/jira/browse/MESOS-5914
 Project: Mesos
  Issue Type: Improvement
  Components: containerization, libprocess
Affects Versions: 0.28.2
 Environment: CentOS7
Kernel 4.6.2-1.el7.elrepo.x86_64
Docker 1.11-2
Reporter: Lei Xu
Priority: Minor


I found mesos-docker-executor initialize many threads when running a docker 
container task. Most of them seems not necessary. And I look up libprocess 
github but found no way to reduce the threads number. 

Is there any env variables to do with this ?

{code}
   PID USER  PR  NIVIRTRESSHR S %CPU %MEM TIME+ COMMAND 
   
 16966 root  20   0 2639056  85108  80460 S  0.3  0.0   2:42.90 
mesos-docker-ex
 16979 root  20   0 2639056  85108  80460 S  0.3  0.0   2:43.59 
mesos-docker-ex
 17012 root  20   0 2639056  85108  80460 S  0.3  0.0   2:43.14 
mesos-docker-ex
 16954 root  20   0 2639056  85108  80460 S  0.0  0.0   0:00.03 
mesos-docker-ex
 16964 root  20   0 2639056  85108  80460 S  0.0  0.0   2:43.41 
mesos-docker-ex
 16965 root  20   0 2639056  85108  80460 S  0.0  0.0   2:42.95 
mesos-docker-ex
 16967 root  20   0 2639056  85108  80460 S  0.0  0.0   2:42.96 
mesos-docker-ex
 16968 root  20   0 2639056  85108  80460 S  0.0  0.0   2:43.17 
mesos-docker-ex
 16969 root  20   0 2639056  85108  80460 S  0.0  0.0   2:43.12 
mesos-docker-ex
 16970 root  20   0 2639056  85108  80460 S  0.0  0.0   2:43.37 
mesos-docker-ex
 16971 root  20   0 2639056  85108  80460 S  0.0  0.0   2:42.92 
mesos-docker-ex
 16972 root  20   0 2639056  85108  80460 S  0.0  0.0   2:42.70 
mesos-docker-ex
 16973 root  20   0 2639056  85108  80460 S  0.0  0.0   2:43.35 
mesos-docker-ex
 16974 root  20   0 2639056  85108  80460 S  0.0  0.0   2:43.59 
mesos-docker-ex
 16975 root  20   0 2639056  85108  80460 S  0.0  0.0   2:43.54 
mesos-docker-ex
 16976 root  20   0 2639056  85108  80460 S  0.0  0.0   2:43.47 
mesos-docker-ex
 16977 root  20   0 2639056  85108  80460 S  0.0  0.0   2:43.61 
mesos-docker-ex
 16980 root  20   0 2639056  85108  80460 S  0.0  0.0   2:43.44 
mesos-docker-ex
 16982 root  20   0 2639056  85108  80460 S  0.0  0.0   2:43.33 
mesos-docker-ex
 16984 root  20   0 2639056  85108  80460 S  0.0  0.0   2:43.06 
mesos-docker-ex
 16986 root  20   0 2639056  85108  80460 S  0.0  0.0   2:43.02 
mesos-docker-ex
 16988 root  20   0 2639056  85108  80460 S  0.0  0.0   2:42.80 
mesos-docker-ex
 16990 root  20   0 2639056  85108  80460 S  0.0  0.0   2:43.05 
mesos-docker-ex
 16992 root  20   0 2639056  85108  80460 S  0.0  0.0   2:43.40 
mesos-docker-ex
 16994 root  20   0 2639056  85108  80460 S  0.0  0.0   2:42.99 
mesos-docker-ex
 16996 root  20   0 2639056  85108  80460 S  0.0  0.0   2:42.93 
mesos-docker-ex
 16998 root  20   0 2639056  85108  80460 S  0.0  0.0   2:42.94 
mesos-docker-ex
 17000 root  20   0 2639056  85108  80460 S  0.0  0.0   2:42.79 
mesos-docker-ex
 17002 root  20   0 2639056  85108  80460 S  0.0  0.0   2:43.28 
mesos-docker-ex
 17004 root  20   0 2639056  85108  80460 S  0.0  0.0   2:42.99 
mesos-docker-ex  
{code}



--
This message 

[jira] [Commented] (MESOS-5544) Support running Mesos agent in a Docker container.

2016-07-26 Thread Lei Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15395029#comment-15395029
 ] 

Lei Xu commented on MESOS-5544:
---

I've containerized mesos and running well without network namespace.

> Support running Mesos agent in a Docker container.
> --
>
> Key: MESOS-5544
> URL: https://issues.apache.org/jira/browse/MESOS-5544
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Jie Yu
>
> Currently, this does not work if one tries to use Mesos containerizer.
> The main problem is that we want to make sure the executor is not killed when 
> agent crashes. So we have to use --pid=host so that the agent is in the host 
> pid namespace.
> But that is not sufficient, Docker daemon will put agent into all cgroups 
> available on the host. We need to make sure we migrate the executor pid out 
> of those cgroups so that when agent crashes, executors are not killed.
> Also, when start the agent container, volumes need to be setup properly so 
> that any mounts under agent's work_dir will be propagate back to the host 
> mount table. This is to make sure we can recover those mounts after agent 
> restarts. This is also true for those mounts that are needed by some isolator 
> (e.g., network/cni isolator).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-5368) Consider introducing persistent agent ID

2016-07-25 Thread Lei Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-5368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15391450#comment-15391450
 ] 

Lei Xu commented on MESOS-5368:
---

+1

> Consider introducing persistent agent ID
> 
>
> Key: MESOS-5368
> URL: https://issues.apache.org/jira/browse/MESOS-5368
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Neil Conway
>Assignee: Abhishek Dasgupta
>  Labels: mesosphere
>
> Currently, agent IDs identify a single "session" by an agent: that is, an 
> agent receives an agent ID when it registers with the master; it reuses that 
> agent ID if it disconnects and successfully reregisters; if the agent shuts 
> down and restarts, it registers anew and receives a new agent ID.
> It would be convenient to have a "persistent agent ID" that remains the same 
> for the duration of a given agent {{work_dir}}. This would mean that a given 
> persistent volume would not migrate between different agent IDs over time, 
> for example (see MESOS-4894). If we supported permanently removing an agent 
> from the cluster (i.e., the {{work_dir}} and any volumes used by the agent 
> will never be reused), we could use the persistent agent ID to report which 
> agent has been removed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4297) Executor does not shutdown when framework teardown.

2016-01-06 Thread Lei Xu (JIRA)
Lei Xu created MESOS-4297:
-

 Summary: Executor does not shutdown when framework teardown.
 Key: MESOS-4297
 URL: https://issues.apache.org/jira/browse/MESOS-4297
 Project: Mesos
  Issue Type: Bug
  Components: framework
Affects Versions: 0.25.0
 Environment: Marathon 0.11.0
Mesos 0.25.0
Spark 1.5.2
Reporter: Lei Xu
Priority: Critical


We found a problem when teardown a Spark framework on Mesos, the executor could 
not exit and still running.

{code}
root 48548 48539  2  2015 ?04:28:11 /home/q/java/default/bin/java 
-cp 
/home/q/mesos/data/slaves/4d0f0fc7-99f4-4a9a-b5d5-6c25affcb4f1-S127/frameworks/20151228-163100-504125962-5050-31081-0016/executors/3/runs/ca324f08-5be9-4457-a2a7-56f2605d6027/spark-1.5.2-bin-2.2.0/conf/:/home/q/mesos/data/slaves/4d0f0fc7-99f4-4a9a-b5d5-6c25affcb4f1-S127/frameworks/20151228-163100-504125962-5050-31081-0016/executors/3/runs/ca324f08-5be9-4457-a2a7-56f2605d6027/spark-1.5.2-bin-2.2.0/lib/spark-assembly-1.5.2-hadoop2.2.0.jar
 -Xms8192m -Xmx8192m org.apache.spark.executor.CoarseGrainedExecutorBackend 
--driver-url 
akka.tcp://sparkDriver@10.90.27.71:47938/user/CoarseGrainedScheduler 
--executor-id 4d0f0fc7-99f4-4a9a-b5d5-6c25affcb4f1-S127/3 --hostname 
l-qosslave26.ops.cn2.qunar.com --cores 2 --app-id 
20151228-163100-504125962-5050-31081-0016
root 48644 48348  0  2015 ?00:00:00 sh -c cd spark-1*;  
./bin/spark-class org.apache.spark.executor.CoarseGrainedExecutorBackend 
--driver-url 
akka.tcp://sparkDriver@10.90.27.71:47938/user/CoarseGrainedScheduler 
--executor-id 4d0f0fc7-99f4-4a9a-b5d5-6c25affcb4f1-S127/5 --hostname 
l-qosslave26.ops.cn2.qunar.com --cores 2 --app-id 
20151228-163100-504125962-5050-31081-0016
root 48645 48644  2  2015 ?04:28:45 /home/q/java/default/bin/java 
-cp 
/home/q/mesos/data/slaves/4d0f0fc7-99f4-4a9a-b5d5-6c25affcb4f1-S127/frameworks/20151228-163100-504125962-5050-31081-0016/executors/5/runs/851073c4-d225-426b-b1b5-3d294eb76f8e/spark-1.5.2-bin-2.2.0/conf/:/home/q/mesos/data/slaves/4d0f0fc7-99f4-4a9a-b5d5-6c25affcb4f1-S127/frameworks/20151228-163100-504125962-5050-31081-0016/executors/5/runs/851073c4-d225-426b-b1b5-3d294eb76f8e/spark-1.5.2-bin-2.2.0/lib/spark-assembly-1.5.2-hadoop2.2.0.jar
 -Xms8192m -Xmx8192m org.apache.spark.executor.CoarseGrainedExecutorBackend 
--driver-url 
akka.tcp://sparkDriver@10.90.27.71:47938/user/CoarseGrainedScheduler 
--executor-id 4d0f0fc7-99f4-4a9a-b5d5-6c25affcb4f1-S127/5 --hostname 
l-qosslave26.ops.cn2.qunar.com --cores 2 --app-id 
20151228-163100-504125962-5050-31081-0016
{code}

This framework {{20151228-163100-504125962-5050-31081-0016}} has already 
teardown a few days ago, And could not find in "Frameworks" page via webui. But 
in the slave page, I found it still registered with slave node and run some 
executors.

And I try to use REST API to kill the framework again, it returns {{No 
framework found with specified ID}}.

At last I killed the Spark task and mesos executor, there is no new task 
started by framework, but it still on this slave and does not exit.

{code}

Frameworks
ID  UserNameActive TasksCPUs (Used / Allocated) Mem 
(Used / Allocated)
…5050-31081-0016
rootwireless-m_invocation_kylin 0   / 0.6   / 192 MB



Executors
ID  NameSource  Active TasksQueued TasksCPUs (Used / Allocated) 
Mem (Used / Allocated)  
5   Command Executor (Task: 5) (Command: sh -c 'cd spark-1*;...')   5   
0   0   / 0.1   / 32 MB Sandbox
4   Command Executor (Task: 4) (Command: sh -c 'cd spark-1*;...')   4   
0   0   / 0.1   / 32 MB Sandbox
3   Command Executor (Task: 3) (Command: sh -c 'cd spark-1*;...')   3   
0   0   / 0.1   / 32 MB Sandbox
2   Command Executor (Task: 2) (Command: sh -c 'cd spark-1*;...')   2   
0   0   / 0.1   / 32 MB Sandbox
1   Command Executor (Task: 1) (Command: sh -c 'cd spark-1*;...')   1   
0   0   / 0.1   / 32 MB Sandbox
0   Command Executor (Task: 0) (Command: sh -c 'cd spark-1*;...')   0   
0   0   / 0.1   / 32 MB Sandbox 
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4297) Executor does not shutdown when framework teardown.

2016-01-06 Thread Lei Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15085536#comment-15085536
 ] 

Lei Xu commented on MESOS-4297:
---

Here is some master logs when I kill task.

{code}
./mesos-master.WARNING:W0106 19:47:12.636579  1548 master.cpp:4408] Ignoring 
status update TASK_FAILED (UUID: e897f8b4-358a-4bd3-a570-c5cedd5cd822) for task 
driver-20151230225518-0013 of framework 
20151228-163100-504125962-5050-31081-0003 from slave 
4d0f0fc7-99f4-4a9a-b5d5-6c25affcb4f1-S121 at slave(1)@10.90.27.71:5051 
(l-qosslave20.ops.cn2.qunar.com) because the framework is unknown
./mesos-master.WARNING:W0106 19:47:52.453431  1547 master.cpp:4408] Ignoring 
status update TASK_FAILED (UUID: e897f8b4-358a-4bd3-a570-c5cedd5cd822) for task 
driver-20151230225518-0013 of framework 
20151228-163100-504125962-5050-31081-0003 from slave 
4d0f0fc7-99f4-4a9a-b5d5-6c25affcb4f1-S121 at slave(1)@10.90.27.71:5051 
(l-qosslave20.ops.cn2.qunar.com) because the framework is unknown
./mesos-master.WARNING:W0106 19:49:12.115389  1550 master.cpp:4408] Ignoring 
status update TASK_FAILED (UUID: e897f8b4-358a-4bd3-a570-c5cedd5cd822) for task 
driver-20151230225518-0013 of framework 
20151228-163100-504125962-5050-31081-0003 from slave 
4d0f0fc7-99f4-4a9a-b5d5-6c25affcb4f1-S121 at slave(1)@10.90.27.71:5051 
(l-qosslave20.ops.cn2.qunar.com) because the framework is unknown
./mesos-master.WARNING:W0106 19:51:52.144099  1543 master.cpp:4408] Ignoring 
status update TASK_FAILED (UUID: e897f8b4-358a-4bd3-a570-c5cedd5cd822) for task 
driver-20151230225518-0013 of framework 
20151228-163100-504125962-5050-31081-0003 from slave 
4d0f0fc7-99f4-4a9a-b5d5-6c25affcb4f1-S121 at slave(1)@10.90.27.71:5051 
(l-qosslave20.ops.cn2.qunar.com) because the framework is unknown
./mesos-master.WARNING:W0106 19:52:39.169888  1549 master.cpp:4408] Ignoring 
status update TASK_FAILED (UUID: ab05e568-f04f-42dc-bdbe-40e19b421c95) for task 
driver-20151230223633-0011 of framework 
20151228-163100-504125962-5050-31081-0003 from slave 
4d0f0fc7-99f4-4a9a-b5d5-6c25affcb4f1-S126 at slave(1)@10.90.27.76:5051 
(l-qosslave25.ops.cn2.qunar.com) because the framework is unknown
./mesos-master.WARNING:W0106 19:57:12.453138  1549 master.cpp:4408] Ignoring 
status update TASK_FAILED (UUID: e897f8b4-358a-4bd3-a570-c5cedd5cd822) for task 
driver-20151230225518-0013 of framework 
20151228-163100-504125962-5050-31081-0003 from slave 
4d0f0fc7-99f4-4a9a-b5d5-6c25affcb4f1-S121 at slave(1)@10.90.27.71:5051 
(l-qosslave20.ops.cn2.qunar.com) because the framework is unknown
./mesos-master.WARNING:W0106 20:02:39.168820  1545 master.cpp:4408] Ignoring 
status update TASK_FAILED (UUID: ab05e568-f04f-42dc-bdbe-40e19b421c95) for task 
driver-20151230223633-0011 of framework 
20151228-163100-504125962-5050-31081-0003 from slave 
4d0f0fc7-99f4-4a9a-b5d5-6c25affcb4f1-S126 at slave(1)@10.90.27.76:5051 
(l-qosslave25.ops.cn2.qunar.com) because the framework is unknown
./mesos-master.WARNING:W0106 20:07:12.110839  1548 master.cpp:4408] Ignoring 
status update TASK_FAILED (UUID: e897f8b4-358a-4bd3-a570-c5cedd5cd822) for task 
driver-20151230225518-0013 of framework 
20151228-163100-504125962-5050-31081-0003 from slave 
4d0f0fc7-99f4-4a9a-b5d5-6c25affcb4f1-S121 at slave(1)@10.90.27.71:5051 
(l-qosslave20.ops.cn2.qunar.com) because the framework is unknown
./mesos-master.WARNING:W0106 20:12:39.215056  1543 master.cpp:4408] Ignoring 
status update TASK_FAILED (UUID: ab05e568-f04f-42dc-bdbe-40e19b421c95) for task 
driver-20151230223633-0011 of framework 
20151228-163100-504125962-5050-31081-0003 from slave 
4d0f0fc7-99f4-4a9a-b5d5-6c25affcb4f1-S126 at slave(1)@10.90.27.76:5051 
(l-qosslave25.ops.cn2.qunar.com) because the framework is unknown
{code}

> Executor does not shutdown when framework teardown.
> ---
>
> Key: MESOS-4297
> URL: https://issues.apache.org/jira/browse/MESOS-4297
> Project: Mesos
>  Issue Type: Bug
>  Components: framework
>Affects Versions: 0.25.0
> Environment: Marathon 0.11.0
> Mesos 0.25.0
> Spark 1.5.2
>Reporter: Lei Xu
>Priority: Critical
>
> We found a problem when teardown a Spark framework on Mesos, the executor 
> could not exit and still running.
> {code}
> root 48548 48539  2  2015 ?04:28:11 /home/q/java/default/bin/java 
> -cp 
> /home/q/mesos/data/slaves/4d0f0fc7-99f4-4a9a-b5d5-6c25affcb4f1-S127/frameworks/20151228-163100-504125962-5050-31081-0016/executors/3/runs/ca324f08-5be9-4457-a2a7-56f2605d6027/spark-1.5.2-bin-2.2.0/conf/:/home/q/mesos/data/slaves/4d0f0fc7-99f4-4a9a-b5d5-6c25affcb4f1-S127/frameworks/20151228-163100-504125962-5050-31081-0016/executors/3/runs/ca324f08-5be9-4457-a2a7-56f2605d6027/spark-1.5.2-bin-2.2.0/lib/spark-assembly-1.5.2-hadoop2.2.0.jar
>  -Xms8192m -Xmx8192m 

[jira] [Commented] (MESOS-4299) Slave lives in two different cluster at the same time with different slave id

2016-01-06 Thread Lei Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15085644#comment-15085644
 ] 

Lei Xu commented on MESOS-4299:
---

{{master/slaves}} response from Cluster B:

{code}
{
  "slaves": [
{
  "active": true,
  "attributes": {
"apps": "logstash",
"colo": "cn5",
"type": "prod"
  },
  "hostname": "l-bu128g9-10k10.ops.cn2.qunar.com",
  "id": "3e7ba6b1-29fd-44e8-9be2-f72896054ac6-S5",
  "pid": "slave(1)@10.90.5.23:5051",
  "registered_time": 1451990379.49813,
  "reregistered_time": 1452093251.39516,
  "resources": {
"cpus": 32,
"disk": 2728919,
"mem": 128126,
"ports": "[8100-1, 31000-32000]"
  }
},
{code}

> Slave lives in two different cluster at the same time with different slave id
> -
>
> Key: MESOS-4299
> URL: https://issues.apache.org/jira/browse/MESOS-4299
> Project: Mesos
>  Issue Type: Bug
>  Components: master, webui
>Affects Versions: 0.25.0
> Environment: Mesos 0.25.0
>Reporter: Lei Xu
>
> I've migrated some nodes from Cluster A to B, and today I found these nodes 
> lives both in Cluster A and B, and the here is the {{/master/slaves}} 
> response:
> {code}
> {
>   "slaves": [
> {
>   "active": false,
>   "attributes": {
> "apps": "logstash",
> "colo": "cn5",
> "type": "prod"
>   },
>   "hostname": "l-bu128g5-10k10.ops.cn2.qunar.com",
>   "id": "3e7ba6b1-29fd-44e8-9be2-f72896054ac6-S2",
>   "offered_resources": {
> "cpus": 0,
> "disk": 0,
> "mem": 0
>   },
>   "pid": "slave(1)@10.90.5.19:5051",
>   "registered_time": 1451988622.66323,
>   "reserved_resources": {},
>   "resources": {
> "cpus": 32.0,
> "disk": 2728919.0,
> "mem": 128126.0,
> "ports": "[8100-1, 31000-32000]"
>   },
>   "unreserved_resources": {
> "cpus": 32.0,
> "disk": 2728919.0,
> "mem": 128126.0,
> "ports": "[8100-1, 31000-32000]"
>   },
>   "used_resources": {
> "cpus": 0,
> "disk": 0,
> "mem": 0
>   }
> },
> .
> {code}
> And the following is mesos slave logs:
> {quote}
> I0105 18:36:22.683724  6452 slave.cpp:2248] Updated checkpointed resources 
> from  to
> I0105 18:37:09.900497  6459 slave.cpp:3926] Current disk usage 0.06%. Max 
> allowed age: 1.798706758587755days
> I0105 18:37:22.678374  6453 slave.cpp:3146] Master marked the slave as 
> disconnected but the slave considers itself registered! Forcing 
> re-registration.
> I0105 18:37:22.678699  6453 slave.cpp:694] Re-detecting master
> I0105 18:37:22.678715  6471 status_update_manager.cpp:176] Pausing sending 
> status updates
> I0105 18:37:22.678753  6453 slave.cpp:741] Detecting new master
> I0105 18:37:22.678977  6456 status_update_manager.cpp:176] Pausing sending 
> status updates
> I0105 18:37:22.679047  6455 slave.cpp:705] New master detected at 
> master@10.88.169.195:5050
> I0105 18:37:22.679108  6455 slave.cpp:768] Authenticating with master 
> master@10.88.169.195:5050
> I0105 18:37:22.679136  6455 slave.cpp:773] Using default CRAM-MD5 
> authenticatee
> I0105 18:37:22.679239  6455 slave.cpp:741] Detecting new master
> I0105 18:37:22.679354  6464 authenticatee.cpp:115] Creating new client SASL 
> connection
> I0105 18:37:22.680883  6461 authenticatee.cpp:206] Received SASL 
> authentication mechanisms: CRAM-MD5
> I0105 18:37:22.680946  6461 authenticatee.cpp:232] Attempting to authenticate 
> with mechanism 'CRAM-MD5'
> I0105 18:37:22.681759  6455 authenticatee.cpp:252] Received SASL 
> authentication step
> I0105 18:37:22.682874  6454 authenticatee.cpp:292] Authentication success
> I0105 18:37:22.682986  6441 slave.cpp:836] Successfully authenticated with 
> master master@10.88.169.195:5050
> I0105 18:37:22.684303  6454 slave.cpp:980] Re-registered with master 
> master@10.88.169.195:5050
> I0105 18:37:22.684455  6454 slave.cpp:1016] Forwarding total oversubscribed 
> resources
> I0105 18:37:22.684471  6468 status_update_manager.cpp:183] Resuming sending 
> status updates
> I0105 18:37:22.684649  6454 slave.cpp:2152] Updating framework 
> 20150610-204949-3299432458-5050-25057- pid to 
> scheduler-1bef8172-5068-44c6-93f5-e97a3910ed79@10.88.169.195:35708
> I0105 18:37:22.685025  6452 status_update_manager.cpp:183] Resuming sending 
> status updates
> I0105 18:37:22.685117  6454 slave.cpp:2248] Updated checkpointed resources 
> from  to
> I0105 18:38:09.901587  6464 slave.cpp:3926] Current disk usage 0.06%. Max 
> allowed age: 1.798706755730266days
> I0105 18:38:22.679468  6451 slave.cpp:3146] Master marked the slave as 
> disconnected but the slave considers itself registered! 

[jira] [Commented] (MESOS-4299) Slave lives in two different cluster at the same time with different slave id

2016-01-06 Thread Lei Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15085704#comment-15085704
 ] 

Lei Xu commented on MESOS-4299:
---

update:

I stop the slave and remove all files in data_dir path, and restart the slave, 
it still shows the same logs above. How to clear up a slave node and join the 
cluster as a new one ?

> Slave lives in two different cluster at the same time with different slave id
> -
>
> Key: MESOS-4299
> URL: https://issues.apache.org/jira/browse/MESOS-4299
> Project: Mesos
>  Issue Type: Bug
>  Components: master, webui
>Affects Versions: 0.25.0
> Environment: Mesos 0.25.0
>Reporter: Lei Xu
>
> I've migrated some nodes from Cluster A to B, and today I found these nodes 
> lives both in Cluster A and B, and the here is the {{/master/slaves}} 
> response:
> {code}
> {
>   "slaves": [
> {
>   "active": false,
>   "attributes": {
> "apps": "logstash",
> "colo": "cn5",
> "type": "prod"
>   },
>   "hostname": "l-bu128g5-10k10.ops.cn2.qunar.com",
>   "id": "3e7ba6b1-29fd-44e8-9be2-f72896054ac6-S2",
>   "offered_resources": {
> "cpus": 0,
> "disk": 0,
> "mem": 0
>   },
>   "pid": "slave(1)@10.90.5.19:5051",
>   "registered_time": 1451988622.66323,
>   "reserved_resources": {},
>   "resources": {
> "cpus": 32.0,
> "disk": 2728919.0,
> "mem": 128126.0,
> "ports": "[8100-1, 31000-32000]"
>   },
>   "unreserved_resources": {
> "cpus": 32.0,
> "disk": 2728919.0,
> "mem": 128126.0,
> "ports": "[8100-1, 31000-32000]"
>   },
>   "used_resources": {
> "cpus": 0,
> "disk": 0,
> "mem": 0
>   }
> },
> .
> {code}
> And the following is mesos slave logs:
> {quote}
> I0105 18:36:22.683724  6452 slave.cpp:2248] Updated checkpointed resources 
> from  to
> I0105 18:37:09.900497  6459 slave.cpp:3926] Current disk usage 0.06%. Max 
> allowed age: 1.798706758587755days
> I0105 18:37:22.678374  6453 slave.cpp:3146] Master marked the slave as 
> disconnected but the slave considers itself registered! Forcing 
> re-registration.
> I0105 18:37:22.678699  6453 slave.cpp:694] Re-detecting master
> I0105 18:37:22.678715  6471 status_update_manager.cpp:176] Pausing sending 
> status updates
> I0105 18:37:22.678753  6453 slave.cpp:741] Detecting new master
> I0105 18:37:22.678977  6456 status_update_manager.cpp:176] Pausing sending 
> status updates
> I0105 18:37:22.679047  6455 slave.cpp:705] New master detected at 
> master@10.88.169.195:5050
> I0105 18:37:22.679108  6455 slave.cpp:768] Authenticating with master 
> master@10.88.169.195:5050
> I0105 18:37:22.679136  6455 slave.cpp:773] Using default CRAM-MD5 
> authenticatee
> I0105 18:37:22.679239  6455 slave.cpp:741] Detecting new master
> I0105 18:37:22.679354  6464 authenticatee.cpp:115] Creating new client SASL 
> connection
> I0105 18:37:22.680883  6461 authenticatee.cpp:206] Received SASL 
> authentication mechanisms: CRAM-MD5
> I0105 18:37:22.680946  6461 authenticatee.cpp:232] Attempting to authenticate 
> with mechanism 'CRAM-MD5'
> I0105 18:37:22.681759  6455 authenticatee.cpp:252] Received SASL 
> authentication step
> I0105 18:37:22.682874  6454 authenticatee.cpp:292] Authentication success
> I0105 18:37:22.682986  6441 slave.cpp:836] Successfully authenticated with 
> master master@10.88.169.195:5050
> I0105 18:37:22.684303  6454 slave.cpp:980] Re-registered with master 
> master@10.88.169.195:5050
> I0105 18:37:22.684455  6454 slave.cpp:1016] Forwarding total oversubscribed 
> resources
> I0105 18:37:22.684471  6468 status_update_manager.cpp:183] Resuming sending 
> status updates
> I0105 18:37:22.684649  6454 slave.cpp:2152] Updating framework 
> 20150610-204949-3299432458-5050-25057- pid to 
> scheduler-1bef8172-5068-44c6-93f5-e97a3910ed79@10.88.169.195:35708
> I0105 18:37:22.685025  6452 status_update_manager.cpp:183] Resuming sending 
> status updates
> I0105 18:37:22.685117  6454 slave.cpp:2248] Updated checkpointed resources 
> from  to
> I0105 18:38:09.901587  6464 slave.cpp:3926] Current disk usage 0.06%. Max 
> allowed age: 1.798706755730266days
> I0105 18:38:22.679468  6451 slave.cpp:3146] Master marked the slave as 
> disconnected but the slave considers itself registered! Forcing 
> re-registration.
> I0105 18:38:22.679739  6451 slave.cpp:694] Re-detecting master
> I0105 18:38:22.679754  6453 status_update_manager.cpp:176] Pausing sending 
> status updates
> I0105 18:38:22.679785  6451 slave.cpp:741] Detecting new master
> I0105 18:38:22.680054  6461 slave.cpp:705] New master detected at 
> master@10.88.169.195:5050
> I0105 18:38:22.680106  6470 

[jira] [Created] (MESOS-4299) Slave lives in two different cluster at the same time with different slave id

2016-01-06 Thread Lei Xu (JIRA)
Lei Xu created MESOS-4299:
-

 Summary: Slave lives in two different cluster at the same time 
with different slave id
 Key: MESOS-4299
 URL: https://issues.apache.org/jira/browse/MESOS-4299
 Project: Mesos
  Issue Type: Bug
  Components: master, webui
Affects Versions: 0.25.0
 Environment: Mesos 0.25.0
Reporter: Lei Xu


I've migrated some nodes from Cluster A to B, and today I found these nodes 
lives both in Cluster A and B, and the here is the {{/master/slaves}} response:

{code}
{
  "slaves": [
{
  "active": false,
  "attributes": {
"apps": "logstash",
"colo": "cn5",
"type": "prod"
  },
  "hostname": "l-bu128g5-10k10.ops.cn2.qunar.com",
  "id": "3e7ba6b1-29fd-44e8-9be2-f72896054ac6-S2",
  "offered_resources": {
"cpus": 0,
"disk": 0,
"mem": 0
  },
  "pid": "slave(1)@10.90.5.19:5051",
  "registered_time": 1451988622.66323,
  "reserved_resources": {},
  "resources": {
"cpus": 32.0,
"disk": 2728919.0,
"mem": 128126.0,
"ports": "[8100-1, 31000-32000]"
  },
  "unreserved_resources": {
"cpus": 32.0,
"disk": 2728919.0,
"mem": 128126.0,
"ports": "[8100-1, 31000-32000]"
  },
  "used_resources": {
"cpus": 0,
"disk": 0,
"mem": 0
  }
},
.
{code}

And the following is mesos slave logs:

{quote}
I0105 18:36:22.683724  6452 slave.cpp:2248] Updated checkpointed resources from 
 to
I0105 18:37:09.900497  6459 slave.cpp:3926] Current disk usage 0.06%. Max 
allowed age: 1.798706758587755days
I0105 18:37:22.678374  6453 slave.cpp:3146] Master marked the slave as 
disconnected but the slave considers itself registered! Forcing re-registration.
I0105 18:37:22.678699  6453 slave.cpp:694] Re-detecting master
I0105 18:37:22.678715  6471 status_update_manager.cpp:176] Pausing sending 
status updates
I0105 18:37:22.678753  6453 slave.cpp:741] Detecting new master
I0105 18:37:22.678977  6456 status_update_manager.cpp:176] Pausing sending 
status updates
I0105 18:37:22.679047  6455 slave.cpp:705] New master detected at 
master@10.88.169.195:5050
I0105 18:37:22.679108  6455 slave.cpp:768] Authenticating with master 
master@10.88.169.195:5050
I0105 18:37:22.679136  6455 slave.cpp:773] Using default CRAM-MD5 authenticatee
I0105 18:37:22.679239  6455 slave.cpp:741] Detecting new master
I0105 18:37:22.679354  6464 authenticatee.cpp:115] Creating new client SASL 
connection
I0105 18:37:22.680883  6461 authenticatee.cpp:206] Received SASL authentication 
mechanisms: CRAM-MD5
I0105 18:37:22.680946  6461 authenticatee.cpp:232] Attempting to authenticate 
with mechanism 'CRAM-MD5'
I0105 18:37:22.681759  6455 authenticatee.cpp:252] Received SASL authentication 
step
I0105 18:37:22.682874  6454 authenticatee.cpp:292] Authentication success
I0105 18:37:22.682986  6441 slave.cpp:836] Successfully authenticated with 
master master@10.88.169.195:5050
I0105 18:37:22.684303  6454 slave.cpp:980] Re-registered with master 
master@10.88.169.195:5050
I0105 18:37:22.684455  6454 slave.cpp:1016] Forwarding total oversubscribed 
resources
I0105 18:37:22.684471  6468 status_update_manager.cpp:183] Resuming sending 
status updates
I0105 18:37:22.684649  6454 slave.cpp:2152] Updating framework 
20150610-204949-3299432458-5050-25057- pid to 
scheduler-1bef8172-5068-44c6-93f5-e97a3910ed79@10.88.169.195:35708
I0105 18:37:22.685025  6452 status_update_manager.cpp:183] Resuming sending 
status updates
I0105 18:37:22.685117  6454 slave.cpp:2248] Updated checkpointed resources from 
 to
I0105 18:38:09.901587  6464 slave.cpp:3926] Current disk usage 0.06%. Max 
allowed age: 1.798706755730266days
I0105 18:38:22.679468  6451 slave.cpp:3146] Master marked the slave as 
disconnected but the slave considers itself registered! Forcing re-registration.
I0105 18:38:22.679739  6451 slave.cpp:694] Re-detecting master
I0105 18:38:22.679754  6453 status_update_manager.cpp:176] Pausing sending 
status updates
I0105 18:38:22.679785  6451 slave.cpp:741] Detecting new master
I0105 18:38:22.680054  6461 slave.cpp:705] New master detected at 
master@10.88.169.195:5050
I0105 18:38:22.680106  6470 status_update_manager.cpp:176] Pausing sending 
status updates
I0105 18:38:22.680107  6461 slave.cpp:768] Authenticating with master 
master@10.88.169.195:5050
I0105 18:38:22.680197  6461 slave.cpp:773] Using default CRAM-MD5 authenticatee
I0105 18:38:22.680271  6461 slave.cpp:741] Detecting new master

.

W0105 19:05:38.207882  6450 slave.cpp:1973] Ignoring shutdown framework message 
for 3e7ba6b1-29fd-44e8-9be2-f72896054ac6-0116 from master@10.90.12.29:5050 
because it is not from the registered master (master@10.88.169.195:5050)
W0106 09:12:38.666767  6468 slave.cpp:1973] Ignoring shutdown framework message 
for 

[jira] [Created] (MESOS-4182) Add Qunar to the "Powered by" page.

2015-12-15 Thread Lei Xu (JIRA)
Lei Xu created MESOS-4182:
-

 Summary: Add Qunar to the "Powered by" page.
 Key: MESOS-4182
 URL: https://issues.apache.org/jira/browse/MESOS-4182
 Project: Mesos
  Issue Type: Wish
  Components: documentation
Reporter: Lei Xu
Priority: Trivial


Hi,

We use Mesos and Marathon to support the log analyize programs, such as ELK, 
Spark. It is a great resource manager to hold thousands of applications to deal 
with 6~8 billion lines text per day, thanks very much! 

https://github.com/apache/mesos/pull/83

We'd love if you could merge it. :)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3821) DOCKER_HOST does not work well with --executor_environment_variables

2015-11-02 Thread Lei Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14986822#comment-14986822
 ] 

Lei Xu commented on MESOS-3821:
---

Cool :)

> DOCKER_HOST does not work well with --executor_environment_variables
> 
>
> Key: MESOS-3821
> URL: https://issues.apache.org/jira/browse/MESOS-3821
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 0.25.0
> Environment: Docker 1.7.1
> Mesos 0.25.0
>Reporter: Lei Xu
>Assignee: haosdent
>
> Hi guys,
> I found that DOCKER_HOST does not work now if I set 
> bq. --executor_environment_variables={"DOCKER_HOST":"localhost:2377"}
> but the docker executor always append 
> bq. -H unix:///var/run/docker.sock 
> on each command, it will overwrite the DOCKER_HOST in fact.
> I think it is too strict now, and I could not disable it via some command 
> flags.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3821) DOCKER_HOST does not work well with --executor_environment_variables

2015-11-02 Thread Lei Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14986755#comment-14986755
 ] 

Lei Xu commented on MESOS-3821:
---

Hi [~haosd...@gmail.com], you're right, I hope that user can specify protocol 
and schema in the --docker_socket, like:

--docker_socket unix:///var/run/docker.sock or --docker_socket 
tcp://127.0.0.1:2376 

> DOCKER_HOST does not work well with --executor_environment_variables
> 
>
> Key: MESOS-3821
> URL: https://issues.apache.org/jira/browse/MESOS-3821
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 0.25.0
> Environment: Docker 1.7.1
> Mesos 0.25.0
>Reporter: Lei Xu
>
> Hi guys,
> I found that DOCKER_HOST does not work now if I set 
> bq. --executor_environment_variables={"DOCKER_HOST":"localhost:2377"}
> but the docker executor always append 
> bq. -H unix:///var/run/docker.sock 
> on each command, it will overwrite the DOCKER_HOST in fact.
> I think it is too strict now, and I could not disable it via some command 
> flags.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3821) DOCKER_HOST does not work well with --executor_environment_variables

2015-11-02 Thread Lei Xu (JIRA)
Lei Xu created MESOS-3821:
-

 Summary: DOCKER_HOST does not work well with 
--executor_environment_variables
 Key: MESOS-3821
 URL: https://issues.apache.org/jira/browse/MESOS-3821
 Project: Mesos
  Issue Type: Bug
  Components: docker
Affects Versions: 0.25.0
 Environment: Docker 1.7.1
Mesos 0.25.0
Reporter: Lei Xu


Hi guys,

I found that DOCKER_HOST does not work now if I set 
bq. --executor_environment_variables={"DOCKER_HOST":"localhost:2377"}

but the docker executor always append 

bq. -H unix:///var/run/docker.sock 

on each command, it will overwrite the DOCKER_HOST in fact.

I think it is too strict now, and I could not disable it via some command flags.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)