Re: make check crashes

2016-06-06 Thread Alexander Rojas
Hi Dave,

Looking at the output you give, A big thing pop into my mind, why are two comas 
between different entries, but then I realized that your float numbers are 
being printed as `7,2932352` instead of `7.2932352` (notice the decimal 
separator). My feeling is that our JSON generator is not locale independent.

@Dave: can you please tell us about the locale in your machine?
@mpark: Can you take a look into it, since you know about the JSON serializer?


> On 29 May 2016, at 10:47, Dave Webb  wrote:
> 
> Hi, 
> I tried to setup Mesos on my Ubuntu 15.10 machine using the exact commands 
> shown on http://mesos.apache.org/gettingstarted/.
> However, "make check" always crashes at the Testcase 
> "MasterAuthorizationTest.SlaveRemoved". Up until this point, all previous 
> tests run successfully.
> 
> This is the error message I got:
> 
> ...
> [--] 13 tests from MasterAuthorizationTest
> [ RUN  ] MasterAuthorizationTest.AuthorizedTask
> [   OK ] MasterAuthorizationTest.AuthorizedTask (115 ms)
> [ RUN  ] MasterAuthorizationTest.UnauthorizedTask
> [   OK ] MasterAuthorizationTest.UnauthorizedTask (116 ms)
> [ RUN  ] MasterAuthorizationTest.KillTask
> [   OK ] MasterAuthorizationTest.KillTask (112 ms)
> [ RUN  ] MasterAuthorizationTest.SlaveRemoved
> F0529 10:39:56.559432 19749 utils.cpp:55] CHECK_SOME(parse): syntax error at 
> line 1 near: 
> ,"frameworks\/test-principal\/messages_processed":1,,"frameworks\/test-principal\/messages_received":1,,"master\/cpus_percent":0,,"master\/cpus_revocable_percent":0,,"master\/cpus_revocable_total":0,,"master\/cpus_revocable_used":0,,"master\/cpus_total":0,,"master\/cpus_used":0,,"master\/disk_percent":0,,"master\/disk_revocable_percent":0,,"master\/disk_revocable_total":0,,"master\/disk_revocable_used":0,,"master\/disk_total":0,,"master\/disk_used":0,,"master\/dropped_messages":0,,"master\/elected":1,,"master\/event_queue_dispatches":0,,"master\/event_queue_http_requests":0,,"master\/event_queue_messages":0,,"master\/frameworks_active":1,,"master\/frameworks_connected":1,,"master\/frameworks_disconnected":0,,"master\/frameworks_inactive":0,,"master\/invalid_executor_to_framework_messages":0,,"master\/invalid_framework_to_executor_messages":0,,"master\/invalid_status_update_acknowledgements":0,,"master\/invalid_status_updates":0,,"master\/mem_percent":0,,"master\/mem_revocable_percent":0,,"master\/mem_revocable_total":0,,"master\/mem_revocable_used":0,,"master\/mem_total":0,,"master\/mem_used":0,,"master\/messages_authenticate":2,,"master\/messages_deactivate_framework":0,,"master\/messages_decline_offers":0,,"master\/messages_executor_to_framework":0,,"master\/messages_exited_executor":0,,"master\/messages_framework_to_executor":0,,"master\/messages_kill_task":0,,"master\/messages_launch_tasks":1,,"master\/messages_reconcile_tasks":0,,"master\/messages_register_framework":0,,"master\/messages_register_slave":1,,"master\/messages_reregister_framework":0,,"master\/messages_reregister_slave":0,,"master\/messages_resource_request":0,,"master\/messages_revive_offers":0,,"master\/messages_status_update":0,,"master\/messages_status_update_acknowledgement":0,,"master\/messages_suppress_offers":0,,"master\/messages_unregister_framework":0,,"master\/messages_unregister_slave":1,,"master\/messages_update_slave":1,,"master\/outstanding_offers":0,,"master\/recovery_slave_removals":0,,"master\/slave_registrations":1,,"master\/slave_removals":1,,"master\/slave_removals\/reason_registered":0,,"master\/slave_removals\/reason_unhealthy":0,,"master\/slave_removals\/reason_unregistered":1,,"master\/slave_reregistrations":0,,"master\/slave_shutdowns_canceled":0,,"master\/slave_shutdowns_completed":0,,"master\/slave_shutdowns_scheduled":0,,"master\/slaves_active":0,,"master\/slaves_connected":0,,"master\/slaves_disconnected":0,,"master\/slaves_inactive":0,,"master\/task_lost\/source_master\/reason_slave_removed":1,,"master\/tasks_error":0,,"master\/tasks_failed":0,,"master\/tasks_finished":0,,"master\/tasks_killed":0,,"master\/tasks_killing":0,,"master\/tasks_lost":1,,"master\/tasks_running":0,,"master\/tasks_staging":1,,"master\/tasks_starting":0,,"master\/uptime_secs":0,084369152,"master\/valid_executor_to_framework_messages":0,,"master\/valid_framework_to_executor_messages":0,,"master\/valid_status_update_acknowledgements":0,,"master\/valid_status_updates":0,,"registrar\/queued_operations":0,,"registrar\/registry_size_bytes":136,,"registrar\/state_fetch_ms":26,628096,"registrar\/state_store_ms":6,835968,"registrar\/state_store_ms\/count":3,"registrar\/state_store_ms\/max":7,308032,"registrar\/state_store_ms\/min":6,835968,"registrar\/state_store_ms\/p50":7,234048,"registrar\/state_store_ms\/p90":7,2932352,"registrar\/state_store_ms\/p95":7,3006336,"registrar\/state_store_ms\/p99":7,30655232,"registrar\/state_store_ms\/p999":7,307884032,"registrar\/state_store_ms\/p":7,3080172032,"scheduler\/event_queue_dispatches":0,,"scheduler\/event_queue_messag

Re: [VOTE] Release Apache Mesos 1.0.0 (rc1)

2016-06-06 Thread Robert Lacroix
Hi Vinod,

In convert.cpp 
 we 
compare the major versions of the native library and the jar. This makes 
upgrading frameworks unnecessarily hard because you would have to deploy Mesos 
and frameworks in lockstep.

Non-binding -1 😜, as this check isn’t strictly useful - especially given this 
is probably the last major upgrade where libmesos is even relevant.

 Robert

> On Jun 1, 2016, at 12:38 AM, Vinod Kone  > wrote:
> 
> Hi all,
> 
> Please vote on releasing the following candidate as Apache Mesos 1.0.0.
> 
> 
> NOTE: The voting period for this release is 3 weeks. Also, we are willing
> to make API changes before the final release. So please test it thoroughly.
> 
> 
> 1.0.0 includes the following features:
> 
> 
> 
>  * Scheduler and Executor v1 HTTP APIs are now considered stable.
> 
> 
> 
> 
> 
>  * [MESOS-4791] - **Experimental** support for v1 Master and Agent APIs.
> These
> 
>APIs let operators and services (monitoring, load balancers) send HTTP
> 
> 
>requests to '/api/v1' endpoint on master or agent. These APIs look
> similar
> 
>to the v1 Scheduler and Executor APIs.
> 
> 
> 
> 
> 
>  * [MESOS-4828] - **Experimental** support for a new `disk/xfs' isolator
> 
> 
>has been added to isolate disk resources more efficiently. Please refer
> to
> 
>docs/mesos-containerizer.md for more details.
> 
> 
> 
> 
> 
>  * [MESOS-4355] - **Experimental** support for Docker volume plugin. We
> added a
> 
>new isolator 'docker/volume' which allows users to use external volumes
> in
> 
>Mesos containerizer. Currently, the isolator interacts with the Docker
> 
> 
>volume plugins using a tool called 'dvdcli'. By speaking the Docker
> volume
> 
>plugin API, most of the Docker volume plugins are supported.
> 
> 
> 
> 
> 
>  * [MESOS-4641] - **Experimental** A new network isolator, the
> 
> 
>`network/cni` isolator, has been introduced in the
> `MesosContainerizer`. The
> 
>`network/cni` isolator implements the Container Network Interface (CNI)
> 
> 
>specification proposed by CoreOS.  With CNI the `network/cni` isolator
> is
> 
>able to allocate a network namespace to Mesos containers and attach the
> 
> 
>container to different types of IP networks by invoking network drivers
> 
> 
>called CNI plugins.
> 
> 
> 
> 
> 
>  * [MESOS-2948, MESOS-5403] - The authorizer interface has been refactored
> in
> 
>order to decouple the ACLs definition language from the interface.
> 
> 
>It additionally includes the option of retrieving `ObjectApprover`. An
> 
> 
>`ObjectApprover` can be used to synchronously check authorizations for
> a
> 
>given object and is hence useful when authorizing a large number of
> objects
> 
>and/or large objects (which need to be copied using request based
> 
> 
>authorization). NOTE: This is a **breaking change** for authorizer
> modules.
> 
> 
> 
> 
>  * [MESOS-4931] - Authorization based HTTP endpoint filtering enables
> operators
> 
>to restrict what part of the cluster state a user is authorized to see.
> 
> 
>Consider for example the `/state` master endpoint: an operator can now
> 
> 
>authorize users to only see a subset of the running frameworks, tasks,
> or
> 
>executors.
> 
> 
> 
> 
> 
>  * [MESOS-4909] - Tasks can now specify a kill policy. They are
> best-effort,
> 
>because machine failures or forcible terminations may occur. Currently,
> the
> 
>only available kill policy is how long to wait between graceful and
> forcible
> 
>task kill. In the future, more policies may be available (e.g. hitting
> an
> 
>HTTP endpoint, running a command, etc). Note that it is the executor's
> 
> 
>responsibility to enforce kill policies. For executor-less
> command-based
> 
>tasks, the kill is performed via sending a signal to the task process:
> 
> 
>SIGTERM for the graceful kill and SIGKILL for the forcible kill. For
> docker
> 
>executor-less tasks the grace period is passed to 'docker stop --time'.
> This
> 
>feature supersedes the '--docker_stop_timeout', which is now
> deprecated.
> 
> 
> 
> 
>  * [MESOS-4908] - The task kill policy defined within 'TaskInfo' can now
> be
> 
>overridden when the scheduler kills the task. This can be used by
> schedulers
> 
>to forcefully kill a task which is already being killed, e.g. if
> something
> 
>went wrong during a graceful kill and a forcible kill is desired. Note
> that
> 
>it is the executor's responsibility to honor the
> 'Event.kill.kill_policy'
> 
>field and override the task's kill policy and kill policy from a
> previous
> 
>kill task request. To use this feature, schedulers and executors must
> 
> 
>support HTTP API; use the '--http_command_executor' agent flag to
> ensure
> 
>the agent launches the

Re: Set LIBPROCESS_IP for frameworks launched with marathon

2016-06-06 Thread Radoslaw Gruchalski
Yes, because that runs in host network. This leads to a question: your docker 
task, is it bridge or host network.

-- 
Best regards,
Rad




On Tue, Jun 7, 2016 at 3:21 AM +0200, "Eli Jordan"  
wrote:










It's important to note that if you run a task with the command executor (I.e. 
Not using docker) LIBPROCESS_IP is defined, along with several other variables 
that are not defined in docker.

ThanksEli
On 7 Jun 2016, at 10:05, Radoslaw Gruchalski  wrote:

I think the problem is that it is not known which agent the task is running on 
until the task i in the running state.Hence the master can’t pass that as an 
env variable to the task.However, I see your point. There is an agent host name 
avaialble in the task as $HOST. Maybe it would be a good idea if Mesos was 
setting env variables named AGENT_IP_0, AGENT_IP_1 and so on for every IP 
interface on the agent, maybe AGENT_BIND_IP if bind IP is different than 
0.0.0.0. OTOH, I can see how this could be considered as some security issue. I 
am not sure what the implications could be. 
Anybody else care to comment?
 















– 
Best regards,

Radek
Gruchalski

ra...@gruchalski.com
de.linkedin.com/in/radgruchalski

Confidentiality:
This
communication is intended for the above-named person and may be
confidential and/or legally privileged.
If it has come to you in
error you must take no action based on it, nor must you copy or show
it to anyone; please delete/destroy and inform the sender
immediately. 


On June 7, 2016 at 1:42:46 AM, Eli Jordan (elias.k.jor...@gmail.com) wrote: 






Thanks Radoslaw. I'm not really set on using host names, I
just want a reliable way to start the framework. For the meantime I
have gone with a solution similar to what you suggested. We use
/etc/default/mesos file to configure mesos, and it has the ip
defined, so I just mounted that in the container and read the
ip.


I would like to avoid having a
dependency on the file system of the  agents though. I'm not
sure why I can't have the docket executor set the LIBPROCESS_IP
variable in the same way the command executor does.



Thanks
Eli



On 6 Jun 2016, at 21:44, Radoslaw Gruchalski 
wrote:







Out of curiosity. Why are you insisting on using host names?

Say you have 1 master and 2 agents with these IPs:




- mesos-master-0: 10.100.1.10

- mesos-agent-0: 10.100.1.11

- mesos-agent-1: 10.100.1.12




Your problem is that you have no way to obtain an IP address of the
agent in the container. Correct?

One way to overcome this problem is to create a shell file, say in
/etc/mesos-agent.sh, with contents like:




...

AGENT_IP=10.100.1.11

…




If you’re using Marathon, you can copy that file to the sandbox
using docker volumes:




     
      {
     
          "containerPath":
“/etc/mesos-agent.sh",
     
          "hostPath":
"/etc/mesos-agent.sh",
     
          "mode": "RO"
     
      }


You can now source
that in the container to set the LIBPROCESS_ADVERTISE_IP.
Other applications
simply use the mesos-agent-X host name. That’s without
mesos-dns.
Things are easier
with mesos-dns or consul service catalog (I prefer the
latter).








– 

Best regards,


Radek Gruchalski


ra...@gruchalski.com
de.linkedin.com/in/radgruchalski



Confidentiality:
This
communication is intended for the above-named person and may be
confidential and/or legally privileged.

If it has come to you in error you must take no action based on it,
nor must you copy or show it to anyone; please delete/destroy and
inform the sender immediately.






On June 6, 2016 at 1:16:07 PM, Eli Jordan
(elias.k.jor...@gmail.com)
wrote:


The issue refers to LIBPROCESS_IP not LIBPROCESS_HOST. I
haven’t been able to find the LIBPROCESS_HOST variable documented
anywhere.


My understanding is that the scheduler uses
LIBPROCESS_IP to determine which network interface to bind to, and
also which ip to advertise to the master, so that the master can
send offers. There is also another variable
LIBPROCESS_ADVERTISE_IP. If this is defined then LIBPROCESS_IP is
used to determine which network interface to bind to, and
LIBPROCESS_ADVERTISE_IP is used to determine which ip to advertise
to the master.


It would be great if there was a
LIBPROCESS_ADVERTISE_HOST variable, then I could just use the $HOST
variable to define this.




On 5 Jun 2016, at 10:41 pm, Sivaram Kannan
 wrote:








I have been using this way from 0.23.0 to the 0.28.0. This
has been surely working (although for a different framework).
Inside the docker container can you see the $HOST variable
defined??



The ticket you referred says that the apps definition needs to
define LIBPROCESS_HOST=$HOST to be make the framework take the
proper IP - you are describing a different problem.



Thanks,

./Siva.



On Sun, Jun 5, 2016 at 4:30 AM, Eli
Jordan  wrote:



I found this issue on the mesos jira that describes
the exact issue I am hitting.


ht

Re: Set LIBPROCESS_IP for frameworks launched with marathon

2016-06-06 Thread Eli Jordan
It's important to note that if you run a task with the command executor (I.e. 
Not using docker) LIBPROCESS_IP is defined, along with several other variables 
that are not defined in docker.

Thanks
Eli

> On 7 Jun 2016, at 10:05, Radoslaw Gruchalski  wrote:
> 
> I think the problem is that it is not known which agent the task is running 
> on until the task i in the running state.
> Hence the master can’t pass that as an env variable to the task.
> However, I see your point. There is an agent host name avaialble in the task 
> as $HOST. Maybe it would be a good idea if Mesos was setting env variables 
> named AGENT_IP_0, AGENT_IP_1 and so on for every IP interface on the agent, 
> maybe AGENT_BIND_IP if bind IP is different than 0.0.0.0. OTOH, I can see how 
> this could be considered as some security issue. I am not sure what the 
> implications could be.
> 
> Anybody else care to comment?
> – 
> Best regards,
> 
> Radek Gruchalski
> 
> ra...@gruchalski.com
> de.linkedin.com/in/radgruchalski
> 
> Confidentiality:
> This communication is intended for the above-named person and may be 
> confidential and/or legally privileged.
> If it has come to you in error you must take no action based on it, nor must 
> you copy or show it to anyone; please delete/destroy and inform the sender 
> immediately.
> 
>> On June 7, 2016 at 1:42:46 AM, Eli Jordan (elias.k.jor...@gmail.com) wrote:
>> 
>> Thanks Radoslaw. I'm not really set on using host names, I just want a 
>> reliable way to start the framework. For the meantime I have gone with a 
>> solution similar to what you suggested. We use /etc/default/mesos file to 
>> configure mesos, and it has the ip defined, so I just mounted that in the 
>> container and read the ip.
>> 
>> I would like to avoid having a dependency on the file system of the  agents 
>> though. I'm not sure why I can't have the docket executor set the 
>> LIBPROCESS_IP variable in the same way the command executor does.
>> 
>> Thanks
>> Eli
>> 
>> On 6 Jun 2016, at 21:44, Radoslaw Gruchalski  wrote:
>> 
>>> Out of curiosity. Why are you insisting on using host names?
>>> Say you have 1 master and 2 agents with these IPs:
>>> 
>>> - mesos-master-0: 10.100.1.10
>>> - mesos-agent-0: 10.100.1.11
>>> - mesos-agent-1: 10.100.1.12
>>> 
>>> Your problem is that you have no way to obtain an IP address of the agent 
>>> in the container. Correct?
>>> One way to overcome this problem is to create a shell file, say in 
>>> /etc/mesos-agent.sh, with contents like:
>>> 
>>> ...
>>> AGENT_IP=10.100.1.11
>>> …
>>> 
>>> If you’re using Marathon, you can copy that file to the sandbox using 
>>> docker volumes:
>>> 
>>> {
>>> "containerPath": “/etc/mesos-agent.sh",
>>> "hostPath": "/etc/mesos-agent.sh",
>>> "mode": "RO"
>>> }
>>> 
>>> You can now source that in the container to set the LIBPROCESS_ADVERTISE_IP.
>>> Other applications simply use the mesos-agent-X host name. That’s without 
>>> mesos-dns.
>>> Things are easier with mesos-dns or consul service catalog (I prefer the 
>>> latter).
>>> – 
>>> Best regards,
>>> 
>>> Radek Gruchalski
>>> 
>>> ra...@gruchalski.com
>>> de.linkedin.com/in/radgruchalski
>>> 
>>> Confidentiality:
>>> This communication is intended for the above-named person and may be 
>>> confidential and/or legally privileged.
>>> If it has come to you in error you must take no action based on it, nor 
>>> must you copy or show it to anyone; please delete/destroy and inform the 
>>> sender immediately.
>>> 
 On June 6, 2016 at 1:16:07 PM, Eli Jordan (elias.k.jor...@gmail.com) wrote:
 
 The issue refers to LIBPROCESS_IP not LIBPROCESS_HOST. I haven’t been able 
 to find the LIBPROCESS_HOST variable documented anywhere.
 
 My understanding is that the scheduler uses LIBPROCESS_IP to determine 
 which network interface to bind to, and also which ip to advertise to the 
 master, so that the master can send offers. There is also another variable 
 LIBPROCESS_ADVERTISE_IP. If this is defined then LIBPROCESS_IP is used to 
 determine which network interface to bind to, and LIBPROCESS_ADVERTISE_IP 
 is used to determine which ip to advertise to the master.
 
 It would be great if there was a LIBPROCESS_ADVERTISE_HOST variable, then 
 I could just use the $HOST variable to define this.
 
> On 5 Jun 2016, at 10:41 pm, Sivaram Kannan  wrote:
> 
> 
> I have been using this way from 0.23.0 to the 0.28.0. This has been 
> surely working (although for a different framework). Inside the docker 
> container can you see the $HOST variable defined??
> 
> The ticket you referred says that the apps definition needs to define 
> LIBPROCESS_HOST=$HOST to be make the framework take the proper IP - you 
> are describing a different problem.
> 
> Thanks,
> ./Siva.
> 
>> On Sun, Jun 5, 2016 at 4:30 AM, Eli Jordan  
>> wrote:
>

Re: Mesos HA does not work (Failed to recover registrar)

2016-06-06 Thread Chengwei Yang
@Qian,

I think you're running issues with firewall, did you make sure your master can
reach from each other?

FROM master A
$ telnet B 5050

I think it fail to connect.

Please ensure shutdown any firewall.

-- 
Thanks,
Chengwei

On Mon, Jun 06, 2016 at 09:06:43PM +0800, Qian Zhang wrote:
> I deleted everything in the work dir (/var/lib/mesos/master), and tried again,
> the same error still happened :-(
> 
> 
> Thanks,
> Qian Zhang
> 
> On Mon, Jun 6, 2016 at 3:03 AM, Jean Christophe “JC” Martin <
> jch.mar...@gmail.com> wrote:
> 
> Qian,
> 
> Zookeeper should be able to reach a quorum with 2, no need to start 3
> simultaneously, but there is an issue with Zookeeper related to connection
> timeouts.
> https://issues.apache.org/jira/browse/ZOOKEEPER-2164
> In some circumstances, the timeout is higher than the sync timeout, which
> cause the leader election to fail.
> Try setting the parameter cnxtimeout in zookeeper (by default it’s 5000ms)
> to the value 500 (500ms). After doing this, leader election in ZK will be
> super fast even if a node is disconnected.
>
> JC
>
> > On Jun 4, 2016, at 4:34 PM, Qian Zhang  wrote:
> >
> > Thanks Vinod and Dick.
> >
> > I think my 3 ZK servers have formed a quorum, each of them has the
> > following config:
> >    $ cat conf/zoo.cfg
> >    server.1=192.168.122.132:2888:3888
> >    server.2=192.168.122.225:2888:3888
> >    server.3=192.168.122.171:2888:3888
> >    autopurge.purgeInterval=6
> >    autopurge.snapRetainCount=5
> >    initLimit=10
> >    syncLimit=5
> >    maxClientCnxns=0
> >    clientPort=2181
> >    tickTime=2000
> >    quorumListenOnAllIPs=true
> >    dataDir=/home/stack/packages/zookeeper-3.4.8/snapshot
> >    dataLogDir=/home/stack/packages/zookeeper-3.4.8/transactions
> >
> > And when I run "bin/zkServer.sh status" on each of them, I can see 
> "Mode:
> > leader" for one, and "Mode: follower" for the other two.
> >
> > I have already tried to manually start 3 masters simultaneously, and 
> here
> > is what I see in their log:
> > In 192.168.122.171(this is the first master I started):
> >    I0605 07:12:49.418721  1187 detector.cpp:152] Detected a new leader:
> > (id='25')
> >    I0605 07:12:49.419276  1186 group.cpp:698] Trying to get
> > '/mesos/log_replicas/24' in ZooKeeper
> >    I0605 07:12:49.420013  1188 group.cpp:698] Trying to get
> > '/mesos/json.info_25' in ZooKeeper
> >    I0605 07:12:49.423807  1188 zookeeper.cpp:259] A new leading master
> > (UPID=master@192.168.122.171:5050) is detected
> >    I0605 07:12:49.423841 1186 network.hpp:461] ZooKeeper group PIDs: {
> > log-replica(1)@192.168.122.171:5050 }
> >    I0605 07:12:49.424281 1187 master.cpp:1951] The newly elected leader
> > is master@192.168.122.171:5050 with id
> cdc459d4-a05f-4f99-9bf4-1ee9a91d139b
> >    I0605 07:12:49.424895  1187 master.cpp:1964] Elected as the leading
> > master!
> >
> > In 192.168.122.225 (second master I started):
> >    I0605 07:12:51.918702  2246 detector.cpp:152] Detected a new leader:
> > (id='25')
> >    I0605 07:12:51.919983  2246 group.cpp:698] Trying to get
> > '/mesos/json.info_25' in ZooKeeper
> >    I0605 07:12:51.921910  2249 network.hpp:461] ZooKeeper group PIDs: {
> > log-replica(1)@192.168.122.171:5050 }
> >    I0605 07:12:51.925721 2252 replica.cpp:673] Replica in EMPTY status
> > received a broadcasted recover request from (6)@192.168.122.225:5050
> >    I0605 07:12:51.927891  2246 zookeeper.cpp:259] A new leading master
> > (UPID=master@192.168.122.171:5050) is detected
> >    I0605 07:12:51.928444 2246 master.cpp:1951] The newly elected leader
> > is master@192.168.122.171:5050 with id
> cdc459d4-a05f-4f99-9bf4-1ee9a91d139b
> >
> > In 192.168.122.132 (last master I started):
> > I0605 07:12:53.553949 16426 detector.cpp:152] Detected a new leader:
> > (id='25')
> > I0605 07:12:53.555179 16429 group.cpp:698] Trying to get
> > '/mesos/json.info_25' in ZooKeeper
> > I0605 07:12:53.560045 16428 zookeeper.cpp:259] A new leading master 
> (UPID
> =
> > master@192.168.122.171:5050) is detected
> >
> > So right after I started these 3 masters, the first one 
> (192.168.122.171)
> > was successfully elected as leader, but after 60s, 192.168.122.171 
> failed
> > with the error mentioned in my first mail, and then 192.168.122.225 was
> > elected as leader, but it failed with the same error too after another
> 60s,
> > and the same thing happened to the last one (192.168.122.132). So after
> > about 180s, all my 3 master were down.
> >
> > I tried both:
> >    sudo ./bin/mesos-master.sh --zk=zk://127.0.0.1:2181/mesos --quorum=2
> > --work_dir=/var/lib/mesos/mas

Re: Set LIBPROCESS_IP for frameworks launched with marathon

2016-06-06 Thread Radoslaw Gruchalski
I think the problem is that it is not known which agent the task is running
on until the task i in the running state.
Hence the master can’t pass that as an env variable to the task.
However, I see your point. There is an agent host name avaialble in the
task as $HOST. Maybe it would be a good idea if Mesos was setting env
variables named AGENT_IP_0, AGENT_IP_1 and so on for every IP interface on
the agent, maybe AGENT_BIND_IP if bind IP is different than 0.0.0.0. OTOH,
I can see how this could be considered as some security issue. I am not
sure what the implications could be.

Anybody else care to comment?

–
Best regards,
Radek Gruchalski
ra...@gruchalski.com
de.linkedin.com/in/radgruchalski


*Confidentiality:*This communication is intended for the above-named person
and may be confidential and/or legally privileged.
If it has come to you in error you must take no action based on it, nor
must you copy or show it to anyone; please delete/destroy and inform the
sender immediately.

On June 7, 2016 at 1:42:46 AM, Eli Jordan (elias.k.jor...@gmail.com) wrote:

Thanks Radoslaw. I'm not really set on using host names, I just want a
reliable way to start the framework. For the meantime I have gone with a
solution similar to what you suggested. We use /etc/default/mesos file to
configure mesos, and it has the ip defined, so I just mounted that in the
container and read the ip.

I would like to avoid having a dependency on the file system of the  agents
though. I'm not sure why I can't have the docket executor set the
LIBPROCESS_IP variable in the same way the command executor does.

Thanks
Eli

On 6 Jun 2016, at 21:44, Radoslaw Gruchalski  wrote:

Out of curiosity. Why are you insisting on using host names?
Say you have 1 master and 2 agents with these IPs:

- mesos-master-0: 10.100.1.10
- mesos-agent-0: 10.100.1.11
- mesos-agent-1: 10.100.1.12

Your problem is that you have no way to obtain an IP address of the agent
in the container. Correct?
One way to overcome this problem is to create a shell file, say in
/etc/mesos-agent.sh, with contents like:

...
AGENT_IP=10.100.1.11
…

If you’re using Marathon, you can copy that file to the sandbox using
docker volumes:

{
"containerPath": “/etc/mesos-agent.sh",
"hostPath": "/etc/mesos-agent.sh",
"mode": "RO"
}

You can now source that in the container to set the LIBPROCESS_ADVERTISE_IP.
Other applications simply use the mesos-agent-X host name. That’s without
mesos-dns.
Things are easier with mesos-dns or consul service catalog (I prefer the
latter).

–
Best regards,
Radek Gruchalski
ra...@gruchalski.com
de.linkedin.com/in/radgruchalski


*Confidentiality:*This communication is intended for the above-named person
and may be confidential and/or legally privileged.
If it has come to you in error you must take no action based on it, nor
must you copy or show it to anyone; please delete/destroy and inform the
sender immediately.

On June 6, 2016 at 1:16:07 PM, Eli Jordan (elias.k.jor...@gmail.com) wrote:

The issue refers to LIBPROCESS_IP not LIBPROCESS_HOST. I haven’t been able
to find the LIBPROCESS_HOST variable documented anywhere.

My understanding is that the scheduler uses LIBPROCESS_IP to determine
which network interface to bind to, and also which ip to advertise to the
master, so that the master can send offers. There is also another variable
LIBPROCESS_ADVERTISE_IP. If this is defined then LIBPROCESS_IP is used to
determine which network interface to bind to, and LIBPROCESS_ADVERTISE_IP
is used to determine which ip to advertise to the master.

It would be great if there was a LIBPROCESS_ADVERTISE_HOST variable, then I
could just use the $HOST variable to define this.

On 5 Jun 2016, at 10:41 pm, Sivaram Kannan  wrote:


I have been using this way from 0.23.0 to the 0.28.0. This has been surely
working (although for a different framework). Inside the docker container
can you see the $HOST variable defined??

The ticket you referred says that the apps definition needs to define
LIBPROCESS_HOST=$HOST to be make the framework take the proper IP - you are
describing a different problem.

Thanks,
./Siva.

On Sun, Jun 5, 2016 at 4:30 AM, Eli Jordan  wrote:

> I found this issue on the mesos jira that describes the exact issue I am
> hitting.
>
> https://issues.apache.org/jira/browse/MESOS-3740
>
> It doesn't appear to be resolved.
>
> Thanks
> Eli
>
> On 5 Jun 2016, at 16:46, Eli Jordan  wrote:
>
> Hmmm… that doesn’t seem to work for me. What version of mesos does this
> work in? I am running 0.27.1.
>
> When using this approach, I still get the following error when the kafka
> mess framework is starting up.
>
> "Scheduler driver bound to loopback interface! Cannot communicate with
> remote master(s). You might want to set 'LIBPROCESS_IP' environment
> variable to use a routable IP address.”
>
> I tried setting LIBPROCESS_IP to ‘0.0.0.0’ and
> LIBPROCESS_ADVERTISE_IP=‘the public ip’ and this work

Re: Consequences of health-check timeouts?

2016-06-06 Thread Benjamin Mahler
I'll make sure this gets fixed for 1.0. Apologies for the pain, it looks
like there is a significant amount of debt in the docker containerizer /
executor.

On Wed, May 18, 2016 at 10:51 AM, Steven Schlansker <
sschlans...@opentable.com> wrote:

>
> > On May 18, 2016, at 10:44 AM, haosdent  wrote:
> >
> > >In re executor_shutdown_grace_period: how would this enable the task
> (MongoDB) to terminate gracefully? (BTW: I am fairly certain that the mongo
> STDOUT as captured by Mesos shows that it received signal 15 just before it
> said good-bye). My naive understanding of this grace period is that it
> simply delays the termination of the executor.
>
> I'm not 100% sure this is related or helpful, but be aware that we believe
> there is a bug in the Docker
> containerizer's handling of logs during shutdown:
>
> https://issues.apache.org/jira/browse/MESOS-5195
>
> We spent a lot of time debugging why our application was not shutting down
> as we expected,
> only to find that the real problem was that Mesos was losing all logs sent
> during
> shutdown.
>
> >
> > If you use DockerContainerizer, mesos use executor_shutdown_grace_period
> as the shutdown gracefully timeout for task as well. If you use
> MesosContainerizer, it would send SIGTERM(15) first. After 3 seconds, if
> the task is still alive, Mesos would send SIGKILL(9) to the task again.
> >
> > >I'm not sure what the java task is. This took place on the mesos-master
> node and none of our applications runs there. It runs master, Marathon, and
> ZK. Maybe the java task is Marathon or ZK?
> >
> > Not sure about this, maybe others have similar experience on this, do
> Marathon or Zookeeper abnormal at that time? Could you provide the log of
> mesos-master/mesos-slave when accident happened as well?
> >
> >
> > On Wed, May 18, 2016 at 7:11 PM, Paul Bell  wrote:
> > Hi Hasodent,
> >
> > Thanks for your reply.
> >
> > In re executor_shutdown_grace_period: how would this enable the task
> (MongoDB) to terminate gracefully? (BTW: I am fairly certain that the mongo
> STDOUT as captured by Mesos shows that it received signal 15 just before it
> said good-bye). My naive understanding of this grace period is that it
> simply delays the termination of the executor.
> >
> > The following snippet is rom /var/log/syslog. I believe it shows the
> stack trace (largely in the kernel) that led to mesos-master being blocked
> for more than 120 seconds. Please note that immediately above (before) the
> blocked mesos-master is a blocked jbd2/dm. Immediately below (after) the
> blocked mesos-master is a blocked java task. I'm not sure what the java
> task is. This took place on the mesos-master node and none of our
> applications runs there. It runs master, Marathon, and ZK. Maybe the java
> task is Marathon or ZK?
> >
> > Thanks again.
> >
> > -Paul
> > May 16 20:06:53 71 kernel: [193339.890848] INFO: task mesos-master:4013
> blocked for more than 120 seconds.
> >
> > May 16 20:06:53 71 kernel: [193339.890873]   Not tainted
> 3.13.0-32-generic #57-Ubuntu
> >
> > May 16 20:06:53 71 kernel: [193339.890889] "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> >
> > May 16 20:06:53 71 kernel: [193339.890912] mesos-masterD
> 88013fd94440 0  4013  1 0x
> >
> > May 16 20:06:53 71 kernel: [193339.890914]  880137429a28
> 0002 880135778000 880137429fd8
> >
> > May 16 20:06:53 71 kernel: [193339.890916]  00014440
> 00014440 880135778000 88013fd94cd8
> >
> > May 16 20:06:53 71 kernel: [193339.890918]  88013ffd34b0
> 0002 81284630 880137429aa0
> >
> > May 16 20:06:53 71 kernel: [193339.890919] Call Trace:
> >
> > May 16 20:06:53 71 kernel: [193339.890922]  [] ?
> start_this_handle+0x590/0x590
> >
> > May 16 20:06:53 71 kernel: [193339.890924]  []
> io_schedule+0x9d/0x140
> >
> > May 16 20:06:53 71 kernel: [193339.890925]  []
> sleep_on_shadow_bh+0xe/0x20
> >
> > May 16 20:06:53 71 kernel: [193339.890927]  []
> __wait_on_bit+0x62/0x90
> >
> > May 16 20:06:53 71 kernel: [193339.890929]  [] ?
> start_this_handle+0x590/0x590
> >
> > May 16 20:06:53 71 kernel: [193339.890930]  []
> out_of_line_wait_on_bit+0x77/0x90
> >
> > May 16 20:06:53 71 kernel: [193339.890932]  [] ?
> autoremove_wake_function+0x40/0x40
> >
> > May 16 20:06:53 71 kernel: [193339.890934]  [] ?
> wake_up_bit+0x25/0x30
> >
> > May 16 20:06:53 71 kernel: [193339.890936]  []
> do_get_write_access+0x2ad/0x4f0
> >
> > May 16 20:06:53 71 kernel: [193339.890938]  [] ?
> __getblk+0x2d/0x2e0
> >
> > May 16 20:06:53 71 kernel: [193339.890939]  []
> jbd2_journal_get_write_access+0x27/0x40
> >
> > May 16 20:06:53 71 kernel: [193339.890942]  []
> __ext4_journal_get_write_access+0x3b/0x80
> >
> > May 16 20:06:53 71 kernel: [193339.890946]  []
> ext4_reserve_inode_write+0x70/0xa0
> >
> > May 16 20:06:53 71 kernel: [193339.890948]  [] ?
> ext4_dirty_inode+0x40/0x60
> >
> > May 16 20:06:53 71 kernel: [193339.89094

Re: Mesos 0.24.1 on Raspberry Pi 3

2016-06-06 Thread Benjamin Mahler
Cool stuff Andrew, thanks for sharing!

On Thu, Jun 2, 2016 at 11:50 AM, Andrew Spyker 
wrote:

> FYI, based on the work others have done in the past, Netflix was able to
> get Mesos agent building and running on Raspberry Pi natively and under
> Docker containers.  Please see this blog for the information:
>
> bit.ly/TitusOnPi
>
> --
> Andrew Spyker (aspy...@netflix.com)
> Twitter:  @aspyker  Blog:  ispyker.blogspot.com
>


Re: Set LIBPROCESS_IP for frameworks launched with marathon

2016-06-06 Thread Eli Jordan
Thanks Radoslaw. I'm not really set on using host names, I just want a reliable 
way to start the framework. For the meantime I have gone with a solution 
similar to what you suggested. We use /etc/default/mesos file to configure 
mesos, and it has the ip defined, so I just mounted that in the container and 
read the ip.

I would like to avoid having a dependency on the file system of the  agents 
though. I'm not sure why I can't have the docket executor set the LIBPROCESS_IP 
variable in the same way the command executor does.

Thanks
Eli

> On 6 Jun 2016, at 21:44, Radoslaw Gruchalski  wrote:
> 
> Out of curiosity. Why are you insisting on using host names?
> Say you have 1 master and 2 agents with these IPs:
> 
> - mesos-master-0: 10.100.1.10
> - mesos-agent-0: 10.100.1.11
> - mesos-agent-1: 10.100.1.12
> 
> Your problem is that you have no way to obtain an IP address of the agent in 
> the container. Correct?
> One way to overcome this problem is to create a shell file, say in 
> /etc/mesos-agent.sh, with contents like:
> 
> ...
> AGENT_IP=10.100.1.11
> …
> 
> If you’re using Marathon, you can copy that file to the sandbox using docker 
> volumes:
> 
> {
> "containerPath": “/etc/mesos-agent.sh",
> "hostPath": "/etc/mesos-agent.sh",
> "mode": "RO"
> }
> 
> You can now source that in the container to set the LIBPROCESS_ADVERTISE_IP.
> Other applications simply use the mesos-agent-X host name. That’s without 
> mesos-dns.
> Things are easier with mesos-dns or consul service catalog (I prefer the 
> latter).
> – 
> Best regards,
> 
> Radek Gruchalski
> 
> ra...@gruchalski.com
> de.linkedin.com/in/radgruchalski
> 
> Confidentiality:
> This communication is intended for the above-named person and may be 
> confidential and/or legally privileged.
> If it has come to you in error you must take no action based on it, nor must 
> you copy or show it to anyone; please delete/destroy and inform the sender 
> immediately.
> 
>> On June 6, 2016 at 1:16:07 PM, Eli Jordan (elias.k.jor...@gmail.com) wrote:
>> 
>> The issue refers to LIBPROCESS_IP not LIBPROCESS_HOST. I haven’t been able 
>> to find the LIBPROCESS_HOST variable documented anywhere.
>> 
>> My understanding is that the scheduler uses LIBPROCESS_IP to determine which 
>> network interface to bind to, and also which ip to advertise to the master, 
>> so that the master can send offers. There is also another variable 
>> LIBPROCESS_ADVERTISE_IP. If this is defined then LIBPROCESS_IP is used to 
>> determine which network interface to bind to, and LIBPROCESS_ADVERTISE_IP is 
>> used to determine which ip to advertise to the master.
>> 
>> It would be great if there was a LIBPROCESS_ADVERTISE_HOST variable, then I 
>> could just use the $HOST variable to define this.
>> 
>>> On 5 Jun 2016, at 10:41 pm, Sivaram Kannan  wrote:
>>> 
>>> 
>>> I have been using this way from 0.23.0 to the 0.28.0. This has been surely 
>>> working (although for a different framework). Inside the docker container 
>>> can you see the $HOST variable defined??
>>> 
>>> The ticket you referred says that the apps definition needs to define 
>>> LIBPROCESS_HOST=$HOST to be make the framework take the proper IP - you are 
>>> describing a different problem.
>>> 
>>> Thanks,
>>> ./Siva.
>>> 
 On Sun, Jun 5, 2016 at 4:30 AM, Eli Jordan  
 wrote:
 I found this issue on the mesos jira that describes the exact issue I am 
 hitting.
 
 https://issues.apache.org/jira/browse/MESOS-3740
 
 It doesn't appear to be resolved. 
 
 Thanks
 Eli
 
 On 5 Jun 2016, at 16:46, Eli Jordan  wrote:
 
> Hmmm… that doesn’t seem to work for me. What version of mesos does this 
> work in? I am running 0.27.1.
> 
> When using this approach, I still get the following error when the kafka 
> mess framework is starting up.
> 
> "Scheduler driver bound to loopback interface! Cannot communicate with 
> remote master(s). You might want to set 'LIBPROCESS_IP' environment 
> variable to use a routable IP address.”
> 
> I tried setting LIBPROCESS_IP to ‘0.0.0.0’ and 
> LIBPROCESS_ADVERTISE_IP=‘the public ip’ and this works. But the host 
> variations don’t seem to work. (i.e. set LIBPROCESS_IP=0.0.0.0 and 
> LIBPROCESS_ADVERTISE_HOST=$HOST)
> 
> It seems lib process doesn’t support using host names.
> 
> I think I might have to run the framework outside of docker, but I would 
> really like to avoid this. 
> 
> This problem would be solved if the docker executor was able to set the 
> same environment variables as the command executor. Is there a way to 
> make this happen?
> 
> I saw that mesos can be extended with a Hook ‘module’ to set extra 
> environment variables in docker containers. This might be a solution, but 
> seems over wrought for a simple problem.
> 
> 
>> On 5 Jun 2

Re: 0.28.2 has been released

2016-06-06 Thread Kapil Arya
Hi Craig,

I should be sending out a link to the RPM packages in a couple of days.
(Our packaging scripts had some issues and I am working on fixing them).

Kapil

On Mon, Jun 6, 2016 at 6:15 AM, craig w  wrote:

> Jie,
>
> Thanks for the updates. When will the packages be available (in particular
> RPM)?
>
> -craig
>
> On Sun, Jun 5, 2016 at 2:31 PM, Jie Yu  wrote:
>
>> Hi folks,
>>
>> I just released Mesos 0.28.2 and updated the website.
>>
>> It includes some important bug fixes. The change log can be found here:
>>
>> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_plain;f=CHANGELOG;hb=0.28.2
>>
>> If you are considering using 0.28, please use 0.28.2!
>>
>> Thanks!
>> - Jie
>>
>
>
>
> --
>
> https://github.com/mindscratch
> https://www.google.com/+CraigWickesser
> https://twitter.com/mind_scratch
> https://twitter.com/craig_links
>
>


Re: Proposal: move content in Wiki to docs in code repo

2016-06-06 Thread Vinod Kone
Works for me. Some things we might miss from wiki would be comments and
ability to watch for updates; but I don't think many people use them.

On Mon, Jun 6, 2016 at 3:15 PM, Gilbert Song  wrote:

> +1.
>
> At least I personally rarely touch the wiki.
>
> Gilbert
>
> On Mon, Jun 6, 2016 at 11:51 AM, Zhou Z Xing  wrote:
>
>> +1
>>
>> It's good idea to collect and gather every thing in one single repo,
>> would be easier for users to find out.
>>
>> Thanks & Best Wishes,
>>
>> Tom Xing(邢舟)
>> Emerging Technology Institute, IBM China Software Development Lab
>> --
>> IBM China Software Development Laboratory (CSDL)
>> Notes ID:Zhou Z Xing/China/IBM
>> Phone :86-10-82450442
>> e-Mail :xingz...@cn.ibm.com
>> Address :Building No.28, ZhongGuanCun Software Park, No.8 Dong Bei Wang
>> West Road, Haidian District, Beijing, P.R.China 100193
>> 地址 :中国北京市海淀区东北旺西路8号 中关村软件园28号楼 100193
>>
>>
>> [image: Inactive hide details for Jie Yu ---2016-06-06 上午 11:29:42---Hi
>> folks, I am proposing moving our content in Wiki (e.g., wor]Jie Yu
>> ---2016-06-06 上午 11:29:42---Hi folks, I am proposing moving our content in
>> Wiki (e.g., working groups, release
>>
>> From: Jie Yu 
>> To: mesos , "user@mesos.apache.org" <
>> user@mesos.apache.org>
>> Date: 2016-06-06 上午 11:29
>> Subject: Proposal: move content in Wiki to docs in code repo
>> --
>>
>>
>>
>> Hi folks,
>>
>> I am proposing moving our content in Wiki (e.g., working groups, release
>> tracking, etc.) to our docs in the code repo. I personally found that wiki
>> is hard to use and there's no reviewing process for changes in the Wiki.
>> The content in Wiki historically received less attention than that in the
>> docs.
>>
>> What do you think?
>>
>> - Jie
>>
>>
>>
>>
>


Re: Proposal: move content in Wiki to docs in code repo

2016-06-06 Thread Gilbert Song
+1.

At least I personally rarely touch the wiki.

Gilbert

On Mon, Jun 6, 2016 at 11:51 AM, Zhou Z Xing  wrote:

> +1
>
> It's good idea to collect and gather every thing in one single repo, would
> be easier for users to find out.
>
> Thanks & Best Wishes,
>
> Tom Xing(邢舟)
> Emerging Technology Institute, IBM China Software Development Lab
> --
> IBM China Software Development Laboratory (CSDL)
> Notes ID:Zhou Z Xing/China/IBM
> Phone :86-10-82450442
> e-Mail :xingz...@cn.ibm.com
> Address :Building No.28, ZhongGuanCun Software Park, No.8 Dong Bei Wang
> West Road, Haidian District, Beijing, P.R.China 100193
> 地址 :中国北京市海淀区东北旺西路8号 中关村软件园28号楼 100193
>
>
> [image: Inactive hide details for Jie Yu ---2016-06-06 上午 11:29:42---Hi
> folks, I am proposing moving our content in Wiki (e.g., wor]Jie Yu
> ---2016-06-06 上午 11:29:42---Hi folks, I am proposing moving our content in
> Wiki (e.g., working groups, release
>
> From: Jie Yu 
> To: mesos , "user@mesos.apache.org" <
> user@mesos.apache.org>
> Date: 2016-06-06 上午 11:29
> Subject: Proposal: move content in Wiki to docs in code repo
> --
>
>
>
> Hi folks,
>
> I am proposing moving our content in Wiki (e.g., working groups, release
> tracking, etc.) to our docs in the code repo. I personally found that wiki
> is hard to use and there's no reviewing process for changes in the Wiki.
> The content in Wiki historically received less attention than that in the
> docs.
>
> What do you think?
>
> - Jie
>
>
>
>


Re: Proposal: move content in Wiki to docs in code repo

2016-06-06 Thread Zhou Z Xing
+1

It's good idea to collect and gather every thing in one single repo, would
be easier for users to find out.

Thanks & Best Wishes,

Tom Xing(邢舟)
Emerging Technology Institute, IBM China Software Development Lab
--
IBM China Software Development Laboratory (CSDL)
Notes ID:Zhou Z Xing/China/IBM
Phone   :86-10-82450442
e-Mail  :xingz...@cn.ibm.com
Address :Building No.28, ZhongGuanCun Software Park, No.8 Dong Bei Wang
West Road, Haidian District, Beijing, P.R.China 100193
地址:中国北京市海淀区东北旺西路8号 中关村软件园28号楼 100193




From:   Jie Yu 
To: mesos , "user@mesos.apache.org"

Date:   2016-06-06 上午 11:29
Subject:Proposal: move content in Wiki to docs in code repo



Hi folks,

I am proposing moving our content in Wiki (e.g., working groups, release
tracking, etc.) to our docs in the code repo. I personally found that wiki
is hard to use and there's no reviewing process for changes in the Wiki.
The content in Wiki historically received less attention than that in the
docs.

What do you think?

- Jie




Re: Proposal: move content in Wiki to docs in code repo

2016-06-06 Thread Tomek Janiszewski
+1
For a long time I didn't knew that wiki exists. It's hidden from a new
users and search engines - I remember only one place where it's linked in
docs.
I can help with move.

pon., 6.06.2016, 20:29 użytkownik Jie Yu  napisał:

> Hi folks,
>
> I am proposing moving our content in Wiki (e.g., working groups, release
> tracking, etc.) to our docs in the code repo. I personally found that wiki
> is hard to use and there's no reviewing process for changes in the Wiki.
> The content in Wiki historically received less attention than that in the
> docs.
>
> What do you think?
>
> - Jie
>


Re: Proposal: move content in Wiki to docs in code repo

2016-06-06 Thread Avinash Sridharan
+1

I think moving the wiki into docs would make information about working
groups, release tracking first class citizens and help disseminate
information better.

On Mon, Jun 6, 2016 at 11:29 AM, Jie Yu  wrote:

> Hi folks,
>
> I am proposing moving our content in Wiki (e.g., working groups, release
> tracking, etc.) to our docs in the code repo. I personally found that wiki
> is hard to use and there's no reviewing process for changes in the Wiki.
> The content in Wiki historically received less attention than that in the
> docs.
>
> What do you think?
>
> - Jie
>



-- 
Avinash Sridharan, Mesosphere
+1 (323) 702 5245


Proposal: move content in Wiki to docs in code repo

2016-06-06 Thread Jie Yu
Hi folks,

I am proposing moving our content in Wiki (e.g., working groups, release
tracking, etc.) to our docs in the code repo. I personally found that wiki
is hard to use and there's no reviewing process for changes in the Wiki.
The content in Wiki historically received less attention than that in the
docs.

What do you think?

- Jie


Re: Mesos HA does not work (Failed to recover registrar)

2016-06-06 Thread haosdent
Hi, @Qian Zhang Your issue reminds me of this
http://search-hadoop.com/m/0Vlr69BZgz1NlAPP1&subj=Re+Mesos+Masters+Leader+Keeps+Fluctuating
which I could not reproduce in my env. I am not sure whether your case are
same with Stefano or not.

On Mon, Jun 6, 2016 at 9:06 PM, Qian Zhang  wrote:

> I deleted everything in the work dir (/var/lib/mesos/master), and tried
> again, the same error still happened :-(
>
>
> Thanks,
> Qian Zhang
>
> On Mon, Jun 6, 2016 at 3:03 AM, Jean Christophe “JC” Martin <
> jch.mar...@gmail.com> wrote:
>
>> Qian,
>>
>> Zookeeper should be able to reach a quorum with 2, no need to start 3
>> simultaneously, but there is an issue with Zookeeper related to connection
>> timeouts.
>> https://issues.apache.org/jira/browse/ZOOKEEPER-2164
>> In some circumstances, the timeout is higher than the sync timeout, which
>> cause the leader election to fail.
>> Try setting the parameter cnxtimeout in zookeeper (by default it’s
>> 5000ms) to the value 500 (500ms). After doing this, leader election in ZK
>> will be super fast even if a node is disconnected.
>>
>> JC
>>
>> > On Jun 4, 2016, at 4:34 PM, Qian Zhang  wrote:
>> >
>> > Thanks Vinod and Dick.
>> >
>> > I think my 3 ZK servers have formed a quorum, each of them has the
>> > following config:
>> >$ cat conf/zoo.cfg
>> >server.1=192.168.122.132:2888:3888
>> >server.2=192.168.122.225:2888:3888
>> >server.3=192.168.122.171:2888:3888
>> >autopurge.purgeInterval=6
>> >autopurge.snapRetainCount=5
>> >initLimit=10
>> >syncLimit=5
>> >maxClientCnxns=0
>> >clientPort=2181
>> >tickTime=2000
>> >quorumListenOnAllIPs=true
>> >dataDir=/home/stack/packages/zookeeper-3.4.8/snapshot
>> >dataLogDir=/home/stack/packages/zookeeper-3.4.8/transactions
>> >
>> > And when I run "bin/zkServer.sh status" on each of them, I can see
>> "Mode:
>> > leader" for one, and "Mode: follower" for the other two.
>> >
>> > I have already tried to manually start 3 masters simultaneously, and
>> here
>> > is what I see in their log:
>> > In 192.168.122.171(this is the first master I started):
>> >I0605 07:12:49.418721  1187 detector.cpp:152] Detected a new leader:
>> > (id='25')
>> >I0605 07:12:49.419276  1186 group.cpp:698] Trying to get
>> > '/mesos/log_replicas/24' in ZooKeeper
>> >I0605 07:12:49.420013  1188 group.cpp:698] Trying to get
>> > '/mesos/json.info_25' in ZooKeeper
>> >I0605 07:12:49.423807  1188 zookeeper.cpp:259] A new leading master
>> > (UPID=master@192.168.122.171:5050) is detected
>> >I0605 07:12:49.423841 1186 network.hpp:461] ZooKeeper group PIDs: {
>> > log-replica(1)@192.168.122.171:5050 }
>> >I0605 07:12:49.424281 1187 master.cpp:1951] The newly elected leader
>> > is master@192.168.122.171:5050 with id
>> cdc459d4-a05f-4f99-9bf4-1ee9a91d139b
>> >I0605 07:12:49.424895  1187 master.cpp:1964] Elected as the leading
>> > master!
>> >
>> > In 192.168.122.225 (second master I started):
>> >I0605 07:12:51.918702  2246 detector.cpp:152] Detected a new leader:
>> > (id='25')
>> >I0605 07:12:51.919983  2246 group.cpp:698] Trying to get
>> > '/mesos/json.info_25' in ZooKeeper
>> >I0605 07:12:51.921910  2249 network.hpp:461] ZooKeeper group PIDs: {
>> > log-replica(1)@192.168.122.171:5050 }
>> >I0605 07:12:51.925721 2252 replica.cpp:673] Replica in EMPTY status
>> > received a broadcasted recover request from (6)@192.168.122.225:5050
>> >I0605 07:12:51.927891  2246 zookeeper.cpp:259] A new leading master
>> > (UPID=master@192.168.122.171:5050) is detected
>> >I0605 07:12:51.928444 2246 master.cpp:1951] The newly elected leader
>> > is master@192.168.122.171:5050 with id
>> cdc459d4-a05f-4f99-9bf4-1ee9a91d139b
>> >
>> > In 192.168.122.132 (last master I started):
>> > I0605 07:12:53.553949 16426 detector.cpp:152] Detected a new leader:
>> > (id='25')
>> > I0605 07:12:53.555179 16429 group.cpp:698] Trying to get
>> > '/mesos/json.info_25' in ZooKeeper
>> > I0605 07:12:53.560045 16428 zookeeper.cpp:259] A new leading master
>> (UPID=
>> > master@192.168.122.171:5050) is detected
>> >
>> > So right after I started these 3 masters, the first one
>> (192.168.122.171)
>> > was successfully elected as leader, but after 60s, 192.168.122.171
>> failed
>> > with the error mentioned in my first mail, and then 192.168.122.225 was
>> > elected as leader, but it failed with the same error too after another
>> 60s,
>> > and the same thing happened to the last one (192.168.122.132). So after
>> > about 180s, all my 3 master were down.
>> >
>> > I tried both:
>> >sudo ./bin/mesos-master.sh --zk=zk://127.0.0.1:2181/mesos --quorum=2
>> > --work_dir=/var/lib/mesos/master
>> > and
>> >sudo ./bin/mesos-master.sh --zk=zk://192.168.122.132:2181,
>> > 192.168.122.171:2181,192.168.122.225:2181/mesos --quorum=2
>> > --work_dir=/var/lib/mesos/master
>> > And I see the same error for both.
>> >
>> > 192.168.122.132, 192.168.122.225 and 192.168.122

Re: Rack awareness support for Mesos

2016-06-06 Thread Charles Allen
There are a lot of things in Mesos which require a-priori communication
between an agent and a framework in order to properly set resource usage
expectations (example: what does 1 cpu mean?). I'm not seeing how having
customizations in core mesos per "way of looking at resources" is scalable
and future-proof.

On Mon, Jun 6, 2016 at 8:48 AM Jörg Schad  wrote:

> Hi,
> thanks for your idea and design doc!
> Just a few thoughts:
> a) The scheduling part would be implemented in a framework scheduler and
> not the Mesos Core, or?
> b) As mentioned by James, this needs to be very flexible (and not
> necessarily based on network structure), afaik people are using labels on
> the agents to identify different fault domains which can then be
> interpreted by framework scheduler. Maybe it would make sense (instead of
> identifying the network structure) to come up with a common label naming
> scheme which can be understood by all/different frameworks.
>
> Looking forward to your thoughts on this!
>
> On Mon, Jun 6, 2016 at 3:27 PM, james  wrote:
>
>> Hello,
>>
>>
>> @Stephen::I guess Stephen is bringing up the 'security' aspect of who
>> get's access to the information, particularly cluster/cloud devops,
>> customers or interlopers?
>>
>>
>> @Fan:: As a consultant, most of my customers either have  or are planning
>> hybrid installations, where some codes run on a local cluster or using 'the
>> cloud' for dynamic load requirements. I would think your proposed scheme
>> needs to be very flexible, both in application to a campus or Metropolitan
>> Area Network, if not massively distributed around the globe. What about
>> different resouce types (racks of arm64, gpu centric hardware, DSPs, FPGA
>> etc etc. Hardware diversity bring many
>> benefits to the cluster/cloud capabilities.
>>
>>
>> This also begs the quesion of hardware management (boot/config/online)
>> of the various hardware, such as is built into coreOS. Are several
>> applications going to be supported? Standards track? Just Mesos DC/OS
>> centric?
>>
>>
>> TIMING DATA:: This is the main issue I see. Once you start 'vectoring
>> in resources' you need to add timing (latency) data to encourage robust
>> and diversified use of of this data. For HPC, this could be very valuable
>> for rDMA abusive algorithms where memory constrained workloads not only
>> need the knowledge of additional nearby memory resources, but
>> the approximated (based on previous data collected) latency and bandwidth
>> constraints to use those additional resources.
>>
>>
>> Great idea. I do like it very much.
>>
>> hth,
>> James
>>
>>
>>
>> On 06/06/2016 05:06 AM, Stephen Gran wrote:
>>
>>> Hi,
>>>
>>> This looks potentially interesting.  How does it work in a public cloud
>>> deployment scenario?  I assume you would just have to disable this
>>> feature, or not enable it?
>>>
>>> Cheers,
>>>
>>> On 06/06/16 10:17, Du, Fan wrote:
>>>
 Hi, Mesos folks

 I’ve been thinking about Mesos rack awareness support for a while,

 it’s a common interest for lots of data center applications to provide
 data locality,

 fault tolerance and better task placement. Create MESOS-5545 to track
 the story,

 and here is the initial design doc [1] to support rack awareness in
 Mesos.

 Looking forward to hear any comments from end user and other developers,

 Thanks!

 [1]:

 https://docs.google.com/document/d/1rql_LZSwtQzBPALnk0qCLsmxcT3-zB7X7aJp-H3xxyE/edit?usp=sharing


>>>
>>
>


Re: Rack awareness support for Mesos

2016-06-06 Thread Jörg Schad
Hi,
thanks for your idea and design doc!
Just a few thoughts:
a) The scheduling part would be implemented in a framework scheduler and
not the Mesos Core, or?
b) As mentioned by James, this needs to be very flexible (and not
necessarily based on network structure), afaik people are using labels on
the agents to identify different fault domains which can then be
interpreted by framework scheduler. Maybe it would make sense (instead of
identifying the network structure) to come up with a common label naming
scheme which can be understood by all/different frameworks.

Looking forward to your thoughts on this!

On Mon, Jun 6, 2016 at 3:27 PM, james  wrote:

> Hello,
>
>
> @Stephen::I guess Stephen is bringing up the 'security' aspect of who
> get's access to the information, particularly cluster/cloud devops,
> customers or interlopers?
>
>
> @Fan:: As a consultant, most of my customers either have  or are planning
> hybrid installations, where some codes run on a local cluster or using 'the
> cloud' for dynamic load requirements. I would think your proposed scheme
> needs to be very flexible, both in application to a campus or Metropolitan
> Area Network, if not massively distributed around the globe. What about
> different resouce types (racks of arm64, gpu centric hardware, DSPs, FPGA
> etc etc. Hardware diversity bring many
> benefits to the cluster/cloud capabilities.
>
>
> This also begs the quesion of hardware management (boot/config/online)
> of the various hardware, such as is built into coreOS. Are several
> applications going to be supported? Standards track? Just Mesos DC/OS
> centric?
>
>
> TIMING DATA:: This is the main issue I see. Once you start 'vectoring
> in resources' you need to add timing (latency) data to encourage robust
> and diversified use of of this data. For HPC, this could be very valuable
> for rDMA abusive algorithms where memory constrained workloads not only
> need the knowledge of additional nearby memory resources, but
> the approximated (based on previous data collected) latency and bandwidth
> constraints to use those additional resources.
>
>
> Great idea. I do like it very much.
>
> hth,
> James
>
>
>
> On 06/06/2016 05:06 AM, Stephen Gran wrote:
>
>> Hi,
>>
>> This looks potentially interesting.  How does it work in a public cloud
>> deployment scenario?  I assume you would just have to disable this
>> feature, or not enable it?
>>
>> Cheers,
>>
>> On 06/06/16 10:17, Du, Fan wrote:
>>
>>> Hi, Mesos folks
>>>
>>> I’ve been thinking about Mesos rack awareness support for a while,
>>>
>>> it’s a common interest for lots of data center applications to provide
>>> data locality,
>>>
>>> fault tolerance and better task placement. Create MESOS-5545 to track
>>> the story,
>>>
>>> and here is the initial design doc [1] to support rack awareness in
>>> Mesos.
>>>
>>> Looking forward to hear any comments from end user and other developers,
>>>
>>> Thanks!
>>>
>>> [1]:
>>>
>>> https://docs.google.com/document/d/1rql_LZSwtQzBPALnk0qCLsmxcT3-zB7X7aJp-H3xxyE/edit?usp=sharing
>>>
>>>
>>
>


Re: OSX 10.10.5 and mesos 0.28.1 -- 10 to 20 X difference in sleep() method compared to non mesos

2016-06-06 Thread haosdent
yes, mine is a laptop.

On Mon, Jun 6, 2016 at 11:23 PM, DiGiorgio, Mr. Rinaldo S. <
rdigior...@pace.edu> wrote:

>
> On Jun 6, 2016, at 11:08, haosdent  wrote:
>
> I use OS X 10.11.4 but I think it should not related to this minor
> different on OS X version.
> For settings, I disable `Enable Power Nap while plugged into a power
> adapter` in `Energy Saver`.
>
>
> I assume you have a laptop, I have retested with power nap being set to
> off on an iMac and it did not make a difference. I will try it on a
> laptop.  The other 15 Macs are all towers. Maybe it is a clue, works on
> laptops but not on desktops.
>
> On Mon, Jun 6, 2016 at 10:57 PM, DiGiorgio, Mr. Rinaldo S. <
> rdigior...@pace.edu> wrote:
>
>
>
> Thanks James and Haosdent,
>>
>> I built my own version of mesos 1.0 and installed it on 10.11.5 and I am
>> getting the same results as 10.10.5 with 0.28.1
>>
>> Do either of you remember what you may have set on your OSX machines to
>> default some of the desktop settings that may be causing this issue?
>>
>>
>> We are already turning off many desktop features since they are not
>> relevant for our use case.
>>
>>
>>
>> Received SUBSCRIBED event
>> Subscribed executor on
>> dhcp-adc-twvpn-3-vpnpool-10-154-101-79.vpn.oracle.com
>> Received LAUNCH event
>> Starting task test-sleep
>> sh -c 'cd /tmp && java SleepLatency'
>> Forked command at 2907
>> loop   0 delay   35 ms
>> loop   1 delay  108 ms
>> loop   2 delay  104 ms
>> loop   3 delay   57 ms
>> loop   4 delay  104 ms
>> loop   5 delay   93 ms
>> loop   6 delay   12 ms
>> loop   7 delay   17 ms
>> loop   8 delay  105 ms
>> loop   9 delay  109 ms
>>
>>
>> > On Jun 4, 2016, at 05:07, haosdent  wrote:
>> >
>> > Hi, Rinaldo. I test your problem in my local Mesos (run in my mac). It
>> looks normal in my side. I started it by
>> >
>> > ```
>> > mesos-execute --master="localhost:5050" --name="test-sleep"
>> --command="cd /tmp && java SleepLatency"
>> > ```
>> >
>> > ```
>> > Registered executor on localhost
>> > Starting task test-sleep
>> > sh -c 'cd /tmp && java SleepLatency'
>> > Forked command at 42480
>> > loop   0 delay   11 ms
>> > loop   1 delay   12 ms
>> > loop   2 delay   11 ms
>> > loop   3 delay   13 ms
>> > loop   4 delay   12 ms
>> > loop   5 delay   12 ms
>> > loop   6 delay   12 ms
>> > loop   7 delay   12 ms
>> > loop   8 delay   12 ms
>> > loop   9 delay   11 ms
>> > loop  10 delay   12 ms
>> > loop  11 delay   13 ms
>> > loop  12 delay   11 ms
>> > loop  13 delay   11 ms
>> > loop  14 delay   12 ms
>> > loop  15 delay   12 ms
>> > loop  16 delay   13 ms
>> > loop  17 delay   12 ms
>> > loop  18 delay   11 ms
>> > loop  19 delay   12 ms
>> > loop  20 delay   12 ms
>> > loop  21 delay   11 ms
>> > loop  22 delay   12 ms
>> > loop  23 delay   11 ms
>> > loop  24 delay   12 ms
>> > loop  25 delay   12 ms
>> > loop  26 delay   12 ms
>> > loop  27 delay   12 ms
>> > loop  28 delay   13 ms
>> > loop  29 delay   12 ms
>> > loop  30 delay   12 ms
>> > loop  31 delay   11 ms
>> > loop  32 delay   11 ms
>> > loop  33 delay   11 ms
>> > loop  34 delay   13 ms
>> > loop  35 delay   10 ms
>> > loop  36 delay   12 ms
>> > loop  37 delay   11 ms
>> > loop  38 delay   12 ms
>> > loop  39 delay   12 ms
>> > loop  40 delay   12 ms
>> > loop  41 delay   12 ms
>> > loop  42 delay   12 ms
>> > loop  43 delay   11 ms
>> > loop  44 delay   12 ms
>> > loop  45 delay   12 ms
>> > loop  46 delay   12 ms
>> > loop  47 delay   12 ms
>> > loop  48 delay   12 ms
>> > loop  49 delay   12 ms
>> > loop  50 delay   12 ms
>> > loop  51 delay   12 ms
>> > loop  52 delay   12 ms
>> > loop  53 delay   13 ms
>> > loop  54 delay   11 ms
>> > loop  55 delay   12 ms
>> > loop  56 delay   12 ms
>> > loop  57 delay   12 ms
>> > loop  58 delay   12 ms
>> > loop  59 delay   11 ms
>> > loop  60 delay   11 ms
>> > loop  61 delay   11 ms
>> > loop  62 delay   11 ms
>> > loop  63 delay   12 ms
>> > loop  64 delay   12 ms
>> > loop  65 delay   12 ms
>> > loop  66 delay   12 ms
>> > loop  67 delay   12 ms
>> > loop  68 delay   11 ms
>> > loop  69 delay   13 ms
>> > loop  70 delay   12 ms
>> > loop  71 delay   10 ms
>> > loop  72 delay   12 ms
>> > loop  73 delay   10 ms
>> > loop  74 delay   11 ms
>> > loop  75 delay   12 ms
>> > loop  76 delay   10 ms
>> > loop  77 delay   11 ms
>> > loop  78 delay   12 ms
>> > loop  79 delay   10 ms
>> > loop  80 delay   12 ms
>> > loop  81 delay   12 ms
>> > loop  82 delay   11 ms
>> > loop  83 delay   11 ms
>> > loop  84 delay   12 ms
>> > loop  85 delay   11 ms
>> > loop  86 delay   12 ms
>> > loop  87 delay   12 ms
>> > loop  88 delay   12 ms
>> > loop  89 delay   11 ms
>> > loop  90 delay   12 ms
>> > loop  91 delay   12 ms
>> > loop  92 delay   13 ms
>> > loop  93 delay   12 ms
>> > loop  94 delay   11 ms
>> > loop  95 delay   10 ms
>> > loop  96 delay   12 ms
>> > loop  97 delay   11 ms
>> > loop  98 delay   12 ms
>> > loop  99 delay   12 ms
>> > total time =  1215 ms
>> > Command exited with status 0 (pid: 42480)
>> > ```
>> >
>> > On Sat, Ju

Re: OSX 10.10.5 and mesos 0.28.1 -- 10 to 20 X difference in sleep() method compared to non mesos

2016-06-06 Thread DiGiorgio, Mr. Rinaldo S.

On Jun 6, 2016, at 11:08, haosdent 
mailto:haosd...@gmail.com>> wrote:

I use OS X 10.11.4 but I think it should not related to this minor different on 
OS X version.
For settings, I disable `Enable Power Nap while plugged into a power adapter` 
in `Energy Saver`.

I assume you have a laptop, I have retested with power nap being set to off on 
an iMac and it did not make a difference. I will try it on a laptop.  The other 
15 Macs are all towers. Maybe it is a clue, works on laptops but not on 
desktops.
On Mon, Jun 6, 2016 at 10:57 PM, DiGiorgio, Mr. Rinaldo S. 
mailto:rdigior...@pace.edu>> wrote:


Thanks James and Haosdent,

I built my own version of mesos 1.0 and installed it on 10.11.5 and I am 
getting the same results as 10.10.5 with 0.28.1

Do either of you remember what you may have set on your OSX machines to default 
some of the desktop settings that may be causing this issue?


We are already turning off many desktop features since they are not relevant 
for our use case.



Received SUBSCRIBED event
Subscribed executor on 
dhcp-adc-twvpn-3-vpnpool-10-154-101-79.vpn.oracle.com
Received LAUNCH event
Starting task test-sleep
sh -c 'cd /tmp && java SleepLatency'
Forked command at 2907
loop   0 delay   35 ms
loop   1 delay  108 ms
loop   2 delay  104 ms
loop   3 delay   57 ms
loop   4 delay  104 ms
loop   5 delay   93 ms
loop   6 delay   12 ms
loop   7 delay   17 ms
loop   8 delay  105 ms
loop   9 delay  109 ms


> On Jun 4, 2016, at 05:07, haosdent 
> mailto:haosd...@gmail.com>> wrote:
>
> Hi, Rinaldo. I test your problem in my local Mesos (run in my mac). It looks 
> normal in my side. I started it by
>
> ```
> mesos-execute --master="localhost:5050" --name="test-sleep" --command="cd 
> /tmp && java SleepLatency"
> ```
>
> ```
> Registered executor on localhost
> Starting task test-sleep
> sh -c 'cd /tmp && java SleepLatency'
> Forked command at 42480
> loop   0 delay   11 ms
> loop   1 delay   12 ms
> loop   2 delay   11 ms
> loop   3 delay   13 ms
> loop   4 delay   12 ms
> loop   5 delay   12 ms
> loop   6 delay   12 ms
> loop   7 delay   12 ms
> loop   8 delay   12 ms
> loop   9 delay   11 ms
> loop  10 delay   12 ms
> loop  11 delay   13 ms
> loop  12 delay   11 ms
> loop  13 delay   11 ms
> loop  14 delay   12 ms
> loop  15 delay   12 ms
> loop  16 delay   13 ms
> loop  17 delay   12 ms
> loop  18 delay   11 ms
> loop  19 delay   12 ms
> loop  20 delay   12 ms
> loop  21 delay   11 ms
> loop  22 delay   12 ms
> loop  23 delay   11 ms
> loop  24 delay   12 ms
> loop  25 delay   12 ms
> loop  26 delay   12 ms
> loop  27 delay   12 ms
> loop  28 delay   13 ms
> loop  29 delay   12 ms
> loop  30 delay   12 ms
> loop  31 delay   11 ms
> loop  32 delay   11 ms
> loop  33 delay   11 ms
> loop  34 delay   13 ms
> loop  35 delay   10 ms
> loop  36 delay   12 ms
> loop  37 delay   11 ms
> loop  38 delay   12 ms
> loop  39 delay   12 ms
> loop  40 delay   12 ms
> loop  41 delay   12 ms
> loop  42 delay   12 ms
> loop  43 delay   11 ms
> loop  44 delay   12 ms
> loop  45 delay   12 ms
> loop  46 delay   12 ms
> loop  47 delay   12 ms
> loop  48 delay   12 ms
> loop  49 delay   12 ms
> loop  50 delay   12 ms
> loop  51 delay   12 ms
> loop  52 delay   12 ms
> loop  53 delay   13 ms
> loop  54 delay   11 ms
> loop  55 delay   12 ms
> loop  56 delay   12 ms
> loop  57 delay   12 ms
> loop  58 delay   12 ms
> loop  59 delay   11 ms
> loop  60 delay   11 ms
> loop  61 delay   11 ms
> loop  62 delay   11 ms
> loop  63 delay   12 ms
> loop  64 delay   12 ms
> loop  65 delay   12 ms
> loop  66 delay   12 ms
> loop  67 delay   12 ms
> loop  68 delay   11 ms
> loop  69 delay   13 ms
> loop  70 delay   12 ms
> loop  71 delay   10 ms
> loop  72 delay   12 ms
> loop  73 delay   10 ms
> loop  74 delay   11 ms
> loop  75 delay   12 ms
> loop  76 delay   10 ms
> loop  77 delay   11 ms
> loop  78 delay   12 ms
> loop  79 delay   10 ms
> loop  80 delay   12 ms
> loop  81 delay   12 ms
> loop  82 delay   11 ms
> loop  83 delay   11 ms
> loop  84 delay   12 ms
> loop  85 delay   11 ms
> loop  86 delay   12 ms
> loop  87 delay   12 ms
> loop  88 delay   12 ms
> loop  89 delay   11 ms
> loop  90 delay   12 ms
> loop  91 delay   12 ms
> loop  92 delay   13 ms
> loop  93 delay   12 ms
> loop  94 delay   11 ms
> loop  95 delay   10 ms
> loop  96 delay   12 ms
> loop  97 delay   11 ms
> loop  98 delay   12 ms
> loop  99 delay   12 ms
> total time =  1215 ms
> Command exited with status 0 (pid: 42480)
> ```
>
> On Sat, Jun 4, 2016 at 4:11 AM, DiGiorgio, Mr. Rinaldo S. 
> mailto:rdigior...@pace.edu>> wrote:
> Hi,
>
> We are running the following Java application and we are getting 
> unreasonable deltas in the actual amount time slept. On linux the results are 
> as expected 10, 11, 12 but mostly 10ms.  Can you suggest any changes we can 
> make or is this a known issue or a new issue to be investigated? When we run 
> the same code on the same instance of OS

Re: OSX 10.10.5 and mesos 0.28.1 -- 10 to 20 X difference in sleep() method compared to non mesos

2016-06-06 Thread haosdent
I use OS X 10.11.4 but I think it should not related to this minor
different on OS X version.
For settings, I disable `Enable Power Nap while plugged into a power
adapter` in `Energy Saver`.

On Mon, Jun 6, 2016 at 10:57 PM, DiGiorgio, Mr. Rinaldo S. <
rdigior...@pace.edu> wrote:

> Thanks James and Haosdent,
>
> I built my own version of mesos 1.0 and installed it on 10.11.5 and I am
> getting the same results as 10.10.5 with 0.28.1
>
> Do either of you remember what you may have set on your OSX machines to
> default some of the desktop settings that may be causing this issue?
>
>
> We are already turning off many desktop features since they are not
> relevant for our use case.
>
>
>
> Received SUBSCRIBED event
> Subscribed executor on
> dhcp-adc-twvpn-3-vpnpool-10-154-101-79.vpn.oracle.com
> Received LAUNCH event
> Starting task test-sleep
> sh -c 'cd /tmp && java SleepLatency'
> Forked command at 2907
> loop   0 delay   35 ms
> loop   1 delay  108 ms
> loop   2 delay  104 ms
> loop   3 delay   57 ms
> loop   4 delay  104 ms
> loop   5 delay   93 ms
> loop   6 delay   12 ms
> loop   7 delay   17 ms
> loop   8 delay  105 ms
> loop   9 delay  109 ms
>
>
> > On Jun 4, 2016, at 05:07, haosdent  wrote:
> >
> > Hi, Rinaldo. I test your problem in my local Mesos (run in my mac). It
> looks normal in my side. I started it by
> >
> > ```
> > mesos-execute --master="localhost:5050" --name="test-sleep"
> --command="cd /tmp && java SleepLatency"
> > ```
> >
> > ```
> > Registered executor on localhost
> > Starting task test-sleep
> > sh -c 'cd /tmp && java SleepLatency'
> > Forked command at 42480
> > loop   0 delay   11 ms
> > loop   1 delay   12 ms
> > loop   2 delay   11 ms
> > loop   3 delay   13 ms
> > loop   4 delay   12 ms
> > loop   5 delay   12 ms
> > loop   6 delay   12 ms
> > loop   7 delay   12 ms
> > loop   8 delay   12 ms
> > loop   9 delay   11 ms
> > loop  10 delay   12 ms
> > loop  11 delay   13 ms
> > loop  12 delay   11 ms
> > loop  13 delay   11 ms
> > loop  14 delay   12 ms
> > loop  15 delay   12 ms
> > loop  16 delay   13 ms
> > loop  17 delay   12 ms
> > loop  18 delay   11 ms
> > loop  19 delay   12 ms
> > loop  20 delay   12 ms
> > loop  21 delay   11 ms
> > loop  22 delay   12 ms
> > loop  23 delay   11 ms
> > loop  24 delay   12 ms
> > loop  25 delay   12 ms
> > loop  26 delay   12 ms
> > loop  27 delay   12 ms
> > loop  28 delay   13 ms
> > loop  29 delay   12 ms
> > loop  30 delay   12 ms
> > loop  31 delay   11 ms
> > loop  32 delay   11 ms
> > loop  33 delay   11 ms
> > loop  34 delay   13 ms
> > loop  35 delay   10 ms
> > loop  36 delay   12 ms
> > loop  37 delay   11 ms
> > loop  38 delay   12 ms
> > loop  39 delay   12 ms
> > loop  40 delay   12 ms
> > loop  41 delay   12 ms
> > loop  42 delay   12 ms
> > loop  43 delay   11 ms
> > loop  44 delay   12 ms
> > loop  45 delay   12 ms
> > loop  46 delay   12 ms
> > loop  47 delay   12 ms
> > loop  48 delay   12 ms
> > loop  49 delay   12 ms
> > loop  50 delay   12 ms
> > loop  51 delay   12 ms
> > loop  52 delay   12 ms
> > loop  53 delay   13 ms
> > loop  54 delay   11 ms
> > loop  55 delay   12 ms
> > loop  56 delay   12 ms
> > loop  57 delay   12 ms
> > loop  58 delay   12 ms
> > loop  59 delay   11 ms
> > loop  60 delay   11 ms
> > loop  61 delay   11 ms
> > loop  62 delay   11 ms
> > loop  63 delay   12 ms
> > loop  64 delay   12 ms
> > loop  65 delay   12 ms
> > loop  66 delay   12 ms
> > loop  67 delay   12 ms
> > loop  68 delay   11 ms
> > loop  69 delay   13 ms
> > loop  70 delay   12 ms
> > loop  71 delay   10 ms
> > loop  72 delay   12 ms
> > loop  73 delay   10 ms
> > loop  74 delay   11 ms
> > loop  75 delay   12 ms
> > loop  76 delay   10 ms
> > loop  77 delay   11 ms
> > loop  78 delay   12 ms
> > loop  79 delay   10 ms
> > loop  80 delay   12 ms
> > loop  81 delay   12 ms
> > loop  82 delay   11 ms
> > loop  83 delay   11 ms
> > loop  84 delay   12 ms
> > loop  85 delay   11 ms
> > loop  86 delay   12 ms
> > loop  87 delay   12 ms
> > loop  88 delay   12 ms
> > loop  89 delay   11 ms
> > loop  90 delay   12 ms
> > loop  91 delay   12 ms
> > loop  92 delay   13 ms
> > loop  93 delay   12 ms
> > loop  94 delay   11 ms
> > loop  95 delay   10 ms
> > loop  96 delay   12 ms
> > loop  97 delay   11 ms
> > loop  98 delay   12 ms
> > loop  99 delay   12 ms
> > total time =  1215 ms
> > Command exited with status 0 (pid: 42480)
> > ```
> >
> > On Sat, Jun 4, 2016 at 4:11 AM, DiGiorgio, Mr. Rinaldo S. <
> rdigior...@pace.edu> wrote:
> > Hi,
> >
> > We are running the following Java application and we are getting
> unreasonable deltas in the actual amount time slept. On linux the results
> are as expected 10, 11, 12 but mostly 10ms.  Can you suggest any changes we
> can make or is this a known issue or a new issue to be investigated? When
> we run the same code on the same instance of OSX 10.10.5 without mesos  --
> we get the expected results.
> >
> >
> > public class SleepLatency {
> >static final int COUNT = 100;
> 

Re: 0.28.2 - mesos-docker-executor - libmesos not found

2016-06-06 Thread haosdent
I encounter this problem before, because some distribution of Linux didn't
add /usr/local/lib in default library search path.
And I try edit /etc/ld.so.conf and to include `/usr/local/lib` then
execute ldconfig in Agent. It works for me.

On Mon, Jun 6, 2016 at 8:47 PM, Kamil Wokitajtis 
wrote:

> I have solved this issue by adding env variable to marathon app deployment
> json:
>
> "env": {
>   "LD_LIBRARY_PATH" : "/usr/local/lib"
> }
>
> Maybe someone could shed light, why 0.28.2 is unable to locate libmesos
> without exporting LD_LIBRARY_PATH?
>
>
>
> 2016-06-06 14:21 GMT+02:00 Kamil Wokitajtis :
>
>> For 0.28.1:
>> root@pltr-app-pl01:~# ldd /usr/local/libexec/mesos/mesos-docker-executor
>> | grep libmesos
>> libmesos-0.28.1.so => /usr/local/lib/libmesos-0.28.1.so
>>  (0x7fbf522b6000)
>>
>> For 0.28.2, as expected, library not found:
>> root@pltr-app-pl02:~# ldd /usr/local/libexec/mesos/mesos-docker-executor
>> | grep libmesos
>> libmesos-0.28.2.so => not found
>>
>> After exporting LD_LIBRARY_PATH, libmesos is found as expected:
>> root@pltr-app-pl02:~# export LD_LIBRARY_PATH=/usr/local/lib
>> root@pltr-app-pl02:~# ldd /usr/local/libexec/mesos/mesos-docker-executor
>> | grep mesos
>> libmesos-0.28.2.so => /usr/local/lib/libmesos-0.28.2.so
>>  (0x7f219a8c1000)
>>
>> Looks like exporting LD_LIBRARY_PATH should help. But exporting this
>> variable in mesos startup scripts doesn't seem to solve issue.
>>
>>
>> 2016-06-06 13:50 GMT+02:00 Guangya Liu :
>>
>>> You can check what is the output of `ldd mesos-docker-executor` to see
>>> if the libmesos-xxx is in the right location.
>>>
>>> [root@dcos001 mesosphere]# ldd
>>> ./packages/mesos--0335ca0d3700ea88ad8b808f3b1b84d747ed07f0/libexec/mesos/mesos-docker-executor
>>> | grep libmesos
>>> libmesos-0.28.1.so =>
>>> /opt/mesosphere/packages/mesos--0335ca0d3700ea88ad8b808f3b1b84d747ed07f0/lib/
>>> libmesos-0.28.1.so (0x7f5cf1cb1000)
>>>
>>> Thanks,
>>>
>>> Guangya
>>>
>>> On Mon, Jun 6, 2016 at 6:38 PM, Kamil Wokitajtis 
>>> wrote:
>>>
 Hi,

 I have upgraded my mesos env from 0.28.1 to 0.28.2.
 On 0.28.1 everything worked just fine.
 Now agents are unable to start docker images, mesos throws:

 mesos-docker-executor: error while loading shared libraries:
 libmesos-0.28.2.so: cannot open shared object file: No such  file or
 directory

 Just like for Mesos 0.28.1 where it works, libmesos-0.28.2 is in
 /usr/local/lib
 There is also symlink libmesos.so -> libmesos-0.28.2.
 /etc/ld.so.conf.d/libc.conf contains /usr/local/lib entry.
 I have also tried exporting LD_LIBRARY_PATH in startup scripts, no luck.

 Thanks,
 Kamil


>>>
>>
>


-- 
Best Regards,
Haosdent Huang


Re: OSX 10.10.5 and mesos 0.28.1 -- 10 to 20 X difference in sleep() method compared to non mesos

2016-06-06 Thread DiGiorgio, Mr. Rinaldo S.
Thanks James and Haosdent,

I built my own version of mesos 1.0 and installed it on 10.11.5 and I am 
getting the same results as 10.10.5 with 0.28.1

Do either of you remember what you may have set on your OSX machines to default 
some of the desktop settings that may be causing this issue?


We are already turning off many desktop features since they are not relevant 
for our use case. 



Received SUBSCRIBED event
Subscribed executor on dhcp-adc-twvpn-3-vpnpool-10-154-101-79.vpn.oracle.com
Received LAUNCH event
Starting task test-sleep
sh -c 'cd /tmp && java SleepLatency'
Forked command at 2907
loop   0 delay   35 ms
loop   1 delay  108 ms
loop   2 delay  104 ms
loop   3 delay   57 ms
loop   4 delay  104 ms
loop   5 delay   93 ms
loop   6 delay   12 ms
loop   7 delay   17 ms
loop   8 delay  105 ms
loop   9 delay  109 ms


> On Jun 4, 2016, at 05:07, haosdent  wrote:
> 
> Hi, Rinaldo. I test your problem in my local Mesos (run in my mac). It looks 
> normal in my side. I started it by
> 
> ```
> mesos-execute --master="localhost:5050" --name="test-sleep" --command="cd 
> /tmp && java SleepLatency"
> ```
> 
> ```
> Registered executor on localhost
> Starting task test-sleep
> sh -c 'cd /tmp && java SleepLatency'
> Forked command at 42480
> loop   0 delay   11 ms
> loop   1 delay   12 ms
> loop   2 delay   11 ms
> loop   3 delay   13 ms
> loop   4 delay   12 ms
> loop   5 delay   12 ms
> loop   6 delay   12 ms
> loop   7 delay   12 ms
> loop   8 delay   12 ms
> loop   9 delay   11 ms
> loop  10 delay   12 ms
> loop  11 delay   13 ms
> loop  12 delay   11 ms
> loop  13 delay   11 ms
> loop  14 delay   12 ms
> loop  15 delay   12 ms
> loop  16 delay   13 ms
> loop  17 delay   12 ms
> loop  18 delay   11 ms
> loop  19 delay   12 ms
> loop  20 delay   12 ms
> loop  21 delay   11 ms
> loop  22 delay   12 ms
> loop  23 delay   11 ms
> loop  24 delay   12 ms
> loop  25 delay   12 ms
> loop  26 delay   12 ms
> loop  27 delay   12 ms
> loop  28 delay   13 ms
> loop  29 delay   12 ms
> loop  30 delay   12 ms
> loop  31 delay   11 ms
> loop  32 delay   11 ms
> loop  33 delay   11 ms
> loop  34 delay   13 ms
> loop  35 delay   10 ms
> loop  36 delay   12 ms
> loop  37 delay   11 ms
> loop  38 delay   12 ms
> loop  39 delay   12 ms
> loop  40 delay   12 ms
> loop  41 delay   12 ms
> loop  42 delay   12 ms
> loop  43 delay   11 ms
> loop  44 delay   12 ms
> loop  45 delay   12 ms
> loop  46 delay   12 ms
> loop  47 delay   12 ms
> loop  48 delay   12 ms
> loop  49 delay   12 ms
> loop  50 delay   12 ms
> loop  51 delay   12 ms
> loop  52 delay   12 ms
> loop  53 delay   13 ms
> loop  54 delay   11 ms
> loop  55 delay   12 ms
> loop  56 delay   12 ms
> loop  57 delay   12 ms
> loop  58 delay   12 ms
> loop  59 delay   11 ms
> loop  60 delay   11 ms
> loop  61 delay   11 ms
> loop  62 delay   11 ms
> loop  63 delay   12 ms
> loop  64 delay   12 ms
> loop  65 delay   12 ms
> loop  66 delay   12 ms
> loop  67 delay   12 ms
> loop  68 delay   11 ms
> loop  69 delay   13 ms
> loop  70 delay   12 ms
> loop  71 delay   10 ms
> loop  72 delay   12 ms
> loop  73 delay   10 ms
> loop  74 delay   11 ms
> loop  75 delay   12 ms
> loop  76 delay   10 ms
> loop  77 delay   11 ms
> loop  78 delay   12 ms
> loop  79 delay   10 ms
> loop  80 delay   12 ms
> loop  81 delay   12 ms
> loop  82 delay   11 ms
> loop  83 delay   11 ms
> loop  84 delay   12 ms
> loop  85 delay   11 ms
> loop  86 delay   12 ms
> loop  87 delay   12 ms
> loop  88 delay   12 ms
> loop  89 delay   11 ms
> loop  90 delay   12 ms
> loop  91 delay   12 ms
> loop  92 delay   13 ms
> loop  93 delay   12 ms
> loop  94 delay   11 ms
> loop  95 delay   10 ms
> loop  96 delay   12 ms
> loop  97 delay   11 ms
> loop  98 delay   12 ms
> loop  99 delay   12 ms
> total time =  1215 ms
> Command exited with status 0 (pid: 42480)
> ```
> 
> On Sat, Jun 4, 2016 at 4:11 AM, DiGiorgio, Mr. Rinaldo S. 
>  wrote:
> Hi,
> 
> We are running the following Java application and we are getting 
> unreasonable deltas in the actual amount time slept. On linux the results are 
> as expected 10, 11, 12 but mostly 10ms.  Can you suggest any changes we can 
> make or is this a known issue or a new issue to be investigated? When we run 
> the same code on the same instance of OSX 10.10.5 without mesos  -- we get 
> the expected results.
> 
> 
> public class SleepLatency {
>static final int COUNT = 100;
>static final long DELAY = 10L;
> 
>public static void main(String[] args) throws Exception {
>long tstart = System.currentTimeMillis();
>for (int i = 0; i < COUNT; i++) {
>long t0 = System.currentTimeMillis();
>Thread.sleep(DELAY);
>long t1 = System.currentTimeMillis();
>System.out.printf("loop %3d delay %4d ms%n", i, t1 - t0);
>}
>long tfinish = System.currentTimeMillis();
>System.out.printf("total time = %5d ms%n", tfinish - tstart);
>}
> }
> 
> == OSX   RESULTS are 10 to 20 times  

Re: how to stop the mesos executor process in JVM?

2016-06-06 Thread Sharma Podila
Yao, in our Java executor, we explicitly call System.exit(0) after we have
successfully sent the last finished message. However, note that there can
be a little bit of a timing issue here. Once we send the last message, we
call an asynchronous "sleep some and exit" routine. This gives the mesos
driver a chance to send the last message successfully before the executor
JVM exits. Usually, a sleep of 2-3 secs should suffice. There may be a more
elegant way to handle this timing issue, but, I haven't looked at it
recently.


On Mon, Jun 6, 2016 at 6:34 AM, Vinod Kone  wrote:

> Couple things.
>
> You need to do the business logic and status update sending in a different
> thread than synchronously in launchTask(). This is because the driver
> doesn't send messages to the agent unless the launchTask() method returns.
> See
> https://github.com/apache/mesos/blob/master/src/examples/java/TestExecutor.java
> for example.
>
> Regarding exiting the executor, driver.stop() or driver.abort() only stops
> the driver, i.e., your executor won't be able to send or receive messages
> from the agent. It is entirely up to the executor process to exit.
>
> HTH,
>
>
>
> On Mon, Jun 6, 2016 at 4:05 AM, Yao Wang  wrote:
>
>> Hi , all !
>>
>> I write my own executor to run code,
>>
>> I override the launchTask method like that :
>>
>>
>> 
>>
>> @Override public void launchTask(ExecutorDriver driver, Protos.TaskInfo
>> task) {
>> LOGGER.info("Executor is launching task#{}\n...", task);
>> //before launch
>> driver.sendStatusUpdate(
>>
>> Protos.TaskStatus.newBuilder().setTaskId(task.getTaskId()).setState(
>> Protos.TaskState.TASK_RUNNING).build());
>>
>> LOGGER.info("Add your bussiness code hear .. ");
>> //bussiness code hear
>>
>>
>> //after launch
>> driver.sendStatusUpdate(
>>
>> Protos.TaskStatus.newBuilder().setTaskId(task.getTaskId()).setState(Protos.TaskState.TASK_FINISHED).setData(
>> ByteString.copyFromUtf8(
>> "${taksData}")).build());
>>
>>
>>   } // end method launchTask
>> 
>>
>>
>> And i build the commandInfo  like that:
>>
>> 
>>
>>
>> String executorCommand = String.format("java -jar %s",
>> extractPath(executorJarPath));
>>
>> Protos.CommandInfo.URI.Builder executorJarURI =
>> Protos.CommandInfo.URI.newBuilder().setValue(executorJarPath); //
>> executorJarURI is local uri or hadoop
>>
>> Protos.CommandInfo.Builder commandInfoBuilder =
>> Protos.CommandInfo.newBuilder().setEnvironment(envBuilder).setValue(
>> executorCommand).addUris(executorJarURI); // executorJarURI is
>> local uri or hadoop
>>
>> long  ctms  = System.nanoTime();
>>
>> Protos.ExecutorID.Builder executorIDBuilder =
>> Protos.ExecutorID.newBuilder().setValue(new StringBuilder().append(
>>   ctms).append("-").append(task.getTaskRequestId()).toString());
>>   Protos.ExecutorInfo.Builder executorInfoBuilder =
>> Protos.ExecutorInfo.newBuilder().setExecutorId(
>>
>> executorIDBuilder).setCommand(commandInfoBuilder).setName("flexcloud-executor-2.0.1-"
>> + ctms).setSource("java");
>>
>> // TaskInfo
>> Protos.TaskInfo.Builder taskInfoBuilder =
>> Protos.TaskInfo.newBuilder().setName(task.getTaskName()).setTaskId(
>>
>> taskIDBuilder).setSlaveId(offer.getSlaveId()).setExecutor(executorInfoBuilder);
>>
>>
>> return taskInfoBuilder.build();
>> 
>>
>>
>> After run the executor with mesos for several times ,  i found every
>> executor  was not exit ,
>>
>> I  execute $ ps -ef | grep “java -jar”  on the slave machine , that
>> shows me :
>>
>> wangyao$ ps -ef | grep "java -jar"
>>   501 20078 19302   0  3:54下午 ?? 0:15.77 /usr/bin/java -jar
>> flexcloud-executor.jar
>>   501 20154 19302   0  3:54下午 ?? 0:17.92 /usr/bin/java -jar
>> flexcloud-executor.jar
>>   501 20230 19302   0  3:54下午 ?? 0:16.13 /usr/bin/java -jar
>> flexcloud-executor.jar
>>
>> In order to stop these process after running a executor,   first ,  i
>> tried to add code  "driver.stop()” or “driver.abort()” to the Executor’s
>> launchTask method,  but it is unused.
>> So,  I add code  “System.exit(0)” ,  stop the JVM directly……. it works  …
>>
>> I have doubt about this way to stop executor  ,  it is the only way to do
>> that?
>>
>>
>


Re: how to stop the mesos executor process in JVM?

2016-06-06 Thread Vinod Kone
Couple things.

You need to do the business logic and status update sending in a different
thread than synchronously in launchTask(). This is because the driver
doesn't send messages to the agent unless the launchTask() method returns.
See
https://github.com/apache/mesos/blob/master/src/examples/java/TestExecutor.java
for example.

Regarding exiting the executor, driver.stop() or driver.abort() only stops
the driver, i.e., your executor won't be able to send or receive messages
from the agent. It is entirely up to the executor process to exit.

HTH,



On Mon, Jun 6, 2016 at 4:05 AM, Yao Wang  wrote:

> Hi , all !
>
> I write my own executor to run code,
>
> I override the launchTask method like that :
>
>
> 
>
> @Override public void launchTask(ExecutorDriver driver, Protos.TaskInfo
> task) {
> LOGGER.info("Executor is launching task#{}\n...", task);
> //before launch
> driver.sendStatusUpdate(
>
> Protos.TaskStatus.newBuilder().setTaskId(task.getTaskId()).setState(
> Protos.TaskState.TASK_RUNNING).build());
>
> LOGGER.info("Add your bussiness code hear .. ");
> //bussiness code hear
>
>
> //after launch
> driver.sendStatusUpdate(
>
> Protos.TaskStatus.newBuilder().setTaskId(task.getTaskId()).setState(Protos.TaskState.TASK_FINISHED).setData(
> ByteString.copyFromUtf8(
> "${taksData}")).build());
>
>
>   } // end method launchTask
> 
>
>
> And i build the commandInfo  like that:
>
> 
>
>
> String executorCommand = String.format("java -jar %s",
> extractPath(executorJarPath));
>
> Protos.CommandInfo.URI.Builder executorJarURI =
> Protos.CommandInfo.URI.newBuilder().setValue(executorJarPath); //
> executorJarURI is local uri or hadoop
>
> Protos.CommandInfo.Builder commandInfoBuilder =
> Protos.CommandInfo.newBuilder().setEnvironment(envBuilder).setValue(
> executorCommand).addUris(executorJarURI); // executorJarURI is
> local uri or hadoop
>
> long  ctms  = System.nanoTime();
>
> Protos.ExecutorID.Builder executorIDBuilder =
> Protos.ExecutorID.newBuilder().setValue(new StringBuilder().append(
>   ctms).append("-").append(task.getTaskRequestId()).toString());
>   Protos.ExecutorInfo.Builder executorInfoBuilder =
> Protos.ExecutorInfo.newBuilder().setExecutorId(
>
> executorIDBuilder).setCommand(commandInfoBuilder).setName("flexcloud-executor-2.0.1-"
> + ctms).setSource("java");
>
> // TaskInfo
> Protos.TaskInfo.Builder taskInfoBuilder =
> Protos.TaskInfo.newBuilder().setName(task.getTaskName()).setTaskId(
>
> taskIDBuilder).setSlaveId(offer.getSlaveId()).setExecutor(executorInfoBuilder);
>
>
> return taskInfoBuilder.build();
> 
>
>
> After run the executor with mesos for several times ,  i found every
> executor  was not exit ,
>
> I  execute $ ps -ef | grep “java -jar”  on the slave machine , that shows
> me :
>
> wangyao$ ps -ef | grep "java -jar"
>   501 20078 19302   0  3:54下午 ?? 0:15.77 /usr/bin/java -jar
> flexcloud-executor.jar
>   501 20154 19302   0  3:54下午 ?? 0:17.92 /usr/bin/java -jar
> flexcloud-executor.jar
>   501 20230 19302   0  3:54下午 ?? 0:16.13 /usr/bin/java -jar
> flexcloud-executor.jar
>
> In order to stop these process after running a executor,   first ,  i
> tried to add code  "driver.stop()” or “driver.abort()” to the Executor’s
> launchTask method,  but it is unused.
> So,  I add code  “System.exit(0)” ,  stop the JVM directly……. it works  …
>
> I have doubt about this way to stop executor  ,  it is the only way to do
> that?
>
>


Re: 0.28.2 - mesos-docker-executor - libmesos not found

2016-06-06 Thread Kamil Wokitajtis
Thanks, that was the case. I wonder why I didn't have to run ldconfig in
the first place after 0.28.1 install.
Anyway, another lesson learnt.

2016-06-06 14:51 GMT+02:00 Stephen Gran :

> Hi,
>
> Run ldconfig after install?
>
> Cheers,
>
> On 06/06/16 13:21, Kamil Wokitajtis wrote:
> > For 0.28.1:
> > root@pltr-app-pl01:~# ldd /usr/local/libexec/mesos/mesos-docker-executor
> > | grep libmesos
> > libmesos-0.28.1.so  =>
> > /usr/local/lib/libmesos-0.28.1.so
> >  (0x7fbf522b6000)
> >
> > For 0.28.2, as expected, library not found:
> > root@pltr-app-pl02:~# ldd /usr/local/libexec/mesos/mesos-docker-executor
> > | grep libmesos
> > libmesos-0.28.2.so  => not found
> >
> > After exporting LD_LIBRARY_PATH, libmesos is found as expected:
> > root@pltr-app-pl02:~#export LD_LIBRARY_PATH=/usr/local/lib
> > root@pltr-app-pl02:~# ldd /usr/local/libexec/mesos/mesos-docker-executor
> > | grep mesos
> > libmesos-0.28.2.so  =>
> > /usr/local/lib/libmesos-0.28.2.so
> >  (0x7f219a8c1000)
> >
> > Looks like exporting LD_LIBRARY_PATH should help. But exporting this
> > variable in mesos startup scripts doesn't seem to solve issue.
> >
> >
> > 2016-06-06 13:50 GMT+02:00 Guangya Liu  > >:
> >
> > You can check what is the output of `ldd mesos-docker-executor` to
> > see if the libmesos-xxx is in the right location.
> >
> > [root@dcos001 mesosphere]# ldd
> >
>  
> ./packages/mesos--0335ca0d3700ea88ad8b808f3b1b84d747ed07f0/libexec/mesos/mesos-docker-executor
> > | grep libmesos
> > libmesos-0.28.1.so  =>
> >
>  /opt/mesosphere/packages/mesos--0335ca0d3700ea88ad8b808f3b1b84d747ed07f0/lib/
> libmesos-0.28.1.so
> >  (0x7f5cf1cb1000)
> >
> > Thanks,
> >
> > Guangya
> >
> > On Mon, Jun 6, 2016 at 6:38 PM, Kamil Wokitajtis
> > mailto:wokitaj...@gmail.com>> wrote:
> >
> > Hi,
> >
> > I have upgraded my mesos env from 0.28.1 to 0.28.2.
> > On 0.28.1 everything worked just fine.
> > Now agents are unable to start docker images, mesos throws:
> >
> > mesos-docker-executor: error while loading shared libraries:
> > libmesos-0.28.2.so : cannot open
> > shared object file: No such  file or directory
> >
> > Just like for Mesos 0.28.1 where it works, libmesos-0.28.2 is in
> > /usr/local/lib
> > There is also symlink libmesos.so -> libmesos-0.28.2.
> > /etc/ld.so.conf.d/libc.conf contains /usr/local/lib entry.
> > I have also tried exporting LD_LIBRARY_PATH in startup scripts,
> > no luck.
> >
> > Thanks,
> > Kamil
> >
> >
> >
>
> --
> Stephen Gran
> Senior Technical Architect
>
> picture the possibilities | piksel.com
>


Re: Mesos HA does not work (Failed to recover registrar)

2016-06-06 Thread Qian Zhang
I deleted everything in the work dir (/var/lib/mesos/master), and tried
again, the same error still happened :-(


Thanks,
Qian Zhang

On Mon, Jun 6, 2016 at 3:03 AM, Jean Christophe “JC” Martin <
jch.mar...@gmail.com> wrote:

> Qian,
>
> Zookeeper should be able to reach a quorum with 2, no need to start 3
> simultaneously, but there is an issue with Zookeeper related to connection
> timeouts.
> https://issues.apache.org/jira/browse/ZOOKEEPER-2164
> In some circumstances, the timeout is higher than the sync timeout, which
> cause the leader election to fail.
> Try setting the parameter cnxtimeout in zookeeper (by default it’s 5000ms)
> to the value 500 (500ms). After doing this, leader election in ZK will be
> super fast even if a node is disconnected.
>
> JC
>
> > On Jun 4, 2016, at 4:34 PM, Qian Zhang  wrote:
> >
> > Thanks Vinod and Dick.
> >
> > I think my 3 ZK servers have formed a quorum, each of them has the
> > following config:
> >$ cat conf/zoo.cfg
> >server.1=192.168.122.132:2888:3888
> >server.2=192.168.122.225:2888:3888
> >server.3=192.168.122.171:2888:3888
> >autopurge.purgeInterval=6
> >autopurge.snapRetainCount=5
> >initLimit=10
> >syncLimit=5
> >maxClientCnxns=0
> >clientPort=2181
> >tickTime=2000
> >quorumListenOnAllIPs=true
> >dataDir=/home/stack/packages/zookeeper-3.4.8/snapshot
> >dataLogDir=/home/stack/packages/zookeeper-3.4.8/transactions
> >
> > And when I run "bin/zkServer.sh status" on each of them, I can see "Mode:
> > leader" for one, and "Mode: follower" for the other two.
> >
> > I have already tried to manually start 3 masters simultaneously, and here
> > is what I see in their log:
> > In 192.168.122.171(this is the first master I started):
> >I0605 07:12:49.418721  1187 detector.cpp:152] Detected a new leader:
> > (id='25')
> >I0605 07:12:49.419276  1186 group.cpp:698] Trying to get
> > '/mesos/log_replicas/24' in ZooKeeper
> >I0605 07:12:49.420013  1188 group.cpp:698] Trying to get
> > '/mesos/json.info_25' in ZooKeeper
> >I0605 07:12:49.423807  1188 zookeeper.cpp:259] A new leading master
> > (UPID=master@192.168.122.171:5050) is detected
> >I0605 07:12:49.423841 1186 network.hpp:461] ZooKeeper group PIDs: {
> > log-replica(1)@192.168.122.171:5050 }
> >I0605 07:12:49.424281 1187 master.cpp:1951] The newly elected leader
> > is master@192.168.122.171:5050 with id
> cdc459d4-a05f-4f99-9bf4-1ee9a91d139b
> >I0605 07:12:49.424895  1187 master.cpp:1964] Elected as the leading
> > master!
> >
> > In 192.168.122.225 (second master I started):
> >I0605 07:12:51.918702  2246 detector.cpp:152] Detected a new leader:
> > (id='25')
> >I0605 07:12:51.919983  2246 group.cpp:698] Trying to get
> > '/mesos/json.info_25' in ZooKeeper
> >I0605 07:12:51.921910  2249 network.hpp:461] ZooKeeper group PIDs: {
> > log-replica(1)@192.168.122.171:5050 }
> >I0605 07:12:51.925721 2252 replica.cpp:673] Replica in EMPTY status
> > received a broadcasted recover request from (6)@192.168.122.225:5050
> >I0605 07:12:51.927891  2246 zookeeper.cpp:259] A new leading master
> > (UPID=master@192.168.122.171:5050) is detected
> >I0605 07:12:51.928444 2246 master.cpp:1951] The newly elected leader
> > is master@192.168.122.171:5050 with id
> cdc459d4-a05f-4f99-9bf4-1ee9a91d139b
> >
> > In 192.168.122.132 (last master I started):
> > I0605 07:12:53.553949 16426 detector.cpp:152] Detected a new leader:
> > (id='25')
> > I0605 07:12:53.555179 16429 group.cpp:698] Trying to get
> > '/mesos/json.info_25' in ZooKeeper
> > I0605 07:12:53.560045 16428 zookeeper.cpp:259] A new leading master
> (UPID=
> > master@192.168.122.171:5050) is detected
> >
> > So right after I started these 3 masters, the first one (192.168.122.171)
> > was successfully elected as leader, but after 60s, 192.168.122.171 failed
> > with the error mentioned in my first mail, and then 192.168.122.225 was
> > elected as leader, but it failed with the same error too after another
> 60s,
> > and the same thing happened to the last one (192.168.122.132). So after
> > about 180s, all my 3 master were down.
> >
> > I tried both:
> >sudo ./bin/mesos-master.sh --zk=zk://127.0.0.1:2181/mesos --quorum=2
> > --work_dir=/var/lib/mesos/master
> > and
> >sudo ./bin/mesos-master.sh --zk=zk://192.168.122.132:2181,
> > 192.168.122.171:2181,192.168.122.225:2181/mesos --quorum=2
> > --work_dir=/var/lib/mesos/master
> > And I see the same error for both.
> >
> > 192.168.122.132, 192.168.122.225 and 192.168.122.171 are 3 VMs which are
> > running on a KVM hypervisor host.
> >
> >
> >
> >
> > Thanks,
> > Qian Zhang
> >
> > On Sun, Jun 5, 2016 at 3:47 AM, Dick Davies 
> wrote:
> >
> >> You told the master it needed a quorum of 2 and it's the only one
> >> online, so it's bombing out.
> >> That's the expected behaviour.
> >>
> >> You need to start at least 2 zookeepers before it will be a functional
> >> group, same for the 

Re: 0.28.2 - mesos-docker-executor - libmesos not found

2016-06-06 Thread Stephen Gran
Hi,

Run ldconfig after install?

Cheers,

On 06/06/16 13:21, Kamil Wokitajtis wrote:
> For 0.28.1:
> root@pltr-app-pl01:~# ldd /usr/local/libexec/mesos/mesos-docker-executor
> | grep libmesos
> libmesos-0.28.1.so  =>
> /usr/local/lib/libmesos-0.28.1.so
>  (0x7fbf522b6000)
>
> For 0.28.2, as expected, library not found:
> root@pltr-app-pl02:~# ldd /usr/local/libexec/mesos/mesos-docker-executor
> | grep libmesos
> libmesos-0.28.2.so  => not found
>
> After exporting LD_LIBRARY_PATH, libmesos is found as expected:
> root@pltr-app-pl02:~#export LD_LIBRARY_PATH=/usr/local/lib
> root@pltr-app-pl02:~# ldd /usr/local/libexec/mesos/mesos-docker-executor
> | grep mesos
> libmesos-0.28.2.so  =>
> /usr/local/lib/libmesos-0.28.2.so
>  (0x7f219a8c1000)
>
> Looks like exporting LD_LIBRARY_PATH should help. But exporting this
> variable in mesos startup scripts doesn't seem to solve issue.
>
>
> 2016-06-06 13:50 GMT+02:00 Guangya Liu  >:
>
> You can check what is the output of `ldd mesos-docker-executor` to
> see if the libmesos-xxx is in the right location.
>
> [root@dcos001 mesosphere]# ldd
> 
> ./packages/mesos--0335ca0d3700ea88ad8b808f3b1b84d747ed07f0/libexec/mesos/mesos-docker-executor
> | grep libmesos
> libmesos-0.28.1.so  =>
> 
> /opt/mesosphere/packages/mesos--0335ca0d3700ea88ad8b808f3b1b84d747ed07f0/lib/libmesos-0.28.1.so
>  (0x7f5cf1cb1000)
>
> Thanks,
>
> Guangya
>
> On Mon, Jun 6, 2016 at 6:38 PM, Kamil Wokitajtis
> mailto:wokitaj...@gmail.com>> wrote:
>
> Hi,
>
> I have upgraded my mesos env from 0.28.1 to 0.28.2.
> On 0.28.1 everything worked just fine.
> Now agents are unable to start docker images, mesos throws:
>
> mesos-docker-executor: error while loading shared libraries:
> libmesos-0.28.2.so : cannot open
> shared object file: No such  file or directory
>
> Just like for Mesos 0.28.1 where it works, libmesos-0.28.2 is in
> /usr/local/lib
> There is also symlink libmesos.so -> libmesos-0.28.2.
> /etc/ld.so.conf.d/libc.conf contains /usr/local/lib entry.
> I have also tried exporting LD_LIBRARY_PATH in startup scripts,
> no luck.
>
> Thanks,
> Kamil
>
>
>

-- 
Stephen Gran
Senior Technical Architect

picture the possibilities | piksel.com


Re: 0.28.2 - mesos-docker-executor - libmesos not found

2016-06-06 Thread Kamil Wokitajtis
I have solved this issue by adding env variable to marathon app deployment
json:

"env": {
  "LD_LIBRARY_PATH" : "/usr/local/lib"
}

Maybe someone could shed light, why 0.28.2 is unable to locate libmesos
without exporting LD_LIBRARY_PATH?



2016-06-06 14:21 GMT+02:00 Kamil Wokitajtis :

> For 0.28.1:
> root@pltr-app-pl01:~# ldd /usr/local/libexec/mesos/mesos-docker-executor
> | grep libmesos
> libmesos-0.28.1.so => /usr/local/lib/libmesos-0.28.1.so
>  (0x7fbf522b6000)
>
> For 0.28.2, as expected, library not found:
> root@pltr-app-pl02:~# ldd /usr/local/libexec/mesos/mesos-docker-executor
> | grep libmesos
> libmesos-0.28.2.so => not found
>
> After exporting LD_LIBRARY_PATH, libmesos is found as expected:
> root@pltr-app-pl02:~# export LD_LIBRARY_PATH=/usr/local/lib
> root@pltr-app-pl02:~# ldd /usr/local/libexec/mesos/mesos-docker-executor
> | grep mesos
> libmesos-0.28.2.so => /usr/local/lib/libmesos-0.28.2.so
>  (0x7f219a8c1000)
>
> Looks like exporting LD_LIBRARY_PATH should help. But exporting this
> variable in mesos startup scripts doesn't seem to solve issue.
>
>
> 2016-06-06 13:50 GMT+02:00 Guangya Liu :
>
>> You can check what is the output of `ldd mesos-docker-executor` to see if
>> the libmesos-xxx is in the right location.
>>
>> [root@dcos001 mesosphere]# ldd
>> ./packages/mesos--0335ca0d3700ea88ad8b808f3b1b84d747ed07f0/libexec/mesos/mesos-docker-executor
>> | grep libmesos
>> libmesos-0.28.1.so =>
>> /opt/mesosphere/packages/mesos--0335ca0d3700ea88ad8b808f3b1b84d747ed07f0/lib/
>> libmesos-0.28.1.so (0x7f5cf1cb1000)
>>
>> Thanks,
>>
>> Guangya
>>
>> On Mon, Jun 6, 2016 at 6:38 PM, Kamil Wokitajtis 
>> wrote:
>>
>>> Hi,
>>>
>>> I have upgraded my mesos env from 0.28.1 to 0.28.2.
>>> On 0.28.1 everything worked just fine.
>>> Now agents are unable to start docker images, mesos throws:
>>>
>>> mesos-docker-executor: error while loading shared libraries:
>>> libmesos-0.28.2.so: cannot open shared object file: No such  file or
>>> directory
>>>
>>> Just like for Mesos 0.28.1 where it works, libmesos-0.28.2 is in
>>> /usr/local/lib
>>> There is also symlink libmesos.so -> libmesos-0.28.2.
>>> /etc/ld.so.conf.d/libc.conf contains /usr/local/lib entry.
>>> I have also tried exporting LD_LIBRARY_PATH in startup scripts, no luck.
>>>
>>> Thanks,
>>> Kamil
>>>
>>>
>>
>


Re: Rack awareness support for Mesos

2016-06-06 Thread james

Hello,


@Stephen::I guess Stephen is bringing up the 'security' aspect of who 
get's access to the information, particularly cluster/cloud devops, 
customers or interlopers?



@Fan:: As a consultant, most of my customers either have  or are 
planning hybrid installations, where some codes run on a local cluster 
or using 'the cloud' for dynamic load requirements. I would think your 
proposed scheme needs to be very flexible, both in application to a 
campus or Metropolitan Area Network, if not massively distributed around 
the globe. What about different resouce types (racks of arm64, gpu 
centric hardware, DSPs, FPGA etc etc. Hardware diversity bring many

benefits to the cluster/cloud capabilities.


This also begs the quesion of hardware management (boot/config/online)
of the various hardware, such as is built into coreOS. Are several 
applications going to be supported? Standards track? Just Mesos DC/OS

centric?


TIMING DATA:: This is the main issue I see. Once you start 'vectoring
in resources' you need to add timing (latency) data to encourage robust
and diversified use of of this data. For HPC, this could be very 
valuable for rDMA abusive algorithms where memory constrained workloads 
not only need the knowledge of additional nearby memory resources, but
the approximated (based on previous data collected) latency and 
bandwidth constraints to use those additional resources.



Great idea. I do like it very much.

hth,
James


On 06/06/2016 05:06 AM, Stephen Gran wrote:

Hi,

This looks potentially interesting.  How does it work in a public cloud
deployment scenario?  I assume you would just have to disable this
feature, or not enable it?

Cheers,

On 06/06/16 10:17, Du, Fan wrote:

Hi, Mesos folks

I’ve been thinking about Mesos rack awareness support for a while,

it’s a common interest for lots of data center applications to provide
data locality,

fault tolerance and better task placement. Create MESOS-5545 to track
the story,

and here is the initial design doc [1] to support rack awareness in Mesos.

Looking forward to hear any comments from end user and other developers,

Thanks!

[1]:
https://docs.google.com/document/d/1rql_LZSwtQzBPALnk0qCLsmxcT3-zB7X7aJp-H3xxyE/edit?usp=sharing







Re: 0.28.2 - mesos-docker-executor - libmesos not found

2016-06-06 Thread Kamil Wokitajtis
For 0.28.1:
root@pltr-app-pl01:~# ldd /usr/local/libexec/mesos/mesos-docker-executor |
grep libmesos
libmesos-0.28.1.so => /usr/local/lib/libmesos-0.28.1.so
 (0x7fbf522b6000)

For 0.28.2, as expected, library not found:
root@pltr-app-pl02:~# ldd /usr/local/libexec/mesos/mesos-docker-executor |
grep libmesos
libmesos-0.28.2.so => not found

After exporting LD_LIBRARY_PATH, libmesos is found as expected:
root@pltr-app-pl02:~# export LD_LIBRARY_PATH=/usr/local/lib
root@pltr-app-pl02:~# ldd /usr/local/libexec/mesos/mesos-docker-executor |
grep mesos
libmesos-0.28.2.so => /usr/local/lib/libmesos-0.28.2.so
 (0x7f219a8c1000)

Looks like exporting LD_LIBRARY_PATH should help. But exporting this
variable in mesos startup scripts doesn't seem to solve issue.


2016-06-06 13:50 GMT+02:00 Guangya Liu :

> You can check what is the output of `ldd mesos-docker-executor` to see if
> the libmesos-xxx is in the right location.
>
> [root@dcos001 mesosphere]# ldd
> ./packages/mesos--0335ca0d3700ea88ad8b808f3b1b84d747ed07f0/libexec/mesos/mesos-docker-executor
> | grep libmesos
> libmesos-0.28.1.so =>
> /opt/mesosphere/packages/mesos--0335ca0d3700ea88ad8b808f3b1b84d747ed07f0/lib/
> libmesos-0.28.1.so (0x7f5cf1cb1000)
>
> Thanks,
>
> Guangya
>
> On Mon, Jun 6, 2016 at 6:38 PM, Kamil Wokitajtis 
> wrote:
>
>> Hi,
>>
>> I have upgraded my mesos env from 0.28.1 to 0.28.2.
>> On 0.28.1 everything worked just fine.
>> Now agents are unable to start docker images, mesos throws:
>>
>> mesos-docker-executor: error while loading shared libraries:
>> libmesos-0.28.2.so: cannot open shared object file: No such  file or
>> directory
>>
>> Just like for Mesos 0.28.1 where it works, libmesos-0.28.2 is in
>> /usr/local/lib
>> There is also symlink libmesos.so -> libmesos-0.28.2.
>> /etc/ld.so.conf.d/libc.conf contains /usr/local/lib entry.
>> I have also tried exporting LD_LIBRARY_PATH in startup scripts, no luck.
>>
>> Thanks,
>> Kamil
>>
>>
>


Re: 0.28.2 - mesos-docker-executor - libmesos not found

2016-06-06 Thread Guangya Liu
You can check what is the output of `ldd mesos-docker-executor` to see if
the libmesos-xxx is in the right location.

[root@dcos001 mesosphere]# ldd
./packages/mesos--0335ca0d3700ea88ad8b808f3b1b84d747ed07f0/libexec/mesos/mesos-docker-executor
| grep libmesos
libmesos-0.28.1.so =>
/opt/mesosphere/packages/mesos--0335ca0d3700ea88ad8b808f3b1b84d747ed07f0/lib/
libmesos-0.28.1.so (0x7f5cf1cb1000)

Thanks,

Guangya

On Mon, Jun 6, 2016 at 6:38 PM, Kamil Wokitajtis 
wrote:

> Hi,
>
> I have upgraded my mesos env from 0.28.1 to 0.28.2.
> On 0.28.1 everything worked just fine.
> Now agents are unable to start docker images, mesos throws:
>
> mesos-docker-executor: error while loading shared libraries:
> libmesos-0.28.2.so: cannot open shared object file: No such  file or
> directory
>
> Just like for Mesos 0.28.1 where it works, libmesos-0.28.2 is in
> /usr/local/lib
> There is also symlink libmesos.so -> libmesos-0.28.2.
> /etc/ld.so.conf.d/libc.conf contains /usr/local/lib entry.
> I have also tried exporting LD_LIBRARY_PATH in startup scripts, no luck.
>
> Thanks,
> Kamil
>
>


Re: Set LIBPROCESS_IP for frameworks launched with marathon

2016-06-06 Thread Radoslaw Gruchalski
Out of curiosity. Why are you insisting on using host names?
Say you have 1 master and 2 agents with these IPs:

- mesos-master-0: 10.100.1.10
- mesos-agent-0: 10.100.1.11
- mesos-agent-1: 10.100.1.12

Your problem is that you have no way to obtain an IP address of the agent
in the container. Correct?
One way to overcome this problem is to create a shell file, say in
/etc/mesos-agent.sh, with contents like:

...
AGENT_IP=10.100.1.11
…

If you’re using Marathon, you can copy that file to the sandbox using
docker volumes:

{
"containerPath": “/etc/mesos-agent.sh",
"hostPath": "/etc/mesos-agent.sh",
"mode": "RO"
}

You can now source that in the container to set the LIBPROCESS_ADVERTISE_IP.
Other applications simply use the mesos-agent-X host name. That’s without
mesos-dns.
Things are easier with mesos-dns or consul service catalog (I prefer the
latter).

–
Best regards,
Radek Gruchalski
ra...@gruchalski.com
de.linkedin.com/in/radgruchalski


*Confidentiality:*This communication is intended for the above-named person
and may be confidential and/or legally privileged.
If it has come to you in error you must take no action based on it, nor
must you copy or show it to anyone; please delete/destroy and inform the
sender immediately.

On June 6, 2016 at 1:16:07 PM, Eli Jordan (elias.k.jor...@gmail.com) wrote:

The issue refers to LIBPROCESS_IP not LIBPROCESS_HOST. I haven’t been able
to find the LIBPROCESS_HOST variable documented anywhere.

My understanding is that the scheduler uses LIBPROCESS_IP to determine
which network interface to bind to, and also which ip to advertise to the
master, so that the master can send offers. There is also another variable
LIBPROCESS_ADVERTISE_IP. If this is defined then LIBPROCESS_IP is used to
determine which network interface to bind to, and LIBPROCESS_ADVERTISE_IP
is used to determine which ip to advertise to the master.

It would be great if there was a LIBPROCESS_ADVERTISE_HOST variable, then I
could just use the $HOST variable to define this.

On 5 Jun 2016, at 10:41 pm, Sivaram Kannan  wrote:


I have been using this way from 0.23.0 to the 0.28.0. This has been surely
working (although for a different framework). Inside the docker container
can you see the $HOST variable defined??

The ticket you referred says that the apps definition needs to define
LIBPROCESS_HOST=$HOST to be make the framework take the proper IP - you are
describing a different problem.

Thanks,
./Siva.

On Sun, Jun 5, 2016 at 4:30 AM, Eli Jordan  wrote:

> I found this issue on the mesos jira that describes the exact issue I am
> hitting.
>
> https://issues.apache.org/jira/browse/MESOS-3740
>
> It doesn't appear to be resolved.
>
> Thanks
> Eli
>
> On 5 Jun 2016, at 16:46, Eli Jordan  wrote:
>
> Hmmm… that doesn’t seem to work for me. What version of mesos does this
> work in? I am running 0.27.1.
>
> When using this approach, I still get the following error when the kafka
> mess framework is starting up.
>
> "Scheduler driver bound to loopback interface! Cannot communicate with
> remote master(s). You might want to set 'LIBPROCESS_IP' environment
> variable to use a routable IP address.”
>
> I tried setting LIBPROCESS_IP to ‘0.0.0.0’ and
> LIBPROCESS_ADVERTISE_IP=‘the public ip’ and this works. But the host
> variations don’t seem to work. (i.e. set LIBPROCESS_IP=0.0.0.0 and
> LIBPROCESS_ADVERTISE_HOST=$HOST)
>
> It seems lib process doesn’t support using host names.
>
> I think I might have to run the framework outside of docker, but I would
> really like to avoid this.
>
> This problem would be solved if the docker executor was able to set the
> same environment variables as the command executor. Is there a way to make
> this happen?
>
> I saw that mesos can be extended with a Hook ‘module’ to set extra
> environment variables in docker containers. This might be a solution, but
> seems over wrought for a simple problem.
>
>
> On 5 Jun 2016, at 12:50 am, Sivaram Kannan  wrote:
>
>
> Hi,
>
> Can you try adding && after the LIBPROCESS_HOST variable and the actual
> command. We have been using this for sometime now.
>
> "cmd": "LIBPROCESS_HOST=$HOST && ./kafka-mesos.sh ..
>
> Thanks,
> ./Siva.
>
>
> On Sat, Jun 4, 2016 at 8:34 AM, Eli Jordan 
> wrote:
>
>> Hi @haosdent
>>
>> Based on my testing, this is not the case.
>>
>> I ran a task (from marathon) without using a docker container that just
>> printed out all environment variables. i.e. while [ true ]; do env; sleep
>> 2; done
>>
>> I then run a task that executed the same command inside an alpine docker
>> image.
>>
>> When running without a docker image LIBPROCESS_IP was defined along with
>> many other variables.
>>
>> Sample output when running without docker (note LIBPROCESS_IP) is defined
>>
>> Registered executor on mesos-slave0
>> Starting task plain-test.5e5b00cc-2645-11e6-a3dd-080027aa149e
>> sh -c 'while [ true ]; do env; sleep 2; done'
>> Forked command at 16571
>> L

Re: Set LIBPROCESS_IP for frameworks launched with marathon

2016-06-06 Thread Eli Jordan
The issue refers to LIBPROCESS_IP not LIBPROCESS_HOST. I haven’t been able to 
find the LIBPROCESS_HOST variable documented anywhere.

My understanding is that the scheduler uses LIBPROCESS_IP to determine which 
network interface to bind to, and also which ip to advertise to the master, so 
that the master can send offers. There is also another variable 
LIBPROCESS_ADVERTISE_IP. If this is defined then LIBPROCESS_IP is used to 
determine which network interface to bind to, and LIBPROCESS_ADVERTISE_IP is 
used to determine which ip to advertise to the master.

It would be great if there was a LIBPROCESS_ADVERTISE_HOST variable, then I 
could just use the $HOST variable to define this.

> On 5 Jun 2016, at 10:41 pm, Sivaram Kannan  wrote:
> 
> 
> I have been using this way from 0.23.0 to the 0.28.0. This has been surely 
> working (although for a different framework). Inside the docker container can 
> you see the $HOST variable defined?? 
> 
> The ticket you referred says that the apps definition needs to define 
> LIBPROCESS_HOST=$HOST to be make the framework take the proper IP - you are 
> describing a different problem.
> 
> Thanks,
> ./Siva.
> 
> On Sun, Jun 5, 2016 at 4:30 AM, Eli Jordan  > wrote:
> I found this issue on the mesos jira that describes the exact issue I am 
> hitting.
> 
> https://issues.apache.org/jira/browse/MESOS-3740 
> 
> 
> It doesn't appear to be resolved. 
> 
> Thanks
> Eli
> 
> On 5 Jun 2016, at 16:46, Eli Jordan  > wrote:
> 
>> Hmmm… that doesn’t seem to work for me. What version of mesos does this work 
>> in? I am running 0.27.1.
>> 
>> When using this approach, I still get the following error when the kafka 
>> mess framework is starting up.
>> 
>> "Scheduler driver bound to loopback interface! Cannot communicate with 
>> remote master(s). You might want to set 'LIBPROCESS_IP' environment variable 
>> to use a routable IP address.”
>> 
>> I tried setting LIBPROCESS_IP to ‘0.0.0.0’ and LIBPROCESS_ADVERTISE_IP=‘the 
>> public ip’ and this works. But the host variations don’t seem to work. (i.e. 
>> set LIBPROCESS_IP=0.0.0.0 and LIBPROCESS_ADVERTISE_HOST=$HOST)
>> 
>> It seems lib process doesn’t support using host names.
>> 
>> I think I might have to run the framework outside of docker, but I would 
>> really like to avoid this. 
>> 
>> This problem would be solved if the docker executor was able to set the same 
>> environment variables as the command executor. Is there a way to make this 
>> happen?
>> 
>> I saw that mesos can be extended with a Hook ‘module’ to set extra 
>> environment variables in docker containers. This might be a solution, but 
>> seems over wrought for a simple problem.
>> 
>> 
>>> On 5 Jun 2016, at 12:50 am, Sivaram Kannan >> > wrote:
>>> 
>>> 
>>> Hi,
>>> 
>>> Can you try adding && after the LIBPROCESS_HOST variable and the actual 
>>> command. We have been using this for sometime now.
>>> 
>>> "cmd": "LIBPROCESS_HOST=$HOST && ./kafka-mesos.sh ..
>>> 
>>> Thanks,
>>> ./Siva.
>>> 
>>> 
>>> On Sat, Jun 4, 2016 at 8:34 AM, Eli Jordan >> > wrote:
>>> Hi @haosdent
>>> 
>>> Based on my testing, this is not the case.
>>> 
>>> I ran a task (from marathon) without using a docker container that just 
>>> printed out all environment variables. i.e. while [ true ]; do env; sleep 
>>> 2; done
>>> 
>>> I then run a task that executed the same command inside an alpine docker 
>>> image.
>>> 
>>> When running without a docker image LIBPROCESS_IP was defined along with 
>>> many other variables. 
>>> 
>>> Sample output when running without docker (note LIBPROCESS_IP) is defined
>>> 
>>> Registered executor on mesos-slave0
>>> Starting task plain-test.5e5b00cc-2645-11e6-a3dd-080027aa149e
>>> sh -c 'while [ true ]; do env; sleep 2; done'
>>> Forked command at 16571
>>> LIBPROCESS_IP=192.168.3.16
>>> MESOS_AGENT_ENDPOINT=192.168.3.16:5051 
>>> MESOS_EXECUTOR_REGISTRATION_TIMEOUT=5mins
>>> HOST=mesos-slave0
>>> SHELL=/bin/sh
>>> MESOS_DIRECTORY=/var/mesos/slaves/7ad17efe-0f9e-4703-9d2e-7fb9ee03f64c-S0/frameworks/aae929c7-24a5-4463-9ae0-bc7b044973c5-/executors/plain-test.5e5b00cc-2645-11e6-a3dd-080027aa149e/runs/c9b6ef86-b37d-4e3c-b1ca-bd680aed779f
>>> PORT0=31082
>>> PORT_10001=31082
>>> LC_ALL=en_US.UTF-8
>>> … more
>>> 
>>> 
>>> Sample output when running with docker (note LIBPROCESS_IP is not defined)
>>> 
>>> --container="mesos-7ad17efe-0f9e-4703-9d2e-7fb9ee03f64c-S0.f3a94ab4-dfff-4e97-b806-f1cc501ecf42"
>>>  --docker="docker" --docker_socket="/var/run/docker.sock" --help="false" 
>>> --initialize_driver_logging="true" --launcher_dir="/usr/libexec/mesos" 
>>> --logbufsecs="0" --logging_level="INFO" 
>>> --mapped_directory="/mnt/mesos/sandbox" --quiet="false" 
>>> --sandbox_directory="/var/mesos/slaves/7ad17efe-0f9e-4703-9d2e-7fb9ee03f64c-S0/frameworks/aae929c7

0.28.2 - mesos-docker-executor - libmesos not found

2016-06-06 Thread Kamil Wokitajtis
Hi,

I have upgraded my mesos env from 0.28.1 to 0.28.2.
On 0.28.1 everything worked just fine.
Now agents are unable to start docker images, mesos throws:

mesos-docker-executor: error while loading shared libraries:
libmesos-0.28.2.so: cannot open shared object file: No such  file or
directory

Just like for Mesos 0.28.1 where it works, libmesos-0.28.2 is in
/usr/local/lib
There is also symlink libmesos.so -> libmesos-0.28.2.
/etc/ld.so.conf.d/libc.conf contains /usr/local/lib entry.
I have also tried exporting LD_LIBRARY_PATH in startup scripts, no luck.

Thanks,
Kamil


Re: 0.28.2 has been released

2016-06-06 Thread craig w
Jie,

Thanks for the updates. When will the packages be available (in particular
RPM)?

-craig

On Sun, Jun 5, 2016 at 2:31 PM, Jie Yu  wrote:

> Hi folks,
>
> I just released Mesos 0.28.2 and updated the website.
>
> It includes some important bug fixes. The change log can be found here:
>
> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_plain;f=CHANGELOG;hb=0.28.2
>
> If you are considering using 0.28, please use 0.28.2!
>
> Thanks!
> - Jie
>



-- 

https://github.com/mindscratch
https://www.google.com/+CraigWickesser
https://twitter.com/mind_scratch
https://twitter.com/craig_links


Re: Rack awareness support for Mesos

2016-06-06 Thread Stephen Gran
Hi,

This looks potentially interesting.  How does it work in a public cloud 
deployment scenario?  I assume you would just have to disable this 
feature, or not enable it?

Cheers,

On 06/06/16 10:17, Du, Fan wrote:
> Hi, Mesos folks
>
> I’ve been thinking about Mesos rack awareness support for a while,
>
> it’s a common interest for lots of data center applications to provide
> data locality,
>
> fault tolerance and better task placement. Create MESOS-5545 to track
> the story,
>
> and here is the initial design doc [1] to support rack awareness in Mesos.
>
> Looking forward to hear any comments from end user and other developers,
>
> Thanks!
>
> [1]:
> https://docs.google.com/document/d/1rql_LZSwtQzBPALnk0qCLsmxcT3-zB7X7aJp-H3xxyE/edit?usp=sharing
>

-- 
Stephen Gran
Senior Technical Architect

picture the possibilities | piksel.com


Rack awareness support for Mesos

2016-06-06 Thread Du, Fan
Hi, Mesos folks

I've been thinking about Mesos rack awareness support for a while,
it's a common interest for lots of data center applications to provide data 
locality,
fault tolerance and better task placement. Create MESOS-5545 to track the story,
and here is the initial design doc [1] to support rack awareness in Mesos.

Looking forward to hear any comments from end user and other developers,
Thanks!

[1]: 
https://docs.google.com/document/d/1rql_LZSwtQzBPALnk0qCLsmxcT3-zB7X7aJp-H3xxyE/edit?usp=sharing


how to stop the mesos executor process in JVM?

2016-06-06 Thread Yao Wang
Hi , all !

I write my own executor to run code, 

I override the launchTask method like that :



@Override public void launchTask(ExecutorDriver driver, Protos.TaskInfo task) {
LOGGER.info("Executor is launching task#{}\n...", task);
//before launch
driver.sendStatusUpdate(
Protos.TaskStatus.newBuilder().setTaskId(task.getTaskId()).setState(
Protos.TaskState.TASK_RUNNING).build());

LOGGER.info("Add your bussiness code hear .. ");
//bussiness code hear


//after launch
driver.sendStatusUpdate(

Protos.TaskStatus.newBuilder().setTaskId(task.getTaskId()).setState(Protos.TaskState.TASK_FINISHED).setData(
ByteString.copyFromUtf8(
"${taksData}")).build());


  } // end method launchTask

  

And i build the commandInfo  like that:


  

String executorCommand = String.format("java -jar %s", 
extractPath(executorJarPath));

Protos.CommandInfo.URI.Builder executorJarURI = 
Protos.CommandInfo.URI.newBuilder().setValue(executorJarPath); // 
executorJarURI is local uri or hadoop

Protos.CommandInfo.Builder commandInfoBuilder = 
Protos.CommandInfo.newBuilder().setEnvironment(envBuilder).setValue(
executorCommand).addUris(executorJarURI); // executorJarURI is local 
uri or hadoop

long  ctms  = System.nanoTime();

Protos.ExecutorID.Builder executorIDBuilder = 
Protos.ExecutorID.newBuilder().setValue(new StringBuilder().append(
  ctms).append("-").append(task.getTaskRequestId()).toString());
  Protos.ExecutorInfo.Builder executorInfoBuilder = 
Protos.ExecutorInfo.newBuilder().setExecutorId(

executorIDBuilder).setCommand(commandInfoBuilder).setName("flexcloud-executor-2.0.1-"
 + ctms).setSource("java");

// TaskInfo
Protos.TaskInfo.Builder taskInfoBuilder = 
Protos.TaskInfo.newBuilder().setName(task.getTaskName()).setTaskId(

taskIDBuilder).setSlaveId(offer.getSlaveId()).setExecutor(executorInfoBuilder);


return taskInfoBuilder.build();

   

After run the executor with mesos for several times ,  i found every executor  
was not exit , 

I  execute $ ps -ef | grep “java -jar”  on the slave machine , that shows me :

wangyao$ ps -ef | grep "java -jar"
  501 20078 19302   0  3:54下午 ?? 0:15.77 /usr/bin/java -jar 
flexcloud-executor.jar
  501 20154 19302   0  3:54下午 ?? 0:17.92 /usr/bin/java -jar 
flexcloud-executor.jar
  501 20230 19302   0  3:54下午 ?? 0:16.13 /usr/bin/java -jar 
flexcloud-executor.jar

In order to stop these process after running a executor,   first ,  i tried to 
add code  "driver.stop()” or “driver.abort()” to the Executor’s launchTask 
method,  but it is unused.
So,  I add code  “System.exit(0)” ,  stop the JVM directly……. it works  …

I have doubt about this way to stop executor  ,  it is the only way to do that?