Are the resource options documented?

2015-08-25 Thread craig w
When configuring a mesos-slave with --resources, I know cpu, mem and
ports are available. Are there others? Are these documented somewhere?

I've found some examples here
https://open.mesosphere.com/reference/mesos-slave/ and the configuration
page (http://mesos.apache.org/documentation/latest/configuration/) is
generic with it's description of --resources.

Thanks
craig


Mesos/Marathon/HAProxy Logging

2015-08-25 Thread John Omernik
I have been playing with an application that is a very simple app: A
webservice running in Python. I've created a docker container, it runs in
the container, I setup marathon to run it, I use mesos-dns and ha proxy and
I can access the service just fine anywhere in the cluster.

First let me say this is VERY cool. The capabilities here awesome.

Now the challenge: the security guy in me wants to take good logs from my
app.  It was setup to do it's own logging through a custom module. I am
very happy with it.  I setup the app in the container to mount a volume
that's in my MapRFS via NFS so I can log directly to a clustered
filesystem. THis is awesome, I can read my logs in Apache Drill as they are
written!!!

However, the haproxy through me for a loop. Once I started running the app
in Marathon with a service port and routed around via haproxy, I realized
something:  I lost my source IPs in my logs?

Why?

Because once HAProxy takes over, it no longer needs to keep the source IP,
and instead the next hop only sees the previous connection IP.  From a
service discovery perspective it works great, but with this setup, I'd lose
the previous hop. Perhaps I manually add something in haproxy to add an
X-forwarded-for header, that would be nice, however, that only works for
http apps, what about other TCP apps that are not HTTP?

This is an interesting problem, because apps should have good logging,
security, performance, troubleshooting, and if I can't get the source IP it
could be a problem.

So, my question is this, anyone ran into this? How are you handling it?
Any brainstorms here we may be able to work off of?

One thing I thought was why are we using HAproxy? Couldn't the same HAProxy
script, actually put in forwarding rules in IPtables?  This sounds messy,
but could it work? Has anyone explored that? If the data was forwarded,
than it wouldn't lose the IP information (and timeouts wouldn't be a
concern either (I think I posted before on how long running TCP connections
can be closed down by HAProxy if they don't implement TCP Keep alives).

Other ideas?  This is interesting to me, and likely others.


Re: Not getting resource offers for 20 min

2015-08-25 Thread CCAAT

THANKS, as I have not kept up on the spark lists

James


On 08/25/2015 04:28 AM, Iulian Dragoș wrote:



On Mon, Aug 24, 2015 at 7:16 PM, CCAAT cc...@tampabay.rr.com
mailto:cc...@tampabay.rr.com wrote:

On 08/24/2015 05:33 AM, Iulian Dragoș wrote:


Hello Iulian,

Ok, so I eventually build spark from 100% sources, after some
intermediate builds on gentoo.   Gentoo is not the best platform for
Java development, but those issues related to spark builds are
slowly being fixed on gentoo. Where (how) do you download the
spark-1.5.x complete source tree, as it does not seen available on
this page::

http://spark.apache.org/downloads.html


It's not yet a final release, but there's a preview:

http://apache-spark-developers-list.1001551.n3.nabble.com/Fwd-ANNOUNCE-Spark-1-5-0-preview-package-td13683.html

Building Spark from sources isn't too hard, there's a
`make-distribution.sh` script in the root directory. There are a few
parameters (like the dependency Hadoop version), but it should be fairly
straight forward. More info here:

http://spark.apache.org/docs/latest/building-spark.html

iulian



Any other related information or tips on building out spark from sources
are keenly received.

James

Unfortunately I don't have access to the cluster anymore, but I
think
Chronos wasn't the culprit. After updating Spark to 1.5 and
setting a
framework role offers started to come (while still using Chronos).

iulian


Iulian Dragos

--
Reactive Apps on the JVM
www.typesafe.com http://www.typesafe.com http://www.typesafe.com





--

--
Iulian Dragos

--
Reactive Apps on the JVM
www.typesafe.com http://www.typesafe.com





Re: Mesos/Marathon/HAProxy Logging

2015-08-25 Thread Jeff Schroeder
This is the header that should be passed:

https://en.m.wikipedia.org/wiki/X-Forwarded-For

Most of the modern internet routes through reverse proxies and this is how
we log the actual source clients to solve similar auditing and compliance
needs.

On Tuesday, August 25, 2015, John Omernik j...@omernik.com wrote:

 I have been playing with an application that is a very simple app: A
 webservice running in Python. I've created a docker container, it runs in
 the container, I setup marathon to run it, I use mesos-dns and ha proxy and
 I can access the service just fine anywhere in the cluster.

 First let me say this is VERY cool. The capabilities here awesome.

 Now the challenge: the security guy in me wants to take good logs from my
 app.  It was setup to do it's own logging through a custom module. I am
 very happy with it.  I setup the app in the container to mount a volume
 that's in my MapRFS via NFS so I can log directly to a clustered
 filesystem. THis is awesome, I can read my logs in Apache Drill as they are
 written!!!

 However, the haproxy through me for a loop. Once I started running the app
 in Marathon with a service port and routed around via haproxy, I realized
 something:  I lost my source IPs in my logs?

 Why?

 Because once HAProxy takes over, it no longer needs to keep the source IP,
 and instead the next hop only sees the previous connection IP.  From a
 service discovery perspective it works great, but with this setup, I'd lose
 the previous hop. Perhaps I manually add something in haproxy to add an
 X-forwarded-for header, that would be nice, however, that only works for
 http apps, what about other TCP apps that are not HTTP?

 This is an interesting problem, because apps should have good logging,
 security, performance, troubleshooting, and if I can't get the source IP it
 could be a problem.

 So, my question is this, anyone ran into this? How are you handling it?
 Any brainstorms here we may be able to work off of?

 One thing I thought was why are we using HAproxy? Couldn't the same
 HAProxy script, actually put in forwarding rules in IPtables?  This sounds
 messy, but could it work? Has anyone explored that? If the data was
 forwarded, than it wouldn't lose the IP information (and timeouts wouldn't
 be a concern either (I think I posted before on how long running TCP
 connections can be closed down by HAProxy if they don't implement TCP Keep
 alives).

 Other ideas?  This is interesting to me, and likely others.



-- 
Text by Jeff, typos by iPhone


Re: Are the resource options documented?

2015-08-25 Thread Alex Rukletsov
From Mesos point of view, a resource is just a string, your agents may
advertise gpu, bananas, pandas and so on. However, some resources are
known to Mesos, and for them isolation is possible. A good example is a
cgroups isolator for mem resources, which will invoke OOM killer if
necessary. Compare with GPU resources: if your agent advertises, say, 1GB
gpu to the master, a task may accept 100MB, but the agent will have no
control, whether a task uses no more than 100MB, because there is no
isolator for this resource. Good news is that you can write an isolator for
your resource, wrap it into a Mesos module, and let Mesos agent use it!

P.S. cpu is not a known resource, but cpus is.

On Tue, Aug 25, 2015 at 7:31 PM, craig w codecr...@gmail.com wrote:

 When configuring a mesos-slave with --resources, I know cpu, mem and
 ports are available. Are there others? Are these documented somewhere?

 I've found some examples here
 https://open.mesosphere.com/reference/mesos-slave/ and the configuration
 page (http://mesos.apache.org/documentation/latest/configuration/) is
 generic with it's description of --resources.

 Thanks
 craig



Re: Mesos/Marathon/HAProxy Logging

2015-08-25 Thread Ankur Chauhan
This may help: 

http://serverfault.com/questions/331079/haproxy-and-forwarding-client-ip-address-to-servers

We use similar options to ensure we have the remote ip.
 On 25 Aug 2015, at 09:30, John Omernik j...@omernik.com wrote:
 
 I have been playing with an application that is a very simple app: A 
 webservice running in Python. I've created a docker container, it runs in the 
 container, I setup marathon to run it, I use mesos-dns and ha proxy and I can 
 access the service just fine anywhere in the cluster. 
 
 First let me say this is VERY cool. The capabilities here awesome.
 
 Now the challenge: the security guy in me wants to take good logs from my 
 app.  It was setup to do it's own logging through a custom module. I am very 
 happy with it.  I setup the app in the container to mount a volume that's in 
 my MapRFS via NFS so I can log directly to a clustered filesystem. THis is 
 awesome, I can read my logs in Apache Drill as they are written!!!
 
 However, the haproxy through me for a loop. Once I started running the app in 
 Marathon with a service port and routed around via haproxy, I realized 
 something:  I lost my source IPs in my logs? 
 
 Why?
 
 Because once HAProxy takes over, it no longer needs to keep the source IP, 
 and instead the next hop only sees the previous connection IP.  From a 
 service discovery perspective it works great, but with this setup, I'd lose 
 the previous hop. Perhaps I manually add something in haproxy to add an 
 X-forwarded-for header, that would be nice, however, that only works for http 
 apps, what about other TCP apps that are not HTTP? 
 
 This is an interesting problem, because apps should have good logging, 
 security, performance, troubleshooting, and if I can't get the source IP it 
 could be a problem. 
 
 So, my question is this, anyone ran into this? How are you handling it?  Any 
 brainstorms here we may be able to work off of? 
 
 One thing I thought was why are we using HAproxy? Couldn't the same HAProxy 
 script, actually put in forwarding rules in IPtables?  This sounds messy, but 
 could it work? Has anyone explored that? If the data was forwarded, than it 
 wouldn't lose the IP information (and timeouts wouldn't be a concern either 
 (I think I posted before on how long running TCP connections can be closed 
 down by HAProxy if they don't implement TCP Keep alives). 
 
 Other ideas?  This is interesting to me, and likely others. 



Re: Custom Scheduler: Diagnosing cause of container task failures

2015-08-25 Thread Alex Rukletsov
It looks like we can have a better error message here.

@Jay, mind filing a JIRA ticket for with description, status update, and
your fix attached? Thanks!

On Fri, Aug 21, 2015 at 7:36 PM, Jay Taylor j...@jaytaylor.com wrote:

 Eventually I was able to isolate what was going on; in this case the
 FrameworkInfo.User was set to an invalid value and setting it to root did
 the trick.

 My scheduler is now working [in a basic form]!!!

 Cheers,
 Jay

 On Thu, Aug 20, 2015 at 4:15 PM, Jay Taylor j...@jaytaylor.com wrote:

 Hey Tim,

 Thank you for the quick response!

 Just checked the sandbox logs and they are all empty (stdout and stderr
 are both 0 bytes).

 I have discovered a little bit more information from the StatusUpdate
 event posted back to my scheduler:

 TaskStatus{
 TaskId: TaskID{
 Value:*fluxCapacitor-test-1,XXX_unrecognized:[],
 },
 State: *TASK_FAILED,
 Message: *Abnormal executor termination,
 Source: *SOURCE_SLAVE,
 Reason: *REASON_COMMAND_EXECUTOR_FAILED,
 Data:nil,
 SlaveId: SlaveID{
 Value: *20150804-211459-1407297728-5050-5855-S1,
 XXX_unrecognized: [],
 },
 ExecutorId: nil,
 Timestamp: *1.440112075509318e+09,
 Uuid: *[102 75 82 85 38 139 68 94 153 189 210 87 218 235 147 166],
 Healthy: nil,
 XXX_unrecognized: [],
 }

 How can I find out what why the command executor is failing?


 On Thu, Aug 20, 2015 at 4:08 PM, Tim Chen t...@mesosphere.io wrote:

 It received a TASK_FAILED from the executor, so you'll need to look at
 the sandbox logs of your task stdout and stderr files to see what went
 wrong.

 These files should be reachable by the Mesos UI.

 Tim

 On Thu, Aug 20, 2015 at 4:01 PM, Jay Taylor outtat...@gmail.com wrote:

 Hey everyone,

 I am writing a scheduler for Mesos and on of my first goals is to get
 simple a docker container to run.

 The tasks get marked as failed with the failure messages originating
 from the slave logs.  Now I'm not sure how to determine exactly what is
 causing the failure.

 The most informative log messages I've found were in the slave log:

 == /var/log/mesos/mesos-slave.INFO ==
 W0820 20:44:25.242230 29639 docker.cpp:994] Ignoring updating unknown
 container: e190037a-b011-4681-9e10-dcbacf6cb819
 I0820 20:44:25.242270 29639 status_update_manager.cpp:322] Received
 status update TASK_FAILED (UUID: 17a21cf7-17d1-42dd-92eb-b281396ebf60) for
 task jay-test-29 of framework 20150804-211741-1608624320-5050-18273-0060
 I0820 20:44:25.242377 29639 slave.cpp:2961] Forwarding the update
 TASK_FAILED (UUID: 17a21cf7-17d1-42dd-92eb-b281396ebf60) for task
 jay-test-29 of framework 20150804-211741-1608624320-5050-18273-0060 to
 master@63.198.215.105:5050
 I0820 20:44:25.247926 29636 status_update_manager.cpp:394] Received
 status update acknowledgement (UUID: 17a21cf7-17d1-42dd-92eb-b281396ebf60)
 for task jay-test-29 of framework 
 20150804-211741-1608624320-5050-18273-0060
 I0820 20:44:25.248108 29636 slave.cpp:3502] Cleaning up executor
 'jay-test-29' of framework 20150804-211741-1608624320-5050-18273-0060
 I0820 20:44:25.248342 29636 slave.cpp:3591] Cleaning up framework
 20150804-211741-1608624320-5050-18273-0060

 And this doesn't really tell me much about *why* it's failed.

 Is there somewhere else I should be looking or an option that needs to
 be turned on to show more information?

 Your assistance is greatly appreciated!

 Jay







Re: Master UI - Tasks section is empty

2015-08-25 Thread Rogier Dikkes

Thank you Craig, just ran into this myself when updating to 0.23

On 8/23/15 7:59 PM, craig w wrote:


See https://issues.apache.org/jira/browse/MESOS-3282

On Aug 23, 2015 12:28 PM, Jeremy Olexa jol...@spscommerce.com 
mailto:jol...@spscommerce.com wrote:


Hi all,

On a new cluster, the tasks section of the left sidebar is
populated as jobs are staged, started, killed, etc. I've noticed
that after a rolling restart of the cluster, like taking a node
out for maintenance - or restarted instances in an ASG, that the
Tasks section of the UI stops working. There are no longer any
value in the UI.

It appears that this part of the UI is in js/controllers.js, but I
don't understand the internals quite yet. Is this issue related to
MESOS-527? Any other insight into this problem?

Thanks,
Jeremy



--
Rogier Dikkes
Systeem Programmeur Hadoop  HPC Cloud
e-mail: rogier.dik...@surfsara.nl | M: +31 6 47 48 93 28
SURFsara | Science Park 140 | 1098 XG Amsterdam



Re: SSL in Mesos 0.23

2015-08-25 Thread Joris Van Remoortere
@Dharmit

If you want to be really sure that the communication is happening over SSL,
you can use a packet sniffing tool like wireshark, or depending on your
operating system you can dump the packet streams directly to a file. For
example TCP dump.
Another thing you can do is to try and hit the HTTP endpoints from curl
using http as opposed to https.

Remember that if you have SSL_SUPPORT_DOWNGRADE=true you should be able to
connect even without SSL. If it is false (the default) you will not be able
to connect.

On Mon, Aug 10, 2015 at 4:43 AM, Dharmit Shah shahdhar...@gmail.com wrote:

 Hi Jeff,

 Thanks for the suggestion.

 I modified the systemd service file to use
 `/etc/sysconfig/mesos-master` and `/etc/sysconfig/mesos-slave` as
 environment files for master and slave services respectively. In these
 files, I specified the environment variables that I used to specify on
 the command line.

 Now if I check `strings /proc/pid/environ | grep SSL` for pids of
 master and slave services, I see the environment variables that I set
 in the /etc/sysconfig/environment-file.

 Now that it looks like I have started the master and slave services
 with SSL enabled, how do I really confirm that communication between
 master and slaves is really happening over SSL?

 Also, how do I enable SSL communication for a framework like Marathon?

 Regards,
 Dharmit.

 On Fri, Aug 7, 2015 at 10:56 PM, Jeff Schroeder
 jeffschroe...@computer.org wrote:
  The sudo command defaults to envreset (look for that in the man page)
 which
  strips all env variables sans a select few. I'd almost bet that your
 SSL_*
  variables are not present and were not passed to the slave. Just sudo -i
 and
  start the slaves *as root* without sudo. There is no benefit to starting
  them with sudo. You can verify what I'm saying with something along the
  lines of:
 
  strings /proc/$(pidof mesos-slave)/environ | grep ^SSL_
 
 
  On Friday, August 7, 2015, Dharmit Shah shahdhar...@gmail.com wrote:
 
  Hello again,
 
  Thanks for your responses. I will share what I tried after your
  suggestions.
 
  1. `ldd /usr/sbin/mesos-master` and `ldd /usr/sbin/mesos-slave`
  returned similar output as one suggested by Craig. So, I guess, the
  Mesosphere repo binaries have SSL enabled. Right?
 
  2. I created SSL private key and cert on one system in my cluster by
  referring this guide on DO [1]. Admittedly, my knowledge of SSL is
  limited.
 
  3. Next, I copied the key and cert to all three mesos-master nodes and
  four mesos-slave nodes. Shouldn't slave nodes be provided only with
  the cert and not the private key? Whereas all master nodes may have
  the private key and cert both. Or am I understanding SSL incorrectly
  here?
 
  4. After copying the cert and key, I started the mesos-master service
  on master nodes with below command:
 
  $ sudo SSL_ENABLED=true SSL_KEY_FILE=~/ssl/mesos.key
  SSL_CERT_FILE=~/ssl/mesos.crt /usr/sbin/mesos-master
  --zk=zk://172.19.10.111:2181,172.19.10.112:2181,
 172.19.10.193:2181/mesos
  --port=5050 --log_dir=/var/log/mesos --acls=file:///root/acls.json
  --credentials=/home/isys/mesos --quorum=2 --work_dir=/var/lib/mesos
 
  I check web UI and things look good. I am not completely sure if
  https should have worked for mesos web UI but, it didn't.
 
  5. Next, I start slave nodes with below command:
 
$ sudo SSL_ENABLED=true SSL_CERT_FILE=~/mesos.crt
  SSL_KEY_FILE=~/mesos.key /usr/sbin/mesos-slave
 
  --master=zk://172.19.10.111:2181,172.19.10.112:2181,
 172.19.10.193:2181/mesos
  --log_dir=/var/log/mesos --containerizers=docker,mesos
  --executor_registration_timeout=15mins
 
  Mesos web UI reported four mesos-slave nodes in Activated mode. So
  far so good. I am still wondering how I should verify if communication
  is happening over SSL.
 
  6. To check if SSL is indeed working, I stopped one slave node and
  started it without SSL using `systemctl start mesos-slave`. I was
  expecting it to not get into Activated state on Mesos web UI but it
  did. So, I think SSL is not configured properly by me.
 
  I am attaching logs from the master nodes. These logs were generated
  after starting masters with command specified in point 4.
 
  Let me know if I am doing something wrong or if you need more logs or
  need me to execute some specific commands.
 
  [1]
 
 https://www.digitalocean.com/community/tutorials/openssl-essentials-working-with-ssl-certificates-private-keys-and-csrs
 
  Regards,
  Dharmit.
 
  On Fri, Aug 7, 2015 at 2:52 AM, Michael Park mcyp...@gmail.com wrote:
   Hi Dharmit,
  
   I'm not certain whether the Mesosphere deb packages have SSL enabled
 or
   not,
   although based on Craig's observation it looks like it is.
  
   I think the correct way to enable SSL is to set the SSL_ENABLED
   environment
   variable, rather than /etc/mesos-master/ssl_enabled. Of course, along
   with
   the rest of the SSL_ environment variables.
  
   e.g. SSL_ENABLED=true SSL_KEY_FILE=path-to-your-private-key
  

Re: SSL in Mesos 0.23

2015-08-25 Thread Joris Van Remoortere
@carlos
Are you building 0.23.0 from source?
Just so we don't miss anything: Can you make sure to run ./bootstrap, and
build in a clean directory with your configuration similar to this:

../configure --enable-libevent --enable-ssl

Here http://mesos.apache.org/documentation/latest/mesos-ssl/ is the
document I am using as a reference

When you start up a master, if you just specify SSL_ENABLED=true it should
error out and notify you that other required flags such as SSL_KEY_FILE are
not provided. Can you verify this? If that is not happening, then the 2
options are:
1. Your environment variables are not making it to the binary: See Jeff
Schroeder's comments
2. The binary is not actually the one you expect. Double check the checksum
with the binary you built after configuring with SSL.



On Fri, Aug 14, 2015 at 12:55 PM, Carlos Sanchez car...@apache.org wrote:

 looking forward to it, thanks!
 running out of ideas here on what am I doing wrong

 On Fri, Aug 14, 2015 at 6:53 PM, Marco Massenzio ma...@mesosphere.io
 wrote:
  FYI - Joris is out this week, he'll be probably able to get back to you
  early next (modulo MesosCon craziness :)
 
  Marco Massenzio
  Distributed Systems Engineer
 
  On Fri, Aug 14, 2015 at 9:14 AM, Carlos Sanchez car...@apache.org
 wrote:
 
  no suggestions?
 
  On Tue, Aug 11, 2015 at 6:47 PM, Vinod Kone vinodk...@apache.org
 wrote:
   @joris, can you help out here?
  
   On Tue, Aug 11, 2015 at 9:43 AM, Carlos Sanchez car...@apache.org
   wrote:
  
   I have tried to enable SSL with no success, even compiling from
 source
   with the ssl flags --enable-libevent --enable-ssl
  
   export SSL_ENABLED=true
   export SSL_SUPPORT_DOWNGRADE=false
   export SSL_REQUIRE_CERT=true
   export SSL_CERT_FILE=/etc/mesos/...
   export SSL_KEY_FILE=/etc/mesos/...
   export SSL_CA_FILE=/etc/mesos/...
  
  
   /home/ubuntu/mesos-deb-packaging/mesos-repo/build/src/mesos-master
   --work_dir=/var/lib/mesos
  
   Port 5050 is still served as plain http, no SSL
  
   Nothing about ssl shows up in the logs, any ideas?
  
   Thanks
  
  
   
From: Dharmit Shah shahdhar...@gmail.com
To: user@mesos.apache.org
Cc:
Date: Mon, 10 Aug 2015 14:13:04 +0530
Subject: Re: SSL in Mesos 0.23
Hi Jeff,
   
Thanks for the suggestion.
   
I modified the systemd service file to use
`/etc/sysconfig/mesos-master` and `/etc/sysconfig/mesos-slave` as
environment files for master and slave services respectively. In
these
files, I specified the environment variables that I used to specify
on
the command line.
   
Now if I check `strings /proc/pid/environ | grep SSL` for pids of
master and slave services, I see the environment variables that I
 set
in the /etc/sysconfig/environment-file.
   
Now that it looks like I have started the master and slave services
with SSL enabled, how do I really confirm that communication
 between
master and slaves is really happening over SSL?
   
Also, how do I enable SSL communication for a framework like
Marathon?
   
Regards,
Dharmit.
   
On Fri, Aug 7, 2015 at 10:56 PM, Jeff Schroeder
jeffschroe...@computer.org wrote:
 The sudo command defaults to envreset (look for that in the man
 page)
 which
 strips all env variables sans a select few. I'd almost bet that
 your
 SSL_*
 variables are not present and were not passed to the slave. Just
 sudo
 -i and
 start the slaves *as root* without sudo. There is no benefit to
 starting
 them with sudo. You can verify what I'm saying with something
 along
 the
 lines of:

 strings /proc/$(pidof mesos-slave)/environ | grep ^SSL_


 On Friday, August 7, 2015, Dharmit Shah shahdhar...@gmail.com
 wrote:

 Hello again,

 Thanks for your responses. I will share what I tried after your
 suggestions.

 1. `ldd /usr/sbin/mesos-master` and `ldd /usr/sbin/mesos-slave`
 returned similar output as one suggested by Craig. So, I guess,
 the
 Mesosphere repo binaries have SSL enabled. Right?

 2. I created SSL private key and cert on one system in my
 cluster
 by
 referring this guide on DO [1]. Admittedly, my knowledge of SSL
 is
 limited.

 3. Next, I copied the key and cert to all three mesos-master
 nodes
 and
 four mesos-slave nodes. Shouldn't slave nodes be provided only
 with
 the cert and not the private key? Whereas all master nodes may
 have
 the private key and cert both. Or am I understanding SSL
 incorrectly
 here?

 4. After copying the cert and key, I started the mesos-master
 service
 on master nodes with below command:

 $ sudo SSL_ENABLED=true SSL_KEY_FILE=~/ssl/mesos.key
 SSL_CERT_FILE=~/ssl/mesos.crt /usr/sbin/mesos-master


 --zk=zk://172.19.10.111:2181,172.19.10.112:2181,
 172.19.10.193:2181/mesos
 --port=5050 

Re: SSL in Mesos 0.23

2015-08-25 Thread Carlos Sanchez
Hi Joris,

I did build from sources, following instructions in
http://mesos.apache.org/gettingstarted/

Is the mesosphere binary compiled with libevent and ssl enabled as
mentioned previously? would make debugging easier if I don't have to rebuild



On Tue, Aug 25, 2015 at 8:52 PM, Joris Van Remoortere jo...@mesosphere.io
wrote:

 @carlos
 Are you building 0.23.0 from source?
 Just so we don't miss anything: Can you make sure to run ./bootstrap, and
 build in a clean directory with your configuration similar to this:

 ../configure --enable-libevent --enable-ssl

 Here http://mesos.apache.org/documentation/latest/mesos-ssl/ is the
 document I am using as a reference

 When you start up a master, if you just specify SSL_ENABLED=true it
 should error out and notify you that other required flags such as SSL_KEY_FILE
 are not provided. Can you verify this? If that is not happening, then the
 2 options are:
 1. Your environment variables are not making it to the binary: See Jeff
 Schroeder's comments
 2. The binary is not actually the one you expect. Double check the
 checksum with the binary you built after configuring with SSL.



 On Fri, Aug 14, 2015 at 12:55 PM, Carlos Sanchez car...@apache.org
 wrote:

 looking forward to it, thanks!
 running out of ideas here on what am I doing wrong

 On Fri, Aug 14, 2015 at 6:53 PM, Marco Massenzio ma...@mesosphere.io
 wrote:
  FYI - Joris is out this week, he'll be probably able to get back to you
  early next (modulo MesosCon craziness :)
 
  Marco Massenzio
  Distributed Systems Engineer
 
  On Fri, Aug 14, 2015 at 9:14 AM, Carlos Sanchez car...@apache.org
 wrote:
 
  no suggestions?
 
  On Tue, Aug 11, 2015 at 6:47 PM, Vinod Kone vinodk...@apache.org
 wrote:
   @joris, can you help out here?
  
   On Tue, Aug 11, 2015 at 9:43 AM, Carlos Sanchez car...@apache.org
   wrote:
  
   I have tried to enable SSL with no success, even compiling from
 source
   with the ssl flags --enable-libevent --enable-ssl
  
   export SSL_ENABLED=true
   export SSL_SUPPORT_DOWNGRADE=false
   export SSL_REQUIRE_CERT=true
   export SSL_CERT_FILE=/etc/mesos/...
   export SSL_KEY_FILE=/etc/mesos/...
   export SSL_CA_FILE=/etc/mesos/...
  
  
   /home/ubuntu/mesos-deb-packaging/mesos-repo/build/src/mesos-master
   --work_dir=/var/lib/mesos
  
   Port 5050 is still served as plain http, no SSL
  
   Nothing about ssl shows up in the logs, any ideas?
  
   Thanks
  
  
   
From: Dharmit Shah shahdhar...@gmail.com
To: user@mesos.apache.org
Cc:
Date: Mon, 10 Aug 2015 14:13:04 +0530
Subject: Re: SSL in Mesos 0.23
Hi Jeff,
   
Thanks for the suggestion.
   
I modified the systemd service file to use
`/etc/sysconfig/mesos-master` and `/etc/sysconfig/mesos-slave` as
environment files for master and slave services respectively. In
these
files, I specified the environment variables that I used to
 specify
on
the command line.
   
Now if I check `strings /proc/pid/environ | grep SSL` for pids
 of
master and slave services, I see the environment variables that I
 set
in the /etc/sysconfig/environment-file.
   
Now that it looks like I have started the master and slave
 services
with SSL enabled, how do I really confirm that communication
 between
master and slaves is really happening over SSL?
   
Also, how do I enable SSL communication for a framework like
Marathon?
   
Regards,
Dharmit.
   
On Fri, Aug 7, 2015 at 10:56 PM, Jeff Schroeder
jeffschroe...@computer.org wrote:
 The sudo command defaults to envreset (look for that in the man
 page)
 which
 strips all env variables sans a select few. I'd almost bet that
 your
 SSL_*
 variables are not present and were not passed to the slave. Just
 sudo
 -i and
 start the slaves *as root* without sudo. There is no benefit to
 starting
 them with sudo. You can verify what I'm saying with something
 along
 the
 lines of:

 strings /proc/$(pidof mesos-slave)/environ | grep ^SSL_


 On Friday, August 7, 2015, Dharmit Shah shahdhar...@gmail.com
 wrote:

 Hello again,

 Thanks for your responses. I will share what I tried after your
 suggestions.

 1. `ldd /usr/sbin/mesos-master` and `ldd /usr/sbin/mesos-slave`
 returned similar output as one suggested by Craig. So, I guess,
 the
 Mesosphere repo binaries have SSL enabled. Right?

 2. I created SSL private key and cert on one system in my
 cluster
 by
 referring this guide on DO [1]. Admittedly, my knowledge of
 SSL is
 limited.

 3. Next, I copied the key and cert to all three mesos-master
 nodes
 and
 four mesos-slave nodes. Shouldn't slave nodes be provided only
 with
 the cert and not the private key? Whereas all master nodes may
 have
 the private key and cert both. Or am I understanding SSL
 incorrectly
 here?

 4. After 

Re: Are the resource options documented?

2015-08-25 Thread haosdent
Also have disk resource. It is documented in attributes-resources.md
https://github.com/apache/mesos/blob/master/docs/attributes-resources.md

On Wed, Aug 26, 2015 at 1:31 AM, craig w codecr...@gmail.com wrote:

 When configuring a mesos-slave with --resources, I know cpu, mem and
 ports are available. Are there others? Are these documented somewhere?

 I've found some examples here
 https://open.mesosphere.com/reference/mesos-slave/ and the configuration
 page (http://mesos.apache.org/documentation/latest/configuration/) is
 generic with it's description of --resources.

 Thanks
 craig




-- 
Best Regards,
Haosdent Huang


Re: Allocation algorithm

2015-08-25 Thread Vinod Kone
The hierarchical allocator looks at one agent's resource at a time. For
each agent, it runs DRF to figure out the candidate framework.

More details here:

https://github.com/apache/mesos/blob/master/src/master/allocator/mesos/hierarchical.hpp#L935

Regarding starvation you observed, yes that is possible with DRF. We plan
to address this by optimistic offers (no implementation yet) and quotas
(WIP).


On Mon, Aug 24, 2015 at 8:42 AM, Hans van den Bogert hansbog...@gmail.com
wrote:

 Can anyone tell how the Mesos allocation algorithm works:
 Does Mesos offer every free resource it has to one framework at a time? Or
 does the allocator divide the max offer size by the amount of
 active/registered frameworks?
   and
 in case of:
   FW1 has a high dominant resource fraction (50%), which it does not
 release. FW2 and FW3 have a lot of churn for their tasks, both have
 outstanding short lived tasks in their queue (shorter than the mesos
 allocation interval), these 2 FWs accept all resources Mesos has to offer -
 if they get the offer.

 Reading the DRF paper and presentation, am I to assume the online DRF
 algorithm would favour FW2 and FW3 always before FW1? As one of the two
 (FW2/3) will always (or at least more likely to,) have a lower dominant
 resource than FW1. According to the presentation on DRF, the framework with
 the lowest dominant resource gets the offer. But this is a potential
 starvation e.g., if a framework has allocated memory, but needs a new offer
 with CPUs to actually do something. You might wonder why the framework
 didn’t use memory AND cpu from the same offer, but Spark for example does
 exactly this.

 To give some context, I think I’m seeing this behaviour with Spark in
 fine-grained mode. I have 4 spark instances which are long-lived, emulating
 interactive queries. The first Spark instance to get an offer “installs”
 executors (with high memory demand) on every slave node it sees. The next
 framework tries to do the same, but for these later instances, theres not
 always enough executor memory, that’s why I end up with an instance, which
 was first to get the offer, with a lot of memory it doesn’t let go, but it
 also gets way less offers for CPU afterwards. In contrast the later spark
 instances with less long-living executors do not have a high memory usage,
 and get relatively more CPU offers.
 Of course setting a max amount  of  Spark executors per framework instance
 would mitigate this, but then I’m basically back to static allocation of
 resources.

 Thanks in advance,

 Hans







Re: Allocation algorithm

2015-08-25 Thread Iulian Dragoș
On Mon, Aug 24, 2015 at 5:42 PM, Hans van den Bogert hansbog...@gmail.com
wrote:

 Can anyone tell how the Mesos allocation algorithm works:
 Does Mesos offer every free resource it has to one framework at a time? Or
 does the allocator divide the max offer size by the amount of
 active/registered frameworks?
   and
 in case of:
   FW1 has a high dominant resource fraction (50%), which it does not
 release. FW2 and FW3 have a lot of churn for their tasks, both have
 outstanding short lived tasks in their queue (shorter than the mesos
 allocation interval), these 2 FWs accept all resources Mesos has to offer -
 if they get the offer.

 Reading the DRF paper and presentation, am I to assume the online DRF
 algorithm would favour FW2 and FW3 always before FW1? As one of the two
 (FW2/3) will always (or at least more likely to,) have a lower dominant
 resource than FW1. According to the presentation on DRF, the framework with
 the lowest dominant resource gets the offer. But this is a potential
 starvation e.g., if a framework has allocated memory, but needs a new offer
 with CPUs to actually do something. You might wonder why the framework
 didn’t use memory AND cpu from the same offer, but Spark for example does
 exactly this.


I'd love to learn more from Mesos devs about the allocation algorithm. In
my limited understanding, you are correct.



 To give some context, I think I’m seeing this behaviour with Spark in
 fine-grained mode. I have 4 spark instances which are long-lived, emulating
 interactive queries. The first Spark instance to get an offer “installs”
 executors (with high memory demand) on every slave node it sees. The next
 framework tries to do the same, but for these later instances, theres not
 always enough executor memory, that’s why I end up with an instance, which
 was first to get the offer, with a lot of memory it doesn’t let go, but it
 also gets way less offers for CPU afterwards. In contrast the later spark
 instances with less long-living executors do not have a high memory usage,
 and get relatively more CPU offers.
 Of course setting a max amount  of  Spark executors per framework instance
 would mitigate this, but then I’m basically back to static allocation of
 resources.


I've seen similar behavior with Spark's fine-grained mode. See my thread
from a couple of days ago.

I would recommend using coarse-grained mode with dynamic allocation
(available in the future 1.5 version). We worked around this by using Mesos
roles, and assigning Spark to a specific role. It seems Mesos will allocate
resources based on roles, if configured. Unfortunately, `spark.mesos.role`
is a new configuration parameter to be added in 1.5 as well, so we needed
to use Spark 1.5 preview.

iulian



 Thanks in advance,

 Hans







-- 

--
Iulian Dragos

--
Reactive Apps on the JVM
www.typesafe.com


Re: Not getting resource offers for 20 min

2015-08-25 Thread Iulian Dragoș
On Mon, Aug 24, 2015 at 7:16 PM, CCAAT cc...@tampabay.rr.com wrote:

 On 08/24/2015 05:33 AM, Iulian Dragoș wrote:


 Hello Iulian,

 Ok, so I eventually build spark from 100% sources, after some intermediate
 builds on gentoo.   Gentoo is not the best platform for Java development,
 but those issues related to spark builds are slowly being fixed on gentoo.
 Where (how) do you download the spark-1.5.x complete source tree, as it
 does not seen available on this page::

 http://spark.apache.org/downloads.html


It's not yet a final release, but there's a preview:

http://apache-spark-developers-list.1001551.n3.nabble.com/Fwd-ANNOUNCE-Spark-1-5-0-preview-package-td13683.html

Building Spark from sources isn't too hard, there's a
`make-distribution.sh` script in the root directory. There are a few
parameters (like the dependency Hadoop version), but it should be fairly
straight forward. More info here:

http://spark.apache.org/docs/latest/building-spark.html

iulian




 Any other related information or tips on building out spark from sources
 are keenly received.

 James

 Unfortunately I don't have access to the cluster anymore, but I think
 Chronos wasn't the culprit. After updating Spark to 1.5 and setting a
 framework role offers started to come (while still using Chronos).

 iulian


 Iulian Dragos

 --
 Reactive Apps on the JVM
 www.typesafe.com http://www.typesafe.com





-- 

--
Iulian Dragos

--
Reactive Apps on the JVM
www.typesafe.com


Re: Mesos/Marathon/HAProxy Logging

2015-08-25 Thread John Omernik
So I agree that is how it should be done, however the current
implementation on Mesos, requires me to manually code something like. In
addition, this is only for http traffic, not tcp... what happens when the
service running on Mesos isn't HTTP? I was hoping for some discussion
beyond just manually editing the ha proxy script to make it http and add
the headers...


On Tue, Aug 25, 2015 at 12:46 PM, Jeff Schroeder jeffschroe...@computer.org
 wrote:

 This is the header that should be passed:

 https://en.m.wikipedia.org/wiki/X-Forwarded-For

 Most of the modern internet routes through reverse proxies and this is how
 we log the actual source clients to solve similar auditing and compliance
 needs.


 On Tuesday, August 25, 2015, John Omernik j...@omernik.com wrote:

 I have been playing with an application that is a very simple app: A
 webservice running in Python. I've created a docker container, it runs in
 the container, I setup marathon to run it, I use mesos-dns and ha proxy and
 I can access the service just fine anywhere in the cluster.

 First let me say this is VERY cool. The capabilities here awesome.

 Now the challenge: the security guy in me wants to take good logs from my
 app.  It was setup to do it's own logging through a custom module. I am
 very happy with it.  I setup the app in the container to mount a volume
 that's in my MapRFS via NFS so I can log directly to a clustered
 filesystem. THis is awesome, I can read my logs in Apache Drill as they are
 written!!!

 However, the haproxy through me for a loop. Once I started running the
 app in Marathon with a service port and routed around via haproxy, I
 realized something:  I lost my source IPs in my logs?

 Why?

 Because once HAProxy takes over, it no longer needs to keep the source
 IP, and instead the next hop only sees the previous connection IP.  From a
 service discovery perspective it works great, but with this setup, I'd lose
 the previous hop. Perhaps I manually add something in haproxy to add an
 X-forwarded-for header, that would be nice, however, that only works for
 http apps, what about other TCP apps that are not HTTP?

 This is an interesting problem, because apps should have good logging,
 security, performance, troubleshooting, and if I can't get the source IP it
 could be a problem.

 So, my question is this, anyone ran into this? How are you handling it?
 Any brainstorms here we may be able to work off of?

 One thing I thought was why are we using HAproxy? Couldn't the same
 HAProxy script, actually put in forwarding rules in IPtables?  This sounds
 messy, but could it work? Has anyone explored that? If the data was
 forwarded, than it wouldn't lose the IP information (and timeouts wouldn't
 be a concern either (I think I posted before on how long running TCP
 connections can be closed down by HAProxy if they don't implement TCP Keep
 alives).

 Other ideas?  This is interesting to me, and likely others.



 --
 Text by Jeff, typos by iPhone



Re: Not getting resource offers for 20 min

2015-08-25 Thread Hans van den Bogert
Wanted to add that, even if there wasn’t a preview package, you can clone from 
GIT, and checkout a tag, where in this case v1.5.0-rc1 is tagged. Then 
proceeded normally as you would’ve had a source distro as described in the 
already mentioned http://spark.apache.org/docs/latest/building-spark.html

 On 25 Aug 2015, at 19:26, CCAAT cc...@tampabay.rr.com wrote:
 
 THANKS, as I have not kept up on the spark lists
 
 James
 
 
 On 08/25/2015 04:28 AM, Iulian Dragoș wrote:
 
 
 On Mon, Aug 24, 2015 at 7:16 PM, CCAAT cc...@tampabay.rr.com
 mailto:cc...@tampabay.rr.com wrote:
 
On 08/24/2015 05:33 AM, Iulian Dragoș wrote:
 
 
Hello Iulian,
 
Ok, so I eventually build spark from 100% sources, after some
intermediate builds on gentoo.   Gentoo is not the best platform for
Java development, but those issues related to spark builds are
slowly being fixed on gentoo. Where (how) do you download the
spark-1.5.x complete source tree, as it does not seen available on
this page::
 
http://spark.apache.org/downloads.html
 
 
 It's not yet a final release, but there's a preview:
 
 http://apache-spark-developers-list.1001551.n3.nabble.com/Fwd-ANNOUNCE-Spark-1-5-0-preview-package-td13683.html
 
 Building Spark from sources isn't too hard, there's a
 `make-distribution.sh` script in the root directory. There are a few
 parameters (like the dependency Hadoop version), but it should be fairly
 straight forward. More info here:
 
 http://spark.apache.org/docs/latest/building-spark.html
 
 iulian
 
 
 
Any other related information or tips on building out spark from sources
are keenly received.
 
James
 
Unfortunately I don't have access to the cluster anymore, but I
think
Chronos wasn't the culprit. After updating Spark to 1.5 and
setting a
framework role offers started to come (while still using Chronos).
 
iulian
 
 
Iulian Dragos
 
--
Reactive Apps on the JVM
www.typesafe.com http://www.typesafe.com http://www.typesafe.com
 
 
 
 
 
 --
 
 --
 Iulian Dragos
 
 --
 Reactive Apps on the JVM
 www.typesafe.com http://www.typesafe.com
 
 



slave_ping_timeout 1secs

2015-08-25 Thread Nastooh Avessta (navesta)
Hi
Running Mesos 0.23.0 and noted that cannot start mesos-master with 
slave_ping_timeout less than 1 second,  tried 0.5secs, 500ms and 50us, etc. 
Is this by design or am I missing something?
Cheers,

[http://www.cisco.com/web/europe/images/email/signature/logo05.jpg]

Nastooh Avessta
ENGINEER.SOFTWARE ENGINEERING
nave...@cisco.com
Phone: +1 604 647 1527

Cisco Systems Limited
595 Burrard Street, Suite 2123 Three Bentall Centre, PO Box 49121
VANCOUVER
BRITISH COLUMBIA
V7X 1J1
CA
Cisco.comhttp://www.cisco.com/





[Think before you print.]Think before you print.

This email may contain confidential and privileged material for the sole use of 
the intended recipient. Any review, use, distribution or disclosure by others 
is strictly prohibited. If you are not the intended recipient (or authorized to 
receive for the recipient), please contact the sender by reply email and delete 
all copies of this message.
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html

Cisco Systems Canada Co, 181 Bay St., Suite 3400, Toronto, ON, Canada, M5J 2T3. 
Phone: 416-306-7000; Fax: 416-306-7099. 
Preferenceshttp://www.cisco.com/offer/subscribe/?sid=000478326 - 
Unsubscribehttp://www.cisco.com/offer/unsubscribe/?sid=000478327 - 
Privacyhttp://www.cisco.com/web/siteassets/legal/privacy.html



Re: slave_ping_timeout 1secs

2015-08-25 Thread Yan Xu
Yes:
https://github.com/apache/mesos/blob/5de7ea455ec577e19c67a75b1cf98493b40c53fb/src/master/flags.cpp#L383

Was the error message not shown in stderr?

--
Jiang Yan Xu y...@jxu.me @xujyan http://twitter.com/xujyan

On Tue, Aug 25, 2015 at 5:41 PM, Nastooh Avessta (navesta) 
nave...@cisco.com wrote:

 Hi

 Running Mesos 0.23.0 and noted that cannot start mesos-master with
 slave_ping_timeout less than 1 second,  tried 0.5secs, 500ms and 50us,
 etc. Is this by design or am I missing something?

 Cheers,



 [image: http://www.cisco.com/web/europe/images/email/signature/logo05.jpg]

 *Nastooh Avessta*
 ENGINEER.SOFTWARE ENGINEERING
 nave...@cisco.com
 Phone: *+1 604 647 1527 %2B1%20604%20647%201527*

 *Cisco Systems Limited*
 595 Burrard Street, Suite 2123 Three Bentall Centre, PO Box 49121
 VANCOUVER
 BRITISH COLUMBIA
 V7X 1J1
 CA
 Cisco.com http://www.cisco.com/



 [image: Think before you print.]Think before you print.

 This email may contain confidential and privileged material for the sole
 use of the intended recipient. Any review, use, distribution or disclosure
 by others is strictly prohibited. If you are not the intended recipient (or
 authorized to receive for the recipient), please contact the sender by
 reply email and delete all copies of this message.

 For corporate legal information go to:
 http://www.cisco.com/web/about/doing_business/legal/cri/index.html

 Cisco Systems Canada Co, 181 Bay St., Suite 3400, Toronto, ON, Canada, M5J
 2T3. Phone: 416-306-7000; Fax: 416-306-7099. *Preferences
 http://www.cisco.com/offer/subscribe/?sid=000478326 - Unsubscribe
 http://www.cisco.com/offer/unsubscribe/?sid=000478327 – Privacy
 http://www.cisco.com/web/siteassets/legal/privacy.html*





RE: slave_ping_timeout 1secs

2015-08-25 Thread Nastooh Avessta (navesta)
I see. Thank you for the clarification. Can I just change the boundaries in the 
source code, to suit my needs, or there is more to it?
Cheers,

[http://www.cisco.com/web/europe/images/email/signature/logo05.jpg]

Nastooh Avessta
ENGINEER.SOFTWARE ENGINEERING
nave...@cisco.com
Phone: +1 604 647 1527

Cisco Systems Limited
595 Burrard Street, Suite 2123 Three Bentall Centre, PO Box 49121
VANCOUVER
BRITISH COLUMBIA
V7X 1J1
CA
Cisco.comhttp://www.cisco.com/





[Think before you print.]Think before you print.

This email may contain confidential and privileged material for the sole use of 
the intended recipient. Any review, use, distribution or disclosure by others 
is strictly prohibited. If you are not the intended recipient (or authorized to 
receive for the recipient), please contact the sender by reply email and delete 
all copies of this message.
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html

Cisco Systems Canada Co, 181 Bay St., Suite 3400, Toronto, ON, Canada, M5J 2T3. 
Phone: 416-306-7000; Fax: 416-306-7099. 
Preferenceshttp://www.cisco.com/offer/subscribe/?sid=000478326 - 
Unsubscribehttp://www.cisco.com/offer/unsubscribe/?sid=000478327 – 
Privacyhttp://www.cisco.com/web/siteassets/legal/privacy.html

From: Yan Xu [mailto:y...@jxu.me]
Sent: Tuesday, August 25, 2015 5:49 PM
To: user@mesos.apache.org
Subject: Re: slave_ping_timeout 1secs

Yes: 
https://github.com/apache/mesos/blob/5de7ea455ec577e19c67a75b1cf98493b40c53fb/src/master/flags.cpp#L383

Was the error message not shown in stderr?

--
Jiang Yan Xu y...@jxu.memailto:y...@jxu.me 
@xujyanhttp://twitter.com/xujyan

On Tue, Aug 25, 2015 at 5:41 PM, Nastooh Avessta (navesta) 
nave...@cisco.commailto:nave...@cisco.com wrote:
Hi
Running Mesos 0.23.0 and noted that cannot start mesos-master with 
slave_ping_timeout less than 1 second,  tried 0.5secs, 500ms and 50us, etc. 
Is this by design or am I missing something?
Cheers,

[http://www.cisco.com/web/europe/images/email/signature/logo05.jpg]

Nastooh Avessta
ENGINEER.SOFTWARE ENGINEERING
nave...@cisco.commailto:nave...@cisco.com
Phone: +1 604 647 1527tel:%2B1%20604%20647%201527

Cisco Systems Limited
595 Burrard Street, Suite 2123 Three Bentall Centre, PO Box 49121
VANCOUVER
BRITISH COLUMBIA
V7X 1J1
CA
Cisco.comhttp://www.cisco.com/





[Think before you print.]Think before you print.

This email may contain confidential and privileged material for the sole use of 
the intended recipient. Any review, use, distribution or disclosure by others 
is strictly prohibited. If you are not the intended recipient (or authorized to 
receive for the recipient), please contact the sender by reply email and delete 
all copies of this message.
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html

Cisco Systems Canada Co, 181 Bay St., Suite 3400, Toronto, ON, Canada, M5J 2T3. 
Phone: 416-306-7000tel:416-306-7000; Fax: 416-306-7099tel:416-306-7099. 
Preferenceshttp://www.cisco.com/offer/subscribe/?sid=000478326 - 
Unsubscribehttp://www.cisco.com/offer/unsubscribe/?sid=000478327 – 
Privacyhttp://www.cisco.com/web/siteassets/legal/privacy.html




Re: SSL in Mesos 0.23

2015-08-25 Thread Joris Van Remoortere
@Carlos
Mesosphere currently doesn't build packages with ssl enabled.

On Tue, Aug 25, 2015 at 3:12 PM, Carlos Sanchez car...@apache.org wrote:

 Hi Joris,

 I did build from sources, following instructions in
 http://mesos.apache.org/gettingstarted/

 Is the mesosphere binary compiled with libevent and ssl enabled as
 mentioned previously? would make debugging easier if I don't have to rebuild



 On Tue, Aug 25, 2015 at 8:52 PM, Joris Van Remoortere jo...@mesosphere.io
  wrote:

 @carlos
 Are you building 0.23.0 from source?
 Just so we don't miss anything: Can you make sure to run ./bootstrap,
 and build in a clean directory with your configuration similar to this:

 ../configure --enable-libevent --enable-ssl

 Here http://mesos.apache.org/documentation/latest/mesos-ssl/ is the
 document I am using as a reference

 When you start up a master, if you just specify SSL_ENABLED=true it
 should error out and notify you that other required flags such as 
 SSL_KEY_FILE
 are not provided. Can you verify this? If that is not happening, then the
 2 options are:
 1. Your environment variables are not making it to the binary: See Jeff
 Schroeder's comments
 2. The binary is not actually the one you expect. Double check the
 checksum with the binary you built after configuring with SSL.



 On Fri, Aug 14, 2015 at 12:55 PM, Carlos Sanchez car...@apache.org
 wrote:

 looking forward to it, thanks!
 running out of ideas here on what am I doing wrong

 On Fri, Aug 14, 2015 at 6:53 PM, Marco Massenzio ma...@mesosphere.io
 wrote:
  FYI - Joris is out this week, he'll be probably able to get back to you
  early next (modulo MesosCon craziness :)
 
  Marco Massenzio
  Distributed Systems Engineer
 
  On Fri, Aug 14, 2015 at 9:14 AM, Carlos Sanchez car...@apache.org
 wrote:
 
  no suggestions?
 
  On Tue, Aug 11, 2015 at 6:47 PM, Vinod Kone vinodk...@apache.org
 wrote:
   @joris, can you help out here?
  
   On Tue, Aug 11, 2015 at 9:43 AM, Carlos Sanchez car...@apache.org
   wrote:
  
   I have tried to enable SSL with no success, even compiling from
 source
   with the ssl flags --enable-libevent --enable-ssl
  
   export SSL_ENABLED=true
   export SSL_SUPPORT_DOWNGRADE=false
   export SSL_REQUIRE_CERT=true
   export SSL_CERT_FILE=/etc/mesos/...
   export SSL_KEY_FILE=/etc/mesos/...
   export SSL_CA_FILE=/etc/mesos/...
  
  
   /home/ubuntu/mesos-deb-packaging/mesos-repo/build/src/mesos-master
   --work_dir=/var/lib/mesos
  
   Port 5050 is still served as plain http, no SSL
  
   Nothing about ssl shows up in the logs, any ideas?
  
   Thanks
  
  
   
From: Dharmit Shah shahdhar...@gmail.com
To: user@mesos.apache.org
Cc:
Date: Mon, 10 Aug 2015 14:13:04 +0530
Subject: Re: SSL in Mesos 0.23
Hi Jeff,
   
Thanks for the suggestion.
   
I modified the systemd service file to use
`/etc/sysconfig/mesos-master` and `/etc/sysconfig/mesos-slave` as
environment files for master and slave services respectively. In
these
files, I specified the environment variables that I used to
 specify
on
the command line.
   
Now if I check `strings /proc/pid/environ | grep SSL` for pids
 of
master and slave services, I see the environment variables that
 I set
in the /etc/sysconfig/environment-file.
   
Now that it looks like I have started the master and slave
 services
with SSL enabled, how do I really confirm that communication
 between
master and slaves is really happening over SSL?
   
Also, how do I enable SSL communication for a framework like
Marathon?
   
Regards,
Dharmit.
   
On Fri, Aug 7, 2015 at 10:56 PM, Jeff Schroeder
jeffschroe...@computer.org wrote:
 The sudo command defaults to envreset (look for that in the man
 page)
 which
 strips all env variables sans a select few. I'd almost bet that
 your
 SSL_*
 variables are not present and were not passed to the slave.
 Just
 sudo
 -i and
 start the slaves *as root* without sudo. There is no benefit to
 starting
 them with sudo. You can verify what I'm saying with something
 along
 the
 lines of:

 strings /proc/$(pidof mesos-slave)/environ | grep ^SSL_


 On Friday, August 7, 2015, Dharmit Shah shahdhar...@gmail.com
 
 wrote:

 Hello again,

 Thanks for your responses. I will share what I tried after
 your
 suggestions.

 1. `ldd /usr/sbin/mesos-master` and `ldd
 /usr/sbin/mesos-slave`
 returned similar output as one suggested by Craig. So, I
 guess,
 the
 Mesosphere repo binaries have SSL enabled. Right?

 2. I created SSL private key and cert on one system in my
 cluster
 by
 referring this guide on DO [1]. Admittedly, my knowledge of
 SSL is
 limited.

 3. Next, I copied the key and cert to all three mesos-master
 nodes
 and
 four mesos-slave nodes. Shouldn't slave nodes be provided only
 with
 the cert and not the