Hi
Trying to adjust the current failover time to below 10 seconds and don't seem 
to be able to find the right set of parameters. Currently, it takes around 
minute and half for master to detect that a slave has gone offline, which seems 
to correspond to slave_ping_timeout=15*max_slave_ping_timeouts=5. However, I 
can't find these parameters in mesos-master:

# mesos-master --version
mesos 0.22.1
#mesos-master --help
Usage: mesos-master [...]

Supported options:
  --acls=VALUE                             The value could be a JSON formatted 
string of ACLs
                                           or a file path containing the JSON 
formatted ACLs used
                                           for authorization. Path could be of 
the form 'file:///path/to/file'
                                           or '/path/to/file'.

                                           See the ACLs protobuf in mesos.proto 
for the expected format.

                                           Example:
                                           {
                                             "register_frameworks": [
                                                                  {
                                                                     
"principals": { "type": "ANY" },
                                                                     "roles": { 
"values": ["a"] }
                                                                  }
                                                                ],
                                             "run_tasks": [
                                                             {
                                                                "principals": { 
"values": ["a", "b"] },
                                                                "users": { 
"values": ["c"] }
                                                             }
                                                           ],
                                             "shutdown_frameworks": [
                                                           {
                                                              "principals": { 
"values": ["a", "b"] },
                                                              
"framework_principals": { "values": ["c"] }
                                                           }
                                                         ]
                                           }
  --allocation_interval=VALUE              Amount of time to wait between 
performing
                                            (batch) allocations (e.g., 500ms, 
1sec, etc). (default: 1secs)
  --[no-]authenticate                      If authenticate is 'true' only 
authenticated frameworks are allowed
                                           to register. If 'false' 
unauthenticated frameworks are also
                                           allowed to register. (default: false)
  --[no-]authenticate_slaves               If 'true' only authenticated slaves 
are allowed to register.
                                           If 'false' unauthenticated slaves 
are also allowed to register. (default: false)
  --authenticators=VALUE                   Authenticator implementation to use 
when authenticating frameworks
                                           and/or slaves. Use the default 
'crammd5', or
                                           load an alternate authenticator 
module using --modules. (default: crammd5)
  --cluster=VALUE                          Human readable name for the cluster,
                                           displayed in the webui.
  --credentials=VALUE                      Either a path to a text file with a 
list of credentials,
                                           each line containing 'principal' and 
'secret' separated by whitespace,
                                           or, a path to a JSON-formatted file 
containing credentials.
                                           Path could be of the form 
'file:///path/to/file' or '/path/to/file'.
                                           JSON file Example:
                                           {
                                             "credentials": [
                                                               {
                                                                  "principal": 
"sherman",
                                                                  "secret": 
"kitesurf",
                                                               }
                                                              ]
                                           }
                                           Text file Example:
                                           username secret

  --external_log_file=VALUE                Specified the externally managed log 
file. This file will be
                                           exposed in the webui and HTTP api. 
This is useful when using
                                           stderr logging as the log file is 
otherwise unknown to Mesos.
  --framework_sorter=VALUE                 Policy to use for allocating 
resources
                                           between a given user's frameworks. 
Options
                                           are the same as for user_allocator. 
(default: drf)
  --[no-]help                              Prints this help message (default: 
false)
  --hooks=VALUE                            A comma separated list of hook 
modules to be
                                           installed inside master.
  --hostname=VALUE                         The hostname the master should 
advertise in ZooKeeper.
                                           If left unset, the hostname is 
resolved from the IP address
                                           that the master binds to.
  --[no-]initialize_driver_logging         Whether to automatically initialize 
google logging of scheduler
                                           and/or executor drivers. (default: 
true)
  --ip=VALUE                               IP address to listen on
  --[no-]log_auto_initialize               Whether to automatically initialize 
the replicated log used for the
                                           registry. If this is set to false, 
the log has to be manually
                                           initialized when used for the very 
first time. (default: true)
  --log_dir=VALUE                          Directory path to put log files (no 
default, nothing
                                           is written to disk unless specified;
                                           does not affect logging to stderr).
                                           NOTE: 3rd party log messages (e.g. 
ZooKeeper) are
                                           only written to stderr!

  --logbufsecs=VALUE                       How many seconds to buffer log 
messages for (default: 0)
  --logging_level=VALUE                    Log message at or above this level; 
possible values:
                                           'INFO', 'WARNING', 'ERROR'; if quiet 
flag is used, this
                                           will affect just the logs from 
log_dir (if specified) (default: INFO)
  --modules=VALUE                          List of modules to be loaded and be 
available to the internal
                                           subsystems.

                                           Use --modules=filepath to specify 
the list of modules via a
                                           file containing a JSON formatted 
string. 'filepath' can be
                                           of the form 'file:///path/to/file' 
or '/path/to/file'.

                                           Use --modules="{...}" to specify the 
list of modules inline.

                                           Example:
                                           {
                                             "libraries": [
                                               {
                                                 "file": "/path/to/libfoo.so",
                                                 "modules": [
                                                   {
                                                     "name": 
"org_apache_mesos_bar",
                                                     "parameters": [
                                                       {
                                                         "key": "X",
                                                         "value": "Y"
                                                       }
                                                     ]
                                                   },
                                                   {
                                                     "name": 
"org_apache_mesos_baz"
                                                   }
                                                 ]
                                               },
                                               {
                                                 "name": "qux",
                                                 "modules": [
                                                   {
                                                     "name": 
"org_apache_mesos_norf"
                                                   }
                                                 ]
                                               }
                                             ]
                                           }
  --offer_timeout=VALUE                    Duration of time before an offer is 
rescinded from a framework.
                                           This helps fairness when running 
frameworks that hold on to offers,
                                           or frameworks that accidentally drop 
offers.
  --port=VALUE                             Port to listen on (default: 5050)
  --[no-]quiet                             Disable logging to stderr (default: 
false)
  --quorum=VALUE                           The size of the quorum of replicas 
when using 'replicated_log' based
                                           registry. It is imperative to set 
this value to be a majority of
                                           masters i.e., quorum > (number of 
masters)/2.
  --rate_limits=VALUE                      The value could be a JSON formatted 
string of rate limits
                                           or a file path containing the JSON 
formatted rate limits used
                                           for framework rate limiting.
                                           Path could be of the form 
'file:///path/to/file'
                                           or '/path/to/file'.

                                           See the RateLimits protobuf in 
mesos.proto for the expected format.

                                           Example:
                                           {
                                             "limits": [
                                               {
                                                 "principal": "foo",
                                                 "qps": 55.5
                                               },
                                               {
                                                 "principal": "bar"
                                               }
                                             ],
                                             "aggregate_default_qps": 33.3
                                           }
  --recovery_slave_removal_limit=VALUE     For failovers, limit on the 
percentage of slaves that can be removed
                                           from the registry *and* shutdown 
after the re-registration timeout
                                           elapses. If the limit is exceeded, 
the master will fail over rather
                                           than remove the slaves.
                                           This can be used to provide safety 
guarantees for production
                                           environments. Production 
environments may expect that across Master
                                           failovers, at most a certain 
percentage of slaves will fail
                                           permanently (e.g. due to rack-level 
failures).
                                           Setting this limit would ensure that 
a human needs to get
                                           involved should an unexpected 
widespread failure of slaves occur
                                           in the cluster.
                                           Values: [0%-100%] (default: 100%)
  --registry=VALUE                         Persistence strategy for the 
registry;
                                           available options are 
'replicated_log', 'in_memory' (for testing). (default: replicated_log)
  --registry_fetch_timeout=VALUE           Duration of time to wait in order to 
fetch data from the registry
                                           after which the operation is 
considered a failure. (default: 1mins)
  --registry_store_timeout=VALUE           Duration of time to wait in order to 
store data in the registry
                                           after which the operation is 
considered a failure. (default: 5secs)
  --[no-]registry_strict                   Whether the Master will take actions 
based on the persistent
                                           information stored in the Registry. 
Setting this to false means
                                           that the Registrar will never reject 
the admission, readmission,
                                           or removal of a slave. Consequently, 
'false' can be used to
                                           bootstrap the persistent state on a 
running cluster.
                                           NOTE: This flag is *experimental* 
and should not be used in
                                           production yet. (default: false)
  --roles=VALUE                            A comma separated list of the 
allocation
                                           roles that frameworks in this 
cluster may
                                           belong to.
  --[no-]root_submissions                  Can root submit frameworks? 
(default: true)
  --slave_removal_rate_limit=VALUE         The maximum rate (e.g., 1/10mins, 
2/3hrs, etc) at which slaves will
                                           be removed from the master when they 
fail health checks. By default
                                           slaves will be removed as soon as 
they fail the health checks.
                                           The value is of the form <Number of 
slaves>/<Duration>.
  --slave_reregister_timeout=VALUE         The timeout within which all slaves 
are expected to re-register
                                           when a new master is elected as the 
leader. Slaves that do not
                                           re-register within the timeout will 
be removed from the registry
                                           and will be shutdown if they attempt 
to communicate with master.
                                           NOTE: This value has to be atleast 
10mins. (default: 10mins)
  --user_sorter=VALUE                      Policy to use for allocating 
resources
                                           between users. May be one of:
                                             dominant_resource_fairness (drf) 
(default: drf)
  --[no-]version                           Show version and exit. (default: 
false)
  --webui_dir=VALUE                        Directory path of the webui 
files/assets (default: /usr/share/mesos/webui)
  --weights=VALUE                          A comma separated list of 
role/weight pairs
                                           of the form 
'role=weight,role=weight'. Weights
                                           are used to indicate forms of 
priority.
  --whitelist=VALUE                        Path to a file with a list of slaves
                                           (one per line) to advertise offers 
for.
                                           Path could be of the form 
'file:///path/to/file' or '/path/to/file'.
  --work_dir=VALUE                         Directory path to store the 
persistent information stored in the
                                           Registry. (example: 
/var/lib/mesos/master)
  --zk=VALUE                               ZooKeeper URL (used for leader 
election amongst masters)
                                           May be one of:
                                             
zk://host1:port1,host2:port2,.../path
                                             
zk://username:password@host1:port1,host2:port2,.../path
                                             file:///path/to/file (where file 
contains one of the above)
  --zk_session_timeout=VALUE               ZooKeeper session timeout. (default: 
10secs)

Furthermore, setting these parameter either in /etc/mesos-master/ or inline 
generates the following error:
# /usr/sbin/mesos-master --zk=zk://10.40.50.228:2181/mesos --port=5050 
--log_dir=/var/log/mesos --hostname=10.40.50.228 --ip=10.40.50.228 --quorum=1 
--work
_dir=/var/lib/mesos --max_slave_ping_timeouts=2
Failed to load unknown flag 'max_slave_ping_timeouts'
Usage: mesos-master [...]

Supported options:
  --acls=VALUE                             The valu
...

Any thoughts?
Cheers,
[http://www.cisco.com/web/europe/images/email/signature/logo05.jpg]

Nastooh Avessta
ENGINEER.SOFTWARE ENGINEERING
nave...@cisco.com
Phone: +1 604 647 1527

Cisco Systems Limited
595 Burrard Street, Suite 2123 Three Bentall Centre, PO Box 49121
VANCOUVER
BRITISH COLUMBIA
V7X 1J1
CA
Cisco.com<http://www.cisco.com/>





[Think before you print.]Think before you print.

This email may contain confidential and privileged material for the sole use of 
the intended recipient. Any review, use, distribution or disclosure by others 
is strictly prohibited. If you are not the intended recipient (or authorized to 
receive for the recipient), please contact the sender by reply email and delete 
all copies of this message.
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html

Cisco Systems Canada Co, 181 Bay St., Suite 3400, Toronto, ON, Canada, M5J 2T3. 
Phone: 416-306-7000; Fax: 416-306-7099. 
Preferences<http://www.cisco.com/offer/subscribe/?sid=000478326> - 
Unsubscribe<http://www.cisco.com/offer/unsubscribe/?sid=000478327> - 
Privacy<http://www.cisco.com/web/siteassets/legal/privacy.html>

Reply via email to