Hi
Trying to adjust the current failover time to below 10 seconds and don't seem
to be able to find the right set of parameters. Currently, it takes around
minute and half for master to detect that a slave has gone offline, which seems
to correspond to slave_ping_timeout=15*max_slave_ping_timeouts=5. However, I
can't find these parameters in mesos-master:
# mesos-master --version
mesos 0.22.1
#mesos-master --help
Usage: mesos-master [...]
Supported options:
--acls=VALUE The value could be a JSON formatted
string of ACLs
or a file path containing the JSON
formatted ACLs used
for authorization. Path could be of
the form 'file:///path/to/file'
or '/path/to/file'.
See the ACLs protobuf in mesos.proto
for the expected format.
Example:
{
"register_frameworks": [
{
"principals": { "type": "ANY" },
"roles": {
"values": ["a"] }
}
],
"run_tasks": [
{
"principals": {
"values": ["a", "b"] },
"users": {
"values": ["c"] }
}
],
"shutdown_frameworks": [
{
"principals": {
"values": ["a", "b"] },
"framework_principals": { "values": ["c"] }
}
]
}
--allocation_interval=VALUE Amount of time to wait between
performing
(batch) allocations (e.g., 500ms,
1sec, etc). (default: 1secs)
--[no-]authenticate If authenticate is 'true' only
authenticated frameworks are allowed
to register. If 'false'
unauthenticated frameworks are also
allowed to register. (default: false)
--[no-]authenticate_slaves If 'true' only authenticated slaves
are allowed to register.
If 'false' unauthenticated slaves
are also allowed to register. (default: false)
--authenticators=VALUE Authenticator implementation to use
when authenticating frameworks
and/or slaves. Use the default
'crammd5', or
load an alternate authenticator
module using --modules. (default: crammd5)
--cluster=VALUE Human readable name for the cluster,
displayed in the webui.
--credentials=VALUE Either a path to a text file with a
list of credentials,
each line containing 'principal' and
'secret' separated by whitespace,
or, a path to a JSON-formatted file
containing credentials.
Path could be of the form
'file:///path/to/file' or '/path/to/file'.
JSON file Example:
{
"credentials": [
{
"principal":
"sherman",
"secret":
"kitesurf",
}
]
}
Text file Example:
username secret
--external_log_file=VALUE Specified the externally managed log
file. This file will be
exposed in the webui and HTTP api.
This is useful when using
stderr logging as the log file is
otherwise unknown to Mesos.
--framework_sorter=VALUE Policy to use for allocating
resources
between a given user's frameworks.
Options
are the same as for user_allocator.
(default: drf)
--[no-]help Prints this help message (default:
false)
--hooks=VALUE A comma separated list of hook
modules to be
installed inside master.
--hostname=VALUE The hostname the master should
advertise in ZooKeeper.
If left unset, the hostname is
resolved from the IP address
that the master binds to.
--[no-]initialize_driver_logging Whether to automatically initialize
google logging of scheduler
and/or executor drivers. (default:
true)
--ip=VALUE IP address to listen on
--[no-]log_auto_initialize Whether to automatically initialize
the replicated log used for the
registry. If this is set to false,
the log has to be manually
initialized when used for the very
first time. (default: true)
--log_dir=VALUE Directory path to put log files (no
default, nothing
is written to disk unless specified;
does not affect logging to stderr).
NOTE: 3rd party log messages (e.g.
ZooKeeper) are
only written to stderr!
--logbufsecs=VALUE How many seconds to buffer log
messages for (default: 0)
--logging_level=VALUE Log message at or above this level;
possible values:
'INFO', 'WARNING', 'ERROR'; if quiet
flag is used, this
will affect just the logs from
log_dir (if specified) (default: INFO)
--modules=VALUE List of modules to be loaded and be
available to the internal
subsystems.
Use --modules=filepath to specify
the list of modules via a
file containing a JSON formatted
string. 'filepath' can be
of the form 'file:///path/to/file'
or '/path/to/file'.
Use --modules="{...}" to specify the
list of modules inline.
Example:
{
"libraries": [
{
"file": "/path/to/libfoo.so",
"modules": [
{
"name":
"org_apache_mesos_bar",
"parameters": [
{
"key": "X",
"value": "Y"
}
]
},
{
"name":
"org_apache_mesos_baz"
}
]
},
{
"name": "qux",
"modules": [
{
"name":
"org_apache_mesos_norf"
}
]
}
]
}
--offer_timeout=VALUE Duration of time before an offer is
rescinded from a framework.
This helps fairness when running
frameworks that hold on to offers,
or frameworks that accidentally drop
offers.
--port=VALUE Port to listen on (default: 5050)
--[no-]quiet Disable logging to stderr (default:
false)
--quorum=VALUE The size of the quorum of replicas
when using 'replicated_log' based
registry. It is imperative to set
this value to be a majority of
masters i.e., quorum > (number of
masters)/2.
--rate_limits=VALUE The value could be a JSON formatted
string of rate limits
or a file path containing the JSON
formatted rate limits used
for framework rate limiting.
Path could be of the form
'file:///path/to/file'
or '/path/to/file'.
See the RateLimits protobuf in
mesos.proto for the expected format.
Example:
{
"limits": [
{
"principal": "foo",
"qps": 55.5
},
{
"principal": "bar"
}
],
"aggregate_default_qps": 33.3
}
--recovery_slave_removal_limit=VALUE For failovers, limit on the
percentage of slaves that can be removed
from the registry *and* shutdown
after the re-registration timeout
elapses. If the limit is exceeded,
the master will fail over rather
than remove the slaves.
This can be used to provide safety
guarantees for production
environments. Production
environments may expect that across Master
failovers, at most a certain
percentage of slaves will fail
permanently (e.g. due to rack-level
failures).
Setting this limit would ensure that
a human needs to get
involved should an unexpected
widespread failure of slaves occur
in the cluster.
Values: [0%-100%] (default: 100%)
--registry=VALUE Persistence strategy for the
registry;
available options are
'replicated_log', 'in_memory' (for testing). (default: replicated_log)
--registry_fetch_timeout=VALUE Duration of time to wait in order to
fetch data from the registry
after which the operation is
considered a failure. (default: 1mins)
--registry_store_timeout=VALUE Duration of time to wait in order to
store data in the registry
after which the operation is
considered a failure. (default: 5secs)
--[no-]registry_strict Whether the Master will take actions
based on the persistent
information stored in the Registry.
Setting this to false means
that the Registrar will never reject
the admission, readmission,
or removal of a slave. Consequently,
'false' can be used to
bootstrap the persistent state on a
running cluster.
NOTE: This flag is *experimental*
and should not be used in
production yet. (default: false)
--roles=VALUE A comma separated list of the
allocation
roles that frameworks in this
cluster may
belong to.
--[no-]root_submissions Can root submit frameworks?
(default: true)
--slave_removal_rate_limit=VALUE The maximum rate (e.g., 1/10mins,
2/3hrs, etc) at which slaves will
be removed from the master when they
fail health checks. By default
slaves will be removed as soon as
they fail the health checks.
The value is of the form <Number of
slaves>/<Duration>.
--slave_reregister_timeout=VALUE The timeout within which all slaves
are expected to re-register
when a new master is elected as the
leader. Slaves that do not
re-register within the timeout will
be removed from the registry
and will be shutdown if they attempt
to communicate with master.
NOTE: This value has to be atleast
10mins. (default: 10mins)
--user_sorter=VALUE Policy to use for allocating
resources
between users. May be one of:
dominant_resource_fairness (drf)
(default: drf)
--[no-]version Show version and exit. (default:
false)
--webui_dir=VALUE Directory path of the webui
files/assets (default: /usr/share/mesos/webui)
--weights=VALUE A comma separated list of
role/weight pairs
of the form
'role=weight,role=weight'. Weights
are used to indicate forms of
priority.
--whitelist=VALUE Path to a file with a list of slaves
(one per line) to advertise offers
for.
Path could be of the form
'file:///path/to/file' or '/path/to/file'.
--work_dir=VALUE Directory path to store the
persistent information stored in the
Registry. (example:
/var/lib/mesos/master)
--zk=VALUE ZooKeeper URL (used for leader
election amongst masters)
May be one of:
zk://host1:port1,host2:port2,.../path
zk://username:password@host1:port1,host2:port2,.../path
file:///path/to/file (where file
contains one of the above)
--zk_session_timeout=VALUE ZooKeeper session timeout. (default:
10secs)
Furthermore, setting these parameter either in /etc/mesos-master/ or inline
generates the following error:
# /usr/sbin/mesos-master --zk=zk://10.40.50.228:2181/mesos --port=5050
--log_dir=/var/log/mesos --hostname=10.40.50.228 --ip=10.40.50.228 --quorum=1
--work
_dir=/var/lib/mesos --max_slave_ping_timeouts=2
Failed to load unknown flag 'max_slave_ping_timeouts'
Usage: mesos-master [...]
Supported options:
--acls=VALUE The valu
...
Any thoughts?
Cheers,
[http://www.cisco.com/web/europe/images/email/signature/logo05.jpg]
Nastooh Avessta
ENGINEER.SOFTWARE ENGINEERING
[email protected]
Phone: +1 604 647 1527
Cisco Systems Limited
595 Burrard Street, Suite 2123 Three Bentall Centre, PO Box 49121
VANCOUVER
BRITISH COLUMBIA
V7X 1J1
CA
Cisco.com<http://www.cisco.com/>
[Think before you print.]Think before you print.
This email may contain confidential and privileged material for the sole use of
the intended recipient. Any review, use, distribution or disclosure by others
is strictly prohibited. If you are not the intended recipient (or authorized to
receive for the recipient), please contact the sender by reply email and delete
all copies of this message.
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html
Cisco Systems Canada Co, 181 Bay St., Suite 3400, Toronto, ON, Canada, M5J 2T3.
Phone: 416-306-7000; Fax: 416-306-7099.
Preferences<http://www.cisco.com/offer/subscribe/?sid=000478326> -
Unsubscribe<http://www.cisco.com/offer/unsubscribe/?sid=000478327> -
Privacy<http://www.cisco.com/web/siteassets/legal/privacy.html>