Nastoo, the only other option right now is to recompile Mesos with those hardcoded constants changed to your desired value. Painful, but that's why we wanted to turn them into flags. https://github.com/apache/mesos/blob/0.22.1/src/master/constants.cpp#L34
On Fri, Jul 17, 2015 at 4:15 PM, Nastooh Avessta (navesta) < [email protected]> wrote: > Thank you for your prompt reply. Any other method that could decrease > failover time, in the meanwhile? > > Cheers, > > > > [image: http://www.cisco.com/web/europe/images/email/signature/logo05.jpg] > > *Nastooh Avessta* > ENGINEER.SOFTWARE ENGINEERING > [email protected] > Phone: *+1 604 647 1527 <%2B1%20604%20647%201527>* > > *Cisco Systems Limited* > 595 Burrard Street, Suite 2123 Three Bentall Centre, PO Box 49121 > VANCOUVER > BRITISH COLUMBIA > V7X 1J1 > CA > Cisco.com <http://www.cisco.com/> > > > > [image: Think before you print.]Think before you print. > > This email may contain confidential and privileged material for the sole > use of the intended recipient. Any review, use, distribution or disclosure > by others is strictly prohibited. If you are not the intended recipient (or > authorized to receive for the recipient), please contact the sender by > reply email and delete all copies of this message. > > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/index.html > > Cisco Systems Canada Co, 181 Bay St., Suite 3400, Toronto, ON, Canada, M5J > 2T3. Phone: 416-306-7000; Fax: 416-306-7099. *Preferences > <http://www.cisco.com/offer/subscribe/?sid=000478326> - Unsubscribe > <http://www.cisco.com/offer/unsubscribe/?sid=000478327> – Privacy > <http://www.cisco.com/web/siteassets/legal/privacy.html>* > > > > *From:* Vinod Kone [mailto:[email protected]] > *Sent:* Friday, July 17, 2015 4:07 PM > *To:* [email protected] > *Subject:* Re: Mesos Slave Failover time > > > > It's not configurable yet, but will be in the upcoming 0.23.0 release. > > > > On Fri, Jul 17, 2015 at 3:46 PM, Nastooh Avessta (navesta) < > [email protected]> wrote: > > Hi > > Trying to adjust the current failover time to below 10 seconds and don’t > seem to be able to find the right set of parameters. Currently, it takes > around minute and half for master to detect that a slave has gone offline, > which seems to correspond to > slave_ping_timeout=15*max_slave_ping_timeouts=5. However, I can’t find > these parameters in mesos-master: > > > > # mesos-master --version > > mesos 0.22.1 > > #mesos-master --help > > Usage: mesos-master [...] > > > > Supported options: > > --acls=VALUE The value could be a JSON > formatted string of ACLs > > or a file path containing the > JSON formatted ACLs used > > for authorization. Path could > be of the form 'file:///path/to/file' > > or '/path/to/file'. > > > > See the ACLs protobuf in > mesos.proto for the expected format. > > > > Example: > > { > > "register_frameworks": [ > > { > > > "principals": { "type": "ANY" }, > > > "roles": { "values": ["a"] } > > } > > ], > > "run_tasks": [ > > { > > > "principals": { > "values": ["a", "b"] }, > > "users": { > "values": ["c"] } > > } > > ], > > "shutdown_frameworks": [ > > { > > > "principals": { "values": ["a", "b"] }, > > > "framework_principals": { "values": ["c"] } > > } > > ] > > } > > --allocation_interval=VALUE Amount of time to wait between > performing > > (batch) allocations (e.g., > 500ms, 1sec, etc). (default: 1secs) > > --[no-]authenticate If authenticate is 'true' only > authenticated frameworks are allowed > > to register. If 'false' > unauthenticated frameworks are also > > allowed to register. (default: > false) > > --[no-]authenticate_slaves If 'true' only authenticated > slaves are allowed to register. > > If 'false' unauthenticated > slaves are also allowed to register. (default: false) > > --authenticators=VALUE Authenticator implementation to > use when authenticating frameworks > > and/or slaves. Use the default > 'crammd5', or > > load an alternate authenticator > module using --modules. (default: crammd5) > > --cluster=VALUE Human readable name for the > cluster, > > displayed in the webui. > > --credentials=VALUE Either a path to a text file > with a list of credentials, > > each line containing > 'principal' and 'secret' separated by whitespace, > > or, a path to a JSON-formatted > file containing credentials. > > Path could be of the form > 'file:///path/to/file' or '/path/to/file'. > > JSON file Example: > > { > > "credentials": [ > > { > > > "principal": "sherman", > > > "secret": "kitesurf", > > } > > ] > > } > > Text file Example: > > username secret > > > > --external_log_file=VALUE Specified the externally > managed log file. This file will be > > exposed in the webui and HTTP > api. This is useful when using > > stderr logging as the log file > is otherwise unknown to Mesos. > > --framework_sorter=VALUE Policy to use for allocating > resources > > between a given user's > frameworks. Options > > are the same as for > user_allocator. (default: drf) > > --[no-]help Prints this help message > (default: false) > > --hooks=VALUE A comma separated list of hook > modules to be > > installed inside master. > > --hostname=VALUE The hostname the master should > advertise in ZooKeeper. > > If left unset, the hostname is > resolved from the IP address > > that the master binds to. > > --[no-]initialize_driver_logging Whether to automatically > initialize google logging of scheduler > > and/or executor drivers. > (default: true) > > --ip=VALUE IP address to listen on > > --[no-]log_auto_initialize Whether to automatically > initialize the replicated log used for the > > registry. If this is set to > false, the log has to be manually > > initialized when used for the > very first time. (default: true) > > --log_dir=VALUE Directory path to put log files > (no default, nothing > > is written to disk unless > specified; > > does not affect logging to > stderr). > > NOTE: 3rd party log messages > (e.g. ZooKeeper) are > > only written to stderr! > > > > --logbufsecs=VALUE How many seconds to buffer log > messages for (default: 0) > > --logging_level=VALUE Log message at or above this > level; possible values: > > 'INFO', 'WARNING', 'ERROR'; if > quiet flag is used, this > > will affect just the logs from > log_dir (if specified) (default: INFO) > > --modules=VALUE List of modules to be loaded > and be available to the internal > > subsystems. > > > > Use --modules=filepath to > specify the list of modules via a > > file containing a JSON > formatted string. 'filepath' can be > > of the form > 'file:///path/to/file' or '/path/to/file'. > > > > Use --modules="{...}" to > specify the list of modules inline. > > > > Example: > > { > > "libraries": [ > > { > > "file": > "/path/to/libfoo.so", > > "modules": [ > > { > > "name": > "org_apache_mesos_bar", > > "parameters": [ > > { > > "key": "X", > > "value": "Y" > > } > > ] > > }, > > { > > "name": > "org_apache_mesos_baz" > > } > > ] > > }, > > { > > "name": "qux", > > "modules": [ > > { > > "name": > "org_apache_mesos_norf" > > } > > ] > > } > > ] > > } > > --offer_timeout=VALUE Duration of time before an > offer is rescinded from a framework. > > This helps fairness when > running frameworks that hold on to offers, > > or frameworks that accidentally > drop offers. > > --port=VALUE Port to listen on (default: > 5050) > > --[no-]quiet Disable logging to stderr > (default: false) > > --quorum=VALUE The size of the quorum of > replicas when using 'replicated_log' based > > registry. It is imperative to > set this value to be a majority of > > masters i.e., quorum > (number > of masters)/2. > > --rate_limits=VALUE The value could be a JSON > formatted string of rate limits > > or a file path containing the > JSON formatted rate limits used > > for framework rate limiting. > > Path could be of the form > 'file:///path/to/file' > > or '/path/to/file'. > > > > See the RateLimits protobuf in > mesos.proto for the expected format. > > > > Example: > > { > > "limits": [ > > { > > "principal": "foo", > > "qps": 55.5 > > }, > > { > > "principal": "bar" > > } > > ], > > "aggregate_default_qps": 33.3 > > } > > --recovery_slave_removal_limit=VALUE For failovers, limit on the > percentage of slaves that can be removed > > from the registry *and* > shutdown after the re-registration timeout > > elapses. If the limit is > exceeded, the master will fail over rather > > than remove the slaves. > > This can be used to provide > safety guarantees for production > > environments. Production > environments may expect that across Master > > failovers, at most a certain > percentage of slaves will fail > > permanently (e.g. due to > rack-level failures). > > Setting this limit would ensure > that a human needs to get > > involved should an unexpected > widespread failure of slaves occur > > in the cluster. > > Values: [0%-100%] (default: > 100%) > > --registry=VALUE Persistence strategy for the > registry; > > available options are > 'replicated_log', 'in_memory' (for testing). (default: replicated_log) > > --registry_fetch_timeout=VALUE Duration of time to wait in > order to fetch data from the registry > > after which the operation is > considered a failure. (default: 1mins) > > --registry_store_timeout=VALUE Duration of time to wait in > order to store data in the registry > > after which the operation is > considered a failure. (default: 5secs) > > --[no-]registry_strict Whether the Master will take > actions based on the persistent > > information stored in the > Registry. Setting this to false means > > that the Registrar will never > reject the admission, readmission, > > or removal of a slave. > Consequently, 'false' can be used to > > bootstrap the persistent state > on a running cluster. > > NOTE: This flag is > *experimental* and should not be used in > > production yet. (default: false) > > --roles=VALUE A comma separated list of the > allocation > > roles that frameworks in this > cluster may > > belong to. > > --[no-]root_submissions Can root submit frameworks? > (default: true) > > --slave_removal_rate_limit=VALUE The maximum rate (e.g., > 1/10mins, 2/3hrs, etc) at which slaves will > > be removed from the master when > they fail health checks. By default > > slaves will be removed as soon > as they fail the health checks. > > The value is of the form > <Number of slaves>/<Duration>. > > --slave_reregister_timeout=VALUE The timeout within which all > slaves are expected to re-register > > when a new master is elected as > the leader. Slaves that do not > > re-register within the timeout > will be removed from the registry > > and will be shutdown if they > attempt to communicate with master. > > NOTE: This value has to be > atleast 10mins. (default: 10mins) > > --user_sorter=VALUE Policy to use for allocating > resources > > between users. May be one of: > > dominant_resource_fairness > (drf) (default: drf) > > --[no-]version Show version and exit. > (default: false) > > --webui_dir=VALUE Directory path of the webui > files/assets (default: /usr/share/mesos/webui) > > --weights=VALUE A comma separated list of > role/weight pairs > > of the form > 'role=weight,role=weight'. Weights > > are used to indicate forms of > priority. > > --whitelist=VALUE Path to a file with a list of > slaves > > (one per line) to advertise > offers for. > > Path could be of the form > 'file:///path/to/file' or '/path/to/file'. > > --work_dir=VALUE Directory path to store the > persistent information stored in the > > Registry. (example: > /var/lib/mesos/master) > > --zk=VALUE ZooKeeper URL (used for leader > election amongst masters) > > May be one of: > > > zk://host1:port1,host2:port2,.../path > > zk://username:password@host1 > :port1,host2:port2,.../path > > file:///path/to/file (where > file contains one of the above) > > --zk_session_timeout=VALUE ZooKeeper session timeout. > (default: 10secs) > > > > Furthermore, setting these parameter either in /etc/mesos-master/ or > inline generates the following error: > > # /usr/sbin/mesos-master --zk=zk://10.40.50.228:2181/mesos --port=5050 > --log_dir=/var/log/mesos --hostname=10.40.50.228 --ip=10.40.50.228 > --quorum=1 --work > > _dir=/var/lib/mesos --max_slave_ping_timeouts=2 > > Failed to load unknown flag 'max_slave_ping_timeouts' > > Usage: mesos-master [...] > > > > Supported options: > > --acls=VALUE The valu > > … > > > > Any thoughts? > > Cheers, > > [image: http://www.cisco.com/web/europe/images/email/signature/logo05.jpg] > > *Nastooh Avessta* > ENGINEER.SOFTWARE ENGINEERING > [email protected] > Phone: *+1 604 647 1527 <%2B1%20604%20647%201527>* > > *Cisco Systems Limited* > 595 Burrard Street, Suite 2123 Three Bentall Centre, PO Box 49121 > VANCOUVER > BRITISH COLUMBIA > V7X 1J1 > CA > Cisco.com <http://www.cisco.com/> > > > > [image: Think before you print.]Think before you print. > > This email may contain confidential and privileged material for the sole > use of the intended recipient. Any review, use, distribution or disclosure > by others is strictly prohibited. If you are not the intended recipient (or > authorized to receive for the recipient), please contact the sender by > reply email and delete all copies of this message. > > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/index.html > > Cisco Systems Canada Co, 181 Bay St., Suite 3400, Toronto, ON, Canada, M5J > 2T3. Phone: 416-306-7000; Fax: 416-306-7099. *Preferences > <http://www.cisco.com/offer/subscribe/?sid=000478326> - Unsubscribe > <http://www.cisco.com/offer/unsubscribe/?sid=000478327> – Privacy > <http://www.cisco.com/web/siteassets/legal/privacy.html>* > > > > >

