Re: Aurora Operations

Erb, Stephan Wed, 04 Jan 2017 07:20:00 -0800

Thanks a lot for the excellent points!

One question regarding `-offer_reservation_duration`: Do you decrease this 
because in a multi-framework setup there is a high chance Aurora won’t get that 
resource back anyway?

From: meghdoot bhattacharya <[email protected]>
Reply-To: "[email protected]" <[email protected]>, meghdoot 
bhattacharya <[email protected]>
Date: Friday, 30 December 2016 at 23:50
To: "[email protected]" <[email protected]>
Subject: Re: Aurora Operations

For our multi framework setup with > 1000 nodes, we do these things

1.       Tune Aurora to play nice with multi frameworks and not cache resources 
more than 2 mins

-min_offer_hold_time (default (5, mins))
Change this to 1 min

With this resources can be cached upto min_offer_hold_time + random value 
between 0-60 secs as controlled by -offer_hold_jitter_window (default (1, mins))

-offer_reservation_duration
Change this to 1 min

-offer_filter_duration (default (5, secs))
Change this to a larger value. Basically an offer rejected by framework will 
not be reoffered until this duration. 30 seconds may be a reasonable value.

http://aurora.apache.org/documentation/latest/reference/scheduler-configuration/
Use this page for reference about all other configs. You can tune flapping 
tasks etc

2.       Mesos master configs to counter unfair frameworks

-offer_timeout=VALUE

Duration of time before an offer is rescinded from a framework. This helps 
fairness when running frameworks that hold on to offers, or frameworks that 
accidentally drop offers. If not set, offers do not timeout.

Again set this as appropriate if you need to.

Also general note
I will highly recommend each framework have a specific role but all the 
slave/agent resources can be unreserved role * (unless you want specific 
reservations). DRF will ensure a level of fairness when a new resource comes up 
which framework should be offered first based on current DRF values for the 
role. Remember DRF does not by itself guarantee a balanced fair resourcing 
overall because frameworks as they reject resources can be gobbled up by other 
frameworks. Currently DRF operates at 2 levels. First among roles and then 
multiple frameworks that share a role.

Thx

________________________________
From: Mauricio Garavaglia <[email protected]>
To: [email protected]
Sent: Tuesday, December 27, 2016 9:28 AM
Subject: Re: Aurora Operations

Hello!
From the top of my head: we increased history_prune_threshold value from 2 days 
to a week if some debugging is required. Increased 
transient_task_state_timeout, because some tasks needs more time to be in the 
"killing" state, and we count on that. Also depending on your needs the 
max_schedule_penalty default of 1 min could be too much. We reduced that to 20 
seconds.

On Tue, Dec 27, 2016 at 2:06 PM, Erb, Stephan 
<[email protected]<mailto:[email protected]>> wrote:

Does anyone else has input here? Any Aurora configuration option with a 
non-default value in your setup is worth sharing here.

Questions are welcome as well, so that we can hopefully try to answer those in 
the upcoming operations guide.

Best regards,
Stephan

From: Zameer Manji [email protected]<mailto:[email protected]>
Reply-To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Date: Saturday, 17 December 2016 at 00:39
To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Re: Aurora Operations

For larger clusters it might be necessary to increase 
`-db_max_active_connection_ count` to increase API throughput. You will also 
need to ensure Xmx=Xms when starting the JVM. Setting those flags to be the 
same prevents heap resizing.

We should consider changing the defaults to match your research. Increasing the 
session timeout and lower the thermos/executor resources to 0 should be done.

To make Aurora play nicely with other frameworks, I suggest lowering 
`min_offer_hold_time` to 1min or 30s. We can also lower the default here. A 
lower value means Aurora will have more latency when scheduling tasks (since it 
will have to wait for offers) but it will enable other frameworks to have 
resources. I also suggest using the Mesos operator tooling to dynamically 
reserve some minimum amount of resources to Aurora and other frameworks to 
ensure that they are not starved entirely.

I'm surprised setting offer_filter_duration to 0s improves performance but that 
should be something to note.

On Fri, Dec 16, 2016 at 8:33 AM, Erb, Stephan 
<[email protected]<mailto:[email protected]>> wrote:
Hi Aurorans,

I would like to start a discussion about Aurora operations and gather feedback 
on how Aurora is configured and operated at your site. The main goal is to come 
up with a set of guidelines that help new users get up to speed, and to find 
ways how we can improve our default configuration and documentation.

Of course, this is kind of a difficult endeavor. Still, I believe this can 
significantly help Aurora's positioning as one of the most scalable and 
battle-tested Mesos frameworks.

I will start with a small collection to get the discussion going:

# General Advice

* Aurora requires a ZK ensemble for leader election. This ensemble should not 
also be used for service discover. Otherwise a service discovery error/outage 
can take down the entire cluster. The same applies for the Mesos ZK.
* For fast and consistent performance, transaction logs should be on distinct 
disks not used by anything else (e.g not even logging). SSDs help as well. This 
applies to the ZK transaction log and the native/replicated log used by Aurora.
* If you have made an operator error in your cluster, stopping the Mesos 
masters is a safe step to limit the error propagation (e.g. agents do not come 
up anymore after a configuration change).

(Disclaimer: these are from this excellent talk https://www.youtube.com/watch? 
v=nNrh-gdu9m4<https://www.youtube.com/watch?v=nNrh-gdu9m4>)

# Aurora Configuration
Just a small collection from what we are using internally or what I have seen 
elsewhere

* Thermos resources: The current defaults of CPU and RAM usage are invasive. 
`-thermos_executor_cpu=0` and `-thermos_executor_ram=128MB` seem to work just 
as well in particular since the Mesos egg got slimmer in recent releases.
* Session timeouts: The default timeout is pretty small (4sec) and can lead to 
unexpected failovers during long GC pauses. A default of 10-15sec seems to be 
more appropriate.
* JVM settings: Either `-XX:+UseG1GC -XX:+UseStringDeduplication` or 
`-XX:+UseConcMarkSweepGC` seem to be sane defaults. The option 
`-Djava.net.preferIPv4Stack= true` seems to make sense in most cases as well.

# Open Questions:

* What is the best way to configure and use Aurora in a multi-framework setup?
* Are there options we recommend for smaller clusters (<100 nodes or <5000 
tasks)? For example, `-offer_filter_duration=0secs` improves scheduling 
performance on small clusters.
* Are there options we recommend for larger clusters (>1000)?

I am looking forward to your contributions.

Thanks,
Stephan

--
Zameer Manji

Re: Aurora Operations

Reply via email to