In lieu of today's meeting, this is an email update:

The 1.8 release process is underway, and it includes a few performance
related changes:

- Parallel reads for the v0 API have been extended to all other v0 read
only endpoints (e.g. /state-summary, /roles, etc). Whereas in 1.7.0, only
/state had the parallel read support. Also, requests are de-duplicated by
principal so that we don't perform redundant construction of responses if
we know they will be the same.

- The allocator performance has improved significantly when quota is in
use, benchmarking shows allocation cycle time reduced ~40% for a small size
cluster and up to ~70% for larger clusters.

- A per-framework (and per-role) override of the global
--min_allocatable_resources filter has been introduced. This lets
frameworks specify the minimum size of offers they want to see for their
roles, and improves scheduling efficiency by reducing the number of offers
declined for having insufficient resource quantities.

In the resource management area, we're currently working on the following
near term items:

- Investigating whether we can make some additional performance
improvements to the sorters (e.g. incremental sorting).
- Finishing the quota limits work, which will allow setting of limits
separate from guarantees.
- Adding an UPDATE_FRAMEWORK call to allow multi-role frameworks to change
their roles without re-subscribing.
- Exposing quota consumption via the API and UI (note that we currently
expose "allocation", but reservations are also considered against quota
consumption!)

There's lots more in the medium term, but I'll omit them here unless folks
are curious.

In the performance area, the following seem like the most pressing short
term items to me:

- Bring the v0 parallel read functionality to the v1 read-only calls.
- Bring v1 endpoint performance closer to v0.

Please chime in if there are any questions or comments,
Ben

Reply via email to