Excellent explenation Nahum!
On 2016-02-15 15:39, Nahum Shalman wrote:
On 02/15/2016 01:30 AM, Benjamin Bergia wrote:
Hi,
I recently noticed that all the packages used by SDC zones are using
some strange settings. All of them, even database ones, are using 1
vCPU with a cap of 400. I can picture in my head the "meaning" of the
CPU cap when the cap is lower than the sum of the CPU percentages. So
something like 4 vCPU and a cap of 200 make sense to me.
Can somebody explain me what happens with a setting of let say 1 vCPU
and a cap of 200 ?
In the SDC case what was the idea behind having this kind of settings
? Does it give any performance/portability/else improvement ?
I am rethinking my packages and where I previously used settings like
you would use on vSphere, I am no wondering if I am not doing it
totally wrong.
There are 3 important concepts. vCPUs, cpu_shares, and cpu_cap. From
the vmadm man page (my further comments below that):
vcpus:
For KVM VMs this parameter defines the number of virtual
CPUs the guest
will see. Generally recommended to be a multiple of 2.
type: integer (number of CPUs)
vmtype: KVM
listable: yes
create: KVM only
update: KVM only (requires VM reboot to take effect)
default: 1
cpu_shares:
Sets a limit on the number of fair share scheduler (FSS)
CPU shares for
a VM. This value is relative to all other VMs on the
system, so a value
only has meaning in relation to other VMs. If you have one
VM with a
a value 10 and another with a value of 50, the VM with 50
will get 5x
as much time from the scheduler as the one with 10 when
there is
contention.
type: integer (number of shares)
vmtype: OS,KVM
listable: yes
create: yes
update: yes (live update)
default: 100
cpu_cap:
Sets a limit on the amount of CPU time that can be used by a
VM. The
unit used is the percentage of a single CPU that can be
used by the VM.
Eg. a value of 300 means up to 3 full CPUs.
type: integer (percentage of single CPUs)
vmtype: OS,KVM
listable: yes
create: yes
update: yes (live update)
First, note that "vcpus" from a SmartOS perspective only applies to
KVM VMs. That setting determines how many processors the VM can see
and thus can use to schedule its processes. We'll come back to that.
Zones (both the kind containing the QEMU process for KVM VMs and
regular LX and joyent branded ones) can see all of the physical
processors and the OS can schedule processes on any of them.
If you imagine yourself as a multi-tenant cloud provider you'll
quickly realize that you need two things:
1. Fairness (preventing noisy neighbors) when the system is fully
loaded. This is what cpu_shares does. If you give every zone the
appropriate number of shares they will all get the proportional amount
of system CPU when the system is fully loaded.
2. Paying for what you get. On a system that is *not* fully loaded, in
theory a zone could use lots and lots of CPUs. Customers would be
incentivized to create and destroy zones until they found one that
could use lots of free CPU. This is where CPU caps come in. They
ensure that on a system that is *not* fully loaded the zone can only
burst up to a reasonable amount relative to what the customer is
paying. This also helps manage expectations. Setting a CPU cap
reasonably close to the amount of CPU that the customer gets when the
system *is* fully loaded means that people are less likely to *think*
that they are suffering from noisy neighbors (when the delta of how
much CPU you get on a fully loaded vs fully unloaded system is small,
you see more consistent performance.)
I haven't looked at the details of the SDC packages, but I can
confidently say that "vcpu" in the context of a joyent branded zone is
an approximation of what to expect based on the shares and the cap (as
opposed to a KVM VM where that's literally the number that the VM will
see).
So if you have a SDC service zone with "1 vCPU and a cap of 200" then
it's getting shares such that when the system is fully loaded it
should get approximately 1 CPU's worth of CPU time from the scheduler,
but when the system is not fully loaded it should be able to get up to
2 CPUs worth of CPU time from the scheduler but no more. The
difference between those two is what the Joyent cloud advertises as
"bursting"
Coming back for one last moment to KVM VMs, remember that the QEMU
process is running in a zone that can have shares and caps.
Additionally, when the VM does I/O, QEMU threads need to be scheduled
to do some (overhead) work to make that happen.
So in theory you might need your shares and caps to be slightly more
than just what the number of vCPUs might otherwise suggest (e.g. for
something performance critical that *has* to live in a VM you could
imagine having vCPUs be 8, but wanting cpu_cap to be 900 and having
shares that give you some extra CPU time when the system is fully
loaded.)
Finally let's see if I can answer your questions:
All of them, even database ones, are using 1 vCPU with a cap of 400.
They are configured so that when the system is fully loaded they
should still get about 1 CPU's worth of CPU time, but if the system
isn't fully loaded they can "burst" up to using 4 CPUs worth but no
more.
I can picture in my head the "meaning" of the CPU cap when the cap is
lower than the sum of the CPU percentages. So something like 4 vCPU
and a cap of 200 make sense to me.
I find that confusing both for KVM VMs or for regular zones. The cap
would ensure that you never get more than 2 CPUs worth of compute time
and a KVM VM that thinks it has 4 processors but can never get more
than 2 processors worth of work done seems like a bad idea.
Can somebody explain me what happens with a setting of let say 1 vCPU
and a cap of 200 ?
For a KVM VM the guest would see 1 processor, but would still have
headroom for the I/O overhead from QEMU. For a regular zone it's like
before, you get 1 CPU when the system is fully loaded, but can burst
up to 2 when there are spare cycles (but no more than that).
In the SDC case what was the idea behind having this kind of settings
? Does it give any performance/portability/else improvement?
The punchline comes down to "bursting". If you think your workloads
are bursty then you want to leave some extra space in the caps so that
the zones can take advantage of otherwise wasted cycles when they need
them, but you also want to ensure fairness under load.
Hopefully this was helpful.
-Nahum
-------------------------------------------
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription:
https://www.listbox.com/member/?member_id=25769125&id_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com