At this point you may want to CC the sdc-discuss mailing list; I don't
know the innards of SDC like I know SmartOS...
I have a vague recollection that in general Joyent (and thus SDC by
default?) assign cpu_shares relative to max_physical_memory.
I can't find anything definitive from a little bit of spelunking in the
code though, so don't quote me on that one...
I did notice that PAPI has no notion of cpu_shares, but VMAPI and
vm-agent do. Somewhere a translation happens but I haven't found it yet
and need to stop looking for now.
Again, maybe someone on the sdc-discuss mailing list would know where to
look...
I'd love to know what the answer is though, so please follow up on this
thread if/when you find out.
-Nahum
On 02/16/2016 07:43 AM, Benjamin Bergia wrote:
Thank you, it does make more sense now.
It also looks like I have been doing it all wrong since the beginning.
Three points are still bothering me and I would like to get a
confirmation if anybody as any idea.
First, in the papi documentation
<https://github.com/joyent/sdc-papi/blob/master/docs/index.md#formulas>,
there is some obscure calculations regarding the cpu cap. Are these
formulas up to date ? If yes what is the "burst ratio" used to
calculate the cpu_burst_ratio.
Then, the same documentation talk about cpu_burst_ratio, fss,
ram_ratio. After creating a new package, these values are not returned
by the papi. This let me think that the documentation might not be up
to date. But I would still be interested by getting a confirmation.
And finally, the triton documentation
<https://docs.joyent.com/private-cloud/packages/configuring#packagedefinitions> specify
that vCPU is only used by KVM and not required for SmartOS zones.
I run exclusively SmartOS zones, that's why I am wondering if in my
case, there is really some calculation running behind the scene to
generate a cpu_share value from a given amount of vCPUs. Or is the
vCPU field wrongly marked as "required" in the AdminUI and SDC's
packages contain 1 vCPU as a simple place holder ?
On Mon, Feb 15, 2016 at 10:51 PM, Jorge Schrauwen
<[email protected] <mailto:[email protected]>> wrote:
Excellent explenation Nahum!
On 2016-02-15 15:39, Nahum Shalman wrote:
On 02/15/2016 01:30 AM, Benjamin Bergia wrote:
Hi,
I recently noticed that all the packages used by SDC zones
are using some strange settings. All of them, even
database ones, are using 1 vCPU with a cap of 400. I can
picture in my head the "meaning" of the CPU cap when the
cap is lower than the sum of the CPU percentages. So
something like 4 vCPU and a cap of 200 make sense to me.
Can somebody explain me what happens with a setting of let
say 1 vCPU and a cap of 200 ?
In the SDC case what was the idea behind having this kind
of settings ? Does it give any
performance/portability/else improvement ?
I am rethinking my packages and where I previously used
settings like you would use on vSphere, I am no wondering
if I am not doing it totally wrong.
There are 3 important concepts. vCPUs, cpu_shares, and
cpu_cap. From
the vmadm man page (my further comments below that):
vcpus:
For KVM VMs this parameter defines the number of
virtual
CPUs the guest
will see. Generally recommended to be a multiple of 2.
type: integer (number of CPUs)
vmtype: KVM
listable: yes
create: KVM only
update: KVM only (requires VM reboot to take effect)
default: 1
cpu_shares:
Sets a limit on the number of fair share scheduler
(FSS)
CPU shares for
a VM. This value is relative to all other VMs on the
system, so a value
only has meaning in relation to other VMs. If you
have one VM with a
a value 10 and another with a value of 50, the VM
with 50 will get 5x
as much time from the scheduler as the one with 10
when there is
contention.
type: integer (number of shares)
vmtype: OS,KVM
listable: yes
create: yes
update: yes (live update)
default: 100
cpu_cap:
Sets a limit on the amount of CPU time that can be
used by a VM. The
unit used is the percentage of a single CPU that can be
used by the VM.
Eg. a value of 300 means up to 3 full CPUs.
type: integer (percentage of single CPUs)
vmtype: OS,KVM
listable: yes
create: yes
update: yes (live update)
First, note that "vcpus" from a SmartOS perspective only
applies to
KVM VMs. That setting determines how many processors the VM
can see
and thus can use to schedule its processes. We'll come back to
that.
Zones (both the kind containing the QEMU process for KVM VMs and
regular LX and joyent branded ones) can see all of the physical
processors and the OS can schedule processes on any of them.
If you imagine yourself as a multi-tenant cloud provider you'll
quickly realize that you need two things:
1. Fairness (preventing noisy neighbors) when the system is fully
loaded. This is what cpu_shares does. If you give every zone the
appropriate number of shares they will all get the
proportional amount
of system CPU when the system is fully loaded.
2. Paying for what you get. On a system that is *not* fully
loaded, in
theory a zone could use lots and lots of CPUs. Customers would be
incentivized to create and destroy zones until they found one
that
could use lots of free CPU. This is where CPU caps come in. They
ensure that on a system that is *not* fully loaded the zone
can only
burst up to a reasonable amount relative to what the customer is
paying. This also helps manage expectations. Setting a CPU cap
reasonably close to the amount of CPU that the customer gets
when the
system *is* fully loaded means that people are less likely to
*think*
that they are suffering from noisy neighbors (when the delta
of how
much CPU you get on a fully loaded vs fully unloaded system is
small,
you see more consistent performance.)
I haven't looked at the details of the SDC packages, but I can
confidently say that "vcpu" in the context of a joyent branded
zone is
an approximation of what to expect based on the shares and the
cap (as
opposed to a KVM VM where that's literally the number that the
VM will
see).
So if you have a SDC service zone with "1 vCPU and a cap of
200" then
it's getting shares such that when the system is fully loaded it
should get approximately 1 CPU's worth of CPU time from the
scheduler,
but when the system is not fully loaded it should be able to
get up to
2 CPUs worth of CPU time from the scheduler but no more. The
difference between those two is what the Joyent cloud
advertises as
"bursting"
Coming back for one last moment to KVM VMs, remember that the QEMU
process is running in a zone that can have shares and caps.
Additionally, when the VM does I/O, QEMU threads need to be
scheduled
to do some (overhead) work to make that happen.
So in theory you might need your shares and caps to be
slightly more
than just what the number of vCPUs might otherwise suggest
(e.g. for
something performance critical that *has* to live in a VM you
could
imagine having vCPUs be 8, but wanting cpu_cap to be 900 and
having
shares that give you some extra CPU time when the system is fully
loaded.)
Finally let's see if I can answer your questions:
All of them, even database ones, are using 1 vCPU with a
cap of 400.
They are configured so that when the system is fully loaded they
should still get about 1 CPU's worth of CPU time, but if the
system
isn't fully loaded they can "burst" up to using 4 CPUs worth
but no
more.
I can picture in my head the "meaning" of the CPU cap when
the cap is lower than the sum of the CPU percentages. So
something like 4 vCPU and a cap of 200 make sense to me.
I find that confusing both for KVM VMs or for regular zones.
The cap
would ensure that you never get more than 2 CPUs worth of
compute time
and a KVM VM that thinks it has 4 processors but can never get
more
than 2 processors worth of work done seems like a bad idea.
Can somebody explain me what happens with a setting of let
say 1 vCPU and a cap of 200 ?
For a KVM VM the guest would see 1 processor, but would still have
headroom for the I/O overhead from QEMU. For a regular zone
it's like
before, you get 1 CPU when the system is fully loaded, but can
burst
up to 2 when there are spare cycles (but no more than that).
In the SDC case what was the idea behind having this kind
of settings ? Does it give any
performance/portability/else improvement?
The punchline comes down to "bursting". If you think your
workloads
are bursty then you want to leave some extra space in the caps
so that
the zones can take advantage of otherwise wasted cycles when
they need
them, but you also want to ensure fairness under load.
Hopefully this was helpful.
-Nahum
http://www.listbox.com
*smartos-discuss* | Archives
<https://www.listbox.com/member/archive/184463/=now>
<https://www.listbox.com/member/archive/rss/184463/22280019-340ab187>
| Modify
<https://www.listbox.com/member/?&>
Your Subscription [Powered by Listbox] <http://www.listbox.com>
-------------------------------------------
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription:
https://www.listbox.com/member/?member_id=25769125&id_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com