CS scheduler not working properly?

jordan j Thu, 30 Mar 2023 05:24:52 -0700

Hey everyone,

This week we are doing performance tests on the environment and we noticed
something weird.


Setup:
- Cloudstack 4.17.2 + XCP-NG advanced network with SG.
- zone with 30 XCP hosts (in 30 clusters) each with 100 GB ram and 100 cores
- There is one compute offering with user dispersing planner. The offering
has a local storage bound (no shared storage on servers) .

Using terraform we tried to deploy 60 instances, 49 GB of ram each and 50
cores.
Some of them were not deployed (about 5).
Running the same task again and again eventually makes the failed instances
be deployed eventually.

Wondering why this happens... looking at the logs i found out that the VMs
fail because of not enough memory on the XCPs. Error comes from XAPI and
not from Cloudstack which makes me conclude that Cloudstack allows the task
but for some reason the scheduler/planner does not compute the memory
resource properly. I wonder if there is a race condition problem where 2
instances are assigned the same host server and what happens is sa both get
created there is memory just for one of them.

Tried to simulate the issue by simultaneously creating instances from the
GUI on a group of 2 servers but it seems GUI-created-instances even if
launched together are executed in order so the scheduler detects when there
is no more RAM and the rest of the processes are stopped.

Has anyone experienced such a problem?

Regards,
Jordan

CS scheduler not working properly?

Reply via email to