Hey everyone, This week we are doing performance tests on the environment and we noticed something weird.
Setup: - Cloudstack 4.17.2 + XCP-NG advanced network with SG. - zone with 30 XCP hosts (in 30 clusters) each with 100 GB ram and 100 cores - There is one compute offering with user dispersing planner. The offering has a local storage bound (no shared storage on servers) . Using terraform we tried to deploy 60 instances, 49 GB of ram each and 50 cores. Some of them were not deployed (about 5). Running the same task again and again eventually makes the failed instances be deployed eventually. Wondering why this happens... looking at the logs i found out that the VMs fail because of not enough memory on the XCPs. Error comes from XAPI and not from Cloudstack which makes me conclude that Cloudstack allows the task but for some reason the scheduler/planner does not compute the memory resource properly. I wonder if there is a race condition problem where 2 instances are assigned the same host server and what happens is sa both get created there is memory just for one of them. Tried to simulate the issue by simultaneously creating instances from the GUI on a group of 2 servers but it seems GUI-created-instances even if launched together are executed in order so the scheduler detects when there is no more RAM and the rest of the processes are stopped. Has anyone experienced such a problem? Regards, Jordan