Hi SviridoffA We have the same setup as you, and we do have a similar issue where the very first VM created from a template takes a long time to startup. It used to take 7 minutes, but yours sound extreme. What ur specs?
We reported this to linbit and they did quite a number of improvements. Now we are down to 3-5 minutes. Are u using the latest linstor version? But in our case, it only takes us max 5 minutes for the VM to startup for the first time (must be a new template never used to create a VM before). We found this issue mainly when creating Autoscale VM. We create a template from a VM with running dockers, and use it for the autoscale. We notice the first autoscale Vm takes a 3-5 minutes for us, subsequent VMs take 30 seconds. I think its got to do with template having to copy itself from secondary storage to primary. If u have a slow secondary storage it might contribute to this issue. Or u can try using the latest packages from linbit. If not, trying reaching out to them? I didnt notice the cpu utilisation as we didnt measure it. Regards, Bryan On 22 Aug 2024 at 6:41 PM +0800, SviridoffA (via GitHub) <[email protected]>, wrote: > > GitHub user SviridoffA created a discussion: Abnormal Load average on first > vm start from template > > Hi everyone! Maybe someone has faced with similar problem and can help me. > > First of all i'm using: > Cloudstack 4.19.1.1 > Hypervisor KVM on ubuntu 24.04 > Primary storage linstor > > When I creating template or registering it from URL, and trying to start VM > for firs time, after cloudstack ui shows this template as downloaded on > primary storage, load average starts to grow, and instance creating takes too > long. It's can take about 40 mins, or even can totaly overload host and ony > way to bring in to a health state its reboot. This load average is present on > all disk nods where linstor creates related devices. After first instamce > creation, all folowing tryes compliting without any problems, and vm's starts > immediately. > > pc utility shows growing number of vgs and lvs processes. Like this: > > 393650 ? D 0:00 vgs -o vg_name,vg_extent_size,vg_size,vg_free --separator ; > --units k --noheadings --nosuffix > 394149 ? D 0:00 lvs -o > lv_name,lv_path,lv_size,vg_name,pool_lv,data_percent,lv_attr,metadata_percent,chunk_size > --separator ; --noheadings --units k --nosuffix > > Nothing suspisies in Cloudstack agent logs, but in Linstor sattelite I have > found many reports like this: > > ERROR REPORT 66C5C167-EF4D9-000006 > > ============================================================ > > Application: LINBIT® LINSTOR > Module: Satellite > Version: 1.29.0 > Build ID: b2be7208a777f0743d4c7187062678cd5416fccf > Build time: 2024-07-31T11:02:51+00:00 > Error time: 2024-08-22 09:59:58 > Node: node2 > Thread: MainWorkerPool-4 > > ============================================================ > > Reported error: > =============== > > Category: LinStorException > Class name: StorageException > Class canonical name: com.linbit.linstor.storage.StorageException > Generated at: Method 'genericExecutor', Source file 'Commands.java', Line #120 > > Error message: Failed to query 'lvs' info > > ErrorContext: > Cause: External command timed out > Details: External command: lvs -o > lv_name,lv_path,lv_size,vg_name,pool_lv,data_percent,lv_attr,metadata_percent,chunk_size > --separator ';' --noheadings --units k --nosuffix > > > Call backtrace: > > Method Native Class:Line number > genericExecutor N com.linbit.linstor.storage.utils.Commands:120 > genericExecutor N com.linbit.linstor.storage.utils.Commands:63 > lvs N com.linbit.linstor.layer.storage.lvm.utils.LvmCommands:99 > lambda$getLvsInfoImpl$1 N > com.linbit.linstor.layer.storage.lvm.utils.LvmUtils:320 > execWithRetry N com.linbit.linstor.layer.storage.lvm.utils.LvmUtils:669 > getLvsInfoImpl N com.linbit.linstor.layer.storage.lvm.utils.LvmUtils:317 > getLvsInfo N com.linbit.linstor.layer.storage.lvm.utils.LvmUtils:282 > fetchAllocatedSizes N com.linbit.linstor.layer.storage.lvm.LvmThinProvider:710 > getVlmAllocatedCapacities N > com.linbit.linstor.core.apicallhandler.StltApiCallHandlerUtils:185 > executeInScope N com.linbit.linstor.api.protobuf.ReqVlmAllocated:84 > lambda$executeReactive$0 N com.linbit.linstor.api.protobuf.ReqVlmAllocated:69 > doInScope N com.linbit.linstor.core.apicallhandler.ScopeRunner:175 > lambda$fluxInScope$0 N com.linbit.linstor.core.apicallhandler.ScopeRunner:100 > call N reactor.core.publisher.MonoCallable:72 > trySubscribeScalarMap N reactor.core.publisher.FluxFlatMap:127 > subscribeOrReturn N reactor.core.publisher.MonoFlatMapMany:49 > subscribe N reactor.core.publisher.Flux:8759 > onNext N reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:195 > request N reactor.core.publisher.Operators$ScalarSubscription:2545 > onSubscribe N reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:141 > subscribe N reactor.core.publisher.MonoJust:55 > subscribe N reactor.core.publisher.MonoDeferContextual:55 > subscribe N reactor.core.publisher.Flux:8773 > trySubscribeScalarMap N reactor.core.publisher.FluxFlatMap:200 > subscribeOrReturn N reactor.core.publisher.MonoFlatMapMany:49 > subscribe N reactor.core.publisher.Flux:8759 > onNext N reactor.core.publisher.FluxFlatMap$FlatMapMain:427 > slowPath N reactor.core.publisher.FluxArray$ArraySubscription:127 > request N reactor.core.publisher.FluxArray$ArraySubscription:100 > onSubscribe N reactor.core.publisher.FluxFlatMap$FlatMapMain:371 > subscribe N reactor.core.publisher.FluxMerge:70 > subscribe N reactor.core.publisher.Flux:8773 > onComplete N reactor.core.publisher.FluxConcatArray$ConcatArraySubscriber:258 > subscribe N reactor.core.publisher.FluxConcatArray:78 > subscribe N reactor.core.publisher.InternalFluxOperator:62 > subscribe N reactor.core.publisher.FluxDefer:54 > subscribe N reactor.core.publisher.Flux:8773 > onNext N reactor.core.publisher.FluxFlatMap$FlatMapMain:427 > drainAsync N > reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:453 > drain N > reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:724 > onNext N > reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:256 > drainFused N reactor.core.publisher.SinkManyUnicast:319 > drain N reactor.core.publisher.SinkManyUnicast:362 > tryEmitNext N reactor.core.publisher.SinkManyUnicast:237 > tryEmitNext N reactor.core.publisher.SinkManySerialized:100 > processInOrder N com.linbit.linstor.netcom.TcpConnectorPeer:420 > doProcessMessage N com.linbit.linstor.proto.CommonMessageProcessor:228 > lambda$processMessage$2 N com.linbit.linstor.proto.CommonMessageProcessor:165 > onNext N reactor.core.publisher.FluxPeek$PeekSubscriber:185 > runAsync N reactor.core.publisher.FluxPublishOn$PublishOnSubscriber:440 > run N reactor.core.publisher.FluxPublishOn$PublishOnSubscriber:527 > call N reactor.core.scheduler.WorkerTask:84 > call N reactor.core.scheduler.WorkerTask:37 > run N java.util.concurrent.FutureTask:317 > run N java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask:304 > runWorker N java.util.concurrent.ThreadPoolExecutor:1144 > run N java.util.concurrent.ThreadPoolExecutor$Worker:642 > run N java.lang.Thread:1583 > > Caused by: > ========== > > Category: Exception > Class name: ChildProcessTimeoutException > Class canonical name: com.linbit.ChildProcessTimeoutException > Generated at: Method 'waitFor', Source file 'ChildProcessHandler.java', Line > #184 > > > Call backtrace: > > Method Native Class:Line number > waitFor N com.linbit.extproc.ChildProcessHandler:184 > waitFor N com.linbit.extproc.ChildProcessHandler:124 > syncProcess N com.linbit.extproc.ExtCmd:160 > exec N com.linbit.extproc.ExtCmd:92 > genericExecutor N com.linbit.linstor.storage.utils.Commands:79 > genericExecutor N com.linbit.linstor.storage.utils.Commands:63 > lvs N com.linbit.linstor.layer.storage.lvm.utils.LvmCommands:99 > lambda$getLvsInfoImpl$1 N > com.linbit.linstor.layer.storage.lvm.utils.LvmUtils:320 > execWithRetry N com.linbit.linstor.layer.storage.lvm.utils.LvmUtils:669 > getLvsInfoImpl N com.linbit.linstor.layer.storage.lvm.utils.LvmUtils:317 > getLvsInfo N com.linbit.linstor.layer.storage.lvm.utils.LvmUtils:282 > fetchAllocatedSizes N com.linbit.linstor.layer.storage.lvm.LvmThinProvider:710 > getVlmAllocatedCapacities N > com.linbit.linstor.core.apicallhandler.StltApiCallHandlerUtils:185 > executeInScope N com.linbit.linstor.api.protobuf.ReqVlmAllocated:84 > lambda$executeReactive$0 N com.linbit.linstor.api.protobuf.ReqVlmAllocated:69 > doInScope N com.linbit.linstor.core.apicallhandler.ScopeRunner:175 > lambda$fluxInScope$0 N com.linbit.linstor.core.apicallhandler.ScopeRunner:100 > call N reactor.core.publisher.MonoCallable:72 > trySubscribeScalarMap N reactor.core.publisher.FluxFlatMap:127 > subscribeOrReturn N reactor.core.publisher.MonoFlatMapMany:49 > subscribe N reactor.core.publisher.Flux:8759 > onNext N reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:195 > request N reactor.core.publisher.Operators$ScalarSubscription:2545 > onSubscribe N reactor.core.publisher.MonoFlatMapMany$FlatMapManyMain:141 > subscribe N reactor.core.publisher.MonoJust:55 > subscribe N reactor.core.publisher.MonoDeferContextual:55 > subscribe N reactor.core.publisher.Flux:8773 > trySubscribeScalarMap N reactor.core.publisher.FluxFlatMap:200 > subscribeOrReturn N reactor.core.publisher.MonoFlatMapMany:49 > subscribe N reactor.core.publisher.Flux:8759 > onNext N reactor.core.publisher.FluxFlatMap$FlatMapMain:427 > slowPath N reactor.core.publisher.FluxArray$ArraySubscription:127 > request N reactor.core.publisher.FluxArray$ArraySubscription:100 > onSubscribe N reactor.core.publisher.FluxFlatMap$FlatMapMain:371 > subscribe N reactor.core.publisher.FluxMerge:70 > subscribe N reactor.core.publisher.Flux:8773 > onComplete N reactor.core.publisher.FluxConcatArray$ConcatArraySubscriber:258 > subscribe N reactor.core.publisher.FluxConcatArray:78 > subscribe N reactor.core.publisher.InternalFluxOperator:62 > subscribe N reactor.core.publisher.FluxDefer:54 > subscribe N reactor.core.publisher.Flux:8773 > onNext N reactor.core.publisher.FluxFlatMap$FlatMapMain:427 > drainAsync N > reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:453 > drain N > reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:724 > onNext N > reactor.core.publisher.FluxFlattenIterable$FlattenIterableSubscriber:256 > drainFused N reactor.core.publisher.SinkManyUnicast:319 > drain N reactor.core.publisher.SinkManyUnicast:362 > tryEmitNext N reactor.core.publisher.SinkManyUnicast:237 > tryEmitNext N reactor.core.publisher.SinkManySerialized:100 > processInOrder N com.linbit.linstor.netcom.TcpConnectorPeer:420 > doProcessMessage N com.linbit.linstor.proto.CommonMessageProcessor:228 > lambda$processMessage$2 N com.linbit.linstor.proto.CommonMessageProcessor:165 > onNext N reactor.core.publisher.FluxPeek$PeekSubscriber:185 > runAsync N reactor.core.publisher.FluxPublishOn$PublishOnSubscriber:440 > run N reactor.core.publisher.FluxPublishOn$PublishOnSubscriber:527 > call N reactor.core.scheduler.WorkerTask:84 > call N reactor.core.scheduler.WorkerTask:37 > run N java.util.concurrent.FutureTask:317 > run N java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask:304 > runWorker N java.util.concurrent.ThreadPoolExecutor:1144 > run N java.util.concurrent.ThreadPoolExecutor$Worker:642 > run N java.lang.Thread:1583 > > > END OF ERROR REPORT. > > I have tryed to disable apparmor and firewall, but it's not helped > I would be appreciate for any help with this. > > > > GitHub link: https://github.com/apache/cloudstack/discussions/9570 > > ---- > This is an automatically sent email for [email protected]. > To unsubscribe, please send an email to: > [email protected] >
