Hi Makrand,

Yes this rings a bell – first of I would advise you to thread very carefully – 
this is most likely an issue with your underlying XAPI db on your poolmaster, 
so there is a risk of further problems. 

We have seen this in the past with a couple of clients – and I think we found 
XS servers still in MM in XenCentre (unbeknownst to CloudStack) – but we have 
then had some problems getting the hosts out of MM again from the Xen side. We 
have also seen situations where taking one host out of MM in XenCentre puts 
another host into MM, which is odd. I know in on one occasion we ended up 
removing / rebuilding / reading the stubborn MM host. Unfortunately we never 
found the actual root cause.

Hopefully your issue is something simpler – have you checked that all SRs are 
plugged on all hosts?

Regards,
Dag Sonstebo
Cloud Architect
ShapeBlue

On 22/02/2018, 10:32, "Makrand" <[email protected]> wrote:

    Hi All,
    
    Couple of days back we had some iSCSI issue and all the LUNs were
    disconnected from Xenserver hosts. After the issue was  fixed and when all
    LUNs were back online, for some BIOS checks, we put one of compute node in
    Maintenance Mode from cloudstack. It took more than usual time for it to go
    into MM (was stuck in PrepateforMaintenance), but it went anyhow. Now
    whenever we are trying to cancel its MM, it just fails:- Command failed due
    to Internal Server Error.
    
    The logs are indicating below
    
    2018-02-16 09:44:24,291 INFO  [o.a.c.f.j.i.AsyncJobMonitor]
    (API-Job-Executor-27:ctx-1e865550 job-72477) Add job-72477 into job
    monitoring
    2018-02-16 09:44:24,292 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
    (API-Job-Executor-27:ctx-1e865550 job-72477) Executing AsyncJobVO
    {id:72477, userId: 2, accountId: 2,
     instanceType: Host, instanceId: 26, cmd:
    org.apache.cloudstack.api.command.admin.host.CancelMaintenanceCmd, cmdInfo:
    {"id":"4bca233d-0e61-495c-a522-43800fe311fc","r
    
esponse":"json","sessionkey":"ZxtGyco2RuYHil/VnglSOgguw5c\u003d","ctxDetails":"{\"com.cloud.host.Host\":\"4bca233d-0e61-495c-a522-43800fe311fc\"}","cmdEventType":"MA
    
INT.CANCEL","ctxUserId":"2","httpmethod":"GET","_":"1518774059073","uuid":"4bca233d-0e61-495c-a522-43800fe311fc","ctxAccountId":"2","ctxStartEventId":"51924"},
    cmdVe
    rsion: 0, status: IN_PROGRESS, processStatus: 0, resultCode: 0, result:
    null, initMsid: 16143068278473, completeMsid: null, lastUpdated: null,
    lastPolled: null, crea
    ted: null}
    2018-02-16 09:44:24,301 ERROR [c.c.a.ApiAsyncJobDispatcher]
    (API-Job-Executor-27:ctx-1e865550 job-72477) Unexpected exception while
    executing org.apache.cloudstack.a
    pi.command.admin.host.CancelMaintenanceCmd
    java.lang.NullPointerException
            at
    
com.cloud.resource.ResourceManagerImpl.doCancelMaintenance(ResourceManagerImpl.java:2083)
            at
    
com.cloud.resource.ResourceManagerImpl.cancelMaintenance(ResourceManagerImpl.java:2140)
            at
    
com.cloud.resource.ResourceManagerImpl.cancelMaintenance(ResourceManagerImpl.java:1127)
            at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
            at
    
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
            at
    
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
            at java.lang.reflect.Method.invoke(Method.java:606)
            at
    
org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:317)
            at
    
org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183)
            at
    
org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150)
            at
    
org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:91)
            at
    
org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
            at
    
org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204)
            at com.sun.proxy.$Proxy147.cancelMaintenance(Unknown Source)
    at
    
org.apache.cloudstack.api.command.admin.host.CancelMaintenanceCmd.execute(CancelMaintenanceCmd.java:102)
            at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:141)
            at
    com.cloud.api.ApiAsyncJobDispatcher.runJob(ApiAsyncJobDispatcher.java:108)
            at
    
org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.runInContext(AsyncJobManagerImpl.java:503)
            at
    
org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
            at
    
org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
            at
    
org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
            at
    
org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
            at
    
org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
            at
    
org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.run(AsyncJobManagerImpl.java:460)
            at
    java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
            at java.util.concurrent.FutureTask.run(FutureTask.java:262)
            at
    
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
            at
    
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
            at java.lang.Thread.run(Thread.java:745)
    2018-02-16 09:44:24,305 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
    (API-Job-Executor-27:ctx-1e865550 job-72477) Complete async job-72477,
    jobStatus: FAILED, resultCode: 530, result:
    
org.apache.cloudstack.api.response.ExceptionResponse/null/{"uuidList":[],"errorcode":530}
    2018-02-16 09:44:24,320 DEBUG [c.c.v.VirtualMachinePowerStateSyncImpl]
    (DirectAgent-303:ctx-d1ac93ce) Done with process of VM state report. host: 1
    2018-02-16 09:44:24,322 DEBUG [c.c.d.DeploymentPlanningManagerImpl]
    (CapacityChecker:ctx-038e67bd) MessageBus message: host reserved capacity
    released for VM: 1, checking if host reservation can be released for host:1
    2018-02-16 09:44:24,329 DEBUG [c.c.d.DeploymentPlanningManagerImpl]
    (CapacityChecker:ctx-038e67bd) Cannot release reservation, Found 7 VMs
    Running on host 1
    2018-02-16 09:44:24,340 DEBUG [c.c.d.DeploymentPlanningManagerImpl]
    (CapacityChecker:ctx-038e67bd) MessageBus message: host reserved capacity
    released for VM: 1, checking if host reservation can be released for host:1
    2018-02-16 09:44:24,347 DEBUG [c.c.d.DeploymentPlanningManagerImpl]
    (CapacityChecker:ctx-038e67bd) Cannot release reservation, Found 7 VMs
    Running on host 1
    2018-02-16 09:44:24,352 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
    (API-Job-Executor-27:ctx-1e865550 job-72477) Done executing
    org.apache.cloudstack.api.command.admin.host.CancelMaintenanceCmd for
    job-72477
    2018-02-16 09:44:24,352 DEBUG [c.c.d.DeploymentPlanningManagerImpl]
    (CapacityChecker:ctx-038e67bd) MessageBus message: host reserved capacity
    released for VM: 1, checking if host reservation can be released for host:1
    2018-02-16 09:44:24,356 DEBUG [c.c.d.DeploymentPlanningManagerImpl]
    (CapacityChecker:ctx-038e67bd) Cannot release reservation, Found 7 VMs
    Running on host 1
    2018-02-16 09:44:24,357 DEBUG [c.c.c.CapacityManagerImpl]
    (CapacityChecker:ctx-038e67bd) No need to calibrate cpu capacity, host:1
    usedCpu: 13500 reservedCpu: 0
    2018-02-16 09:44:24,357 DEBUG [c.c.c.CapacityManagerImpl]
    (CapacityChecker:ctx-038e67bd) No need to calibrate memory capacity, host:1
    usedMem: 21881683968 reservedMem: 0
    2018-02-16 09:44:24,363 INFO  [o.a.c.f.j.i.AsyncJobMonitor]
    (API-Job-Executor-27:ctx-1e865550 job-72477) Remove job-72477 from job
    monitoring
    
    So far tried:-
    1) Rebooted Compute multiple times, but no help
    2) Edited DB and marked resource state as Enabled. Then forcefully
    reconnected host. All looking ok. Again put back in MM and tried taking it
    out, its FAILING with same error.
    
    Not sure what is issue here. Anyone??
    
    
    
    --
    Makrand
    


[email protected] 
www.shapeblue.com
53 Chandos Place, Covent Garden, London  WC2N 4HSUK
@shapeblue
  
 

Reply via email to