Re: Failure to detach Azure Disk in OpenShift 4.2.7 after 15 minutes
Unfortunately, I didn't run it before I made the manual change. I ran it just then, and I can see error messages in the output, is that worth giving it to you still? The errors seemed to be coming from "azure_controller_standard.go" which seemed to be the code responsible for attaching/detaching Azure disks. Although I'm guessing the code that decides when to detach a disk, is hiding somewhere else? On Mon, 25 Nov 2019 at 15:26, Clayton Coleman wrote: > Did you run must-gather while it couldn’t detach? > > Without deeper debug info from the interval it’s hard to say. If you can > recreate it and run must gather we might be able to find it. > > On Nov 24, 2019, at 10:25 PM, Joel Pearson > wrote: > > Hi, > > I updated some machine config to configure chrony for masters and workers, > and I found that one of my containers got stuck after the masters had > restarted. > > One of the containers still couldn't start for 15 minutes, as the disk was > still attached to master-2 whereas the pod had been scheduled on master-1. > > In the end I manually detached the disk in the azure console. > > Is this a known issue? Or should I have waited for more than 15 minutes? > > Maybe this happened because the masters restarted and maybe whatever is > responsible for detaching the disk got restarted, and there wasn't a > cleanup process to detach from the original node? I'm not sure if this is > further complicated by the fact that my masters are also workers? > > Here is the event information from the pod: > > Warning FailedMount 57s (x8 over 16m) kubelet, > resource-group-prefix-master-1 Unable to mount volumes for pod > "odoo-3-m9kxs_odoo(c0a31c68-0f2c-11ea-b695-000d3a970043)": timeout expired > waiting for volumes to attach or mount for pod "odoo"/"odoo-3-m9kxs". list > of unmounted volumes=[odoo-data]. list of unattached volumes=[odoo-1 > odoo-data default-token-5d6x7] > > Warning FailedAttachVolume 55s (x15 over 15m) attachdetach-controller > AttachVolume.Attach failed for volume > "pvc-61f1ad81-0f24-11ea-8f8f-000d3a970df2" : Attach volume > "resource-group-prefix-dynamic-pvc-61f1ad81-0f24-11ea-8f8f-000d3a970df2" to > instance "resource-group-prefix-master-1" failed with > compute.VirtualMachinesClient#CreateOrUpdate: Failure sending request: > StatusCode=0 -- Original Error: autorest/azure: Service returned an error. > Status= Code="ConflictingUserInput" Message="A disk with name > resource-group-prefix-dynamic-pvc-61f1ad81-0f24-11ea-8f8f-000d3a970df2 > already exists in Resource Group RESOURCE-GROUP-PREFIX-RG and is attached > to VM > /subscriptions/-xxx---x/resourceGroups/resource-group-prefix-rg/providers/Microsoft.Compute/virtualMachines/resource-group-prefix-master-2. > 'Name' is an optional property for a disk and a unique name will be > generated if not provided." > Target="/subscriptions/-xxx---x/resourceGroups/resource-group-prefix-rg/providers/Microsoft.Compute/disks/resource-group-prefix-dynamic-pvc-61f1ad81-0f24-11ea-8f8f-000d3a970df2" > > Thanks, > > Joel > > ___ > users mailing list > users@lists.openshift.redhat.com > http://lists.openshift.redhat.com/openshiftmm/listinfo/users > > ___ users mailing list users@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/users
Re: Failure to detach Azure Disk in OpenShift 4.2.7 after 15 minutes
Did you run must-gather while it couldn’t detach? Without deeper debug info from the interval it’s hard to say. If you can recreate it and run must gather we might be able to find it. On Nov 24, 2019, at 10:25 PM, Joel Pearson wrote: Hi, I updated some machine config to configure chrony for masters and workers, and I found that one of my containers got stuck after the masters had restarted. One of the containers still couldn't start for 15 minutes, as the disk was still attached to master-2 whereas the pod had been scheduled on master-1. In the end I manually detached the disk in the azure console. Is this a known issue? Or should I have waited for more than 15 minutes? Maybe this happened because the masters restarted and maybe whatever is responsible for detaching the disk got restarted, and there wasn't a cleanup process to detach from the original node? I'm not sure if this is further complicated by the fact that my masters are also workers? Here is the event information from the pod: Warning FailedMount 57s (x8 over 16m) kubelet, resource-group-prefix-master-1 Unable to mount volumes for pod "odoo-3-m9kxs_odoo(c0a31c68-0f2c-11ea-b695-000d3a970043)": timeout expired waiting for volumes to attach or mount for pod "odoo"/"odoo-3-m9kxs". list of unmounted volumes=[odoo-data]. list of unattached volumes=[odoo-1 odoo-data default-token-5d6x7] Warning FailedAttachVolume 55s (x15 over 15m) attachdetach-controller AttachVolume.Attach failed for volume "pvc-61f1ad81-0f24-11ea-8f8f-000d3a970df2" : Attach volume "resource-group-prefix-dynamic-pvc-61f1ad81-0f24-11ea-8f8f-000d3a970df2" to instance "resource-group-prefix-master-1" failed with compute.VirtualMachinesClient#CreateOrUpdate: Failure sending request: StatusCode=0 -- Original Error: autorest/azure: Service returned an error. Status= Code="ConflictingUserInput" Message="A disk with name resource-group-prefix-dynamic-pvc-61f1ad81-0f24-11ea-8f8f-000d3a970df2 already exists in Resource Group RESOURCE-GROUP-PREFIX-RG and is attached to VM /subscriptions/-xxx---x/resourceGroups/resource-group-prefix-rg/providers/Microsoft.Compute/virtualMachines/resource-group-prefix-master-2. 'Name' is an optional property for a disk and a unique name will be generated if not provided." Target="/subscriptions/-xxx---x/resourceGroups/resource-group-prefix-rg/providers/Microsoft.Compute/disks/resource-group-prefix-dynamic-pvc-61f1ad81-0f24-11ea-8f8f-000d3a970df2" Thanks, Joel ___ users mailing list users@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/users ___ users mailing list users@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/users
Failure to detach Azure Disk in OpenShift 4.2.7 after 15 minutes
Hi, I updated some machine config to configure chrony for masters and workers, and I found that one of my containers got stuck after the masters had restarted. One of the containers still couldn't start for 15 minutes, as the disk was still attached to master-2 whereas the pod had been scheduled on master-1. In the end I manually detached the disk in the azure console. Is this a known issue? Or should I have waited for more than 15 minutes? Maybe this happened because the masters restarted and maybe whatever is responsible for detaching the disk got restarted, and there wasn't a cleanup process to detach from the original node? I'm not sure if this is further complicated by the fact that my masters are also workers? Here is the event information from the pod: Warning FailedMount 57s (x8 over 16m) kubelet, resource-group-prefix-master-1 Unable to mount volumes for pod "odoo-3-m9kxs_odoo(c0a31c68-0f2c-11ea-b695-000d3a970043)": timeout expired waiting for volumes to attach or mount for pod "odoo"/"odoo-3-m9kxs". list of unmounted volumes=[odoo-data]. list of unattached volumes=[odoo-1 odoo-data default-token-5d6x7] Warning FailedAttachVolume 55s (x15 over 15m) attachdetach-controller AttachVolume.Attach failed for volume "pvc-61f1ad81-0f24-11ea-8f8f-000d3a970df2" : Attach volume "resource-group-prefix-dynamic-pvc-61f1ad81-0f24-11ea-8f8f-000d3a970df2" to instance "resource-group-prefix-master-1" failed with compute.VirtualMachinesClient#CreateOrUpdate: Failure sending request: StatusCode=0 -- Original Error: autorest/azure: Service returned an error. Status= Code="ConflictingUserInput" Message="A disk with name resource-group-prefix-dynamic-pvc-61f1ad81-0f24-11ea-8f8f-000d3a970df2 already exists in Resource Group RESOURCE-GROUP-PREFIX-RG and is attached to VM /subscriptions/-xxx---x/resourceGroups/resource-group-prefix-rg/providers/Microsoft.Compute/virtualMachines/resource-group-prefix-master-2. 'Name' is an optional property for a disk and a unique name will be generated if not provided." Target="/subscriptions/-xxx---x/resourceGroups/resource-group-prefix-rg/providers/Microsoft.Compute/disks/resource-group-prefix-dynamic-pvc-61f1ad81-0f24-11ea-8f8f-000d3a970df2" Thanks, Joel ___ users mailing list users@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/users