Re: Failure to detach Azure Disk in OpenShift 4.2.7 after 15 minutes

Joel Pearson Sun, 24 Nov 2019 21:29:03 -0800

Unfortunately, I didn't run it before I made the manual change.

I ran it just then, and I can see error messages in the output, is that
worth giving it to you still?


The errors seemed to be coming from "azure_controller_standard.go" which
seemed to be the code responsible for attaching/detaching Azure disks.
Although I'm guessing the code that decides when to detach a disk, is
hiding somewhere else?

On Mon, 25 Nov 2019 at 15:26, Clayton Coleman <ccole...@redhat.com> wrote:

> Did you run must-gather while it couldn’t detach?
>
> Without deeper debug info from the interval it’s hard to say.  If you can
> recreate it and run must gather we might be able to find it.
>
> On Nov 24, 2019, at 10:25 PM, Joel Pearson <japear...@agiledigital.com.au>
> wrote:
>
> Hi,
>
> I updated some machine config to configure chrony for masters and workers,
> and I found that one of my containers got stuck after the masters had
> restarted.
>
> One of the containers still couldn't start for 15 minutes, as the disk was
> still attached to master-2 whereas the pod had been scheduled on master-1.
>
> In the end I manually detached the disk in the azure console.
>
> Is this a known issue? Or should I have waited for more than 15 minutes?
>
> Maybe this happened because the masters restarted and maybe whatever is
> responsible for detaching the disk got restarted, and there wasn't a
> cleanup process to detach from the original node? I'm not sure if this is
> further complicated by the fact that my masters are also workers?
>
> Here is the event information from the pod:
>
>   Warning  FailedMount         57s (x8 over 16m)   kubelet,
> resource-group-prefix-master-1  Unable to mount volumes for pod
> "odoo-3-m9kxs_odoo(c0a31c68-0f2c-11ea-b695-000d3a970043)": timeout expired
> waiting for volumes to attach or mount for pod "odoo"/"odoo-3-m9kxs". list
> of unmounted volumes=[odoo-data]. list of unattached volumes=[odoo-1
> odoo-data default-token-5d6x7]
>
>   Warning  FailedAttachVolume  55s (x15 over 15m)  attachdetach-controller
>                       AttachVolume.Attach failed for volume
> "pvc-61f1ad81-0f24-11ea-8f8f-000d3a970df2" : Attach volume
> "resource-group-prefix-dynamic-pvc-61f1ad81-0f24-11ea-8f8f-000d3a970df2" to
> instance "resource-group-prefix-master-1" failed with
> compute.VirtualMachinesClient#CreateOrUpdate: Failure sending request:
> StatusCode=0 -- Original Error: autorest/azure: Service returned an error.
> Status=<nil> Code="ConflictingUserInput" Message="A disk with name
> resource-group-prefix-dynamic-pvc-61f1ad81-0f24-11ea-8f8f-000d3a970df2
> already exists in Resource Group RESOURCE-GROUP-PREFIX-RG and is attached
> to VM
> /subscriptions/xxxx-xxx-xxxx-xxxx-xxxxx/resourceGroups/resource-group-prefix-rg/providers/Microsoft.Compute/virtualMachines/resource-group-prefix-master-2.
> 'Name' is an optional property for a disk and a unique name will be
> generated if not provided."
> Target="/subscriptions/xxxx-xxx-xxxx-xxxx-xxxxx/resourceGroups/resource-group-prefix-rg/providers/Microsoft.Compute/disks/resource-group-prefix-dynamic-pvc-61f1ad81-0f24-11ea-8f8f-000d3a970df2"
>
> Thanks,
>
> Joel
>
> _______________________________________________
> users mailing list
> users@lists.openshift.redhat.com
> http://lists.openshift.redhat.com/openshiftmm/listinfo/users
>
>

_______________________________________________
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users

Re: Failure to detach Azure Disk in OpenShift 4.2.7 after 15 minutes

Reply via email to