Re: Failure to detach Azure Disk in OpenShift 4.2.7 after 15 minutes

2019-11-24 Thread Joel Pearson
Unfortunately, I didn't run it before I made the manual change.

I ran it just then, and I can see error messages in the output, is that
worth giving it to you still?

The errors seemed to be coming from "azure_controller_standard.go" which
seemed to be the code responsible for attaching/detaching Azure disks.
Although I'm guessing the code that decides when to detach a disk, is
hiding somewhere else?

On Mon, 25 Nov 2019 at 15:26, Clayton Coleman  wrote:

> Did you run must-gather while it couldn’t detach?
>
> Without deeper debug info from the interval it’s hard to say.  If you can
> recreate it and run must gather we might be able to find it.
>
> On Nov 24, 2019, at 10:25 PM, Joel Pearson 
> wrote:
>
> Hi,
>
> I updated some machine config to configure chrony for masters and workers,
> and I found that one of my containers got stuck after the masters had
> restarted.
>
> One of the containers still couldn't start for 15 minutes, as the disk was
> still attached to master-2 whereas the pod had been scheduled on master-1.
>
> In the end I manually detached the disk in the azure console.
>
> Is this a known issue? Or should I have waited for more than 15 minutes?
>
> Maybe this happened because the masters restarted and maybe whatever is
> responsible for detaching the disk got restarted, and there wasn't a
> cleanup process to detach from the original node? I'm not sure if this is
> further complicated by the fact that my masters are also workers?
>
> Here is the event information from the pod:
>
>   Warning  FailedMount 57s (x8 over 16m)   kubelet,
> resource-group-prefix-master-1  Unable to mount volumes for pod
> "odoo-3-m9kxs_odoo(c0a31c68-0f2c-11ea-b695-000d3a970043)": timeout expired
> waiting for volumes to attach or mount for pod "odoo"/"odoo-3-m9kxs". list
> of unmounted volumes=[odoo-data]. list of unattached volumes=[odoo-1
> odoo-data default-token-5d6x7]
>
>   Warning  FailedAttachVolume  55s (x15 over 15m)  attachdetach-controller
>   AttachVolume.Attach failed for volume
> "pvc-61f1ad81-0f24-11ea-8f8f-000d3a970df2" : Attach volume
> "resource-group-prefix-dynamic-pvc-61f1ad81-0f24-11ea-8f8f-000d3a970df2" to
> instance "resource-group-prefix-master-1" failed with
> compute.VirtualMachinesClient#CreateOrUpdate: Failure sending request:
> StatusCode=0 -- Original Error: autorest/azure: Service returned an error.
> Status= Code="ConflictingUserInput" Message="A disk with name
> resource-group-prefix-dynamic-pvc-61f1ad81-0f24-11ea-8f8f-000d3a970df2
> already exists in Resource Group RESOURCE-GROUP-PREFIX-RG and is attached
> to VM
> /subscriptions/-xxx---x/resourceGroups/resource-group-prefix-rg/providers/Microsoft.Compute/virtualMachines/resource-group-prefix-master-2.
> 'Name' is an optional property for a disk and a unique name will be
> generated if not provided."
> Target="/subscriptions/-xxx---x/resourceGroups/resource-group-prefix-rg/providers/Microsoft.Compute/disks/resource-group-prefix-dynamic-pvc-61f1ad81-0f24-11ea-8f8f-000d3a970df2"
>
> Thanks,
>
> Joel
>
> ___
> users mailing list
> users@lists.openshift.redhat.com
> http://lists.openshift.redhat.com/openshiftmm/listinfo/users
>
>
___
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users


Re: Failure to detach Azure Disk in OpenShift 4.2.7 after 15 minutes

2019-11-24 Thread Clayton Coleman
Did you run must-gather while it couldn’t detach?

Without deeper debug info from the interval it’s hard to say.  If you can
recreate it and run must gather we might be able to find it.

On Nov 24, 2019, at 10:25 PM, Joel Pearson 
wrote:

Hi,

I updated some machine config to configure chrony for masters and workers,
and I found that one of my containers got stuck after the masters had
restarted.

One of the containers still couldn't start for 15 minutes, as the disk was
still attached to master-2 whereas the pod had been scheduled on master-1.

In the end I manually detached the disk in the azure console.

Is this a known issue? Or should I have waited for more than 15 minutes?

Maybe this happened because the masters restarted and maybe whatever is
responsible for detaching the disk got restarted, and there wasn't a
cleanup process to detach from the original node? I'm not sure if this is
further complicated by the fact that my masters are also workers?

Here is the event information from the pod:

  Warning  FailedMount 57s (x8 over 16m)   kubelet,
resource-group-prefix-master-1  Unable to mount volumes for pod
"odoo-3-m9kxs_odoo(c0a31c68-0f2c-11ea-b695-000d3a970043)": timeout expired
waiting for volumes to attach or mount for pod "odoo"/"odoo-3-m9kxs". list
of unmounted volumes=[odoo-data]. list of unattached volumes=[odoo-1
odoo-data default-token-5d6x7]

  Warning  FailedAttachVolume  55s (x15 over 15m)  attachdetach-controller
  AttachVolume.Attach failed for volume
"pvc-61f1ad81-0f24-11ea-8f8f-000d3a970df2" : Attach volume
"resource-group-prefix-dynamic-pvc-61f1ad81-0f24-11ea-8f8f-000d3a970df2" to
instance "resource-group-prefix-master-1" failed with
compute.VirtualMachinesClient#CreateOrUpdate: Failure sending request:
StatusCode=0 -- Original Error: autorest/azure: Service returned an error.
Status= Code="ConflictingUserInput" Message="A disk with name
resource-group-prefix-dynamic-pvc-61f1ad81-0f24-11ea-8f8f-000d3a970df2
already exists in Resource Group RESOURCE-GROUP-PREFIX-RG and is attached
to VM
/subscriptions/-xxx---x/resourceGroups/resource-group-prefix-rg/providers/Microsoft.Compute/virtualMachines/resource-group-prefix-master-2.
'Name' is an optional property for a disk and a unique name will be
generated if not provided."
Target="/subscriptions/-xxx---x/resourceGroups/resource-group-prefix-rg/providers/Microsoft.Compute/disks/resource-group-prefix-dynamic-pvc-61f1ad81-0f24-11ea-8f8f-000d3a970df2"

Thanks,

Joel

___
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users
___
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users


Failure to detach Azure Disk in OpenShift 4.2.7 after 15 minutes

2019-11-24 Thread Joel Pearson
Hi,

I updated some machine config to configure chrony for masters and workers,
and I found that one of my containers got stuck after the masters had
restarted.

One of the containers still couldn't start for 15 minutes, as the disk was
still attached to master-2 whereas the pod had been scheduled on master-1.

In the end I manually detached the disk in the azure console.

Is this a known issue? Or should I have waited for more than 15 minutes?

Maybe this happened because the masters restarted and maybe whatever is
responsible for detaching the disk got restarted, and there wasn't a
cleanup process to detach from the original node? I'm not sure if this is
further complicated by the fact that my masters are also workers?

Here is the event information from the pod:

  Warning  FailedMount 57s (x8 over 16m)   kubelet,
resource-group-prefix-master-1  Unable to mount volumes for pod
"odoo-3-m9kxs_odoo(c0a31c68-0f2c-11ea-b695-000d3a970043)": timeout expired
waiting for volumes to attach or mount for pod "odoo"/"odoo-3-m9kxs". list
of unmounted volumes=[odoo-data]. list of unattached volumes=[odoo-1
odoo-data default-token-5d6x7]

  Warning  FailedAttachVolume  55s (x15 over 15m)  attachdetach-controller
  AttachVolume.Attach failed for volume
"pvc-61f1ad81-0f24-11ea-8f8f-000d3a970df2" : Attach volume
"resource-group-prefix-dynamic-pvc-61f1ad81-0f24-11ea-8f8f-000d3a970df2" to
instance "resource-group-prefix-master-1" failed with
compute.VirtualMachinesClient#CreateOrUpdate: Failure sending request:
StatusCode=0 -- Original Error: autorest/azure: Service returned an error.
Status= Code="ConflictingUserInput" Message="A disk with name
resource-group-prefix-dynamic-pvc-61f1ad81-0f24-11ea-8f8f-000d3a970df2
already exists in Resource Group RESOURCE-GROUP-PREFIX-RG and is attached
to VM
/subscriptions/-xxx---x/resourceGroups/resource-group-prefix-rg/providers/Microsoft.Compute/virtualMachines/resource-group-prefix-master-2.
'Name' is an optional property for a disk and a unique name will be
generated if not provided."
Target="/subscriptions/-xxx---x/resourceGroups/resource-group-prefix-rg/providers/Microsoft.Compute/disks/resource-group-prefix-dynamic-pvc-61f1ad81-0f24-11ea-8f8f-000d3a970df2"

Thanks,

Joel
___
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users