Re: OpenShift 3.6 on AWS creating EBS volumes in wrong region

Hemant Kumar Sun, 07 Jan 2018 12:23:37 -0800

Is the volume at least attached to the node where you were expecting?

Can you post following:


1. oc get pvc <pvc_name> -o json
2. oc get pv <pv> -o json
3. oc get pod <pod> -o json
4. oc describe pod <pod>
6. output of lsblk and /proc/self/mountinfo on node where volume was
supposed to get attached and mounted.
7. Both kubelet and controller-manager logs. Controller-manager logs are
important in to debug - why volume did not attach in time. You find
controller-manager's log via journacl -u
atomic-openshift-master-controller-manager (or whatever is the name of
controller-manager systemd unit)


You can send them to me personally - if you would rather not post sensitive
information to public mailing list.





On Sun, Jan 7, 2018 at 2:58 PM, Marc Boorshtein <[email protected]>
wrote:

> sounds like the SELinux error is a red herring.  found a red hat bug
> report showing this isn't an issue.  This is all I'm seeing in the node's
> system log:
>
> Jan  7 19:50:08 ip-10-0-4-69 origin-node: I0107 19:50:08.381938    1750
> kubelet.go:1854] SyncLoop (ADD, "api"): "mariadb-3-5425j_test2(
> f6e9aa44-f3e3-11e7-96b9-0abad0f909f2)"
> Jan  7 19:50:08 ip-10-0-4-69 origin-node: I0107 19:50:08.495545    1750
> reconciler.go:212] operationExecutor.VerifyControllerAttachedVolume
> started for volume "default-token-b8c6l" (UniqueName: "
> kubernetes.io/secret/f6e9aa44-f3e3-11e7-96b9-0abad0f909f2-default-token-
> b8c6l") pod "mariadb-3-5425j" (UID: "f6e9aa44-f3e3-11e7-96b9-
> 0abad0f909f2")
> Jan  7 19:50:08 ip-10-0-4-69 origin-node: I0107 19:50:08.595841    1750
> reconciler.go:257] operationExecutor.MountVolume started for volume
> "default-token-b8c6l" (UniqueName: "kubernetes.io/secret/
> f6e9aa44-f3e3-11e7-96b9-0abad0f909f2-default-token-b8c6l") pod
> "mariadb-3-5425j" (UID: "f6e9aa44-f3e3-11e7-96b9-0abad0f909f2")
> Jan  7 19:50:08 ip-10-0-4-69 origin-node: I0107 19:50:08.608039    1750
> operation_generator.go:481] MountVolume.SetUp succeeded for volume
> "default-token-b8c6l" (UniqueName: "kubernetes.io/secret/
> f6e9aa44-f3e3-11e7-96b9-0abad0f909f2-default-token-b8c6l") pod
> "mariadb-3-5425j" (UID: "f6e9aa44-f3e3-11e7-96b9-0abad0f909f2")
> Jan  7 19:52:11 ip-10-0-4-69 origin-node: E0107 19:52:11.395023    1750
> kubelet.go:1594] Unable to mount volumes for pod "mariadb-3-5425j_test2(
> f6e9aa44-f3e3-11e7-96b9-0abad0f909f2)": timeout expired waiting for
> volumes to attach/mount for pod "test2"/"mariadb-3-5425j". list of
> unattached/unmounted volumes=[mariadb-data]; skipping pod
> Jan  7 19:52:11 ip-10-0-4-69 origin-node: E0107 19:52:11.395068    1750
> pod_workers.go:186] Error syncing pod f6e9aa44-f3e3-11e7-96b9-0abad0f909f2
> ("mariadb-3-5425j_test2(f6e9aa44-f3e3-11e7-96b9-0abad0f909f2)"),
> skipping: timeout expired waiting for volumes to attach/mount for pod
> "test2"/"mariadb-3-5425j". list of unattached/unmounted
> volumes=[mariadb-data]
>
> i'm kind of at a loss where else to look.  There are other EBS volumes on
> the server to handle local disks and the docker storage volume.  No selinux
> errors.  Any ideas where to look?
>
> Thanks
>
> On Sun, Jan 7, 2018 at 2:28 PM Marc Boorshtein <[email protected]>
> wrote:
>
>> The only errors I can find are in dmesg on the node thats running the pod:
>>
>> [ 1208.768340] XFS (dm-6): Mounting V5 Filesystem
>> [ 1208.907628] XFS (dm-6): Ending clean mount
>> [ 1208.937388] XFS (dm-6): Unmounting Filesystem
>> [ 1209.016985] XFS (dm-6): Mounting V5 Filesystem
>> [ 1209.148183] XFS (dm-6): Ending clean mount
>> [ 1209.167997] XFS (dm-6): Unmounting Filesystem
>> [ 1209.218989] XFS (dm-6): Mounting V5 Filesystem
>> [ 1209.342131] XFS (dm-6): Ending clean mount
>> [ 1209.386249] SELinux: mount invalid.  Same superblock, different
>> security settings for (dev mqueue, type mqueue)
>> [ 1217.550065] pci 0000:00:1d.0: [1d0f:8061] type 00 class 0x010802
>> [ 1217.550128] pci 0000:00:1d.0: reg 0x10: [mem 0x00000000-0x00003fff]
>> [ 1217.551181] pci 0000:00:1d.0: BAR 0: assigned [mem
>> 0xc0000000-0xc0003fff]
>> [ 1217.559756] nvme nvme3: pci function 0000:00:1d.0
>> [ 1217.568601] nvme 0000:00:1d.0: enabling device (0000 -> 0002)
>> [ 1217.575951] nvme 0000:00:1d.0: irq 33 for MSI/MSI-X
>> [ 1218.500526] nvme 0000:00:1d.0: irq 33 for MSI/MSI-X
>> [ 1218.500547] nvme 0000:00:1d.0: irq 34 for MSI/MSI-X
>>
>> google's found some issues with coreos, but nothing for openshift and
>> ebs.  I'm running cetos 7.4, docker is at Docker version 1.12.6, build
>> ec8512b/1.12.6 running on M5.large instances
>>
>> On Sat, Jan 6, 2018 at 10:19 PM Hemant Kumar <[email protected]> wrote:
>>
>>> The message you posted is generic message that is logged (or surfaced
>>> via events) when openshift-node process couldn't find attached volumes
>>> within specified time. That message in itself does not mean that node
>>> process will not retry (in fact it will retry more than once) and if volume
>>> is attached and mounted - pod will start correctly.
>>>
>>> There may be something else going on here - I can't say for sure without
>>> looking at openshift's node and controller-manager's logs.
>>>
>>>
>>>
>>>
>>>
>>> On Sat, Jan 6, 2018 at 9:38 PM, Marc Boorshtein <[email protected]>
>>> wrote:
>>>
>>>> Thank you for the explanation.  That now makes sense.  I redeployed
>>>> with 3.7 and the correct tags on the ec2 instances.  Now my new issue is
>>>> that I'm continuously getting the error "Unable to mount volumes for
>>>> pod "jenkins-2-lrgjb_test(ca61f578-f352-11e7-9237-0abad0f909f2)":
>>>> timeout expired waiting for volumes to attach/mount for pod
>>>> "test"/"jenkins-2-lrgjb". list of unattached/unmounted
>>>> volumes=[jenkins-data]" when trying to deploy jenkins.   The EBS volume is
>>>> created, the volume is attached to the node when i run lsblk i see the
>>>> device but it just times out.
>>>>
>>>> Thanks
>>>> Marc
>>>>
>>>> On Sat, Jan 6, 2018 at 6:43 AM Hemant Kumar <[email protected]> wrote:
>>>>
>>>>> Correction in last sentence:
>>>>>
>>>>> " hence it will pick NOT zone in which Openshift cluster did not
>>>>> exist."
>>>>>
>>>>> On Sat, Jan 6, 2018 at 6:36 AM, Hemant Kumar <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Let me clarify - I did not say that you have to "label" nodes and
>>>>>> masters.
>>>>>>
>>>>>> I was suggesting to tag nodes and masters, the way you tag a cloud
>>>>>> resource via AWS console or AWS CLI. I meant - AWS tag not openshift 
>>>>>> labels.
>>>>>>
>>>>>> The reason you have volumes created in another zone is because - your
>>>>>> AWS account has nodes in more than one zone, possibly not part of 
>>>>>> Openshift
>>>>>> cluster. But when you are requesting a dynamic provisioned volume -
>>>>>> Openshift considers all nodes it can find and accordingly it "randomly"
>>>>>> selects a zone among zone it discovered.
>>>>>>
>>>>>> But if you were to use AWS Console or CLI to tag all nodes(including
>>>>>> master) in your cluster with "KubernetesCluster" : "cluster_id"
>>>>>> then it will only select tagged nodes and hence it will pick zone in 
>>>>>> which
>>>>>> Openshift cluster did not exist.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Jan 5, 2018 at 11:48 PM, Marc Boorshtein <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> how do i label a master?  When i create PVCs it switches between 1c
>>>>>>> and 1a.  look on the master I see:
>>>>>>>
>>>>>>> Creating volume for PVC "wtf3"; chose zone="us-east-1c" from
>>>>>>> zones=["us-east-1a" "us-east-1c"]
>>>>>>>
>>>>>>> Where did us-east-1c come from???
>>>>>>>
>>>>>>> On Fri, Jan 5, 2018 at 11:07 PM Hemant Kumar <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Both nodes and masters. The tag information is picked from master
>>>>>>>> itself(Where controller-manager is running) and then openshift uses 
>>>>>>>> same
>>>>>>>> value to find all nodes in the cluster.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Jan 5, 2018 at 10:26 PM, Marc Boorshtein <
>>>>>>>> [email protected]> wrote:
>>>>>>>>
>>>>>>>>> node and masters?  or just nodes? (sounded like just nodes from
>>>>>>>>> the docs)
>>>>>>>>>
>>>>>>>>> On Fri, Jan 5, 2018 at 9:16 PM Hemant Kumar <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Make sure that you configure ALL instances in the cluster with
>>>>>>>>>> tag "KubernetesCluster": "value". The value of the tag for key
>>>>>>>>>> "KubernetesCluster" should be same for all instances in the cluster. 
>>>>>>>>>> You
>>>>>>>>>> can choose any string you want for value.
>>>>>>>>>>
>>>>>>>>>> You will probably have to restart openshift controller-manager
>>>>>>>>>> after the change at very minimum.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Fri, Jan 5, 2018 at 8:21 PM, Marc Boorshtein <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hello,
>>>>>>>>>>>
>>>>>>>>>>> I have a brand new Origin 3.6 running on AWS, the master and all
>>>>>>>>>>> nodes are in us-east-1a but whenever I try to have AWS create a new 
>>>>>>>>>>> volume,
>>>>>>>>>>> it puts it in us-east-1c so then no one can access it and all my 
>>>>>>>>>>> nodes go
>>>>>>>>>>> into a permanent pending state because NoVolumeZoneConflict.  
>>>>>>>>>>> Looking at
>>>>>>>>>>> aws.conf it states us-east-1a.  What am I missing?
>>>>>>>>>>>
>>>>>>>>>>> Thanks
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> users mailing list
>>>>>>>>>>> [email protected]
>>>>>>>>>>> http://lists.openshift.redhat.com/openshiftmm/listinfo/users
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>
>>>

_______________________________________________
users mailing list
[email protected]
http://lists.openshift.redhat.com/openshiftmm/listinfo/users

Re: OpenShift 3.6 on AWS creating EBS volumes in wrong region

Reply via email to