Is the volume at least attached to the node where you were expecting? Can you post following:
1. oc get pvc <pvc_name> -o json 2. oc get pv <pv> -o json 3. oc get pod <pod> -o json 4. oc describe pod <pod> 6. output of lsblk and /proc/self/mountinfo on node where volume was supposed to get attached and mounted. 7. Both kubelet and controller-manager logs. Controller-manager logs are important in to debug - why volume did not attach in time. You find controller-manager's log via journacl -u atomic-openshift-master-controller-manager (or whatever is the name of controller-manager systemd unit) You can send them to me personally - if you would rather not post sensitive information to public mailing list. On Sun, Jan 7, 2018 at 2:58 PM, Marc Boorshtein <[email protected]> wrote: > sounds like the SELinux error is a red herring. found a red hat bug > report showing this isn't an issue. This is all I'm seeing in the node's > system log: > > Jan 7 19:50:08 ip-10-0-4-69 origin-node: I0107 19:50:08.381938 1750 > kubelet.go:1854] SyncLoop (ADD, "api"): "mariadb-3-5425j_test2( > f6e9aa44-f3e3-11e7-96b9-0abad0f909f2)" > Jan 7 19:50:08 ip-10-0-4-69 origin-node: I0107 19:50:08.495545 1750 > reconciler.go:212] operationExecutor.VerifyControllerAttachedVolume > started for volume "default-token-b8c6l" (UniqueName: " > kubernetes.io/secret/f6e9aa44-f3e3-11e7-96b9-0abad0f909f2-default-token- > b8c6l") pod "mariadb-3-5425j" (UID: "f6e9aa44-f3e3-11e7-96b9- > 0abad0f909f2") > Jan 7 19:50:08 ip-10-0-4-69 origin-node: I0107 19:50:08.595841 1750 > reconciler.go:257] operationExecutor.MountVolume started for volume > "default-token-b8c6l" (UniqueName: "kubernetes.io/secret/ > f6e9aa44-f3e3-11e7-96b9-0abad0f909f2-default-token-b8c6l") pod > "mariadb-3-5425j" (UID: "f6e9aa44-f3e3-11e7-96b9-0abad0f909f2") > Jan 7 19:50:08 ip-10-0-4-69 origin-node: I0107 19:50:08.608039 1750 > operation_generator.go:481] MountVolume.SetUp succeeded for volume > "default-token-b8c6l" (UniqueName: "kubernetes.io/secret/ > f6e9aa44-f3e3-11e7-96b9-0abad0f909f2-default-token-b8c6l") pod > "mariadb-3-5425j" (UID: "f6e9aa44-f3e3-11e7-96b9-0abad0f909f2") > Jan 7 19:52:11 ip-10-0-4-69 origin-node: E0107 19:52:11.395023 1750 > kubelet.go:1594] Unable to mount volumes for pod "mariadb-3-5425j_test2( > f6e9aa44-f3e3-11e7-96b9-0abad0f909f2)": timeout expired waiting for > volumes to attach/mount for pod "test2"/"mariadb-3-5425j". list of > unattached/unmounted volumes=[mariadb-data]; skipping pod > Jan 7 19:52:11 ip-10-0-4-69 origin-node: E0107 19:52:11.395068 1750 > pod_workers.go:186] Error syncing pod f6e9aa44-f3e3-11e7-96b9-0abad0f909f2 > ("mariadb-3-5425j_test2(f6e9aa44-f3e3-11e7-96b9-0abad0f909f2)"), > skipping: timeout expired waiting for volumes to attach/mount for pod > "test2"/"mariadb-3-5425j". list of unattached/unmounted > volumes=[mariadb-data] > > i'm kind of at a loss where else to look. There are other EBS volumes on > the server to handle local disks and the docker storage volume. No selinux > errors. Any ideas where to look? > > Thanks > > On Sun, Jan 7, 2018 at 2:28 PM Marc Boorshtein <[email protected]> > wrote: > >> The only errors I can find are in dmesg on the node thats running the pod: >> >> [ 1208.768340] XFS (dm-6): Mounting V5 Filesystem >> [ 1208.907628] XFS (dm-6): Ending clean mount >> [ 1208.937388] XFS (dm-6): Unmounting Filesystem >> [ 1209.016985] XFS (dm-6): Mounting V5 Filesystem >> [ 1209.148183] XFS (dm-6): Ending clean mount >> [ 1209.167997] XFS (dm-6): Unmounting Filesystem >> [ 1209.218989] XFS (dm-6): Mounting V5 Filesystem >> [ 1209.342131] XFS (dm-6): Ending clean mount >> [ 1209.386249] SELinux: mount invalid. Same superblock, different >> security settings for (dev mqueue, type mqueue) >> [ 1217.550065] pci 0000:00:1d.0: [1d0f:8061] type 00 class 0x010802 >> [ 1217.550128] pci 0000:00:1d.0: reg 0x10: [mem 0x00000000-0x00003fff] >> [ 1217.551181] pci 0000:00:1d.0: BAR 0: assigned [mem >> 0xc0000000-0xc0003fff] >> [ 1217.559756] nvme nvme3: pci function 0000:00:1d.0 >> [ 1217.568601] nvme 0000:00:1d.0: enabling device (0000 -> 0002) >> [ 1217.575951] nvme 0000:00:1d.0: irq 33 for MSI/MSI-X >> [ 1218.500526] nvme 0000:00:1d.0: irq 33 for MSI/MSI-X >> [ 1218.500547] nvme 0000:00:1d.0: irq 34 for MSI/MSI-X >> >> google's found some issues with coreos, but nothing for openshift and >> ebs. I'm running cetos 7.4, docker is at Docker version 1.12.6, build >> ec8512b/1.12.6 running on M5.large instances >> >> On Sat, Jan 6, 2018 at 10:19 PM Hemant Kumar <[email protected]> wrote: >> >>> The message you posted is generic message that is logged (or surfaced >>> via events) when openshift-node process couldn't find attached volumes >>> within specified time. That message in itself does not mean that node >>> process will not retry (in fact it will retry more than once) and if volume >>> is attached and mounted - pod will start correctly. >>> >>> There may be something else going on here - I can't say for sure without >>> looking at openshift's node and controller-manager's logs. >>> >>> >>> >>> >>> >>> On Sat, Jan 6, 2018 at 9:38 PM, Marc Boorshtein <[email protected]> >>> wrote: >>> >>>> Thank you for the explanation. That now makes sense. I redeployed >>>> with 3.7 and the correct tags on the ec2 instances. Now my new issue is >>>> that I'm continuously getting the error "Unable to mount volumes for >>>> pod "jenkins-2-lrgjb_test(ca61f578-f352-11e7-9237-0abad0f909f2)": >>>> timeout expired waiting for volumes to attach/mount for pod >>>> "test"/"jenkins-2-lrgjb". list of unattached/unmounted >>>> volumes=[jenkins-data]" when trying to deploy jenkins. The EBS volume is >>>> created, the volume is attached to the node when i run lsblk i see the >>>> device but it just times out. >>>> >>>> Thanks >>>> Marc >>>> >>>> On Sat, Jan 6, 2018 at 6:43 AM Hemant Kumar <[email protected]> wrote: >>>> >>>>> Correction in last sentence: >>>>> >>>>> " hence it will pick NOT zone in which Openshift cluster did not >>>>> exist." >>>>> >>>>> On Sat, Jan 6, 2018 at 6:36 AM, Hemant Kumar <[email protected]> >>>>> wrote: >>>>> >>>>>> Let me clarify - I did not say that you have to "label" nodes and >>>>>> masters. >>>>>> >>>>>> I was suggesting to tag nodes and masters, the way you tag a cloud >>>>>> resource via AWS console or AWS CLI. I meant - AWS tag not openshift >>>>>> labels. >>>>>> >>>>>> The reason you have volumes created in another zone is because - your >>>>>> AWS account has nodes in more than one zone, possibly not part of >>>>>> Openshift >>>>>> cluster. But when you are requesting a dynamic provisioned volume - >>>>>> Openshift considers all nodes it can find and accordingly it "randomly" >>>>>> selects a zone among zone it discovered. >>>>>> >>>>>> But if you were to use AWS Console or CLI to tag all nodes(including >>>>>> master) in your cluster with "KubernetesCluster" : "cluster_id" >>>>>> then it will only select tagged nodes and hence it will pick zone in >>>>>> which >>>>>> Openshift cluster did not exist. >>>>>> >>>>>> >>>>>> >>>>>> On Fri, Jan 5, 2018 at 11:48 PM, Marc Boorshtein < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> how do i label a master? When i create PVCs it switches between 1c >>>>>>> and 1a. look on the master I see: >>>>>>> >>>>>>> Creating volume for PVC "wtf3"; chose zone="us-east-1c" from >>>>>>> zones=["us-east-1a" "us-east-1c"] >>>>>>> >>>>>>> Where did us-east-1c come from??? >>>>>>> >>>>>>> On Fri, Jan 5, 2018 at 11:07 PM Hemant Kumar <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Both nodes and masters. The tag information is picked from master >>>>>>>> itself(Where controller-manager is running) and then openshift uses >>>>>>>> same >>>>>>>> value to find all nodes in the cluster. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Fri, Jan 5, 2018 at 10:26 PM, Marc Boorshtein < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> node and masters? or just nodes? (sounded like just nodes from >>>>>>>>> the docs) >>>>>>>>> >>>>>>>>> On Fri, Jan 5, 2018 at 9:16 PM Hemant Kumar <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Make sure that you configure ALL instances in the cluster with >>>>>>>>>> tag "KubernetesCluster": "value". The value of the tag for key >>>>>>>>>> "KubernetesCluster" should be same for all instances in the cluster. >>>>>>>>>> You >>>>>>>>>> can choose any string you want for value. >>>>>>>>>> >>>>>>>>>> You will probably have to restart openshift controller-manager >>>>>>>>>> after the change at very minimum. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Fri, Jan 5, 2018 at 8:21 PM, Marc Boorshtein < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>>>>>>>>>> Hello, >>>>>>>>>>> >>>>>>>>>>> I have a brand new Origin 3.6 running on AWS, the master and all >>>>>>>>>>> nodes are in us-east-1a but whenever I try to have AWS create a new >>>>>>>>>>> volume, >>>>>>>>>>> it puts it in us-east-1c so then no one can access it and all my >>>>>>>>>>> nodes go >>>>>>>>>>> into a permanent pending state because NoVolumeZoneConflict. >>>>>>>>>>> Looking at >>>>>>>>>>> aws.conf it states us-east-1a. What am I missing? >>>>>>>>>>> >>>>>>>>>>> Thanks >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> users mailing list >>>>>>>>>>> [email protected] >>>>>>>>>>> http://lists.openshift.redhat.com/openshiftmm/listinfo/users >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>> >>>>> >>>
_______________________________________________ users mailing list [email protected] http://lists.openshift.redhat.com/openshiftmm/listinfo/users
