Samuel, Well that was it. I had screwed up the machine type definitions in the install-config.yaml (I thought I was specifying the higher capacities for the worker, but I made the changes in the master section). Duh.
Retrying.... Regards, Marvin On Thu, Oct 10, 2019 at 8:08 AM Samuel Martín Moro <faus...@gmail.com> wrote: > Hi, > > Looking at that node labels, we can see the instance type being used is > m4.large, and not c5.xlarge. > So it would be normal to only see 2 CPUs instead of 4. > How did you change your instances sizes? Editing your install-config.yaml? > Can you show us? (you can remove any sensitive data, like pullSecrets) > > Regards. > > On Thu, Oct 10, 2019 at 1:53 PM Just Marvin < > marvin.the.cynical.ro...@gmail.com> wrote: > >> Samuel, >> >> See below, the output from "oc describe node <node id>" for one of my >> worker nodes. In particular, I'm interested in the "Capacity" and >> "Allocatable" sections. In the Capacity section, it says that this is 2 >> CPUs. When I first noticed this, I had defined the workers using c5-xlarge >> machines - which have 4 vpcus. I thought that maybe OpenShift itself is >> reserving 2 CPUs for itself. But I then rebuilt the cluster with c5-2xlarge >> machines, and the output you see, below, is from that. It shows 2 CPUs as >> well. So this seems like OpenShift isn't recognizing the additional >> hardware. How do I fix this? >> >> name: ip-10-0-156-206.us-west-1.compute.internal >> Roles: worker >> Labels: beta.kubernetes.io/arch=amd64 >> beta.kubernetes.io/instance-type=m4.large >> beta.kubernetes.io/os=linux >> failure-domain.beta.kubernetes.io/region=us-west-1 >> failure-domain.beta.kubernetes.io/zone=us-west-1b >> kubernetes.io/arch=amd64 >> kubernetes.io/hostname=ip-10-0-156-206 >> kubernetes.io/os=linux >> node-role.kubernetes.io/worker= >> node.openshift.io/os_id=rhcos >> Annotations: machine.openshift.io/machine: >> openshift-machine-api/two-4z45k-worker-us-west-1b-t4dln >> machineconfiguration.openshift.io/currentConfig: >> rendered-worker-f4169460716c78be83ccb2609dd91fc3 >> machineconfiguration.openshift.io/desiredConfig: >> rendered-worker-f4169460716c78be83ccb2609dd91fc3 >> machineconfiguration.openshift.io/state: Done >> >> volumes.kubernetes.io/controller-managed-attach-detach: true >> CreationTimestamp: Thu, 10 Oct 2019 07:30:18 -0400 >> Taints: <none> >> Unschedulable: false >> Conditions: >> Type Status LastHeartbeatTime >> LastTransitionTime Reason Message >> ---- ------ ----------------- >> ------------------ ------ ------- >> MemoryPressure False Thu, 10 Oct 2019 07:46:01 -0400 Thu, 10 Oct >> 2019 07:30:18 -0400 KubeletHasSufficientMemory kubelet has sufficient >> memory available >> DiskPressure False Thu, 10 Oct 2019 07:46:01 -0400 Thu, 10 Oct >> 2019 07:30:18 -0400 KubeletHasNoDiskPressure kubelet has no disk >> pressure >> PIDPressure False Thu, 10 Oct 2019 07:46:01 -0400 Thu, 10 Oct >> 2019 07:30:18 -0400 KubeletHasSufficientPID kubelet has sufficient >> PID available >> Ready True Thu, 10 Oct 2019 07:46:01 -0400 Thu, 10 Oct >> 2019 07:31:19 -0400 KubeletReady kubelet is posting ready >> status >> Addresses: >> InternalIP: 10.0.156.206 >> Hostname: ip-10-0-156-206.us-west-1.compute.internal >> InternalDNS: ip-10-0-156-206.us-west-1.compute.internal >> Capacity: >> attachable-volumes-aws-ebs: 39 >> cpu: 2 >> hugepages-1Gi: 0 >> hugepages-2Mi: 0 >> memory: 8162888Ki >> pods: 250 >> Allocatable: >> attachable-volumes-aws-ebs: 39 >> cpu: 1500m >> hugepages-1Gi: 0 >> hugepages-2Mi: 0 >> memory: 7548488Ki >> pods: 250 >> System Info: >> Machine ID: 23efe37e2b244bd788bb8575cd340bfd >> System UUID: >> ec25c230-d02c-99cf-0540-bad276c8cc73 >> Boot ID: >> 1f3a064f-a24e-4bdf-b6b0-c7fd3019757e >> Kernel Version: 4.18.0-80.11.2.el8_0.x86_64 >> OS Image: Red Hat Enterprise Linux CoreOS >> 42.80.20191001.0 (Ootpa) >> Operating System: linux >> Architecture: amd64 >> Container Runtime Version: >> cri-o://1.14.10-0.21.dev.rhaos4.2.git0d4a906.el8 >> Kubelet Version: v1.14.6+d3a139f63 >> Kube-Proxy Version: v1.14.6+d3a139f63 >> ProviderID: >> aws:///us-west-1b/i-07e495331f3f25ac0 >> Non-terminated Pods: (18 in total) >> Namespace Name >> CPU Requests CPU Limits Memory Requests Memory Limits AGE >> --------- ---- >> ------------ ---------- --------------- ------------- --- >> openshift-cluster-node-tuning-operator tuned-9qnjd >> 10m (0%) 0 (0%) 50Mi (0%) 0 (0%) 16m >> openshift-dns dns-default-skgfs >> 110m (7%) 0 (0%) 70Mi (0%) 512Mi (6%) 16m >> openshift-image-registry image-registry-584f455476-9q7b8 >> 100m (6%) 0 (0%) 256Mi (3%) 0 (0%) 16m >> openshift-image-registry node-ca-74h7h >> 10m (0%) 0 (0%) 10Mi (0%) 0 (0%) 15m >> openshift-ingress router-default-85b6848bdf-679xf >> 100m (6%) 0 (0%) 256Mi (3%) 0 (0%) 16m >> openshift-machine-config-operator machine-config-daemon-6ht2m >> 20m (1%) 0 (0%) 50Mi (0%) 0 (0%) 15m >> openshift-marketplace >> certified-operators-5cb88dd798-lcfvc 10m (0%) 0 (0%) 100Mi >> (1%) 0 (0%) 17m >> openshift-marketplace >> community-operators-7f8987f496-8rgzq 10m (0%) 0 (0%) 100Mi >> (1%) 0 (0%) 17m >> openshift-marketplace >> redhat-operators-cd495bc4f-fcm5t 10m (0%) 0 (0%) 100Mi >> (1%) 0 (0%) 17m >> openshift-monitoring alertmanager-main-1 >> 100m (6%) 100m (6%) 225Mi (3%) 25Mi (0%) 12m >> openshift-monitoring >> kube-state-metrics-6b66989cb7-nlqbm 30m (2%) 0 (0%) >> 120Mi (1%) 0 (0%) 17m >> openshift-monitoring node-exporter-bhrhz >> 10m (0%) 0 (0%) 20Mi (0%) 0 (0%) 16m >> openshift-monitoring >> openshift-state-metrics-6bf647b484-sfgjs 120m (8%) 0 (0%) >> 190Mi (2%) 0 (0%) 17m >> openshift-monitoring >> prometheus-adapter-66d6b69459-bcq5p 10m (0%) 0 (0%) 20Mi >> (0%) 0 (0%) 13m >> openshift-monitoring prometheus-k8s-1 >> 430m (28%) 200m (13%) 1134Mi (15%) 50Mi (0%) 14m >> openshift-multus multus-jlmwx >> 10m (0%) 0 (0%) 150Mi (2%) 0 (0%) 16m >> openshift-sdn ovs-hdml2 >> 200m (13%) 0 (0%) 400Mi (5%) 0 (0%) 16m >> openshift-sdn sdn-vwh22 >> 100m (6%) 0 (0%) 200Mi (2%) 0 (0%) 16m >> Allocated resources: >> (Total limits may be over 100 percent, i.e., overcommitted.) >> Resource Requests Limits >> -------- -------- ------ >> cpu 1390m (92%) 300m (20%) >> memory 3451Mi (46%) 587Mi (7%) >> ephemeral-storage 0 (0%) 0 (0%) >> attachable-volumes-aws-ebs 0 0 >> Events: <none> >> >> >> >> On Wed, Oct 9, 2019 at 8:30 AM Samuel Martín Moro <faus...@gmail.com> >> wrote: >> >>> You have two master nodes? You'ld rather go with 1 or 3. With 2 masters, >>> your etcd quorum is 2. If you lose a master, the API would be unavailable. >>> >>> Now ... c5-xlarge should be fine. >>> Not sure why your console doesn't show everything (and not familiar with >>> that dashboard yet). as a wild guess, probably some delay collecting >>> metrics. Though you should see something, eventually. >>> >>> If you use "oc describe node <node-name>", you should see the >>> reservations (requests & limits) for that node. >>> Might be able to figure out what's eating up your resources. >>> >>> Depending on what openshift components you're deploying, you may already >>> be using quite a lot. >>> Especially if you don't have infra nodes, and did deploy EFK and/or >>> hawkular/cassandra. Prometheus could use some resources as well. >>> Meanwhile, istio itself can ship with more or less components, ... >>> >>> If using EFK: you may be able to lower resources requests/limits for >>> ElasticSearch >>> If using Hawkular: same remark regarding Cassandra >>> Hard to say, without seeing it. But you can probably free up some >>> resources here and there. >>> >>> >>> Good luck, >>> >>> Regards. >>> >>> On Wed, Oct 9, 2019 at 1:47 PM Just Marvin < >>> marvin.the.cynical.ro...@gmail.com> wrote: >>> >>>> Samuel, >>>> >>>> So it is CPU. But, I destroyed the cluster, gave it machines with >>>> twice as much memory, retried and got the same problem: >>>> >>>> Events: >>>> Type Reason Age From >>>> Message >>>> ---- ------ ---- ---- >>>> ------- >>>> Warning FailedScheduling 16s (x4 over 3m12s) default-scheduler >>>> 0/4 nodes are available: 2 Insufficient cpu, 2 node(s) had taints that the >>>> pod didn't tolerate. >>>> >>>> I'm guessing that the two pods with taints are the two master >>>> nodes, but the other two are c5-xlarge machines. But here is maybe a >>>> relevant observation. As soon as I log into the cluster, I see this on my >>>> main dashboard. >>>> >>>> [image: image.png] >>>> >>>> Is there perhaps a problem with the CPU resource monitoring that is >>>> causing my problems? >>>> >>>> Regards, >>>> Marvin >>>> >>>> On Sun, Oct 6, 2019 at 3:52 PM Samuel Martín Moro <faus...@gmail.com> >>>> wrote: >>>> >>>>> you can use "oc describe pod <pod-name>", to figure out what's going >>>>> on with your pod. >>>>> could be that you're out of cpu/memory. >>>>> >>>>> Regards. >>>>> >>>>> On Sun, Oct 6, 2019 at 9:27 PM Just Marvin < >>>>> marvin.the.cynical.ro...@gmail.com> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> [zaphod@oc3027208274 ocp4.2-aws]$ oc get pods >>>>>> NAME READY STATUS RESTARTS AGE >>>>>> istio-citadel-7cb44f4bb-tccql 1/1 Running 0 9m35s >>>>>> istio-galley-75599dbc67-b4mgx 1/1 Running 0 8m41s >>>>>> istio-policy-56476c984b-c7t8j 0/2 *Pending* 0 >>>>>> 8m23s >>>>>> istio-telemetry-d5bbd7d7b-v8kjq 0/2 *Pending* 0 >>>>>> 8m24s >>>>>> jaeger-5d9dfdfb67-mv8mp 2/2 Running 0 8m45s >>>>>> prometheus-685bdbdc45-hmb9f 2/2 Running 0 9m17s >>>>>> [zaphod@oc3027208274 ocp4.2-aws]$ >>>>>> >>>>>> The pods in Pending state don't seem to be moving forward. The >>>>>> operator logs aren't showing anything informative about why this might >>>>>> be. >>>>>> Is this normal, or if there is a problem, and if so, how would I figure >>>>>> out >>>>>> the cause? >>>>>> >>>>>> Regards, >>>>>> Marvin >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> users@lists.openshift.redhat.com >>>>>> http://lists.openshift.redhat.com/openshiftmm/listinfo/users >>>>>> >>>>> >>>>> >>>>> -- >>>>> Samuel Martín Moro >>>>> {EPITECH.} 2011 >>>>> >>>>> "Nobody wants to say how this works. >>>>> Maybe nobody knows ..." >>>>> Xorg.conf(5) >>>>> >>>> >>> >>> -- >>> Samuel Martín Moro >>> {EPITECH.} 2011 >>> >>> "Nobody wants to say how this works. >>> Maybe nobody knows ..." >>> Xorg.conf(5) >>> >> > > -- > Samuel Martín Moro > {EPITECH.} 2011 > > "Nobody wants to say how this works. > Maybe nobody knows ..." > Xorg.conf(5) >
_______________________________________________ users mailing list users@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/users