Hi, Looking at that node labels, we can see the instance type being used is m4.large, and not c5.xlarge. So it would be normal to only see 2 CPUs instead of 4. How did you change your instances sizes? Editing your install-config.yaml? Can you show us? (you can remove any sensitive data, like pullSecrets)
Regards. On Thu, Oct 10, 2019 at 1:53 PM Just Marvin < marvin.the.cynical.ro...@gmail.com> wrote: > Samuel, > > See below, the output from "oc describe node <node id>" for one of my > worker nodes. In particular, I'm interested in the "Capacity" and > "Allocatable" sections. In the Capacity section, it says that this is 2 > CPUs. When I first noticed this, I had defined the workers using c5-xlarge > machines - which have 4 vpcus. I thought that maybe OpenShift itself is > reserving 2 CPUs for itself. But I then rebuilt the cluster with c5-2xlarge > machines, and the output you see, below, is from that. It shows 2 CPUs as > well. So this seems like OpenShift isn't recognizing the additional > hardware. How do I fix this? > > name: ip-10-0-156-206.us-west-1.compute.internal > Roles: worker > Labels: beta.kubernetes.io/arch=amd64 > beta.kubernetes.io/instance-type=m4.large > beta.kubernetes.io/os=linux > failure-domain.beta.kubernetes.io/region=us-west-1 > failure-domain.beta.kubernetes.io/zone=us-west-1b > kubernetes.io/arch=amd64 > kubernetes.io/hostname=ip-10-0-156-206 > kubernetes.io/os=linux > node-role.kubernetes.io/worker= > node.openshift.io/os_id=rhcos > Annotations: machine.openshift.io/machine: > openshift-machine-api/two-4z45k-worker-us-west-1b-t4dln > machineconfiguration.openshift.io/currentConfig: > rendered-worker-f4169460716c78be83ccb2609dd91fc3 > machineconfiguration.openshift.io/desiredConfig: > rendered-worker-f4169460716c78be83ccb2609dd91fc3 > machineconfiguration.openshift.io/state: Done > volumes.kubernetes.io/controller-managed-attach-detach: > true > CreationTimestamp: Thu, 10 Oct 2019 07:30:18 -0400 > Taints: <none> > Unschedulable: false > Conditions: > Type Status LastHeartbeatTime > LastTransitionTime Reason Message > ---- ------ ----------------- > ------------------ ------ ------- > MemoryPressure False Thu, 10 Oct 2019 07:46:01 -0400 Thu, 10 Oct > 2019 07:30:18 -0400 KubeletHasSufficientMemory kubelet has sufficient > memory available > DiskPressure False Thu, 10 Oct 2019 07:46:01 -0400 Thu, 10 Oct > 2019 07:30:18 -0400 KubeletHasNoDiskPressure kubelet has no disk > pressure > PIDPressure False Thu, 10 Oct 2019 07:46:01 -0400 Thu, 10 Oct > 2019 07:30:18 -0400 KubeletHasSufficientPID kubelet has sufficient > PID available > Ready True Thu, 10 Oct 2019 07:46:01 -0400 Thu, 10 Oct > 2019 07:31:19 -0400 KubeletReady kubelet is posting ready > status > Addresses: > InternalIP: 10.0.156.206 > Hostname: ip-10-0-156-206.us-west-1.compute.internal > InternalDNS: ip-10-0-156-206.us-west-1.compute.internal > Capacity: > attachable-volumes-aws-ebs: 39 > cpu: 2 > hugepages-1Gi: 0 > hugepages-2Mi: 0 > memory: 8162888Ki > pods: 250 > Allocatable: > attachable-volumes-aws-ebs: 39 > cpu: 1500m > hugepages-1Gi: 0 > hugepages-2Mi: 0 > memory: 7548488Ki > pods: 250 > System Info: > Machine ID: 23efe37e2b244bd788bb8575cd340bfd > System UUID: > ec25c230-d02c-99cf-0540-bad276c8cc73 > Boot ID: > 1f3a064f-a24e-4bdf-b6b0-c7fd3019757e > Kernel Version: 4.18.0-80.11.2.el8_0.x86_64 > OS Image: Red Hat Enterprise Linux CoreOS > 42.80.20191001.0 (Ootpa) > Operating System: linux > Architecture: amd64 > Container Runtime Version: > cri-o://1.14.10-0.21.dev.rhaos4.2.git0d4a906.el8 > Kubelet Version: v1.14.6+d3a139f63 > Kube-Proxy Version: v1.14.6+d3a139f63 > ProviderID: > aws:///us-west-1b/i-07e495331f3f25ac0 > Non-terminated Pods: (18 in total) > Namespace Name > CPU Requests CPU Limits Memory Requests Memory Limits AGE > --------- ---- > ------------ ---------- --------------- ------------- --- > openshift-cluster-node-tuning-operator tuned-9qnjd > 10m (0%) 0 (0%) 50Mi (0%) 0 (0%) 16m > openshift-dns dns-default-skgfs > 110m (7%) 0 (0%) 70Mi (0%) 512Mi (6%) 16m > openshift-image-registry image-registry-584f455476-9q7b8 > 100m (6%) 0 (0%) 256Mi (3%) 0 (0%) 16m > openshift-image-registry node-ca-74h7h > 10m (0%) 0 (0%) 10Mi (0%) 0 (0%) 15m > openshift-ingress router-default-85b6848bdf-679xf > 100m (6%) 0 (0%) 256Mi (3%) 0 (0%) 16m > openshift-machine-config-operator machine-config-daemon-6ht2m > 20m (1%) 0 (0%) 50Mi (0%) 0 (0%) 15m > openshift-marketplace > certified-operators-5cb88dd798-lcfvc 10m (0%) 0 (0%) 100Mi > (1%) 0 (0%) 17m > openshift-marketplace > community-operators-7f8987f496-8rgzq 10m (0%) 0 (0%) 100Mi > (1%) 0 (0%) 17m > openshift-marketplace redhat-operators-cd495bc4f-fcm5t > 10m (0%) 0 (0%) 100Mi (1%) 0 (0%) 17m > openshift-monitoring alertmanager-main-1 > 100m (6%) 100m (6%) 225Mi (3%) 25Mi (0%) 12m > openshift-monitoring > kube-state-metrics-6b66989cb7-nlqbm 30m (2%) 0 (0%) > 120Mi (1%) 0 (0%) 17m > openshift-monitoring node-exporter-bhrhz > 10m (0%) 0 (0%) 20Mi (0%) 0 (0%) 16m > openshift-monitoring > openshift-state-metrics-6bf647b484-sfgjs 120m (8%) 0 (0%) > 190Mi (2%) 0 (0%) 17m > openshift-monitoring > prometheus-adapter-66d6b69459-bcq5p 10m (0%) 0 (0%) 20Mi > (0%) 0 (0%) 13m > openshift-monitoring prometheus-k8s-1 > 430m (28%) 200m (13%) 1134Mi (15%) 50Mi (0%) 14m > openshift-multus multus-jlmwx > 10m (0%) 0 (0%) 150Mi (2%) 0 (0%) 16m > openshift-sdn ovs-hdml2 > 200m (13%) 0 (0%) 400Mi (5%) 0 (0%) 16m > openshift-sdn sdn-vwh22 > 100m (6%) 0 (0%) 200Mi (2%) 0 (0%) 16m > Allocated resources: > (Total limits may be over 100 percent, i.e., overcommitted.) > Resource Requests Limits > -------- -------- ------ > cpu 1390m (92%) 300m (20%) > memory 3451Mi (46%) 587Mi (7%) > ephemeral-storage 0 (0%) 0 (0%) > attachable-volumes-aws-ebs 0 0 > Events: <none> > > > > On Wed, Oct 9, 2019 at 8:30 AM Samuel Martín Moro <faus...@gmail.com> > wrote: > >> You have two master nodes? You'ld rather go with 1 or 3. With 2 masters, >> your etcd quorum is 2. If you lose a master, the API would be unavailable. >> >> Now ... c5-xlarge should be fine. >> Not sure why your console doesn't show everything (and not familiar with >> that dashboard yet). as a wild guess, probably some delay collecting >> metrics. Though you should see something, eventually. >> >> If you use "oc describe node <node-name>", you should see the >> reservations (requests & limits) for that node. >> Might be able to figure out what's eating up your resources. >> >> Depending on what openshift components you're deploying, you may already >> be using quite a lot. >> Especially if you don't have infra nodes, and did deploy EFK and/or >> hawkular/cassandra. Prometheus could use some resources as well. >> Meanwhile, istio itself can ship with more or less components, ... >> >> If using EFK: you may be able to lower resources requests/limits for >> ElasticSearch >> If using Hawkular: same remark regarding Cassandra >> Hard to say, without seeing it. But you can probably free up some >> resources here and there. >> >> >> Good luck, >> >> Regards. >> >> On Wed, Oct 9, 2019 at 1:47 PM Just Marvin < >> marvin.the.cynical.ro...@gmail.com> wrote: >> >>> Samuel, >>> >>> So it is CPU. But, I destroyed the cluster, gave it machines with >>> twice as much memory, retried and got the same problem: >>> >>> Events: >>> Type Reason Age From >>> Message >>> ---- ------ ---- ---- >>> ------- >>> Warning FailedScheduling 16s (x4 over 3m12s) default-scheduler 0/4 >>> nodes are available: 2 Insufficient cpu, 2 node(s) had taints that the pod >>> didn't tolerate. >>> >>> I'm guessing that the two pods with taints are the two master nodes, >>> but the other two are c5-xlarge machines. But here is maybe a relevant >>> observation. As soon as I log into the cluster, I see this on my main >>> dashboard. >>> >>> [image: image.png] >>> >>> Is there perhaps a problem with the CPU resource monitoring that is >>> causing my problems? >>> >>> Regards, >>> Marvin >>> >>> On Sun, Oct 6, 2019 at 3:52 PM Samuel Martín Moro <faus...@gmail.com> >>> wrote: >>> >>>> you can use "oc describe pod <pod-name>", to figure out what's going on >>>> with your pod. >>>> could be that you're out of cpu/memory. >>>> >>>> Regards. >>>> >>>> On Sun, Oct 6, 2019 at 9:27 PM Just Marvin < >>>> marvin.the.cynical.ro...@gmail.com> wrote: >>>> >>>>> Hi, >>>>> >>>>> [zaphod@oc3027208274 ocp4.2-aws]$ oc get pods >>>>> NAME READY STATUS RESTARTS AGE >>>>> istio-citadel-7cb44f4bb-tccql 1/1 Running 0 9m35s >>>>> istio-galley-75599dbc67-b4mgx 1/1 Running 0 8m41s >>>>> istio-policy-56476c984b-c7t8j 0/2 *Pending* 0 8m23s >>>>> istio-telemetry-d5bbd7d7b-v8kjq 0/2 *Pending* 0 8m24s >>>>> jaeger-5d9dfdfb67-mv8mp 2/2 Running 0 8m45s >>>>> prometheus-685bdbdc45-hmb9f 2/2 Running 0 9m17s >>>>> [zaphod@oc3027208274 ocp4.2-aws]$ >>>>> >>>>> The pods in Pending state don't seem to be moving forward. The >>>>> operator logs aren't showing anything informative about why this might be. >>>>> Is this normal, or if there is a problem, and if so, how would I figure >>>>> out >>>>> the cause? >>>>> >>>>> Regards, >>>>> Marvin >>>>> _______________________________________________ >>>>> users mailing list >>>>> users@lists.openshift.redhat.com >>>>> http://lists.openshift.redhat.com/openshiftmm/listinfo/users >>>>> >>>> >>>> >>>> -- >>>> Samuel Martín Moro >>>> {EPITECH.} 2011 >>>> >>>> "Nobody wants to say how this works. >>>> Maybe nobody knows ..." >>>> Xorg.conf(5) >>>> >>> >> >> -- >> Samuel Martín Moro >> {EPITECH.} 2011 >> >> "Nobody wants to say how this works. >> Maybe nobody knows ..." >> Xorg.conf(5) >> > -- Samuel Martín Moro {EPITECH.} 2011 "Nobody wants to say how this works. Maybe nobody knows ..." Xorg.conf(5)
_______________________________________________ users mailing list users@lists.openshift.redhat.com http://lists.openshift.redhat.com/openshiftmm/listinfo/users