Hi,

Looking at that node labels, we can see the instance type being used is
m4.large, and not c5.xlarge.
So it would be normal to only see 2 CPUs instead of 4.
How did you change your instances sizes? Editing your install-config.yaml?
Can you show us? (you can remove any sensitive data, like pullSecrets)

Regards.

On Thu, Oct 10, 2019 at 1:53 PM Just Marvin <
marvin.the.cynical.ro...@gmail.com> wrote:

> Samuel,
>
>     See below, the output from "oc describe node <node id>" for one of my
> worker nodes. In particular, I'm interested in the "Capacity" and
> "Allocatable" sections. In the Capacity section, it says that this is 2
> CPUs. When I first noticed this, I had defined the workers using c5-xlarge
> machines - which have 4 vpcus. I thought that maybe OpenShift itself is
> reserving 2 CPUs for itself. But I then rebuilt the cluster with c5-2xlarge
> machines, and the output you see, below, is from that. It shows 2 CPUs as
> well. So this seems like OpenShift isn't recognizing the additional
> hardware. How do I fix this?
>
> name:               ip-10-0-156-206.us-west-1.compute.internal
> Roles:              worker
> Labels:             beta.kubernetes.io/arch=amd64
>                     beta.kubernetes.io/instance-type=m4.large
>                     beta.kubernetes.io/os=linux
>                     failure-domain.beta.kubernetes.io/region=us-west-1
>                     failure-domain.beta.kubernetes.io/zone=us-west-1b
>                     kubernetes.io/arch=amd64
>                     kubernetes.io/hostname=ip-10-0-156-206
>                     kubernetes.io/os=linux
>                     node-role.kubernetes.io/worker=
>                     node.openshift.io/os_id=rhcos
> Annotations:        machine.openshift.io/machine:
> openshift-machine-api/two-4z45k-worker-us-west-1b-t4dln
>                     machineconfiguration.openshift.io/currentConfig:
> rendered-worker-f4169460716c78be83ccb2609dd91fc3
>                     machineconfiguration.openshift.io/desiredConfig:
> rendered-worker-f4169460716c78be83ccb2609dd91fc3
>                     machineconfiguration.openshift.io/state: Done
>                     volumes.kubernetes.io/controller-managed-attach-detach:
> true
> CreationTimestamp:  Thu, 10 Oct 2019 07:30:18 -0400
> Taints:             <none>
> Unschedulable:      false
> Conditions:
>   Type             Status  LastHeartbeatTime
> LastTransitionTime                Reason                       Message
>   ----             ------  -----------------
> ------------------                ------                       -------
>   MemoryPressure   False   Thu, 10 Oct 2019 07:46:01 -0400   Thu, 10 Oct
> 2019 07:30:18 -0400   KubeletHasSufficientMemory   kubelet has sufficient
> memory available
>   DiskPressure     False   Thu, 10 Oct 2019 07:46:01 -0400   Thu, 10 Oct
> 2019 07:30:18 -0400   KubeletHasNoDiskPressure     kubelet has no disk
> pressure
>   PIDPressure      False   Thu, 10 Oct 2019 07:46:01 -0400   Thu, 10 Oct
> 2019 07:30:18 -0400   KubeletHasSufficientPID      kubelet has sufficient
> PID available
>   Ready            True    Thu, 10 Oct 2019 07:46:01 -0400   Thu, 10 Oct
> 2019 07:31:19 -0400   KubeletReady                 kubelet is posting ready
> status
> Addresses:
>   InternalIP:   10.0.156.206
>   Hostname:     ip-10-0-156-206.us-west-1.compute.internal
>   InternalDNS:  ip-10-0-156-206.us-west-1.compute.internal
> Capacity:
>  attachable-volumes-aws-ebs:  39
>  cpu:                         2
>  hugepages-1Gi:               0
>  hugepages-2Mi:               0
>  memory:                      8162888Ki
>  pods:                        250
> Allocatable:
>  attachable-volumes-aws-ebs:  39
>  cpu:                         1500m
>  hugepages-1Gi:               0
>  hugepages-2Mi:               0
>  memory:                      7548488Ki
>  pods:                        250
> System Info:
>  Machine ID:                              23efe37e2b244bd788bb8575cd340bfd
>  System UUID:
> ec25c230-d02c-99cf-0540-bad276c8cc73
>  Boot ID:
> 1f3a064f-a24e-4bdf-b6b0-c7fd3019757e
>  Kernel Version:                          4.18.0-80.11.2.el8_0.x86_64
>  OS Image:                                Red Hat Enterprise Linux CoreOS
> 42.80.20191001.0 (Ootpa)
>  Operating System:                        linux
>  Architecture:                            amd64
>  Container Runtime Version:
> cri-o://1.14.10-0.21.dev.rhaos4.2.git0d4a906.el8
>  Kubelet Version:                         v1.14.6+d3a139f63
>  Kube-Proxy Version:                      v1.14.6+d3a139f63
> ProviderID:
> aws:///us-west-1b/i-07e495331f3f25ac0
> Non-terminated Pods:                      (18 in total)
>   Namespace                               Name
>            CPU Requests  CPU Limits  Memory Requests  Memory Limits  AGE
>   ---------                               ----
>            ------------  ----------  ---------------  -------------  ---
>   openshift-cluster-node-tuning-operator  tuned-9qnjd
>             10m (0%)      0 (0%)      50Mi (0%)        0 (0%)         16m
>   openshift-dns                           dns-default-skgfs
>             110m (7%)     0 (0%)      70Mi (0%)        512Mi (6%)     16m
>   openshift-image-registry                image-registry-584f455476-9q7b8
>             100m (6%)     0 (0%)      256Mi (3%)       0 (0%)         16m
>   openshift-image-registry                node-ca-74h7h
>             10m (0%)      0 (0%)      10Mi (0%)        0 (0%)         15m
>   openshift-ingress                       router-default-85b6848bdf-679xf
>             100m (6%)     0 (0%)      256Mi (3%)       0 (0%)         16m
>   openshift-machine-config-operator       machine-config-daemon-6ht2m
>             20m (1%)      0 (0%)      50Mi (0%)        0 (0%)         15m
>   openshift-marketplace
> certified-operators-5cb88dd798-lcfvc        10m (0%)      0 (0%)      100Mi
> (1%)       0 (0%)         17m
>   openshift-marketplace
> community-operators-7f8987f496-8rgzq        10m (0%)      0 (0%)      100Mi
> (1%)       0 (0%)         17m
>   openshift-marketplace                   redhat-operators-cd495bc4f-fcm5t
>            10m (0%)      0 (0%)      100Mi (1%)       0 (0%)         17m
>   openshift-monitoring                    alertmanager-main-1
>             100m (6%)     100m (6%)   225Mi (3%)       25Mi (0%)      12m
>   openshift-monitoring
>  kube-state-metrics-6b66989cb7-nlqbm         30m (2%)      0 (0%)
>  120Mi (1%)       0 (0%)         17m
>   openshift-monitoring                    node-exporter-bhrhz
>             10m (0%)      0 (0%)      20Mi (0%)        0 (0%)         16m
>   openshift-monitoring
>  openshift-state-metrics-6bf647b484-sfgjs    120m (8%)     0 (0%)
>  190Mi (2%)       0 (0%)         17m
>   openshift-monitoring
>  prometheus-adapter-66d6b69459-bcq5p         10m (0%)      0 (0%)      20Mi
> (0%)        0 (0%)         13m
>   openshift-monitoring                    prometheus-k8s-1
>            430m (28%)    200m (13%)  1134Mi (15%)     50Mi (0%)      14m
>   openshift-multus                        multus-jlmwx
>            10m (0%)      0 (0%)      150Mi (2%)       0 (0%)         16m
>   openshift-sdn                           ovs-hdml2
>             200m (13%)    0 (0%)      400Mi (5%)       0 (0%)         16m
>   openshift-sdn                           sdn-vwh22
>             100m (6%)     0 (0%)      200Mi (2%)       0 (0%)         16m
> Allocated resources:
>   (Total limits may be over 100 percent, i.e., overcommitted.)
>   Resource                    Requests      Limits
>   --------                    --------      ------
>   cpu                         1390m (92%)   300m (20%)
>   memory                      3451Mi (46%)  587Mi (7%)
>   ephemeral-storage           0 (0%)        0 (0%)
>   attachable-volumes-aws-ebs  0             0
> Events:                       <none>
>
>
>
> On Wed, Oct 9, 2019 at 8:30 AM Samuel Martín Moro <faus...@gmail.com>
> wrote:
>
>> You have two master nodes? You'ld rather go with 1 or 3. With 2 masters,
>> your etcd quorum is 2. If you lose a master, the API would be unavailable.
>>
>> Now ... c5-xlarge should be fine.
>> Not sure why your console doesn't show everything (and not familiar with
>> that dashboard yet). as a wild guess, probably some delay collecting
>> metrics. Though you should see something, eventually.
>>
>> If you use "oc describe node <node-name>", you should see the
>> reservations (requests & limits) for that node.
>> Might be able to figure out what's eating up your resources.
>>
>> Depending on what openshift components you're deploying, you may already
>> be using quite a lot.
>> Especially if you don't have infra nodes, and did deploy EFK and/or
>> hawkular/cassandra. Prometheus could use some resources as well.
>> Meanwhile, istio itself can ship with more or less components, ...
>>
>> If using EFK: you may be able to lower resources requests/limits for
>> ElasticSearch
>> If using Hawkular: same remark regarding Cassandra
>> Hard to say, without seeing it. But you can probably free up some
>> resources here and there.
>>
>>
>> Good luck,
>>
>> Regards.
>>
>> On Wed, Oct 9, 2019 at 1:47 PM Just Marvin <
>> marvin.the.cynical.ro...@gmail.com> wrote:
>>
>>> Samuel,
>>>
>>>     So it is CPU. But, I destroyed the cluster, gave it machines with
>>> twice as much memory, retried and got the same problem:
>>>
>>> Events:
>>>   Type     Reason            Age                  From
>>> Message
>>>   ----     ------            ----                 ----
>>> -------
>>>   Warning  FailedScheduling  16s (x4 over 3m12s)  default-scheduler  0/4
>>> nodes are available: 2 Insufficient cpu, 2 node(s) had taints that the pod
>>> didn't tolerate.
>>>
>>>     I'm guessing that the two pods with taints are the two master nodes,
>>> but the other two are c5-xlarge machines. But here is maybe a relevant
>>> observation. As soon as I log into the cluster, I see this on my main
>>> dashboard.
>>>
>>> [image: image.png]
>>>
>>>     Is there perhaps a problem with the CPU resource monitoring that is
>>> causing my problems?
>>>
>>> Regards,
>>> Marvin
>>>
>>> On Sun, Oct 6, 2019 at 3:52 PM Samuel Martín Moro <faus...@gmail.com>
>>> wrote:
>>>
>>>> you can use "oc describe pod <pod-name>", to figure out what's going on
>>>> with your pod.
>>>> could be that you're out of cpu/memory.
>>>>
>>>> Regards.
>>>>
>>>> On Sun, Oct 6, 2019 at 9:27 PM Just Marvin <
>>>> marvin.the.cynical.ro...@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> [zaphod@oc3027208274 ocp4.2-aws]$ oc get pods
>>>>> NAME                              READY   STATUS    RESTARTS   AGE
>>>>> istio-citadel-7cb44f4bb-tccql     1/1     Running   0          9m35s
>>>>> istio-galley-75599dbc67-b4mgx     1/1     Running   0          8m41s
>>>>> istio-policy-56476c984b-c7t8j     0/2     *Pending*   0          8m23s
>>>>> istio-telemetry-d5bbd7d7b-v8kjq   0/2     *Pending*   0          8m24s
>>>>> jaeger-5d9dfdfb67-mv8mp           2/2     Running   0          8m45s
>>>>> prometheus-685bdbdc45-hmb9f       2/2     Running   0          9m17s
>>>>> [zaphod@oc3027208274 ocp4.2-aws]$
>>>>>
>>>>>     The pods in Pending state don't seem to be moving forward. The
>>>>> operator logs aren't showing anything informative about why this might be.
>>>>> Is this normal, or if there is a problem, and if so, how would I figure 
>>>>> out
>>>>> the cause?
>>>>>
>>>>> Regards,
>>>>> Marvin
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users@lists.openshift.redhat.com
>>>>> http://lists.openshift.redhat.com/openshiftmm/listinfo/users
>>>>>
>>>>
>>>>
>>>> --
>>>> Samuel Martín Moro
>>>> {EPITECH.} 2011
>>>>
>>>> "Nobody wants to say how this works.
>>>>  Maybe nobody knows ..."
>>>>                       Xorg.conf(5)
>>>>
>>>
>>
>> --
>> Samuel Martín Moro
>> {EPITECH.} 2011
>>
>> "Nobody wants to say how this works.
>>  Maybe nobody knows ..."
>>                       Xorg.conf(5)
>>
>

-- 
Samuel Martín Moro
{EPITECH.} 2011

"Nobody wants to say how this works.
 Maybe nobody knows ..."
                      Xorg.conf(5)
_______________________________________________
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users

Reply via email to