Samuel,

    Well that was it. I had screwed up the machine type definitions in the
install-config.yaml (I thought I was specifying the higher capacities for
the worker, but I made the changes in the master section). Duh.

    Retrying....

Regards,
Marvin

On Thu, Oct 10, 2019 at 8:08 AM Samuel Martín Moro <faus...@gmail.com>
wrote:

> Hi,
>
> Looking at that node labels, we can see the instance type being used is
> m4.large, and not c5.xlarge.
> So it would be normal to only see 2 CPUs instead of 4.
> How did you change your instances sizes? Editing your install-config.yaml?
> Can you show us? (you can remove any sensitive data, like pullSecrets)
>
> Regards.
>
> On Thu, Oct 10, 2019 at 1:53 PM Just Marvin <
> marvin.the.cynical.ro...@gmail.com> wrote:
>
>> Samuel,
>>
>>     See below, the output from "oc describe node <node id>" for one of my
>> worker nodes. In particular, I'm interested in the "Capacity" and
>> "Allocatable" sections. In the Capacity section, it says that this is 2
>> CPUs. When I first noticed this, I had defined the workers using c5-xlarge
>> machines - which have 4 vpcus. I thought that maybe OpenShift itself is
>> reserving 2 CPUs for itself. But I then rebuilt the cluster with c5-2xlarge
>> machines, and the output you see, below, is from that. It shows 2 CPUs as
>> well. So this seems like OpenShift isn't recognizing the additional
>> hardware. How do I fix this?
>>
>> name:               ip-10-0-156-206.us-west-1.compute.internal
>> Roles:              worker
>> Labels:             beta.kubernetes.io/arch=amd64
>>                     beta.kubernetes.io/instance-type=m4.large
>>                     beta.kubernetes.io/os=linux
>>                     failure-domain.beta.kubernetes.io/region=us-west-1
>>                     failure-domain.beta.kubernetes.io/zone=us-west-1b
>>                     kubernetes.io/arch=amd64
>>                     kubernetes.io/hostname=ip-10-0-156-206
>>                     kubernetes.io/os=linux
>>                     node-role.kubernetes.io/worker=
>>                     node.openshift.io/os_id=rhcos
>> Annotations:        machine.openshift.io/machine:
>> openshift-machine-api/two-4z45k-worker-us-west-1b-t4dln
>>                     machineconfiguration.openshift.io/currentConfig:
>> rendered-worker-f4169460716c78be83ccb2609dd91fc3
>>                     machineconfiguration.openshift.io/desiredConfig:
>> rendered-worker-f4169460716c78be83ccb2609dd91fc3
>>                     machineconfiguration.openshift.io/state: Done
>>
>> volumes.kubernetes.io/controller-managed-attach-detach: true
>> CreationTimestamp:  Thu, 10 Oct 2019 07:30:18 -0400
>> Taints:             <none>
>> Unschedulable:      false
>> Conditions:
>>   Type             Status  LastHeartbeatTime
>> LastTransitionTime                Reason                       Message
>>   ----             ------  -----------------
>> ------------------                ------                       -------
>>   MemoryPressure   False   Thu, 10 Oct 2019 07:46:01 -0400   Thu, 10 Oct
>> 2019 07:30:18 -0400   KubeletHasSufficientMemory   kubelet has sufficient
>> memory available
>>   DiskPressure     False   Thu, 10 Oct 2019 07:46:01 -0400   Thu, 10 Oct
>> 2019 07:30:18 -0400   KubeletHasNoDiskPressure     kubelet has no disk
>> pressure
>>   PIDPressure      False   Thu, 10 Oct 2019 07:46:01 -0400   Thu, 10 Oct
>> 2019 07:30:18 -0400   KubeletHasSufficientPID      kubelet has sufficient
>> PID available
>>   Ready            True    Thu, 10 Oct 2019 07:46:01 -0400   Thu, 10 Oct
>> 2019 07:31:19 -0400   KubeletReady                 kubelet is posting ready
>> status
>> Addresses:
>>   InternalIP:   10.0.156.206
>>   Hostname:     ip-10-0-156-206.us-west-1.compute.internal
>>   InternalDNS:  ip-10-0-156-206.us-west-1.compute.internal
>> Capacity:
>>  attachable-volumes-aws-ebs:  39
>>  cpu:                         2
>>  hugepages-1Gi:               0
>>  hugepages-2Mi:               0
>>  memory:                      8162888Ki
>>  pods:                        250
>> Allocatable:
>>  attachable-volumes-aws-ebs:  39
>>  cpu:                         1500m
>>  hugepages-1Gi:               0
>>  hugepages-2Mi:               0
>>  memory:                      7548488Ki
>>  pods:                        250
>> System Info:
>>  Machine ID:                              23efe37e2b244bd788bb8575cd340bfd
>>  System UUID:
>> ec25c230-d02c-99cf-0540-bad276c8cc73
>>  Boot ID:
>> 1f3a064f-a24e-4bdf-b6b0-c7fd3019757e
>>  Kernel Version:                          4.18.0-80.11.2.el8_0.x86_64
>>  OS Image:                                Red Hat Enterprise Linux CoreOS
>> 42.80.20191001.0 (Ootpa)
>>  Operating System:                        linux
>>  Architecture:                            amd64
>>  Container Runtime Version:
>> cri-o://1.14.10-0.21.dev.rhaos4.2.git0d4a906.el8
>>  Kubelet Version:                         v1.14.6+d3a139f63
>>  Kube-Proxy Version:                      v1.14.6+d3a139f63
>> ProviderID:
>> aws:///us-west-1b/i-07e495331f3f25ac0
>> Non-terminated Pods:                      (18 in total)
>>   Namespace                               Name
>>              CPU Requests  CPU Limits  Memory Requests  Memory Limits  AGE
>>   ---------                               ----
>>              ------------  ----------  ---------------  -------------  ---
>>   openshift-cluster-node-tuning-operator  tuned-9qnjd
>>             10m (0%)      0 (0%)      50Mi (0%)        0 (0%)         16m
>>   openshift-dns                           dns-default-skgfs
>>             110m (7%)     0 (0%)      70Mi (0%)        512Mi (6%)     16m
>>   openshift-image-registry                image-registry-584f455476-9q7b8
>>             100m (6%)     0 (0%)      256Mi (3%)       0 (0%)         16m
>>   openshift-image-registry                node-ca-74h7h
>>             10m (0%)      0 (0%)      10Mi (0%)        0 (0%)         15m
>>   openshift-ingress                       router-default-85b6848bdf-679xf
>>             100m (6%)     0 (0%)      256Mi (3%)       0 (0%)         16m
>>   openshift-machine-config-operator       machine-config-daemon-6ht2m
>>             20m (1%)      0 (0%)      50Mi (0%)        0 (0%)         15m
>>   openshift-marketplace
>> certified-operators-5cb88dd798-lcfvc        10m (0%)      0 (0%)      100Mi
>> (1%)       0 (0%)         17m
>>   openshift-marketplace
>> community-operators-7f8987f496-8rgzq        10m (0%)      0 (0%)      100Mi
>> (1%)       0 (0%)         17m
>>   openshift-marketplace
>> redhat-operators-cd495bc4f-fcm5t            10m (0%)      0 (0%)      100Mi
>> (1%)       0 (0%)         17m
>>   openshift-monitoring                    alertmanager-main-1
>>             100m (6%)     100m (6%)   225Mi (3%)       25Mi (0%)      12m
>>   openshift-monitoring
>>  kube-state-metrics-6b66989cb7-nlqbm         30m (2%)      0 (0%)
>>  120Mi (1%)       0 (0%)         17m
>>   openshift-monitoring                    node-exporter-bhrhz
>>             10m (0%)      0 (0%)      20Mi (0%)        0 (0%)         16m
>>   openshift-monitoring
>>  openshift-state-metrics-6bf647b484-sfgjs    120m (8%)     0 (0%)
>>  190Mi (2%)       0 (0%)         17m
>>   openshift-monitoring
>>  prometheus-adapter-66d6b69459-bcq5p         10m (0%)      0 (0%)      20Mi
>> (0%)        0 (0%)         13m
>>   openshift-monitoring                    prometheus-k8s-1
>>              430m (28%)    200m (13%)  1134Mi (15%)     50Mi (0%)      14m
>>   openshift-multus                        multus-jlmwx
>>              10m (0%)      0 (0%)      150Mi (2%)       0 (0%)         16m
>>   openshift-sdn                           ovs-hdml2
>>             200m (13%)    0 (0%)      400Mi (5%)       0 (0%)         16m
>>   openshift-sdn                           sdn-vwh22
>>             100m (6%)     0 (0%)      200Mi (2%)       0 (0%)         16m
>> Allocated resources:
>>   (Total limits may be over 100 percent, i.e., overcommitted.)
>>   Resource                    Requests      Limits
>>   --------                    --------      ------
>>   cpu                         1390m (92%)   300m (20%)
>>   memory                      3451Mi (46%)  587Mi (7%)
>>   ephemeral-storage           0 (0%)        0 (0%)
>>   attachable-volumes-aws-ebs  0             0
>> Events:                       <none>
>>
>>
>>
>> On Wed, Oct 9, 2019 at 8:30 AM Samuel Martín Moro <faus...@gmail.com>
>> wrote:
>>
>>> You have two master nodes? You'ld rather go with 1 or 3. With 2 masters,
>>> your etcd quorum is 2. If you lose a master, the API would be unavailable.
>>>
>>> Now ... c5-xlarge should be fine.
>>> Not sure why your console doesn't show everything (and not familiar with
>>> that dashboard yet). as a wild guess, probably some delay collecting
>>> metrics. Though you should see something, eventually.
>>>
>>> If you use "oc describe node <node-name>", you should see the
>>> reservations (requests & limits) for that node.
>>> Might be able to figure out what's eating up your resources.
>>>
>>> Depending on what openshift components you're deploying, you may already
>>> be using quite a lot.
>>> Especially if you don't have infra nodes, and did deploy EFK and/or
>>> hawkular/cassandra. Prometheus could use some resources as well.
>>> Meanwhile, istio itself can ship with more or less components, ...
>>>
>>> If using EFK: you may be able to lower resources requests/limits for
>>> ElasticSearch
>>> If using Hawkular: same remark regarding Cassandra
>>> Hard to say, without seeing it. But you can probably free up some
>>> resources here and there.
>>>
>>>
>>> Good luck,
>>>
>>> Regards.
>>>
>>> On Wed, Oct 9, 2019 at 1:47 PM Just Marvin <
>>> marvin.the.cynical.ro...@gmail.com> wrote:
>>>
>>>> Samuel,
>>>>
>>>>     So it is CPU. But, I destroyed the cluster, gave it machines with
>>>> twice as much memory, retried and got the same problem:
>>>>
>>>> Events:
>>>>   Type     Reason            Age                  From
>>>> Message
>>>>   ----     ------            ----                 ----
>>>> -------
>>>>   Warning  FailedScheduling  16s (x4 over 3m12s)  default-scheduler
>>>>  0/4 nodes are available: 2 Insufficient cpu, 2 node(s) had taints that the
>>>> pod didn't tolerate.
>>>>
>>>>     I'm guessing that the two pods with taints are the two master
>>>> nodes, but the other two are c5-xlarge machines. But here is maybe a
>>>> relevant observation. As soon as I log into the cluster, I see this on my
>>>> main dashboard.
>>>>
>>>> [image: image.png]
>>>>
>>>>     Is there perhaps a problem with the CPU resource monitoring that is
>>>> causing my problems?
>>>>
>>>> Regards,
>>>> Marvin
>>>>
>>>> On Sun, Oct 6, 2019 at 3:52 PM Samuel Martín Moro <faus...@gmail.com>
>>>> wrote:
>>>>
>>>>> you can use "oc describe pod <pod-name>", to figure out what's going
>>>>> on with your pod.
>>>>> could be that you're out of cpu/memory.
>>>>>
>>>>> Regards.
>>>>>
>>>>> On Sun, Oct 6, 2019 at 9:27 PM Just Marvin <
>>>>> marvin.the.cynical.ro...@gmail.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> [zaphod@oc3027208274 ocp4.2-aws]$ oc get pods
>>>>>> NAME                              READY   STATUS    RESTARTS   AGE
>>>>>> istio-citadel-7cb44f4bb-tccql     1/1     Running   0          9m35s
>>>>>> istio-galley-75599dbc67-b4mgx     1/1     Running   0          8m41s
>>>>>> istio-policy-56476c984b-c7t8j     0/2     *Pending*   0
>>>>>>  8m23s
>>>>>> istio-telemetry-d5bbd7d7b-v8kjq   0/2     *Pending*   0
>>>>>>  8m24s
>>>>>> jaeger-5d9dfdfb67-mv8mp           2/2     Running   0          8m45s
>>>>>> prometheus-685bdbdc45-hmb9f       2/2     Running   0          9m17s
>>>>>> [zaphod@oc3027208274 ocp4.2-aws]$
>>>>>>
>>>>>>     The pods in Pending state don't seem to be moving forward. The
>>>>>> operator logs aren't showing anything informative about why this might 
>>>>>> be.
>>>>>> Is this normal, or if there is a problem, and if so, how would I figure 
>>>>>> out
>>>>>> the cause?
>>>>>>
>>>>>> Regards,
>>>>>> Marvin
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> users@lists.openshift.redhat.com
>>>>>> http://lists.openshift.redhat.com/openshiftmm/listinfo/users
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Samuel Martín Moro
>>>>> {EPITECH.} 2011
>>>>>
>>>>> "Nobody wants to say how this works.
>>>>>  Maybe nobody knows ..."
>>>>>                       Xorg.conf(5)
>>>>>
>>>>
>>>
>>> --
>>> Samuel Martín Moro
>>> {EPITECH.} 2011
>>>
>>> "Nobody wants to say how this works.
>>>  Maybe nobody knows ..."
>>>                       Xorg.conf(5)
>>>
>>
>
> --
> Samuel Martín Moro
> {EPITECH.} 2011
>
> "Nobody wants to say how this works.
>  Maybe nobody knows ..."
>                       Xorg.conf(5)
>
_______________________________________________
users mailing list
users@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/users

Reply via email to