I am maintaining a kubernetes cluster having hardware of very different configurations. So far it only had machines with 96GB of RAM which worked well. Today I added 10 more nodes, each with 32GB of RAM. When a large scale experiment was deployed to this new cloud configuration, around 20% of the requested pod never got scheduled. They kept on hanging at `ContainerCreating` state indefinitely. When I describe a kubernetes node on which one of these pods is scheduled, I receive the following,
# kubectl describe no/<kubernetes_node> Name: <kubernetes_node> Roles: <none> Labels: beta.kubernetes.io/arch=amd64 beta.kubernetes.io/os=linux kubernetes.io/hostname=<kubernetes_node> Annotations: node.alpha.kubernetes.io/ttl=0 volumes.kubernetes.io/controller-managed-attach-detach=true Taints: <none> CreationTimestamp: Fri, 02 Mar 2018 12:58:47 -0800 Conditions: Type Status LastHeartbeatTime LastTransitionTime Reason Message ---- ------ ----------------- ------------------ ------ ------- OutOfDisk False Fri, 02 Mar 2018 18:24:33 -0800 Fri, 02 Mar 2018 12:58:47 -0800 KubeletHasSufficientDisk kubelet has sufficient disk space available MemoryPressure False Fri, 02 Mar 2018 18:24:33 -0800 Fri, 02 Mar 2018 12:58:47 -0800 KubeletHasSufficientMemory kubelet has sufficient memory available DiskPressure False Fri, 02 Mar 2018 18:24:33 -0800 Fri, 02 Mar 2018 12:58:47 -0800 KubeletHasNoDiskPressure kubelet has no disk pressure Ready True Fri, 02 Mar 2018 18:24:33 -0800 Fri, 02 Mar 2018 12:59:17 -0800 KubeletReady kubelet is posting ready status Addresses: InternalIP: 192.168.52.105 Hostname: <kubernetes_node> Capacity: cpu: 8 memory: 32919476Ki pods: 110 Allocatable: cpu: 8 memory: 32817076Ki pods: 110 System Info: Machine ID: cb97393b0de14b6ebd4f2eabae6d7690 System UUID: 00000000-BEEF-0706-0000-0000EFBE0E0F Boot ID: 8aca44a0-9ae9-454e-8f9e-50ae6be3665a Kernel Version: 4.4.0-21-generic OS Image: Ubuntu 16.04 LTS Operating System: linux Architecture: amd64 Container Runtime Version: docker://1.13.1 Kubelet Version: v1.9.3 Kube-Proxy Version: v1.9.3 PodCIDR: 192.168.67.0/24 ExternalID: <kubernetes_node> Non-terminated Pods: (5 in total) Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits --------- ---- ------------ ---------- --------------- ------------- kube-system calico-node-qpr9n 250m (3%) 0 (0%) 0 (0%) 0 (0%) kube-system kube-proxy-hqbbd 0 (0%) 0 (0%) 0 (0%) 0 (0%) <kubernetes_user> minecrawlers-gqjqv 2 (25%) 2 (25%) 8Gi (25%) 8Gi (25%) <kubernetes_user> minecrawlers-rb984 2 (25%) 2 (25%) 8Gi (25%) 8Gi (25%) <kubernetes_user> minecrawlers-sjwzd 2 (25%) 2 (25%) 8Gi (25%) 8Gi (25%) Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) CPU Requests CPU Limits Memory Requests Memory Limits ------------ ---------- --------------- ------------- 6250m (78%) 6 (75%) 24Gi (76%) 24Gi (76%) Events: <none> None of the three pods ever gets scheduled. Describing the first one shows this: # kubectl describe pods/minecrawlers-gqjqv Name: minecrawlers-gqjqv Namespace: <kubernetes_user> Node: <kubernetes_node>/192.168.52.105 Start Time: Fri, 02 Mar 2018 17:58:31 -0800 Labels: app=minecrawl Annotations: <none> Status: Pending IP: Controlled By: ReplicationController/minecrawlers Containers: minecrawl: Container ID: Image: git.seclab.cs.ucsb.edu:4567/<kubernetes_user>/minecrawl:v16 Image ID: Port: <none> State: Waiting Reason: ContainerCreating Ready: False Restart Count: 0 Limits: cpu: 2 memory: 8Gi Requests: cpu: 2 memory: 8Gi Environment: <none> Mounts: /dev/shm from dshm (rw) /var/run/secrets/kubernetes.io/serviceaccount from default-token-5q7vh (ro) Conditions: Type Status Initialized True Ready False PodScheduled True Volumes: dshm: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: Memory default-token-5q7vh: Type: Secret (a volume populated by a Secret) SecretName: default-token-5q7vh Optional: false QoS Class: Guaranteed Node-Selectors: <none> Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s node.kubernetes.io/unreachable:NoExecute for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 36m default-scheduler Successfully assigned minecrawlers-gqjqv to <kubernetes_node> Normal SuccessfulMountVolume 36m kubelet, <kubernetes_node> MountVolume.SetUp succeeded for volume "dshm" Normal SuccessfulMountVolume 36m kubelet, <kubernetes_node> MountVolume.SetUp succeeded for volume "default-token-5q7vh" Normal SandboxChanged 35m (x11 over 36m) kubelet, <kubernetes_node> Pod sandbox changed, it will be killed and re-created. Warning FailedCreatePodSandBox 1m (x541 over 36m) kubelet, <kubernetes_node> Failed create pod sandbox. What can possibly be the problem? -- You received this message because you are subscribed to the Google Groups "Kubernetes user discussion and Q&A" group. To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-users+unsubscr...@googlegroups.com. To post to this group, send email to kubernetes-users@googlegroups.com. Visit this group at https://groups.google.com/group/kubernetes-users. For more options, visit https://groups.google.com/d/optout.