I am maintaining a kubernetes cluster having hardware of very different 
configurations. So far it only had machines with 96GB of RAM which worked 
well. Today I added 10 more nodes, each with 32GB of RAM. When a large 
scale experiment was deployed to this new cloud configuration, around 20% 
of the requested pod never got scheduled. They kept on hanging at 
`ContainerCreating` state indefinitely. When I describe a kubernetes node 
on which one of these pods is scheduled, I receive the following,

    # kubectl describe no/<kubernetes_node>
    Name:               <kubernetes_node>
    Roles:              <none>
    Labels:             beta.kubernetes.io/arch=amd64
                        beta.kubernetes.io/os=linux
                        kubernetes.io/hostname=<kubernetes_node>
    Annotations:        node.alpha.kubernetes.io/ttl=0
                        
volumes.kubernetes.io/controller-managed-attach-detach=true
    Taints:             <none>
    CreationTimestamp:  Fri, 02 Mar 2018 12:58:47 -0800
    Conditions:
      Type             Status  LastHeartbeatTime                
 LastTransitionTime                Reason                       Message
      ----             ------  -----------------                
 ------------------                ------                       -------
      OutOfDisk        False   Fri, 02 Mar 2018 18:24:33 -0800   Fri, 02 
Mar 2018 12:58:47 -0800   KubeletHasSufficientDisk     kubelet has 
sufficient disk space available
      MemoryPressure   False   Fri, 02 Mar 2018 18:24:33 -0800   Fri, 02 
Mar 2018 12:58:47 -0800   KubeletHasSufficientMemory   kubelet has 
sufficient memory available
      DiskPressure     False   Fri, 02 Mar 2018 18:24:33 -0800   Fri, 02 
Mar 2018 12:58:47 -0800   KubeletHasNoDiskPressure     kubelet has no disk 
pressure
      Ready            True    Fri, 02 Mar 2018 18:24:33 -0800   Fri, 02 
Mar 2018 12:59:17 -0800   KubeletReady                 kubelet is posting 
ready status
    Addresses:
      InternalIP:  192.168.52.105
      Hostname:    <kubernetes_node>
    Capacity:
     cpu:     8
     memory:  32919476Ki
     pods:    110
    Allocatable:
     cpu:     8
     memory:  32817076Ki
     pods:    110
    System Info:
     Machine ID:                 cb97393b0de14b6ebd4f2eabae6d7690
     System UUID:                00000000-BEEF-0706-0000-0000EFBE0E0F
     Boot ID:                    8aca44a0-9ae9-454e-8f9e-50ae6be3665a
     Kernel Version:             4.4.0-21-generic
     OS Image:                   Ubuntu 16.04 LTS
     Operating System:           linux
     Architecture:               amd64
     Container Runtime Version:  docker://1.13.1
     Kubelet Version:            v1.9.3
     Kube-Proxy Version:         v1.9.3
    PodCIDR:                     192.168.67.0/24
    ExternalID:                  <kubernetes_node>
    Non-terminated Pods:         (5 in total)
      Namespace                  Name                  CPU Requests  CPU 
Limits  Memory Requests  Memory Limits
      ---------                  ----                  ------------  
----------  ---------------  -------------
      kube-system                calico-node-qpr9n     250m (3%)     0 
(0%)      0 (0%)           0 (0%)
      kube-system                kube-proxy-hqbbd      0 (0%)        0 
(0%)      0 (0%)           0 (0%)
      <kubernetes_user>          minecrawlers-gqjqv    2 (25%)       2 
(25%)     8Gi (25%)        8Gi (25%)
      <kubernetes_user>          minecrawlers-rb984    2 (25%)       2 
(25%)     8Gi (25%)        8Gi (25%)
      <kubernetes_user>          minecrawlers-sjwzd    2 (25%)       2 
(25%)     8Gi (25%)        8Gi (25%)
    Allocated resources:
      (Total limits may be over 100 percent, i.e., overcommitted.)
      CPU Requests  CPU Limits  Memory Requests  Memory Limits
      ------------  ----------  ---------------  -------------
      6250m (78%)   6 (75%)     24Gi (76%)       24Gi (76%)
    Events:         <none>

None of the three pods ever gets scheduled. Describing the first one shows 
this:

    # kubectl describe pods/minecrawlers-gqjqv
    Name:           minecrawlers-gqjqv
    Namespace:      <kubernetes_user>
    Node:           <kubernetes_node>/192.168.52.105
    Start Time:     Fri, 02 Mar 2018 17:58:31 -0800
    Labels:         app=minecrawl
    Annotations:    <none>
    Status:         Pending
    IP:             
    Controlled By:  ReplicationController/minecrawlers
    Containers:
      minecrawl:
        Container ID:   
        Image:          
git.seclab.cs.ucsb.edu:4567/<kubernetes_user>/minecrawl:v16
        Image ID:       
        Port:           <none>
        State:          Waiting
          Reason:       ContainerCreating
        Ready:          False
        Restart Count:  0
        Limits:
          cpu:     2
          memory:  8Gi
        Requests:
          cpu:        2
          memory:     8Gi
        Environment:  <none>
        Mounts:
          /dev/shm from dshm (rw)
          /var/run/secrets/kubernetes.io/serviceaccount from 
default-token-5q7vh (ro)
    Conditions:
      Type           Status
      Initialized    True 
      Ready          False 
      PodScheduled   True 
    Volumes:
      dshm:
        Type:    EmptyDir (a temporary directory that shares a pod's 
lifetime)
        Medium:  Memory
      default-token-5q7vh:
        Type:        Secret (a volume populated by a Secret)
        SecretName:  default-token-5q7vh
        Optional:    false
    QoS Class:       Guaranteed
    Node-Selectors:  <none>
    Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                     node.kubernetes.io/unreachable:NoExecute for 300s
    Events:
      Type     Reason                  Age                 From            
                    Message
      ----     ------                  ----                ----            
                    -------
      Normal   Scheduled               36m                
 default-scheduler                   Successfully assigned 
minecrawlers-gqjqv to <kubernetes_node>
      Normal   SuccessfulMountVolume   36m                 kubelet, 
<kubernetes_node>  MountVolume.SetUp succeeded for volume "dshm"
      Normal   SuccessfulMountVolume   36m                 kubelet, 
<kubernetes_node>  MountVolume.SetUp succeeded for volume 
"default-token-5q7vh"
      Normal   SandboxChanged          35m (x11 over 36m)  kubelet, 
<kubernetes_node>  Pod sandbox changed, it will be killed and re-created.
      Warning  FailedCreatePodSandBox  1m (x541 over 36m)  kubelet, 
<kubernetes_node>  Failed create pod sandbox.

What can possibly be the problem?

-- 
You received this message because you are subscribed to the Google Groups 
"Kubernetes user discussion and Q&A" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to kubernetes-users+unsubscr...@googlegroups.com.
To post to this group, send email to kubernetes-users@googlegroups.com.
Visit this group at https://groups.google.com/group/kubernetes-users.
For more options, visit https://groups.google.com/d/optout.

Reply via email to