Hi

I'm in a bit of a panic :-(

Recently we started attempting to configure a radosgw to our ceph cluster, which was until now only doing cephfs (and rbd wss working as well). We were messing about with ceph-ansible, as this was how we originally installed the cluster. Anyway, it installed nautilus 14.2.18 on the radosgw and I though it would be good to pull up the rest of the cluster to that level as well using our tried and tested ceph upgrade script (it basically does an update of all ceph nodes one by one and checks whether ceph is ok again before doing the next)

After the 3rd mon/mgr was done, all pg's were unavailable :-(
obviously, the script is not continuing, but ceph is also broken now...

The message deceptively is: HEALTH_WARN Reduced data availability: 5568 pgs inactive

That's all PGs!

I tried as a desperate measure to upgrade one ceph OSD node, but that broke as well, the osd service on that node gets an interrupt from the kernel....

the versions are now like:
20:29 [root@cephmon1 ~]# ceph versions
{
    "mon": {
"ceph version 14.2.18 (befbc92f3c11eedd8626487211d200c0b44786d9) nautilus (stable)": 3
    },
    "mgr": {
"ceph version 14.2.18 (befbc92f3c11eedd8626487211d200c0b44786d9) nautilus (stable)": 3
    },
    "osd": {
"ceph version 14.2.15 (afdd217ae5fb1ed3f60e16bd62357ca58cc650e5) nautilus (stable)": 156
    },
    "mds": {
"ceph version 14.2.15 (afdd217ae5fb1ed3f60e16bd62357ca58cc650e5) nautilus (stable)": 2
    },
    "overall": {
"ceph version 14.2.15 (afdd217ae5fb1ed3f60e16bd62357ca58cc650e5) nautilus (stable)": 158, "ceph version 14.2.18 (befbc92f3c11eedd8626487211d200c0b44786d9) nautilus (stable)": 6
    }
}


12 OSDs are down

# ceph -s
  cluster:
    id:     b489547c-ba50-4745-a914-23eb78e0e5dc
    health: HEALTH_WARN
            Reduced data availability: 5568 pgs inactive

  services:
    mon: 3 daemons, quorum cephmon3,cephmon1,cephmon2 (age 50m)
    mgr: cephmon1(active, since 53m), standbys: cephmon3, cephmon2
    mds: cephfs:1 {0=cephmds2=up:active} 1 up:standby
osd: 168 osds: 156 up (since 28m), 156 in (since 18m); 1722 remapped pgs

  data:
    pools:   12 pools, 5568 pgs
    objects: 0 objects, 0 B
    usage:   0 B used, 0 B / 0 B avail
    pgs:     100.000% pgs unknown
             5568 unknown

  progress:
    Rebalancing after osd.103 marked in
      [..............................]


_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to