Hi Alwin, yes, all OSDs, which did not start, were on same physical clusternode and all running VMs on cluster were dead because of missing objects.
Problem was that those OSDs did not have an entry in "ceph auth list", so manually adding the OSDs (ceph auth add osd.X osd 'allow *' mon 'allow profile osd' mgr 'allow profile osd' -i /var/lib/ceph/osd/ceph- X/keyring) solved the problem. After that starting systemd-service for each OSD war successful. Until now I did not find anything related in logfiles on clusternode. Any hint to demystifying the cluster behavior is welcome... -- Thomas Naumann Abteilung Netze und Kommunikation Otto-von-Guericke Universität Magdeburg Universitätsrechenzentrum Universitätsplatz 2 39106 Magdeburg fon: +49 391 67-58563 email: thomas.naum...@ovgu.de On Mon, 2020-06-29 at 10:36 +0200, Alwin Antreich wrote: > Hello Thomas, > > On Fri, Jun 26, 2020 at 07:51:57AM +0000, Naumann, Thomas wrote: > > Hi, > > > > in our production cluster (proxmox 5.4, ceph 12.2) there is an > > issue > > since yesterday. after an increase of a pool 5 OSDs do not start, > > status is "down/in", ceph health: HEALTH_WARN nodown,noout flag(s) > > set, > > 5 osds down, 128 osds: 123 up, 128 in. > > > > last lines of OSD-logfile: > > 2020-06-26 08:40:26.240005 7f6d245fff80 1 freelist init > > 2020-06-26 08:40:26.243779 7f6d245fff80 1 > > bluestore(/var/lib/ceph/osd/ceph-45) _open_alloc opening allocation > > metadata > > 2020-06-26 08:40:26.251501 7f6d245fff80 1 > > bluestore(/var/lib/ceph/osd/ceph-45) _open_alloc loaded 3.47TiB in > > 1 > > extents > > 2020-06-26 08:40:26.253058 7f6d245fff80 0 <cls> > > /mnt/big/pve/ceph/ceph-12.2.13/src/cls/cephfs/cls_cephfs.cc:197: > > loading cephfs > > 2020-06-26 08:40:26.253309 7f6d245fff80 0 _get_class not permitted > > to > > load sdk > > 2020-06-26 08:40:26.256486 7f6d245fff80 0 _get_class not permitted > > to > > load kvs > > 2020-06-26 08:40:26.256611 7f6d245fff80 0 <cls> > > /mnt/big/pve/ceph/ceph-12.2.13/src/cls/hello/cls_hello.cc:296: > > loading > > cls_hello > > 2020-06-26 08:40:26.258362 7f6d245fff80 0 _get_class not permitted > > to > > load lua > > 2020-06-26 08:40:26.259850 7f6d245fff80 0 osd.45 46770 crush map > > has > > features 288514051259236352, adjusting msgr requires for clients > > 2020-06-26 08:40:26.259859 7f6d245fff80 0 osd.45 46770 crush map > > has > > features 288514051259236352 was 8705, adjusting msgr requires for > > mons > > 2020-06-26 08:40:26.259863 7f6d245fff80 0 osd.45 46770 crush map > > has > > features 1009089991638532096, adjusting msgr requires for osds > > 2020-06-26 08:40:26.305880 7f6d245fff80 0 osd.45 46770 load_pgs > > 2020-06-26 08:40:28.024638 7f6d245fff80 0 osd.45 46770 load_pgs > > opened > > 129 pgs > > 2020-06-26 08:40:28.024803 7f6d245fff80 0 osd.45 46770 using > > weightedpriority op queue with priority op cut off at 64. > > 2020-06-26 08:40:28.025741 7f6d245fff80 -1 osd.45 46770 > > log_to_monitors > > {default=true} > > 2020-06-26 08:40:28.028397 7f6d245fff80 -1 osd.45 46770 init > > authentication failed: (1) Operation not permitted > > > > Does anyone know how to fix this? > Are does OSDs on the same host? What is the current status of the > cluster? > > -- > Cheers, > Alwin > > _______________________________________________ > pve-user mailing list > pve-user@pve.proxmox.com > https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user _______________________________________________ pve-user mailing list pve-user@pve.proxmox.com https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user