Re: [ceph-users] ceph-volume lvm create leaves half-built OSDs lying around [EXT]
On 11/09/2019 12:18, Alfredo Deza wrote: > On Wed, Sep 11, 2019 at 6:18 AM Matthew Vernon wrote: >> or >> ii) allow the bootstrap-osd credential to purge OSDs > > I wasn't aware that the bootstrap-osd credentials allowed to > purge/destroy OSDs, are you sure this is possible? If it is I think > that would be reasonable to try. Sorry, that was my point - currently, the bootstrap-osd credential iasn't allowed to purge/destroy OSDs, but we could decide that the correct fix is to change that so it can. I'm not convinced that's a good idea, though! Regards, Matthew signature.asc Description: OpenPGP digital signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph-volume lvm create leaves half-built OSDs lying around
Den ons 11 sep. 2019 kl 12:18 skrev Matthew Vernon : > We keep finding part-made OSDs (they appear not attached to any host, > and down and out; but still counting towards the number of OSDs); we > never saw this with ceph-disk. On investigation, this is because > ceph-volume lvm create makes the OSD (ID and auth at least) too early in > the process and is then unable to roll-back cleanly (because the > bootstrap-osd credential isn't allowed to remove OSDs). > > ---8< > This is annoying to have to clear up, and it seems to me could be > avoided by either: > > i) ceph-volume should (attempt to) set up the LVM volumes &c before > making the new OSD id > or > ii) allow the bootstrap-osd credential to purge OSDs > > i) seems like clearly the better answer...? > This happens to me too at times. Even a simple iii) "Run 'ceph osd purge XYZ' printout for my cut-n-paste-convenience would be an improvement over the current situation, though it might be better having some kind of state that tells the cluster if an OSD has run for even the slightest of time, and if not - allow the bootstrap-osd key to delete a never-really-seen OSD id from all relevant places it might appear in when the disk setup fails you. -- May the most significant bit of your life be positive. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph-volume lvm create leaves half-built OSDs lying around
On Wed, Sep 11, 2019 at 6:18 AM Matthew Vernon wrote: > > Hi, > > We keep finding part-made OSDs (they appear not attached to any host, > and down and out; but still counting towards the number of OSDs); we > never saw this with ceph-disk. On investigation, this is because > ceph-volume lvm create makes the OSD (ID and auth at least) too early in > the process and is then unable to roll-back cleanly (because the > bootstrap-osd credential isn't allowed to remove OSDs). > > As an example (very truncated): > > Running command: /usr/bin/ceph --cluster ceph --name > client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring > -i - osd new 20cea174-4c1b-4330-ad33-505a03156c33 > Running command: vgcreate --force --yes > ceph-9d66ec60-c71b-49e0-8c1a-e74e98eafb0e /dev/sdbh > stderr: Device /dev/sdbh not found (or ignored by filtering). > Unable to add physical volume '/dev/sdbh' to volume group > 'ceph-9d66ec60-c71b-49e0-8c1a-e74e98eafb0e'. > --> Was unable to complete a new OSD, will rollback changes > --> OSD will be fully purged from the cluster, because the ID was generated > Running command: ceph osd purge osd.828 --yes-i-really-mean-it > stderr: 2019-09-10 15:07:53.396528 7fbca2caf700 -1 auth: unable to find > a keyring on > /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,: > (2) No such file or directory > stderr: 2019-09-10 15:07:53.397318 7fbca2caf700 -1 monclient: > authenticate NOTE: no keyring found; disabled cephx authentication > 2019-09-10 15:07:53.397334 7fbca2caf700 0 librados: client.admin > authentication error (95) Operation not supported > Ah this is tricky to solve for every case... ceph-volume is doing a best-effort here > This is annoying to have to clear up, and it seems to me could be > avoided by either: > > i) ceph-volume should (attempt to) set up the LVM volumes &c before > making the new OSD id That would've helped in your particular case where the failure is observed when trying to create the LV. When the failure is on the Ceph side... the problem is similar. > or > ii) allow the bootstrap-osd credential to purge OSDs I wasn't aware that the bootstrap-osd credentials allowed to purge/destroy OSDs, are you sure this is possible? If it is I think that would be reasonable to try. > > i) seems like clearly the better answer...? > > Regards, > > Matthew > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] ceph-volume lvm create leaves half-built OSDs lying around
Hi, We keep finding part-made OSDs (they appear not attached to any host, and down and out; but still counting towards the number of OSDs); we never saw this with ceph-disk. On investigation, this is because ceph-volume lvm create makes the OSD (ID and auth at least) too early in the process and is then unable to roll-back cleanly (because the bootstrap-osd credential isn't allowed to remove OSDs). As an example (very truncated): Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 20cea174-4c1b-4330-ad33-505a03156c33 Running command: vgcreate --force --yes ceph-9d66ec60-c71b-49e0-8c1a-e74e98eafb0e /dev/sdbh stderr: Device /dev/sdbh not found (or ignored by filtering). Unable to add physical volume '/dev/sdbh' to volume group 'ceph-9d66ec60-c71b-49e0-8c1a-e74e98eafb0e'. --> Was unable to complete a new OSD, will rollback changes --> OSD will be fully purged from the cluster, because the ID was generated Running command: ceph osd purge osd.828 --yes-i-really-mean-it stderr: 2019-09-10 15:07:53.396528 7fbca2caf700 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,: (2) No such file or directory stderr: 2019-09-10 15:07:53.397318 7fbca2caf700 -1 monclient: authenticate NOTE: no keyring found; disabled cephx authentication 2019-09-10 15:07:53.397334 7fbca2caf700 0 librados: client.admin authentication error (95) Operation not supported This is annoying to have to clear up, and it seems to me could be avoided by either: i) ceph-volume should (attempt to) set up the LVM volumes &c before making the new OSD id or ii) allow the bootstrap-osd credential to purge OSDs i) seems like clearly the better answer...? Regards, Matthew signature.asc Description: OpenPGP digital signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com