Re: [ceph-users] ceph-volume lvm create leaves half-built OSDs lying around [EXT]

2019-09-11 Thread Matthew Vernon
On 11/09/2019 12:18, Alfredo Deza wrote:
> On Wed, Sep 11, 2019 at 6:18 AM Matthew Vernon  wrote:

>> or
>> ii) allow the bootstrap-osd credential to purge OSDs
> 
> I wasn't aware that the bootstrap-osd credentials allowed to
> purge/destroy OSDs, are you sure this is possible? If it is I think
> that would be reasonable to try.

Sorry, that was my point - currently, the bootstrap-osd credential
iasn't allowed to purge/destroy OSDs, but we could decide that the
correct fix is to change that so it can. I'm not convinced that's a good
idea, though!

Regards,

Matthew



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-volume lvm create leaves half-built OSDs lying around

2019-09-11 Thread Janne Johansson
Den ons 11 sep. 2019 kl 12:18 skrev Matthew Vernon :

> We keep finding part-made OSDs (they appear not attached to any host,
> and down and out; but still counting towards the number of OSDs); we
> never saw this with ceph-disk. On investigation, this is because
> ceph-volume lvm create makes the OSD (ID and auth at least) too early in
> the process and is then unable to roll-back cleanly (because the
> bootstrap-osd credential isn't allowed to remove OSDs).
>
>
---8<


> This is annoying to have to clear up, and it seems to me could be
> avoided by either:
>
> i) ceph-volume should (attempt to) set up the LVM volumes &c before
> making the new OSD id
> or
> ii) allow the bootstrap-osd credential to purge OSDs
>
> i) seems like clearly the better answer...?
>

This happens to me too at times. Even a simple
iii)  "Run 'ceph osd purge XYZ'
printout for my cut-n-paste-convenience would be an improvement over the
current situation, though it might be better having some kind of state that
tells the cluster if an OSD has run for even the slightest of time, and if
not - allow the bootstrap-osd key to delete a never-really-seen OSD id from
all relevant places it might appear in when the disk setup fails you.

-- 
May the most significant bit of your life be positive.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-volume lvm create leaves half-built OSDs lying around

2019-09-11 Thread Alfredo Deza
On Wed, Sep 11, 2019 at 6:18 AM Matthew Vernon  wrote:
>
> Hi,
>
> We keep finding part-made OSDs (they appear not attached to any host,
> and down and out; but still counting towards the number of OSDs); we
> never saw this with ceph-disk. On investigation, this is because
> ceph-volume lvm create makes the OSD (ID and auth at least) too early in
> the process and is then unable to roll-back cleanly (because the
> bootstrap-osd credential isn't allowed to remove OSDs).
>
> As an example (very truncated):
>
> Running command: /usr/bin/ceph --cluster ceph --name
> client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring
> -i - osd new 20cea174-4c1b-4330-ad33-505a03156c33
> Running command: vgcreate --force --yes
> ceph-9d66ec60-c71b-49e0-8c1a-e74e98eafb0e /dev/sdbh
>  stderr: Device /dev/sdbh not found (or ignored by filtering).
>   Unable to add physical volume '/dev/sdbh' to volume group
> 'ceph-9d66ec60-c71b-49e0-8c1a-e74e98eafb0e'.
> --> Was unable to complete a new OSD, will rollback changes
> --> OSD will be fully purged from the cluster, because the ID was generated
> Running command: ceph osd purge osd.828 --yes-i-really-mean-it
>  stderr: 2019-09-10 15:07:53.396528 7fbca2caf700 -1 auth: unable to find
> a keyring on
> /etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,:
> (2) No such file or directory
>  stderr: 2019-09-10 15:07:53.397318 7fbca2caf700 -1 monclient:
> authenticate NOTE: no keyring found; disabled cephx authentication
> 2019-09-10 15:07:53.397334 7fbca2caf700  0 librados: client.admin
> authentication error (95) Operation not supported
>
Ah this is tricky to solve for every case... ceph-volume is doing a
best-effort here

> This is annoying to have to clear up, and it seems to me could be
> avoided by either:
>
> i) ceph-volume should (attempt to) set up the LVM volumes &c before
> making the new OSD id

That would've helped in your particular case where the failure is
observed when trying to create the LV. When the failure is on the Ceph
side... the problem is
similar.

> or
> ii) allow the bootstrap-osd credential to purge OSDs

I wasn't aware that the bootstrap-osd credentials allowed to
purge/destroy OSDs, are you sure this is possible? If it is I think
that would be reasonable to try.

>
> i) seems like clearly the better answer...?
>
> Regards,
>
> Matthew
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph-volume lvm create leaves half-built OSDs lying around

2019-09-11 Thread Matthew Vernon
Hi,

We keep finding part-made OSDs (they appear not attached to any host,
and down and out; but still counting towards the number of OSDs); we
never saw this with ceph-disk. On investigation, this is because
ceph-volume lvm create makes the OSD (ID and auth at least) too early in
the process and is then unable to roll-back cleanly (because the
bootstrap-osd credential isn't allowed to remove OSDs).

As an example (very truncated):

Running command: /usr/bin/ceph --cluster ceph --name
client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring
-i - osd new 20cea174-4c1b-4330-ad33-505a03156c33
Running command: vgcreate --force --yes
ceph-9d66ec60-c71b-49e0-8c1a-e74e98eafb0e /dev/sdbh
 stderr: Device /dev/sdbh not found (or ignored by filtering).
  Unable to add physical volume '/dev/sdbh' to volume group
'ceph-9d66ec60-c71b-49e0-8c1a-e74e98eafb0e'.
--> Was unable to complete a new OSD, will rollback changes
--> OSD will be fully purged from the cluster, because the ID was generated
Running command: ceph osd purge osd.828 --yes-i-really-mean-it
 stderr: 2019-09-10 15:07:53.396528 7fbca2caf700 -1 auth: unable to find
a keyring on
/etc/ceph/ceph.client.admin.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,:
(2) No such file or directory
 stderr: 2019-09-10 15:07:53.397318 7fbca2caf700 -1 monclient:
authenticate NOTE: no keyring found; disabled cephx authentication
2019-09-10 15:07:53.397334 7fbca2caf700  0 librados: client.admin
authentication error (95) Operation not supported

This is annoying to have to clear up, and it seems to me could be
avoided by either:

i) ceph-volume should (attempt to) set up the LVM volumes &c before
making the new OSD id
or
ii) allow the bootstrap-osd credential to purge OSDs

i) seems like clearly the better answer...?

Regards,

Matthew



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com