I find that I cannot re-add a disk to a Ceph cluster after the OSD on the disk 
is removed. Ceph seems to know about the existence of these disks, but not 
about their "host:dev" information:

```
# ceph device ls
DEVICE                                     HOST:DEV         DAEMONS  WEAR  LIFE 
EXPECTANCY
SAMSUNG_MZ7L37T6_S6KHNE0T100049                                                 
   0%   <-- should be host01:sda, was osd.0
SAMSUNG_MZ7L37T6_S6KHNE0T100050            host01:sdb       osd.1      0%
SAMSUNG_MZ7L37T6_S6KHNE0T100052                                                 
   0%   <-- should be host02:sda
SAMSUNG_MZ7L37T6_S6KHNE0T100053            host01:sde       osd.9      0%
SAMSUNG_MZ7L37T6_S6KHNE0T100061            host01:sdf       osd.11     0%
SAMSUNG_MZ7L37T6_S6KHNE0T100062                                                 
   0%
SAMSUNG_MZ7L37T6_S6KHNE0T100063            host01:sdc       osd.5      0%
SAMSUNG_MZ7L37T6_S6KHNE0T100064            host01:sdg       osd.13     0%
SAMSUNG_MZ7L37T6_S6KHNE0T100065                                                 
   0%
SAMSUNG_MZ7L37T6_S6KHNE0T100066            host01:sdd       osd.7      0%
SAMSUNG_MZ7L37T6_S6KHNE0T100067                                                 
   0%
SAMSUNG_MZ7L37T6_S6KHNE0T100068                                                 
   0%    <-- should be host02:sdb
SAMSUNG_MZ7L37T6_S6KHNE0T100069                                                 
   0%
SAMSUNG_MZ7L37T6_S6KHNE0T100070                                                 
   0%
SAMSUNG_MZ7L37T6_S6KHNE0T100071                                                 
   0%
SAMSUNG_MZ7L37T6_S6KHNE0T100072            host01:sdh       osd.15     0%
SAMSUNG_MZQL27T6HBLA-00B7C_S6CVNG0T321600  host03:nvme4n1   osd.20     0%
... obmitted ...
SAMSUNG_MZQL27T6HBLA-00B7C_S6CVNG0T321608  host03:nvme8n1   osd.22     0%
```

For disk "SAMSUNG_MZ7L37T6_S6KHNE0T100049", the "HOST:DEV" field is empty, 
while I believe it should be "host01:sda", as I have confirmed by running 
`smartctl -i /dev/sda" on host01.

I guess the missing information is the reason that OSDs cannot be created, 
either manually or automatically on these devices. I have tried:
1. `ceph orch daemon add osd host01:/dev/sda` Prints "Created no osd(s) on host 
host01; already created?"
2. `ceph orch apply osd --all-available-devices` adds nothing

I arrived at such situation while testing if draining a host works: I drained 
host02, removed it and added it back via:
```
ceph orch host drain host02
ceph orch host rm host02
ceph orch host add host02 <internal_ip> --labels _admin
```

I am running Ceph 17.2.6 on ubuntu. Ceph was deployed via cephadm. FYI the 
relavent orchestrator spec for osd:
```
# ceph orch ls osd --export
service_type: osd
service_name: osd
unmanaged: true
spec:
  filter_logic: AND
  objectstore: bluestore
---
service_type: osd
service_id: all-available-devices
service_name: osd.all-available-devices
placement:
  host_pattern: '*'
spec:
  data_devices:
    all: true
  filter_logic: AND
  objectstore: bluestore
```


Any thoughts on what maybe wrong here? Is there a way that I can tell Ceph "you 
are wrong about the whereabouts of these disks, forget what you know and fetch 
disk information afresh"?

Any help much appreciated!
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to