Hi all,

We're seeing a spillover issue with Ceph, using 14.2.8:

We originally had 1GB rocks.db partition:

1. ceph health detail
   HEALTH_WARN BlueFS spillover detected on 3 OSD
   BLUEFS_SPILLOVER BlueFS spillover detected on 3 OSD
   osd.3 spilled over 78 MiB metadata from 'db' device (1024 MiB used
   of 1024 MiB) to slow device
   osd.4 spilled over 78 MiB metadata from 'db' device (1024 MiB used
   of 1024 MiB) to slow device
   osd.5 spilled over 84 MiB metadata from 'db' device (1024 MiB used
   of 1024 MiB) to slow device

We have created new 6GiB partitions for rocks.db, copied the original partition, then extended it with "ceph-bluestore-tool bluefs-bdev-expand". Now we get:

1. ceph health detail
   HEALTH_WARN BlueFS spillover detected on 3 OSD
   BLUEFS_SPILLOVER BlueFS spillover detected on 3 OSD
   osd.3 spilled over 5 MiB metadata from 'db' device (555 MiB used of
   6.0 GiB) to slow device
   osd.4 spilled over 5 MiB metadata from 'db' device (552 MiB used of
   6.0 GiB) to slow device
   osd.5 spilled over 5 MiB metadata from 'db' device (561 MiB used of
   6.0 GiB) to slow device

Issuing "ceph daemon osd.X compact" doesn't help, but shows the following transitional state:

1. ceph daemon osd.5 compact {
   "elapsed_time": 5.4560688339999999
   }
2. ceph health detail
   HEALTH_WARN BlueFS spillover detected on 3 OSD
   BLUEFS_SPILLOVER BlueFS spillover detected on 3 OSD
   osd.3 spilled over 5 MiB metadata from 'db' device (556 MiB used of
   6.0 GiB) to slow device
   osd.4 spilled over 5 MiB metadata from 'db' device (552 MiB used of
   6.0 GiB) to slow device
   osd.5 spilled over 5 MiB metadata from 'db' device (1.1 GiB used of
   6.0 GiB) to slow device
   (...and after a while...)
3. ceph health detail
   HEALTH_WARN BlueFS spillover detected on 3 OSD
   BLUEFS_SPILLOVER BlueFS spillover detected on 3 OSD
   osd.3 spilled over 5 MiB metadata from 'db' device (556 MiB used of
   6.0 GiB) to slow device
   osd.4 spilled over 5 MiB metadata from 'db' device (552 MiB used of
   6.0 GiB) to slow device
   osd.5 spilled over 5 MiB metadata from 'db' device (551 MiB used of
   6.0 GiB) to slow device

I may be overlooking something, any idea? Just found also the following ceph issue:

https://tracker.ceph.com/issues/38745

5MiB of metadata in slow isn't a big problem, but cluster is permanently in health Warning state... :)


# pveversion -v
proxmox-ve: 6.1-2 (running kernel: 5.3.18-2-pve)
pve-manager: 6.1-7 (running version: 6.1-7/13e58d5e)
pve-kernel-helper: 6.1-7
pve-kernel-5.3: 6.1-5
pve-kernel-4.15: 5.4-14
pve-kernel-5.3.18-2-pve: 5.3.18-2
pve-kernel-4.15.18-26-pve: 4.15.18-54
pve-kernel-4.15.18-25-pve: 4.15.18-53
pve-kernel-4.15.18-12-pve: 4.15.18-36
pve-kernel-4.15.18-2-pve: 4.15.18-21
pve-kernel-4.13.13-6-pve: 4.13.13-42
pve-kernel-4.13.13-2-pve: 4.13.13-33
ceph: 14.2.8-pve1
ceph-fuse: 14.2.8-pve1
corosync: 3.0.3-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.15-pve1
libpve-access-control: 6.0-6
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.0-12
libpve-guest-common-perl: 3.0-3
libpve-http-server-perl: 3.0-4
libpve-storage-perl: 6.1-4
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 3.2.1-1
lxcfs: 3.0.3-pve60
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.1-3
pve-cluster: 6.1-4
pve-container: 3.0-19
pve-docs: 6.1-6
pve-edk2-firmware: 2.20200229-1
pve-firewall: 4.0-10
pve-firmware: 3.0-6
pve-ha-manager: 3.0-8
pve-i18n: 2.0-4
pve-qemu-kvm: 4.1.1-3
pve-xtermjs: 4.3.0-1
qemu-server: 6.1-6
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.3-pve1



Thanks a lot

Eneko

--
Zuzendari Teknikoa / Director Técnico
Binovo IT Human Project, S.L.
Telf. 943569206
Astigarragako bidea 2, 2º izq. oficina 11; 20180 Oiartzun (Gipuzkoa)
www.binovo.es

_______________________________________________
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user

Reply via email to