Hi Alwin,

El 24/3/20 a las 14:54, Alwin Antreich escribió:
On Tue, Mar 24, 2020 at 01:12:03PM +0100, Eneko Lacunza wrote:
Hi Allwin,

El 24/3/20 a las 12:24, Alwin Antreich escribió:
On Tue, Mar 24, 2020 at 10:34:15AM +0100, Eneko Lacunza wrote:
We're seeing a spillover issue with Ceph, using 14.2.8:
[...]
3. ceph health detail
     HEALTH_WARN BlueFS spillover detected on 3 OSD
     BLUEFS_SPILLOVER BlueFS spillover detected on 3 OSD
     osd.3 spilled over 5 MiB metadata from 'db' device (556 MiB used of
     6.0 GiB) to slow device
     osd.4 spilled over 5 MiB metadata from 'db' device (552 MiB used of
     6.0 GiB) to slow device
     osd.5 spilled over 5 MiB metadata from 'db' device (551 MiB used of
     6.0 GiB) to slow device

I may be overlooking something, any idea? Just found also the following ceph
issue:

https://tracker.ceph.com/issues/38745

5MiB of metadata in slow isn't a big problem, but cluster is permanently in
health Warning state... :)
The DB/WAL device is to small and all the new metadata has to be written
to the slow device. This will destroy performance.

I think the size changes, as the DB gets compacted.
Yes. But it isn't too small... it's 6 GiB and there's only ~560MiB of data.
Yes true. I meant the used of size. But the message is oddly.

You should find the compaction stats in the OSD log files. It could be,
as in the bug tracker reasoned, that the compaction needs to much space
and spills over to the slow device. Addionally, if no set extra, the WAL
will take up 512 MB on the DB device.
I don't see any indication that compaction needs too much space:

2020-03-24 14:24:04.883 7f03ffbee700  4 rocksdb: [db/db_impl.cc:777] ------- DUMPING STATS -------
2020-03-24 14:24:04.883 7f03ffbee700  4 rocksdb: [db/db_impl.cc:778]
** DB Stats **
Uptime(secs): 15000.1 total, 600.0 interval
Cumulative writes: 4646 writes, 18K keys, 4646 commit groups, 1.0 writes per commit group, ingest: 0.01 GB, 0.00 MB/s Cumulative WAL: 4646 writes, 1891 syncs, 2.46 writes per sync, written: 0.01 GB, 0.00 MB/s
Cumulative stall: 00:00:0.000 H:M:S, 0.0 percent
Interval writes: 163 writes, 637 keys, 163 commit groups, 1.0 writes per commit group, ingest: 0.63 MB, 0.00 MB/s Interval WAL: 163 writes, 67 syncs, 2.40 writes per sync, written: 0.00 MB, 0.00 MB/s
Interval stall: 00:00:0.000 H:M:S, 0.0 percent

** Compaction Stats [default] **
Level    Files   Size     Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  L0      0/0    0.00 KB   0.0      0.0     0.0      0.0 0.0      0.0       0.0   1.0      0.0     33.4 0.02              0.00         2    0.009       0      0   L1      0/0    0.00 KB   0.0      0.0     0.0      0.0 0.0      0.0       0.0   0.8    162.1    134.6 0.09              0.06         1    0.092    127K    10K   L2      9/0   538.64 MB   0.2      0.5     0.0      0.5 0.5      0.0       0.0  43.6    102.7    101.2 5.32              1.31         1    5.325   1496K   110K  Sum      9/0   538.64 MB   0.0      0.5     0.0      0.5 0.5      0.0       0.0 961.1    103.3    101.5 5.43              1.37         4    1.358   1623K   121K  Int      0/0    0.00 KB   0.0      0.0     0.0      0.0 0.0      0.0       0.0   0.0      0.0      0.0 0.00              0.00         0    0.000       0      0

** Compaction Stats [default] **
Priority    Files   Size     Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Low      0/0    0.00 KB   0.0      0.5     0.0      0.5 0.5      0.0       0.0   0.0    103.7    101.7 5.42              1.36         2    2.708   1623K   121K High      0/0    0.00 KB   0.0      0.0     0.0      0.0 0.0      0.0       0.0   0.0      0.0     43.9 0.01              0.00         1    0.013       0      0 User      0/0    0.00 KB   0.0      0.0     0.0      0.0 0.0      0.0       0.0   0.0      0.0      0.4 0.00              0.00         1    0.004       0      0
Uptime(secs): 15000.1 total, 600.0 interval
Flush(GB): cumulative 0.001, interval 0.000
AddFile(GB): cumulative 0.000, interval 0.000
AddFile(Total Files): cumulative 0, interval 0
AddFile(L0 Files): cumulative 0, interval 0
AddFile(Keys): cumulative 0, interval 0
Cumulative compaction: 0.54 GB write, 0.04 MB/s write, 0.55 GB read, 0.04 MB/s read, 5.4 seconds Interval compaction: 0.00 GB write, 0.00 MB/s write, 0.00 GB read, 0.00 MB/s read, 0.0 seconds Stalls(count): 0 level0_slowdown, 0 level0_slowdown_with_compaction, 0 level0_numfiles, 0 level0_numfiles_with_compaction, 0 stop for pending_compaction_bytes, 0 slowdown for pending_compaction_bytes, 0 memtable_compaction, 0 memtable_slowdown, interval 0 total count

I see the following in a perf dump:

    "bluefs": {
        "gift_bytes": 0,
        "reclaim_bytes": 0,
        "db_total_bytes": 6442442752,
        "db_used_bytes": 696246272,
        "wal_total_bytes": 0,
        "wal_used_bytes": 0,
        "slow_total_bytes": 40004222976,
        "slow_used_bytes": 5242880,
        "num_files": 20,
        "log_bytes": 41631744,
        "log_compactions": 0,
        "logged_bytes": 40550400,
        "files_written_wal": 2,
        "files_written_sst": 41,
        "bytes_written_wal": 102040973,
        "bytes_written_sst": 2233090674,
        "bytes_written_slow": 0,
        "max_bytes_wal": 0,
        "max_bytes_db": 1153425408,
        "max_bytes_slow": 0,
        "read_random_count": 127832,
        "read_random_bytes": 2761102524,
        "read_random_disk_count": 19206,
        "read_random_disk_bytes": 2330400597,
        "read_random_buffer_count": 108844,
        "read_random_buffer_bytes": 430701927,
        "read_count": 21457,
        "read_bytes": 1087948189,
        "read_prefetch_count": 21438,
        "read_prefetch_bytes": 1086853927
    },


If the above doesn't give any information then you may need to export
the bluefs (RocksDB). Then you can run the kvstore-tool on it.
I'll look to try this, although I'd say it's some kind of bug.

The easiest way ist to destroy and re-create the OSD with a bigger
DB/WAL. The guideline from Facebook for RocksDB is 3/30/300 GB.
It's well below the 3GiB limit in the guideline ;)
For now. ;)
Cluster has 2 years now, data amount is quite stable, I think it will hold for some time ;)

Thanks a lot
Eneko

--
Zuzendari Teknikoa / Director Técnico
Binovo IT Human Project, S.L.
Telf. 943569206
Astigarragako bidea 2, 2º izq. oficina 11; 20180 Oiartzun (Gipuzkoa)
www.binovo.es

_______________________________________________
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user

Reply via email to