Hi Alwin,
Thanks for your reply! Appreciated.
These messages are not necessarily caused by a network issue. It might
well be that the daemon osd.18 can not react to heartbeat messages.
The thing is: the two OSDs are on the same host. I checked
ceph-osd.18.log, and it contains just regular ceph stuff, nothing
special, like this:
I noticed on host pm2 there are multiple kworker pids running with 100%
CPU utilisation. Also swap usage is 100%, while regular RAM usage (from
proxmox gui) is only 54%.
No idea what to make of that...
Check the logs on the host of osd.18.
Here they are:
2019-02-08 08:44:01.953390 7f6dc08b4700 1 leveldb: Level-0 table #1432303:
started
2019-02-08 08:44:02.108622 7f6dc08b4700 1 leveldb: Level-0 table #1432303:
1299359 bytes OK
2019-02-08 08:44:02.181135 7f6dc08b4700 1 leveldb: Delete type=0 #1432295
Also ceph-mon.1.log contains nothing special, except the regular stuff.
The cluster is doing scrubbing too, this is an intensive operation and
taxes your OSDs. This intensify the issue. But in general, you need to
find out what caused the slow requests. Ceph is able to throttle and
tries to get IOs done, even under pressure.
Yes, I turned off noscrub and nodeep-scrub again, after the issues of
yesterdaymorning were resolved. The system has been running HEALTH_OK
24hrs, with no issues. (except the worrying loglines appearing every
second)
If you describe your system further (eg. osd tree, crush map, system
specs) then we may be able to point you in the right direction. ;)
Here you go:
root@pm2:/var/log/ceph# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 87.35376 root default
-2 29.11688 host pm1
0 hdd 3.64000 osd.0 up 1.00000 1.00000
1 hdd 3.64000 osd.1 up 1.00000 1.00000
2 hdd 3.63689 osd.2 up 1.00000 1.00000
3 hdd 3.64000 osd.3 up 1.00000 1.00000
12 hdd 3.64000 osd.12 up 1.00000 1.00000
13 hdd 3.64000 osd.13 up 1.00000 1.00000
14 hdd 3.64000 osd.14 up 1.00000 1.00000
15 hdd 3.64000 osd.15 up 1.00000 1.00000
-3 29.12000 host pm2
4 hdd 3.64000 osd.4 up 1.00000 1.00000
5 hdd 3.64000 osd.5 up 1.00000 1.00000
6 hdd 3.64000 osd.6 up 1.00000 1.00000
7 hdd 3.64000 osd.7 up 1.00000 1.00000
16 hdd 3.64000 osd.16 up 1.00000 1.00000
17 hdd 3.64000 osd.17 up 1.00000 1.00000
18 hdd 3.64000 osd.18 up 1.00000 1.00000
19 hdd 3.64000 osd.19 up 1.00000 1.00000
-4 29.11688 host pm3
8 hdd 3.64000 osd.8 up 1.00000 1.00000
9 hdd 3.64000 osd.9 up 1.00000 1.00000
10 hdd 3.64000 osd.10 up 1.00000 1.00000
11 hdd 3.64000 osd.11 up 1.00000 1.00000
20 hdd 3.64000 osd.20 up 1.00000 1.00000
21 hdd 3.64000 osd.21 up 1.00000 1.00000
22 hdd 3.64000 osd.22 up 1.00000 1.00000
23 hdd 3.63689 osd.23 up 1.00000 1.00000
We have journals on SSD.
The crush map:
root@pm2:/var/log/ceph# cat /tmp/decomp
# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable chooseleaf_stable 1
tunable straw_calc_version 1
tunable allowed_bucket_algs 54
# devices
device 0 osd.0 class hdd
device 1 osd.1 class hdd
device 2 osd.2 class hdd
device 3 osd.3 class hdd
device 4 osd.4 class hdd
device 5 osd.5 class hdd
device 6 osd.6 class hdd
device 7 osd.7 class hdd
device 8 osd.8 class hdd
device 9 osd.9 class hdd
device 10 osd.10 class hdd
device 11 osd.11 class hdd
device 12 osd.12 class hdd
device 13 osd.13 class hdd
device 14 osd.14 class hdd
device 15 osd.15 class hdd
device 16 osd.16 class hdd
device 17 osd.17 class hdd
device 18 osd.18 class hdd
device 19 osd.19 class hdd
device 20 osd.20 class hdd
device 21 osd.21 class hdd
device 22 osd.22 class hdd
device 23 osd.23 class hdd
# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 region
type 10 root
# buckets
host pm1 {
id -2 # do not change unnecessarily
id -5 class hdd # do not change unnecessarily
# weight 29.117
alg straw
hash 0 # rjenkins1
item osd.0 weight 3.640
item osd.1 weight 3.640
item osd.3 weight 3.640
item osd.12 weight 3.640
item osd.13 weight 3.640
item osd.14 weight 3.640
item osd.15 weight 3.640
item osd.2 weight 3.637
}
host pm2 {
id -3 # do not change unnecessarily
id -6 class hdd # do not change unnecessarily
# weight 29.120
alg straw
hash 0 # rjenkins1
item osd.4 weight 3.640
item osd.5 weight 3.640
item osd.6 weight 3.640
item osd.7 weight 3.640
item osd.16 weight 3.640
item osd.17 weight 3.640
item osd.18 weight 3.640
item osd.19 weight 3.640
}
host pm3 {
id -4 # do not change unnecessarily
id -7 class hdd # do not change unnecessarily
# weight 29.117
alg straw
hash 0 # rjenkins1
item osd.8 weight 3.640
item osd.9 weight 3.640
item osd.10 weight 3.640
item osd.11 weight 3.640
item osd.20 weight 3.640
item osd.21 weight 3.640
item osd.22 weight 3.640
item osd.23 weight 3.637
}
root default {
id -1 # do not change unnecessarily
id -8 class hdd # do not change unnecessarily
# weight 87.354
alg straw
hash 0 # rjenkins1
item pm1 weight 29.117
item pm2 weight 29.120
item pm3 weight 29.117
}
# rules
rule replicated_ruleset {
id 0
type replicated
min_size 1
max_size 10
step take default
step chooseleaf firstn 0 type host
step emit
}
# end crush map
The three servers are identical: 128GB memory (50% used), dual Xeon(R)
CPU E5-2630 v4 @ 2.20GHz, pve 5.3,
Any ideas where to look? I could of course try a reboot of that node
pm2, to see if that makes the issue go away, but I'd rather understand
why osd.18 does not respond to heartbeat messages, why swap usage is
100%, and why there are multiple high-cpu kworker threads running on
this host only.
MJ
_______________________________________________
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user