Hi, Uwe
29.12.2021 14:16, Uwe Sauter пишет:
Just a feeling but I'd say that the imbalance in OSDs (one host having many
more disks than the
rest) is your problem.
Yes, last node in cluster have more disk then the rest, but
one disk is 12TB and all others 9 HD is 1TB
Assuming that your configuration keeps 3 copies of each VM image then the
imbalance probably means
that 2 of these 3 copies reside on pve-3111 and if this host is unavailable,
all VM images with 2
copies on that host become unresponsive, too.
In Proxmox web ceph pool I set the Size: 2 , Min.Size: 2
With : ceph osd map vm.pool object-name (vm ID) I see some of vm object
one copy is on osd.12, example :
osdmap e14321 pool 'vm.pool' (2) object '114' -> pg 2.10486407 (2.7) ->
up ([12,8], p12) acting ([12,8], p12)
But this example :
osdmap e14321 pool 'vm.pool' (2) object '113' -> pg 2.8bd09f6d (2.36d)
-> up ([10,7], p10) acting ([10,7], p10)
osd.10 and osd.7
Check your failure domain for Ceph and possibly change it from OSD to host.
This should prevent that
one host holds multiple copies of a VM image.
I didn 't understand a little what to check ?
Can you explain me with example?
Regards,
Uwe
Am 29.12.21 um 09:36 schrieb Сергей Цаболов:
Hello to all.
In my case I have the 7 node cluster Proxmox and working Ceph (ceph version
15.2.15 octopus
(stable)": 7)
Ceph HEALTH_OK
ceph -s
cluster:
id: 9662e3fa-4ce6-41df-8d74-5deaa41a8dde
health: HEALTH_OK
services:
mon: 7 daemons, quorum
pve-3105,pve-3107,pve-3108,pve-3103,pve-3101,pve-3111,pve-3109 (age 17h)
mgr: pve-3107(active, since 41h), standbys: pve-3109, pve-3103, pve-3105,
pve-3101, pve-3111,
pve-3108
mds: cephfs:1 {0=pve-3105=up:active} 6 up:standby
osd: 22 osds: 22 up (since 17h), 22 in (since 17h)
task status:
data:
pools: 4 pools, 1089 pgs
objects: 1.09M objects, 4.1 TiB
usage: 7.7 TiB used, 99 TiB / 106 TiB avail
pgs: 1089 active+clean
---------------------------------------------------------------------------------------------------------------------
ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 106.43005 root default
-13 14.55478 host pve-3101
10 hdd 7.27739 osd.10 up 1.00000 1.00000
11 hdd 7.27739 osd.11 up 1.00000 1.00000
-11 14.55478 host pve-3103
8 hdd 7.27739 osd.8 up 1.00000 1.00000
9 hdd 7.27739 osd.9 up 1.00000 1.00000
-3 14.55478 host pve-3105
0 hdd 7.27739 osd.0 up 1.00000 1.00000
1 hdd 7.27739 osd.1 up 1.00000 1.00000
-5 14.55478 host pve-3107
2 hdd 7.27739 osd.2 up 1.00000 1.00000
3 hdd 7.27739 osd.3 up 1.00000 1.00000
-9 14.55478 host pve-3108
6 hdd 7.27739 osd.6 up 1.00000 1.00000
7 hdd 7.27739 osd.7 up 1.00000 1.00000
-7 14.55478 host pve-3109
4 hdd 7.27739 osd.4 up 1.00000 1.00000
5 hdd 7.27739 osd.5 up 1.00000 1.00000
-15 19.10138 host pve-3111
12 hdd 10.91409 osd.12 up 1.00000 1.00000
13 hdd 0.90970 osd.13 up 1.00000 1.00000
14 hdd 0.90970 osd.14 up 1.00000 1.00000
15 hdd 0.90970 osd.15 up 1.00000 1.00000
16 hdd 0.90970 osd.16 up 1.00000 1.00000
17 hdd 0.90970 osd.17 up 1.00000 1.00000
18 hdd 0.90970 osd.18 up 1.00000 1.00000
19 hdd 0.90970 osd.19 up 1.00000 1.00000
20 hdd 0.90970 osd.20 up 1.00000 1.00000
21 hdd 0.90970 osd.21 up 1.00000 1.00000
---------------------------------------------------------------------------------------------------------------
POOL ID PGS STORED OBJECTS USED %USED
MAX AVAIL
vm.pool 2 1024 3.0 TiB 863.31k 6.0 TiB 6.38
44 TiB (this pool
have the all VM disk)
---------------------------------------------------------------------------------------------------------------
ceph osd map vm.pool vm.pool.object
osdmap e14319 pool 'vm.pool' (2) object 'vm.pool.object' -> pg 2.196f68d5 (2.d5)
-> up ([2,4], p2)
acting ([2,4], p2)
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
pveversion -v
proxmox-ve: 6.4-1 (running kernel: 5.4.143-1-pve)
pve-manager: 6.4-13 (running version: 6.4-13/9f411e79)
pve-kernel-helper: 6.4-8
pve-kernel-5.4: 6.4-7
pve-kernel-5.4.143-1-pve: 5.4.143-1
pve-kernel-5.4.106-1-pve: 5.4.106-1
ceph: 15.2.15-pve1~bpo10
ceph-fuse: 15.2.15-pve1~bpo10
corosync: 3.1.2-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: residual config
ifupdown2: 3.0.0-1+pve4~bpo10
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.22-pve1~bpo10+1
libproxmox-acme-perl: 1.1.0
libproxmox-backup-qemu0: 1.1.0-1
libpve-access-control: 6.4-3
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.4-4
libpve-guest-common-perl: 3.1-5
libpve-http-server-perl: 3.2-3
libpve-storage-perl: 6.4-1
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.6-2
lxcfs: 4.0.6-pve1
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.1.13-2
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.6-1
pve-cluster: 6.4-1
pve-container: 3.3-6
pve-docs: 6.4-2
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-4
pve-firmware: 3.3-2
pve-ha-manager: 3.1-1
pve-i18n: 2.3-1
pve-qemu-kvm: 5.2.0-6
pve-xtermjs: 4.7.0-3
qemu-server: 6.4-2
smartmontools: 7.2-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 2.0.6-pve1~bpo10+1
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
And now my problem:
For all VM I have one pool for VM disks
When node/host pve-3111 is shutdown in many of other nodes/hosts pve-3107,
pve-3105 VM not
shutdown but not available in network.
After the node/host is up Ceph back to HEALTH_OK and the all VM back to access
in Network (without
reboot).
Can some one to suggest me what I can to check in Ceph ?
Thanks.
--
-------------------------
С уважением
Сергей Цаболов,
Системный администратор
ООО "Т8"
Тел.: +74992716161,
Моб: +79850334875
[email protected]
ООО «Т8», 107076, г. Москва, Краснобогатырская ул., д. 44, стр.1
www.t8.ru
_______________________________________________
pve-user mailing list
[email protected]
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user