Re: [PVE-User] [ceph-users] Re: Ceph Usage web and terminal.

Сергей Цаболов Wed, 29 Dec 2021 06:07:03 -0800

Ok,  I understand the case.

29.12.2021 16:13, Uwe Sauter пишет:

Am 29.12.21 um 13:51 schrieb Сергей Цаболов:

Hi, Uwe


29.12.2021 14:16, Uwe Sauter пишет:

Just a feeling but I'd say that the imbalance in OSDs (one host having many 
more disks than the
rest) is your problem.

Yes, last node in cluster have more disk then the rest, but

one disk is 12TB and all others 9 HD is 1TB

Assuming that your configuration keeps 3 copies of each VM image then the 
imbalance probably means
that 2 of these 3 copies reside on pve-3111 and if this host is unavailable, 
all VM images with 2
copies on that host become unresponsive, too.

In Proxmox web ceph pool I set the  Size: 2 , Min.Size: 2

So this means that you want to have 2 copies in the regular case (size) and 
also 2 copies in the
failure case (min size) so that the VMs stay available.

Yes I think before like you answer, but is not so worked.


So you might solve your problem by decreasing min size to 1 (dangerous!!) or by 
increasing size to
3, which means that in the regular case you will have 3 copies but if only 2 
are available, it will
still work and re-sync the 3rd copy once it comes online again.


I understand if decreasing min.size to 1 is very (dangerous!!!)

If I increasing to 3 min.size keep 2 is default .

But I'm afraid if set the 3/2 (good choice) MAX AVAIL in pool is willdecrease in two or more space, or am I wrong?


For now I have with all disk :

CLASS  SIZE         AVAIL       USED         RAW USED  %RAW USED
hdd    `106 TiB      99 TiB      7.7 TiB       7.7 TiB       7.26
TOTAL  106 TiB      99 TiB      7.7 TiB       7.7 TiB       7.26

--- POOLS ---

POOL ID PGS STORED OBJECTS USED %USED MAX AVAILdevice_health_metrics 1 1 8.3 MiB 22 17 MiB 0 44 TiBvm.pool 2 1024 3.0 TiB 864.55k 6.0 TiB 6.39 44 TiB ( terminal 44 TiB = 48.37 ) in web Isee 51.50 TBcephfs_data 3 32 874 GiB 223.76k 1.7TiB 1.91 44 TiBcephfs_metadata 4 32 25 MiB 27 51 MiB 0 44 TiB



Am I right in my reasoning ?

Thank you!

With :  ceph osd map vm.pool object-name (vm ID) I see some of vm object one 
copy is on osd.12,
example :

osdmap e14321 pool 'vm.pool' (2) object '114' -> pg 2.10486407 (2.7) -> up 
([12,8], p12) acting
([12,8], p12)

But this example :

osdmap e14321 pool 'vm.pool' (2) object '113' -> pg 2.8bd09f6d (2.36d) -> up 
([10,7], p10) acting
([10,7], p10)

osd.10 and osd.7

Check your failure domain for Ceph and possibly change it from OSD to host. 
This should prevent that
one host holds multiple copies of a VM image.

I didn 't understand a little what to check  ?

Can you explain me with example?

I don't have an example but you can read about the concept at:

https://docs.ceph.com/en/latest/rados/operations/crush-map/#crush-maps


Regards,

        Uwe


Regards,

     Uwe

Am 29.12.21 um 09:36 schrieb Сергей Цаболов:

Hello to all.

In my case I have the 7 node cluster Proxmox and working Ceph (ceph version 
15.2.15  octopus
(stable)": 7)

Ceph HEALTH_OK

ceph -s
    cluster:
      id:     9662e3fa-4ce6-41df-8d74-5deaa41a8dde
      health: HEALTH_OK

    services:
      mon: 7 daemons, quorum 
pve-3105,pve-3107,pve-3108,pve-3103,pve-3101,pve-3111,pve-3109 (age 17h)
      mgr: pve-3107(active, since 41h), standbys: pve-3109, pve-3103, pve-3105, 
pve-3101, pve-3111,
pve-3108
      mds: cephfs:1 {0=pve-3105=up:active} 6 up:standby
      osd: 22 osds: 22 up (since 17h), 22 in (since 17h)

    task status:

    data:
      pools:   4 pools, 1089 pgs
      objects: 1.09M objects, 4.1 TiB
      usage:   7.7 TiB used, 99 TiB / 106 TiB avail
      pgs:     1089 active+clean

---------------------------------------------------------------------------------------------------------------------



ceph osd tree

ID   CLASS  WEIGHT     TYPE NAME            STATUS  REWEIGHT PRI-AFF
   -1         106.43005  root default
-13          14.55478      host pve-3101
   10    hdd    7.27739          osd.10           up   1.00000 1.00000
   11    hdd    7.27739          osd.11           up   1.00000 1.00000
-11          14.55478      host pve-3103
    8    hdd    7.27739          osd.8            up   1.00000 1.00000
    9    hdd    7.27739          osd.9            up   1.00000 1.00000
   -3          14.55478      host pve-3105
    0    hdd    7.27739          osd.0            up   1.00000 1.00000
    1    hdd    7.27739          osd.1            up   1.00000 1.00000
   -5          14.55478      host pve-3107
    2    hdd    7.27739          osd.2            up   1.00000 1.00000
    3    hdd    7.27739          osd.3            up   1.00000 1.00000
   -9          14.55478      host pve-3108
    6    hdd    7.27739          osd.6            up   1.00000 1.00000
    7    hdd    7.27739          osd.7            up   1.00000 1.00000
   -7          14.55478      host pve-3109
    4    hdd    7.27739          osd.4            up   1.00000 1.00000
    5    hdd    7.27739          osd.5            up   1.00000 1.00000
-15          19.10138      host pve-3111
   12    hdd   10.91409          osd.12           up   1.00000 1.00000
   13    hdd    0.90970          osd.13           up   1.00000 1.00000
   14    hdd    0.90970          osd.14           up   1.00000 1.00000
   15    hdd    0.90970          osd.15           up   1.00000 1.00000
   16    hdd    0.90970          osd.16           up   1.00000 1.00000
   17    hdd    0.90970          osd.17           up   1.00000 1.00000
   18    hdd    0.90970          osd.18           up   1.00000 1.00000
   19    hdd    0.90970          osd.19           up   1.00000 1.00000
   20    hdd    0.90970          osd.20           up   1.00000 1.00000
   21    hdd    0.90970          osd.21           up   1.00000 1.00000

---------------------------------------------------------------------------------------------------------------



POOL                               ID  PGS   STORED   OBJECTS USED     %USED  
MAX AVAIL
vm.pool                            2  1024  3.0 TiB  863.31k  6.0 TiB   6.38    
 44 TiB  (this pool
have the all VM disk)

---------------------------------------------------------------------------------------------------------------



ceph osd map vm.pool vm.pool.object
osdmap e14319 pool 'vm.pool' (2) object 'vm.pool.object' -> pg 2.196f68d5 (2.d5) 
-> up ([2,4], p2)
acting ([2,4], p2)

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------


pveversion -v
proxmox-ve: 6.4-1 (running kernel: 5.4.143-1-pve)
pve-manager: 6.4-13 (running version: 6.4-13/9f411e79)
pve-kernel-helper: 6.4-8
pve-kernel-5.4: 6.4-7
pve-kernel-5.4.143-1-pve: 5.4.143-1
pve-kernel-5.4.106-1-pve: 5.4.106-1
ceph: 15.2.15-pve1~bpo10
ceph-fuse: 15.2.15-pve1~bpo10
corosync: 3.1.2-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: residual config
ifupdown2: 3.0.0-1+pve4~bpo10
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.22-pve1~bpo10+1
libproxmox-acme-perl: 1.1.0
libproxmox-backup-qemu0: 1.1.0-1
libpve-access-control: 6.4-3
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.4-4
libpve-guest-common-perl: 3.1-5
libpve-http-server-perl: 3.2-3
libpve-storage-perl: 6.4-1
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.6-2
lxcfs: 4.0.6-pve1
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.1.13-2
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.6-1
pve-cluster: 6.4-1
pve-container: 3.3-6
pve-docs: 6.4-2
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-4
pve-firmware: 3.3-2
pve-ha-manager: 3.1-1
pve-i18n: 2.3-1
pve-qemu-kvm: 5.2.0-6
pve-xtermjs: 4.7.0-3
qemu-server: 6.4-2
smartmontools: 7.2-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 2.0.6-pve1~bpo10+1

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------



And now my problem:

For all VM I have one pool for VM disks

When  node/host pve-3111  is shutdown in many of other nodes/hosts pve-3107, 
pve-3105  VM not
shutdown but not available in network.

After the node/host is up Ceph back to HEALTH_OK and the all VM back to access 
in Network (without
reboot).

Can some one to suggest me what I can to check in Ceph ?

Thanks.

--
-------------------------
С уважением
Сергей Цаболов,
Системный администратор
ООО "Т8"
Тел.: +74992716161,
Моб: +79850334875
[email protected]
ООО «Т8», 107076, г. Москва, Краснобогатырская ул., д. 44, стр.1
www.t8.ru


_______________________________________________
pve-user mailing list
[email protected]
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-user

Re: [PVE-User] [ceph-users] Re: Ceph Usage web and terminal.

Reply via email to