Hi all,

We're in the process of upgrading our office Proxmox v4.4 cluster to v5.1 .

For that we first have followed instructions in
https://pve.proxmox.com/wiki/Ceph_Jewel_to_Luminous

to upgrade Ceph Jewel to Luminous.

Upgrade was apparently a success:
# ceph -s
  cluster:
    id:     8ee074d4-005c-4bd6-a077-85eddde543b5
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum 0,2,3
    mgr: butroe(active), standbys: guadalupe, sanmarko
    osd: 12 osds: 12 up, 12 in

  data:
    pools:   2 pools, 640 pgs
    objects: 518k objects, 1966 GB
    usage:   4120 GB used, 7052 GB / 11172 GB avail
    pgs:     640 active+clean

  io:
    client:   644 kB/s rd, 3299 kB/s wr, 61 op/s rd, 166 op/s wr

And versions seem good too:
# ceph mon versions
{
    "ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)": 3
}
# ceph osd versions
{
    "ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable)": 12
}

But this weeked there were problems backing up some VMs, all with the same error:
no such volume 'ceph-proxmox:vm-120-disk-1'

The "missing" volumes don't show in storage content, but they DO if we do a "rbd -p proxmox ls".

When we try an info command we get an error though:
# rbd -p proxmox info vm-120-disk-1
2017-11-13 16:04:02.979006 7f99d8ff9700 -1 librbd::image::OpenRequest: failed to retreive immutable metadata: (2) No such file or directory
rbd: error opening image vm-120-disk-1: (2) No such file or directory

Other VM disk images behave normally:
# rbd -p proxmox info vm-119-disk-1
rbd image 'vm-119-disk-1':
    size 3072 MB in 768 objects
    order 22 (4096 kB objects)
    block_name_prefix: rbd_data.575762ae8944a
    format: 2
    features: layering
    flags:

I don't really know what to look at to further diagnose this. I recall that there was a version 1 format for rbd, but I doubt "missing" disk images are in that old format (and really don't know how to check that if info doesn't work...)

Some of the missing VMs continue to be used by "old" running qemu processes and work correctly; but if we stop the VM, then it won't start again with the error reported above. I can start and stop VMs with non-"missing" disk images normally.

Any hints about what to try next?

OSDs are filestore with XFS (created from GUI).

# pveversion -v
proxmox-ve: 4.4-96 (running kernel: 4.4.83-1-pve)
pve-manager: 4.4-18 (running version: 4.4-18/ef2610e8)
pve-kernel-4.4.67-1-pve: 4.4.67-92
pve-kernel-4.4.76-1-pve: 4.4.76-94
pve-kernel-4.4.83-1-pve: 4.4.83-96
lvm2: 2.02.116-pve3
corosync-pve: 2.4.2-2~pve4+1
libqb0: 1.0.1-1
pve-cluster: 4.0-53
qemu-server: 4.0-113
pve-firmware: 1.1-11
libpve-common-perl: 4.0-96
libpve-access-control: 4.0-23
libpve-storage-perl: 4.0-76
pve-libspice-server1: 0.12.8-2
vncterm: 1.3-2
pve-docs: 4.4-4
pve-qemu-kvm: 2.9.0-5~pve4
pve-container: 1.0-101
pve-firewall: 2.0-33
pve-ha-manager: 1.0-41
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u3
lxc-pve: 2.0.7-4
lxcfs: 2.0.6-pve1
criu: 1.6.0-1
novnc-pve: 0.5-9
smartmontools: 6.5+svn4324-1~pve80
zfsutils: 0.6.5.9-pve15~bpo80
ceph: 12.2.1-1~bpo80+1

Thanks a lot
Eneko

--
Zuzendari Teknikoa / Director Técnico
Binovo IT Human Project, S.L.
Telf. 943569206
Astigarraga bidea 2, 2º izq. oficina 11; 20180 Oiartzun (Gipuzkoa)
www.binovo.es

_______________________________________________
pve-user mailing list
[email protected]
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user

Reply via email to