Public bug reported:
---Problem Description---
We setup 2 Ubuntu KVM host with the same mount point and try to migration the
guest between 2 HOST. The migration is success, the guest appear on other Host
after the migration but it shows some I/O error on the guest.
On the first host, run this
root@micro:~# virsh migrate --live --domain microg5
qemu+ssh://10.33.10.115/system --verbose --undefinesource --persistent
--timeout 60
Migration: [100 %]
The guest appear on other HOST:
root@tiny:~# virsh list --all
Id Name State
2 tinyg1 running
3 tinyg2 running
5 tinyg4 running
6 tinyg5 running
7 tinyg6 running
9 tinyg3 running
12 microg5 running <<< this guest is from HOST "Micro"
Checking status of the guest, I can see this error....
root@microg5:~# dmesg |tail -20
[ 60.818955] blk_update_request: I/O error, dev vdc, sector 96749232
[ 60.819113] Aborting journal on device vdc2-8.
[ 60.820121] blk_update_request: I/O error, dev vdc, sector 9084320
[ 60.820643] EXT4-fs warning (device vdc2): ext4_end_bio:329: I/O error -5
writing to inode 393279 (offset 0 size 0 starting block 1135541)
[ 60.820652] Buffer I/O error on device vdc2, logical block 1133492
[ 60.820655] EXT4-fs (vdc2): previous I/O error to superblock detected
[ 60.821394] blk_update_request: I/O error, dev vdc, sector 96747520
[ 60.821397] blk_update_request: I/O error, dev vdc, sector 96747520
[ 60.821402] Buffer I/O error on dev vdc2, logical block 12091392, lost sync
page write
[ 60.821466] JBD2: Error -5 detected when updating journal superblock for
vdc2-8.
[ 60.822214] blk_update_request: I/O error, dev vdc, sector 16384
[ 60.822216] blk_update_request: I/O error, dev vdc, sector 16384
[ 60.822218] Buffer I/O error on dev vdc2, logical block 0, lost sync page write
[ 60.822227] EXT4-fs error (device vdc2): ext4_journal_check_start:56: Detected
aborted journal
[ 60.822228] EXT4-fs (vdc2): Remounting filesystem read-only
[ 60.822229] EXT4-fs (vdc2): previous I/O error to superblock detected
[ 60.823201] blk_update_request: I/O error, dev vdc, sector 16384
[ 60.823203] blk_update_request: I/O error, dev vdc, sector 16384
[ 60.823204] Buffer I/O error on dev vdc2, logical block 0, lost sync page write
[ 96.736959] nfsd4: failed to purge old clients from recovery directory
v4recovery
root@microg5:~#
@haochanh
haochanh commented 18 days ago
Moving the guest back to original host successfully but we still see the I/O
error
root@tiny:~# virsh migrate --live --domain microg5
qemu+ssh://10.33.9.187/system --verbose --undefinesource --persistent --timeout
60
Migration: [100 %]
root@tiny:~# virsh list --all
Id Name State
2 tinyg1 running
3 tinyg2 running
5 tinyg4 running
6 tinyg5 running
7 tinyg6 running
9 tinyg3 running
On the orginal host:
root@micro:~# virsh list --all
Id Name State
2 microg1 running
3 microg2 running
4 microg3 running
5 microg4 running
9 microg6 running
16 microg5 running
Here is our config: both HOST (micro & tiny) are sharing the same NFS mount
/kvm_nfs/
Micro KVM:
root@micro:~# ls -l /kvm_nfs/microg6.raw.img
-rw-r--r-- 1 nobody 4294967294 107374182400 Aug 3 18:23 /kvm_nfs/microg6.raw.img
Tiny KVM:
root@tiny:~# ls -l /kvm_nfs/microg6.raw.img
-rw-r--r-- 1 nobody 4294967294 107374182400 Aug 3 2016 /kvm_nfs/microg6.raw.img
We try to do the migration the guest microg6 from "micro" to "tiny"
root@micro:~# virsh domblklist microg6
Target Source
vda /kvm_nfs/microg6.raw.img
root@micro:~# virsh migrate --live --domain microg6
qemu+ssh://10.33.10.115/system --verbose --undefinesource --persistent
--timeout 60
Migration: [100 %] <<<< it successfully goes to tiny KVM.
We can see guest "microg6" on tiny KVM now.
root@tiny:~# virsh domblklist microg6
Target Source
vda /kvm_nfs/microg6.raw.img
Checking on the guest "microg6", we see these error.....
root@microg6:~# dmesg |tail
[24371.936814] blk_update_request: I/O error, dev vda, sector 16384
[24371.936900] Buffer I/O error on dev vda2, logical block 0, lost sync page
write
[24373.661328] blk_update_request: I/O error, dev vda, sector 16416
[24373.661552] Buffer I/O error on dev vda2, logical block 4, lost async page
write
[24373.661778] Buffer I/O error on dev vda2, logical block 6, lost async page
write
[24373.662023] Buffer I/O error on dev vda2, logical block 13107201, lost async
page write
[24373.662253] Buffer I/O error on dev vda2, logical block 21004406, lost async
page write
[24373.662427] Buffer I/O error on dev vda2, logical block 21010963, lost async
page write
[24373.662713] Buffer I/O error on dev vda2, logical block 21495820, lost async
page write
[24373.662957] Buffer I/O error on dev vda2, logical block 16777222, lost async
page write
Both sharing the same NFS mount
root@tiny:~# df -T |grep "kvm_nfs"
tmp-lte:/kvm_lpm nfs4 1238417408 786754560 388756480 67% /kvm_nfs
root@tiny:~# rsh micro "df -T |grep kvm_nfs"
tmp-lte:/kvm_lpm nfs4 1238417408 786754560 388756480 67% /kvm_nfs
root@micro:~# df -T |grep "kvm_nfs"
tmp-lte:/kvm_lpm nfs4 1238417408 786754560 388756480 67% /kvm_nfs
The problem is due to a configuration problem in the NFSv4 server and/or
clients.
Probably related to the NFSv4 ID <-> Name Mapping (idmap).
Thus, this is not an I/O-related problem.
The solution/requirement is to make sure that the libvirt-qemu user has the
same UID/GID on all systems that it might be migrated too.
(workaround is to use NFSv3, for example, which has no ID mapping).
"NFSv4 mount incorrectly shows all files with ownership as
nobody:nobody"
https://access.redhat.com/solutions/33455
- Ensure the client and server have matching UID's and GID's.
It is a common misconception that the UID's and GID's can differ
when using NFSv4.
The sole purpose of id mapping is to map an id to a name and
vice-versa.
ID mapping is not intended as some sort of replacement for managing
id's.
- On a non-Ubuntu linux, if the above settings have been applied and
UID/GID's are matched on server and client
and users are still being mapped to nobody:nobody than a clearing of
the idmapd cache may be required:
# nfsidmap -c
There are some articles on google on 'linux how to change uid'.
Details:
---
The user 'developer' on NFS server has UID 529.
[developer@tmp-lte ~]$ grep developer /etc/passwd
developer:x:529:531:developer login id:/home/developer:/bin/bash
Create an user 'developer' on tiny w/ UID non-529.
The change owner operation to the 'developer' user does not work (user remains
'nobody').
root@tiny:~# useradd --uid 1234 developer
root@tiny:~# chown developer /kvm_nfs/test.mauricfo
root@tiny:~# ls -l /kvm_nfs/test.mauricfo
-rw-r--r-- 1 nobody root 18 Oct 11 17:17 /kvm_nfs/test.mauricfo
Remove the user 'developer' for the next test.
root@tiny:~# userdel developer
Create an user 'developer' on tiny w/ UID 529 (same as in NFS server).
The change owner operation to the 'developer' user DOES work (user is no longer
'nobody').
root@tiny:~# useradd --uid=529 developer
root@tiny:~# chown developer /kvm_nfs/test.mauricfo
root@tiny:~# ls -l /kvm_nfs/test.mauricfo
-rw-r--r-- 1 developer root 18 Oct 11 17:17 /kvm_nfs/test.mauricfo
And that works on tiny, but does NOT work on micro UNTIL you clear the nfsidmap
cache.
root@micro:~# ls -l /kvm_nfs/test.mauricfo
-rw-r--r-- 1 nobody root 18 Oct 11 17:17 /kvm_nfs/test.mauricfo
root@micro:~# useradd --uid 529 developer
root@micro:~# chown developer /kvm_nfs/test.mauricfo
root@micro:~# ls -l /kvm_nfs/test.mauricfo
-rw-r--r-- 1 nobody root 18 Oct 11 17:17 /kvm_nfs/test.mauricfo
root@micro:~# nfsidmap -c
nfsidmap: clearing '2a251810 I------ 1 perm 1f030000 0 0
keyring .id_resolver: 8'
root@micro:~# ls -l /kvm_nfs/test.mauricfo
-rw-r--r-- 1 developer root 18 Oct 11 17:17 /kvm_nfs/test.mauricfo
So, this is clearly NFSv4-specific.
Details 2:
---
The idmapd uses the domain from 'hostname --domain', which is the same
across all hosts.
[developer@tmp-lte ~]$ head -n5 /etc/idmapd.conf
[General]
#Verbosity = 0
# The following should be set to the local NFSv4 domain name
# The default is the host's DNS domain name.
#Domain = local.domain.edu
[developer@tmp-lte ~]$ hostname --domain
isst.aus.stglabs.ibm.com
root@tiny:~# hostname --domain
isst.aus.stglabs.ibm.com
root@micro:~# hostname --domain
isst.aus.stglabs.ibm.com
Started the nfs-idmapd.service in tiny and micro (apt-get install nfs-server;
systemctl start nfs-idmapd.service).
Increased verbosity in /etc/idmapd.conf to 3.
Verified that the domain values used by nfs-idmapd service are the same on all
hosts (journalctl -u nfs-idmapd)
References:
---
These links discuss a bit about this problem (user/group
nobody/nogroup); in case they're useful for the next person.
https://access.redhat.com/solutions/33455
- Ensure the client and server have matching UID's and GID's.
It is a common misconception that the UID's and GID's can differ when using
NFSv4.
The sole purpose of id mapping is to map an id to a name and vice-versa.
ID mapping is not intended as some sort of replacement for managing id's.
- On non-Ubuntu linux, if the above settings have been applied and UID/GID's
are matched on server and client
and users are still being mapped to nobody:nobody than a clearing of the
idmapd cache may be required:
# nfsidmap -c
https://help.ubuntu.com/community/NFSv4Howto
If all directory listings show just "nobody" and "nogroup" instead of
real user and group names,
then you might want to check the Domain parameter set in
/etc/idmapd.conf.
NFSv4 client and server should be in the same domain.
Other operating systems might derive the NFSv4 domain name from the
domain name mentioned in /etc/resolv.conf (e.g. Solaris 10).
http://www.enterprisenetworkingplanet.com/netos/article.php/3644471/Implement-NFSv4-Domains-and-Authentication.htm
http://unix.stackexchange.com/questions/138479/mounting-nfs-owners-are-nobodynogroup
https://www.novell.com/support/kb/doc.php?id=7005060
https://community.netapp.com/t5/Network-Storage-Protocols-Discussions/NFSv4-Linux-client-Netapp-Server-gt-Problem-with-id-mapping/td-p/17895
The reservation of an UID/GID in Debian/Ubuntu follows an allocation process
governed by Debian.
I have submitted an allocation request, and will prepare the patches for
libvirt-qemu in Debian and Ubuntu.
Hi Chanh,
Can you please test the libvirt-bin & libvirt0 packages?
http://ausgsa.ibm.com/~mauricfo/public/bugs/bz145069/v2/
Please confirm if they resolve the issue.
Thanks!
Details
---
The test packages assume that the UID & GID 64055 will be allocated by
Debian, and user this value for libvirt-qemu user/group.
You can remove any trace of the current libvirt-bin package (which creates
user/group) and user/group with:
# apt-get purge libvirt-bin
# userdel libvirt-qemu
# groupdel libvirt-qemu
Existing files assigned to libvirt-qemu user/group will not have its
permissions changed, so not sure that's a clean transition from an
existing system/files, but the permissions should be correct for all the
installed and new files created from there on.
Hi Mauricio,
Yes, after applied this patch, the migration is working fine without any IO
error.
I am able to migrate between 2 Host (Tiny & Micro) using the NFS mount method
without any IO issue.
root@micro:~# id libvirt-qemu
uid=64055(libvirt-qemu) gid=117(kvm) groups=117(kvm),64055(libvirt-qemu)
root@tiny:~# id libvirt-qemu
uid=64055(libvirt-qemu) gid=116(kvm) groups=116(kvm),64055(libvirt-qemu)
Hi Canonical,
@taco-screen-team
The attached patches are for Zesty, Xenial, and Debian sid (which I plan
to submit if the UID/GID allocation request is granted).
afaik @cjwatson is the maintainer of base-passwd on Debian, and could
review/grant/deny the allocation request.
Per Ubuntu Policy, we'd need it ack'ed on Debian first.
Then the libvirt patches.. all packages were verified.
For Z, X, and sid, the test-case follows this pattern, and has correct
behavior/results.
Thanks!
Test-case
===
Package is not installed -- no libvirt-qemu user/group:
---
# getent passwd libvirt-qemu
#
# getent group libvirt-qemu
#
Package is installed -- libvirt-qemu user/group is created w/ allocated uid/gid:
---
# dpkg -i libvirt{-bin,0}_1.3.1-1ubuntu10.5uidgid1_*.deb
# getent passwd libvirt-qemu
libvirt-qemu:x:64055:117:Libvirt Qemu,,,:/var/lib/libvirt:/bin/false
# getent group libvirt-qemu
libvirt-qemu:x:64055:libvirt-qemu
Package is uninstalled -- libvirt-qemu user/group is removed:
---
# apt-get --yes purge libvirt-bin
# getent passwd libvirt-qemu
#
# getent group libvirt-qemu
#
Package is installed with uid/gid taken -- libvirt-qemu user/group is created
with other uid/gid:
---
# useradd --uid 64055 testuser
# groupadd --gid 64055 testgroup
# dpkg -i libvirt{-bin,0}_1.3.1-1ubuntu10.5uidgid1_*.deb
# echo $?
0
# getent passwd libvirt-qemu
libvirt-qemu:x:113:117:Libvirt Qemu,,,:/var/lib/libvirt:/bin/false
# getent group libvirt-qemu
libvirt-qemu:x:118:libvirt-qemu
** Affects: libvirt (Ubuntu)
Importance: Undecided
Assignee: Taco Screen team (taco-screen-team)
Status: New
** Tags: architecture-ppc64le bugnameltc-145069 severity-high
targetmilestone-inin16041
** Tags added: architecture-ppc64le bugnameltc-145069 severity-high
targetmilestone-inin16041
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1637601
Title:
UbuntuKVM: migration using NFS mount fails #190
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1637601/+subscriptions
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs