Public bug reported: ---Problem Description--- We setup 2 Ubuntu KVM host with the same mount point and try to migration the guest between 2 HOST. The migration is success, the guest appear on other Host after the migration but it shows some I/O error on the guest.
On the first host, run this root@micro:~# virsh migrate --live --domain microg5 qemu+ssh://10.33.10.115/system --verbose --undefinesource --persistent --timeout 60 Migration: [100 %] The guest appear on other HOST: root@tiny:~# virsh list --all Id Name State 2 tinyg1 running 3 tinyg2 running 5 tinyg4 running 6 tinyg5 running 7 tinyg6 running 9 tinyg3 running 12 microg5 running <<< this guest is from HOST "Micro" Checking status of the guest, I can see this error.... root@microg5:~# dmesg |tail -20 [ 60.818955] blk_update_request: I/O error, dev vdc, sector 96749232 [ 60.819113] Aborting journal on device vdc2-8. [ 60.820121] blk_update_request: I/O error, dev vdc, sector 9084320 [ 60.820643] EXT4-fs warning (device vdc2): ext4_end_bio:329: I/O error -5 writing to inode 393279 (offset 0 size 0 starting block 1135541) [ 60.820652] Buffer I/O error on device vdc2, logical block 1133492 [ 60.820655] EXT4-fs (vdc2): previous I/O error to superblock detected [ 60.821394] blk_update_request: I/O error, dev vdc, sector 96747520 [ 60.821397] blk_update_request: I/O error, dev vdc, sector 96747520 [ 60.821402] Buffer I/O error on dev vdc2, logical block 12091392, lost sync page write [ 60.821466] JBD2: Error -5 detected when updating journal superblock for vdc2-8. [ 60.822214] blk_update_request: I/O error, dev vdc, sector 16384 [ 60.822216] blk_update_request: I/O error, dev vdc, sector 16384 [ 60.822218] Buffer I/O error on dev vdc2, logical block 0, lost sync page write [ 60.822227] EXT4-fs error (device vdc2): ext4_journal_check_start:56: Detected aborted journal [ 60.822228] EXT4-fs (vdc2): Remounting filesystem read-only [ 60.822229] EXT4-fs (vdc2): previous I/O error to superblock detected [ 60.823201] blk_update_request: I/O error, dev vdc, sector 16384 [ 60.823203] blk_update_request: I/O error, dev vdc, sector 16384 [ 60.823204] Buffer I/O error on dev vdc2, logical block 0, lost sync page write [ 96.736959] nfsd4: failed to purge old clients from recovery directory v4recovery root@microg5:~# @haochanh haochanh commented 18 days ago Moving the guest back to original host successfully but we still see the I/O error root@tiny:~# virsh migrate --live --domain microg5 qemu+ssh://10.33.9.187/system --verbose --undefinesource --persistent --timeout 60 Migration: [100 %] root@tiny:~# virsh list --all Id Name State 2 tinyg1 running 3 tinyg2 running 5 tinyg4 running 6 tinyg5 running 7 tinyg6 running 9 tinyg3 running On the orginal host: root@micro:~# virsh list --all Id Name State 2 microg1 running 3 microg2 running 4 microg3 running 5 microg4 running 9 microg6 running 16 microg5 running Here is our config: both HOST (micro & tiny) are sharing the same NFS mount /kvm_nfs/ Micro KVM: root@micro:~# ls -l /kvm_nfs/microg6.raw.img -rw-r--r-- 1 nobody 4294967294 107374182400 Aug 3 18:23 /kvm_nfs/microg6.raw.img Tiny KVM: root@tiny:~# ls -l /kvm_nfs/microg6.raw.img -rw-r--r-- 1 nobody 4294967294 107374182400 Aug 3 2016 /kvm_nfs/microg6.raw.img We try to do the migration the guest microg6 from "micro" to "tiny" root@micro:~# virsh domblklist microg6 Target Source vda /kvm_nfs/microg6.raw.img root@micro:~# virsh migrate --live --domain microg6 qemu+ssh://10.33.10.115/system --verbose --undefinesource --persistent --timeout 60 Migration: [100 %] <<<< it successfully goes to tiny KVM. We can see guest "microg6" on tiny KVM now. root@tiny:~# virsh domblklist microg6 Target Source vda /kvm_nfs/microg6.raw.img Checking on the guest "microg6", we see these error..... root@microg6:~# dmesg |tail [24371.936814] blk_update_request: I/O error, dev vda, sector 16384 [24371.936900] Buffer I/O error on dev vda2, logical block 0, lost sync page write [24373.661328] blk_update_request: I/O error, dev vda, sector 16416 [24373.661552] Buffer I/O error on dev vda2, logical block 4, lost async page write [24373.661778] Buffer I/O error on dev vda2, logical block 6, lost async page write [24373.662023] Buffer I/O error on dev vda2, logical block 13107201, lost async page write [24373.662253] Buffer I/O error on dev vda2, logical block 21004406, lost async page write [24373.662427] Buffer I/O error on dev vda2, logical block 21010963, lost async page write [24373.662713] Buffer I/O error on dev vda2, logical block 21495820, lost async page write [24373.662957] Buffer I/O error on dev vda2, logical block 16777222, lost async page write Both sharing the same NFS mount root@tiny:~# df -T |grep "kvm_nfs" tmp-lte:/kvm_lpm nfs4 1238417408 786754560 388756480 67% /kvm_nfs root@tiny:~# rsh micro "df -T |grep kvm_nfs" tmp-lte:/kvm_lpm nfs4 1238417408 786754560 388756480 67% /kvm_nfs root@micro:~# df -T |grep "kvm_nfs" tmp-lte:/kvm_lpm nfs4 1238417408 786754560 388756480 67% /kvm_nfs The problem is due to a configuration problem in the NFSv4 server and/or clients. Probably related to the NFSv4 ID <-> Name Mapping (idmap). Thus, this is not an I/O-related problem. The solution/requirement is to make sure that the libvirt-qemu user has the same UID/GID on all systems that it might be migrated too. (workaround is to use NFSv3, for example, which has no ID mapping). "NFSv4 mount incorrectly shows all files with ownership as nobody:nobody" https://access.redhat.com/solutions/33455 - Ensure the client and server have matching UID's and GID's. It is a common misconception that the UID's and GID's can differ when using NFSv4. The sole purpose of id mapping is to map an id to a name and vice-versa. ID mapping is not intended as some sort of replacement for managing id's. - On a non-Ubuntu linux, if the above settings have been applied and UID/GID's are matched on server and client and users are still being mapped to nobody:nobody than a clearing of the idmapd cache may be required: # nfsidmap -c There are some articles on google on 'linux how to change uid'. Details: --- The user 'developer' on NFS server has UID 529. [developer@tmp-lte ~]$ grep developer /etc/passwd developer:x:529:531:developer login id:/home/developer:/bin/bash Create an user 'developer' on tiny w/ UID non-529. The change owner operation to the 'developer' user does not work (user remains 'nobody'). root@tiny:~# useradd --uid 1234 developer root@tiny:~# chown developer /kvm_nfs/test.mauricfo root@tiny:~# ls -l /kvm_nfs/test.mauricfo -rw-r--r-- 1 nobody root 18 Oct 11 17:17 /kvm_nfs/test.mauricfo Remove the user 'developer' for the next test. root@tiny:~# userdel developer Create an user 'developer' on tiny w/ UID 529 (same as in NFS server). The change owner operation to the 'developer' user DOES work (user is no longer 'nobody'). root@tiny:~# useradd --uid=529 developer root@tiny:~# chown developer /kvm_nfs/test.mauricfo root@tiny:~# ls -l /kvm_nfs/test.mauricfo -rw-r--r-- 1 developer root 18 Oct 11 17:17 /kvm_nfs/test.mauricfo And that works on tiny, but does NOT work on micro UNTIL you clear the nfsidmap cache. root@micro:~# ls -l /kvm_nfs/test.mauricfo -rw-r--r-- 1 nobody root 18 Oct 11 17:17 /kvm_nfs/test.mauricfo root@micro:~# useradd --uid 529 developer root@micro:~# chown developer /kvm_nfs/test.mauricfo root@micro:~# ls -l /kvm_nfs/test.mauricfo -rw-r--r-- 1 nobody root 18 Oct 11 17:17 /kvm_nfs/test.mauricfo root@micro:~# nfsidmap -c nfsidmap: clearing '2a251810 I------ 1 perm 1f030000 0 0 keyring .id_resolver: 8' root@micro:~# ls -l /kvm_nfs/test.mauricfo -rw-r--r-- 1 developer root 18 Oct 11 17:17 /kvm_nfs/test.mauricfo So, this is clearly NFSv4-specific. Details 2: --- The idmapd uses the domain from 'hostname --domain', which is the same across all hosts. [developer@tmp-lte ~]$ head -n5 /etc/idmapd.conf [General] #Verbosity = 0 # The following should be set to the local NFSv4 domain name # The default is the host's DNS domain name. #Domain = local.domain.edu [developer@tmp-lte ~]$ hostname --domain isst.aus.stglabs.ibm.com root@tiny:~# hostname --domain isst.aus.stglabs.ibm.com root@micro:~# hostname --domain isst.aus.stglabs.ibm.com Started the nfs-idmapd.service in tiny and micro (apt-get install nfs-server; systemctl start nfs-idmapd.service). Increased verbosity in /etc/idmapd.conf to 3. Verified that the domain values used by nfs-idmapd service are the same on all hosts (journalctl -u nfs-idmapd) References: --- These links discuss a bit about this problem (user/group nobody/nogroup); in case they're useful for the next person. https://access.redhat.com/solutions/33455 - Ensure the client and server have matching UID's and GID's. It is a common misconception that the UID's and GID's can differ when using NFSv4. The sole purpose of id mapping is to map an id to a name and vice-versa. ID mapping is not intended as some sort of replacement for managing id's. - On non-Ubuntu linux, if the above settings have been applied and UID/GID's are matched on server and client and users are still being mapped to nobody:nobody than a clearing of the idmapd cache may be required: # nfsidmap -c https://help.ubuntu.com/community/NFSv4Howto If all directory listings show just "nobody" and "nogroup" instead of real user and group names, then you might want to check the Domain parameter set in /etc/idmapd.conf. NFSv4 client and server should be in the same domain. Other operating systems might derive the NFSv4 domain name from the domain name mentioned in /etc/resolv.conf (e.g. Solaris 10). http://www.enterprisenetworkingplanet.com/netos/article.php/3644471/Implement-NFSv4-Domains-and-Authentication.htm http://unix.stackexchange.com/questions/138479/mounting-nfs-owners-are-nobodynogroup https://www.novell.com/support/kb/doc.php?id=7005060 https://community.netapp.com/t5/Network-Storage-Protocols-Discussions/NFSv4-Linux-client-Netapp-Server-gt-Problem-with-id-mapping/td-p/17895 The reservation of an UID/GID in Debian/Ubuntu follows an allocation process governed by Debian. I have submitted an allocation request, and will prepare the patches for libvirt-qemu in Debian and Ubuntu. Hi Chanh, Can you please test the libvirt-bin & libvirt0 packages? http://ausgsa.ibm.com/~mauricfo/public/bugs/bz145069/v2/ Please confirm if they resolve the issue. Thanks! Details --- The test packages assume that the UID & GID 64055 will be allocated by Debian, and user this value for libvirt-qemu user/group. You can remove any trace of the current libvirt-bin package (which creates user/group) and user/group with: # apt-get purge libvirt-bin # userdel libvirt-qemu # groupdel libvirt-qemu Existing files assigned to libvirt-qemu user/group will not have its permissions changed, so not sure that's a clean transition from an existing system/files, but the permissions should be correct for all the installed and new files created from there on. Hi Mauricio, Yes, after applied this patch, the migration is working fine without any IO error. I am able to migrate between 2 Host (Tiny & Micro) using the NFS mount method without any IO issue. root@micro:~# id libvirt-qemu uid=64055(libvirt-qemu) gid=117(kvm) groups=117(kvm),64055(libvirt-qemu) root@tiny:~# id libvirt-qemu uid=64055(libvirt-qemu) gid=116(kvm) groups=116(kvm),64055(libvirt-qemu) Hi Canonical, @taco-screen-team The attached patches are for Zesty, Xenial, and Debian sid (which I plan to submit if the UID/GID allocation request is granted). afaik @cjwatson is the maintainer of base-passwd on Debian, and could review/grant/deny the allocation request. Per Ubuntu Policy, we'd need it ack'ed on Debian first. Then the libvirt patches.. all packages were verified. For Z, X, and sid, the test-case follows this pattern, and has correct behavior/results. Thanks! Test-case === Package is not installed -- no libvirt-qemu user/group: --- # getent passwd libvirt-qemu # # getent group libvirt-qemu # Package is installed -- libvirt-qemu user/group is created w/ allocated uid/gid: --- # dpkg -i libvirt{-bin,0}_1.3.1-1ubuntu10.5uidgid1_*.deb # getent passwd libvirt-qemu libvirt-qemu:x:64055:117:Libvirt Qemu,,,:/var/lib/libvirt:/bin/false # getent group libvirt-qemu libvirt-qemu:x:64055:libvirt-qemu Package is uninstalled -- libvirt-qemu user/group is removed: --- # apt-get --yes purge libvirt-bin # getent passwd libvirt-qemu # # getent group libvirt-qemu # Package is installed with uid/gid taken -- libvirt-qemu user/group is created with other uid/gid: --- # useradd --uid 64055 testuser # groupadd --gid 64055 testgroup # dpkg -i libvirt{-bin,0}_1.3.1-1ubuntu10.5uidgid1_*.deb # echo $? 0 # getent passwd libvirt-qemu libvirt-qemu:x:113:117:Libvirt Qemu,,,:/var/lib/libvirt:/bin/false # getent group libvirt-qemu libvirt-qemu:x:118:libvirt-qemu ** Affects: libvirt (Ubuntu) Importance: Undecided Assignee: Taco Screen team (taco-screen-team) Status: New ** Tags: architecture-ppc64le bugnameltc-145069 severity-high targetmilestone-inin16041 ** Tags added: architecture-ppc64le bugnameltc-145069 severity-high targetmilestone-inin16041 -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1637601 Title: UbuntuKVM: migration using NFS mount fails #190 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1637601/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs