Re: [ceph-users] Reproducable Data Corruption with cephfs kernel driver
I too find Ceph fuse more stable. However, you really should do your tests with a much more recent kernel ! 3.10 is old. I think there is Ceph improvements in every kernel version since a long time. -- Thomas Lemarchand Cloud Solutions SAS - Responsable des systèmes d'information On jeu., 2014-12-18 at 14:52 +1000, Lindsay Mathieson wrote: I'be been experimenting with CephFS for funning KVM images (proxmox). cephfs fuse version - 0.87 cephfs kernel module - kernel version 3.10 Part of my testing involves running a Windows 7 VM up and running CrystalDiskMark to check the I/O in the VM. Its surprisingly good with both the fuse and the kernel driver, seq reads writes are actually faster than the underlying disk, so I presume the FS is aggressively caching. With the fuse driver I have no problems. With the kernel driver, the benchmark runs fine, but when I reboot the VM the drive is corrupted and unreadable, every time. Rolling back to a snapshot fixes the disk. This does not happen unless I run the benchmark, which I presume is writing a lot of data. No problems with the same test for Ceph rbd, or NFS. -- Lindsay ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Reproducable Data Corruption with cephfs kernel driver
On Wed, Dec 17, 2014 at 8:52 PM, Lindsay Mathieson lindsay.mathie...@gmail.com wrote: I'be been experimenting with CephFS for funning KVM images (proxmox). cephfs fuse version - 0.87 cephfs kernel module - kernel version 3.10 Part of my testing involves running a Windows 7 VM up and running CrystalDiskMark to check the I/O in the VM. Its surprisingly good with both the fuse and the kernel driver, seq reads writes are actually faster than the underlying disk, so I presume the FS is aggressively caching. With the fuse driver I have no problems. With the kernel driver, the benchmark runs fine, but when I reboot the VM the drive is corrupted and unreadable, every time. Rolling back to a snapshot fixes the disk. This does not happen unless I run the benchmark, which I presume is writing a lot of data. No problems with the same test for Ceph rbd, or NFS. Do you have any information about *how* the drive is corrupted; what part Win7 is unhappy with? I don't know how Proxmox configures it, but I assume you're storing the disk images as single files on the FS? I'm really not sure what the kernel client could even do here, since if you're not rebooting the host as well as the VM then it can't be losing any of the data it's given. :/ -Greg ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Reproducable Data Corruption with cephfs kernel driver
Hi Lindsay, have you tried the different cache-options (no cache, write through, ...) which proxmox offer, for the drive? Udo On 18.12.2014 05:52, Lindsay Mathieson wrote: I'be been experimenting with CephFS for funning KVM images (proxmox). cephfs fuse version - 0.87 cephfs kernel module - kernel version 3.10 Part of my testing involves running a Windows 7 VM up and running CrystalDiskMark to check the I/O in the VM. Its surprisingly good with both the fuse and the kernel driver, seq reads writes are actually faster than the underlying disk, so I presume the FS is aggressively caching. With the fuse driver I have no problems. With the kernel driver, the benchmark runs fine, but when I reboot the VM the drive is corrupted and unreadable, every time. Rolling back to a snapshot fixes the disk. This does not happen unless I run the benchmark, which I presume is writing a lot of data. No problems with the same test for Ceph rbd, or NFS. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Reproducable Data Corruption with cephfs kernel driver
On Thu, 18 Dec 2014 08:41:21 PM Udo Lembke wrote: have you tried the different cache-options (no cache, write through, ...) which proxmox offer, for the drive? I tried with writeback and it didn't corrupt. -- Lindsay signature.asc Description: This is a digitally signed message part. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Reproducable Data Corruption with cephfs kernel driver
On Thu, 18 Dec 2014 11:23:42 AM Gregory Farnum wrote: Do you have any information about *how* the drive is corrupted; what part Win7 is unhappy with? Failure to find the boot sector I think, I'll run it again and take a screen shot. I don't know how Proxmox configures it, but I assume you're storing the disk images as single files on the FS? its a single KVM QCOW2 file. -- Lindsay signature.asc Description: This is a digitally signed message part. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Reproducable Data Corruption with cephfs kernel driver
On Thu, Dec 18, 2014 at 8:40 PM, Lindsay Mathieson lindsay.mathie...@gmail.com wrote: I don't know how Proxmox configures it, but I assume you're storing the disk images as single files on the FS? its a single KVM QCOW2 file. Like the cache mode, the image format might be an interesting thing to experiment with. There are bugs out there in all layers of the IO stack, it's entirely possible that you're seeing a bug elsewhere in the stack that is only being triggered when using Ceph. This probably goes without saying, but make sure you're using the latest/greatest versions of everything, including kvm/qemu/proxmox/kernel/guest drivers. John ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com