Re: [ceph-users] SSD-Cache Tier + RBD-Cache = Filesystem corruption?
Hi, On 02/18/2016 07:53 PM, Jason Dillaman wrote: > That's a pretty strange and seemingly non-random corruption of your first > block. Is that object in the cache pool right now? If so, is the backing > pool object just as corrupt as the cache pool's object? How do I see all that? Sorry, I'm new to this kind of ceph-debugging. If there is no quick answer, I will start digging into this topic. > > I see that your cache pool is currently configured in forward mode. Did you > switch to that mode in an attempt to stop any further issues or was it > configured in forward mode before any corruption? No, I switched to forward mode in order to stop the corruption. It was in writeback initially. Cheers, udo. signature.asc Description: OpenPGP digital signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] SSD-Cache Tier + RBD-Cache = Filesystem corruption?
That's a pretty strange and seemingly non-random corruption of your first block. Is that object in the cache pool right now? If so, is the backing pool object just as corrupt as the cache pool's object? I see that your cache pool is currently configured in forward mode. Did you switch to that mode in an attempt to stop any further issues or was it configured in forward mode before any corruption? -- Jason Dillaman - Original Message - > From: "Udo Waechter" > To: "Jason Dillaman" > Cc: "ceph-users" > Sent: Wednesday, February 17, 2016 3:15:01 AM > Subject: Re: [ceph-users] SSD-Cache Tier + RBD-Cache = Filesystem corruption? > > Hello, sorry for the delay. I was pretty busy otherwise. > > > On 02/11/2016 03:13 PM, Jason Dillaman wrote: > > Assuming the partition table is still zeroed on that image, can you run: > > > > # rados -p get rbd_data.18394b3d1b58ba. - | cut > > -b 512 | hexdump > > > > Here's the hexdump: > > > 000 0a0a 0a0a 0a00 0a00 0a0a 0a0a 0a0a 0a0a > 010 0a0a 0a0a 0a0a 0a0a 0a0a 0a0a 0a0a 0a0a > 020 0a0a 0a0a 0a0a 000a 0a0a 0a0a 0a0a 0a0a > 030 0a0a 0a00 0a0a 0a0a 0a0a 0a0a 000a 0a0a > 040 0a0a 0a0a 0a0a 0a0a 0a00 > 04a > > > > Can you also provide your pool setup: > > > > # ceph report --format xml 2>/dev/null | xmlstarlet sel -t -c > > "//osdmap/pools" > > > Attached you'll find the pools information. > > Thanks very much for looking into this. > > udo. > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] SSD-Cache Tier + RBD-Cache = Filesystem corruption?
Hello, sorry for the delay. I was pretty busy otherwise. On 02/11/2016 03:13 PM, Jason Dillaman wrote: > Assuming the partition table is still zeroed on that image, can you run: > > # rados -p get rbd_data.18394b3d1b58ba. - | cut > -b 512 | hexdump > Here's the hexdump: 000 0a0a 0a0a 0a00 0a00 0a0a 0a0a 0a0a 0a0a 010 0a0a 0a0a 0a0a 0a0a 0a0a 0a0a 0a0a 0a0a 020 0a0a 0a0a 0a0a 000a 0a0a 0a0a 0a0a 0a0a 030 0a0a 0a00 0a0a 0a0a 0a0a 0a0a 000a 0a0a 040 0a0a 0a0a 0a0a 0a0a 0a00 04a > Can you also provide your pool setup: > > # ceph report --format xml 2>/dev/null | xmlstarlet sel -t -c "//osdmap/pools" > Attached you'll find the pools information. Thanks very much for looking into this. udo. 0data1hashpspool132022562564546611464960selfmanaged746611[1~3]0020-12020none000none00false1100false1metadata1hashpspool1320232432404594700selfmanaged00[]00-1-1-1none000none00false1100false3libvirt-pool1hashpspool13202324324046622465070selfmanaged11346622[1~4,6~3,a~2,d~4,12~2,1f~1,47~1,6a~4,70~2]0019-11919none0040408000defaultnone00false1100false19libvirt-pool-cache9hashpspool,incomplete_clones1113210010004662200selfmanaged11346622[1~4,6~3,a~2,d~4,12~2,1f~1,47~1,6a~4,70~2]003-1-1forward0040608000bloom0.050036001false1100false20data-cache9hashpspool,incomplete_clones1113210010004661100selfmanaged746611[1~3]000-1-1forward0040608000bloom0.050036001false1100false signature.asc Description: OpenPGP digital signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] SSD-Cache Tier + RBD-Cache = Filesystem corruption?
Assuming the partition table is still zeroed on that image, can you run: # rados -p get rbd_data.18394b3d1b58ba. - | cut -b 512 | hexdump Can you also provide your pool setup: # ceph report --format xml 2>/dev/null | xmlstarlet sel -t -c "//osdmap/pools" -- Jason Dillaman - Original Message - > From: "Udo Waechter" > To: "Jason Dillaman" > Cc: "ceph-users" > Sent: Thursday, February 11, 2016 4:04:43 AM > Subject: Re: [ceph-users] SSD-Cache Tier + RBD-Cache = Filesystem corruption? > > > > On 02/10/2016 06:07 PM, Jason Dillaman wrote: > > Can you provide the 'rbd info' dump from one of these corrupt images? > > > sure, > > rbd image 'ldap01.root.borked': > size 2 MB in 5000 objects > order 22 (4096 kB objects) > block_name_prefix: rbd_data.18394b3d1b58ba > format: 2 > features: layering > flags: > parent: libvirt-pool/debian7-instal...@installed.mini > overlap: 2 MB > > Thanks, > udo. > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] SSD-Cache Tier + RBD-Cache = Filesystem corruption?
On 02/10/2016 06:07 PM, Jason Dillaman wrote: > Can you provide the 'rbd info' dump from one of these corrupt images? > sure, rbd image 'ldap01.root.borked': size 2 MB in 5000 objects order 22 (4096 kB objects) block_name_prefix: rbd_data.18394b3d1b58ba format: 2 features: layering flags: parent: libvirt-pool/debian7-instal...@installed.mini overlap: 2 MB Thanks, udo. signature.asc Description: OpenPGP digital signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] SSD-Cache Tier + RBD-Cache = Filesystem corruption?
Can you provide the 'rbd info' dump from one of these corrupt images? -- Jason Dillaman - Original Message - > From: "Udo Waechter" > To: "Jason Dillaman" > Cc: "ceph-users" > Sent: Wednesday, February 10, 2016 12:04:41 PM > Subject: Re: [ceph-users] SSD-Cache Tier + RBD-Cache = Filesystem corruption? > > Hi, > > On 02/09/2016 03:46 PM, Jason Dillaman wrote: > > What release of Infernalis are you running? When you encounter this error, > > is the partition table zeroed out or does it appear to be random > > corruption? > > > its > ceph version 9.2.0 (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299) > > and dpkg -l ceph: > ceph 9.2.0-1~bpo80+1 > > from eu.ceph.com > > The partition table is zeroed out. Also, I have experienced that all > files which are actually written (DB-Files in ldap-cluster, postgres > transaction logs, ...) are corrupted. > > Nevertheless, restoring the partition-table and then running e2fsck > corrupts the filessytem beyond repair. > Some Images are even empty afterwards :( > > Thanks, > udo. > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] SSD-Cache Tier + RBD-Cache = Filesystem corruption?
Hi, On 02/09/2016 03:46 PM, Jason Dillaman wrote: > What release of Infernalis are you running? When you encounter this error, > is the partition table zeroed out or does it appear to be random corruption? > its ceph version 9.2.0 (bb2ecea240f3a1d525bcb35670cb07bd1f0ca299) and dpkg -l ceph: ceph 9.2.0-1~bpo80+1 from eu.ceph.com The partition table is zeroed out. Also, I have experienced that all files which are actually written (DB-Files in ldap-cluster, postgres transaction logs, ...) are corrupted. Nevertheless, restoring the partition-table and then running e2fsck corrupts the filessytem beyond repair. Some Images are even empty afterwards :( Thanks, udo. signature.asc Description: OpenPGP digital signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] SSD-Cache Tier + RBD-Cache = Filesystem corruption?
What release of Infernalis are you running? When you encounter this error, is the partition table zeroed out or does it appear to be random corruption? -- Jason Dillaman - Original Message - > From: "Udo Waechter" > To: "ceph-users" > Sent: Saturday, February 6, 2016 5:31:51 AM > Subject: [ceph-users] SSD-Cache Tier + RBD-Cache = Filesystem corruption? > > Hello, > > I am experiencing totally weird filesystem corruptions with the > following setup: > > * Ceph infernalis on Debian8 > * 10 OSDs (5 hosts) with spinning disks > * 4 OSDs (1 host, with SSDs) > > The SSDs are new in my setup and I am trying to setup a Cache tier. > > Now, with the spinning disks Ceph is running since about a year without > any major issues. Replacing disks and all that went fine. > > Ceph is used by rbd+libvirt+kvm with > > rbd_cache = true > rbd_cache_writethrough_until_flush = true > rbd_cache_size = 128M > rbd_cache_max_dirty = 96M > > Also, in libvirt, I have > > cachemode=writeback enabled. > > So far so good. > > Now, I've added the SSD-Cache tier to the picture with "cache-mode > writeback" > > The SSD-Machine also has "deadline" scheduler enabled. > > Suddenly VMs start to corrupt their filesystems (all ext4) with "Journal > failed". > Trying to reboot the machines ends in "No bootable drive" > Using parted and testdisk on the image mapped via rbd reveals that the > partition table is gone. > > testdisk finds the proper ones, e2fsck repairs the filesystem beyond > usage afterwards. > > This does not happen to all machines, It happens to those that actually > do some or most fo the IO > > elasticsearch, MariaDB+Galera, postgres, backup, GIT > > So I thought, yesterday one of my ldap-servers died, and that one is not > doing IO. > > Could it be that rbd caching + qemu writeback cache + ceph cach tier > writeback are not playing well together? > > I've read through some older mails on the list, where people had similar > problems and suspected somehting like that. > > What are the proper/right settings for rdb/qemu/libvirt? > > libvirt: cachemode=none (writeback?) > rdb: cache_mode = none > SSD-tier: cachemode: writeback > > ? > > Thanks for any help, > udo. > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] SSD-Cache Tier + RBD-Cache = Filesystem corruption?
Hello, I'm quite concerned by this (and the silence from the devs), however there are a number of people doing similar things (at least with Hammer) and you'd think they would have been bitten by this if it were a systemic bug. More below. On Sat, 6 Feb 2016 11:31:51 +0100 Udo Waechter wrote: > Hello, > > I am experiencing totally weird filesystem corruptions with the > following setup: > > * Ceph infernalis on Debian8 Hammer here, might be a regression. > * 10 OSDs (5 hosts) with spinning disks > * 4 OSDs (1 host, with SSDs) > So you're running your cache tier host with replication of 1, I presume? What kind of SSDs/FS/other relevant configuration options? Could there be simply some corruption on the SSDs that is of course then presented to the RDB clients eventually? > The SSDs are new in my setup and I am trying to setup a Cache tier. > > Now, with the spinning disks Ceph is running since about a year without > any major issues. Replacing disks and all that went fine. > > Ceph is used by rbd+libvirt+kvm with > > rbd_cache = true > rbd_cache_writethrough_until_flush = true > rbd_cache_size = 128M > rbd_cache_max_dirty = 96M > > Also, in libvirt, I have > > cachemode=writeback enabled. > > So far so good. > > Now, I've added the SSD-Cache tier to the picture with "cache-mode > writeback" > > The SSD-Machine also has "deadline" scheduler enabled. > > Suddenly VMs start to corrupt their filesystems (all ext4) with "Journal > failed". > Trying to reboot the machines ends in "No bootable drive" > Using parted and testdisk on the image mapped via rbd reveals that the > partition table is gone. > Did turning the cache explicitly off (both Ceph and qemu) fix this? > testdisk finds the proper ones, e2fsck repairs the filesystem beyond > usage afterwards. > > This does not happen to all machines, It happens to those that actually > do some or most fo the IO > > elasticsearch, MariaDB+Galera, postgres, backup, GIT > > So I thought, yesterday one of my ldap-servers died, and that one is not > doing IO. > > Could it be that rbd caching + qemu writeback cache + ceph cach tier > writeback are not playing well together? > > I've read through some older mails on the list, where people had similar > problems and suspected somehting like that. > Any particular references (URLs, Message-IDs)? Regards, Christian > What are the proper/right settings for rdb/qemu/libvirt? > > libvirt: cachemode=none (writeback?) > rdb: cache_mode = none > SSD-tier: cachemode: writeback > > ? > > Thanks for any help, > udo. > -- Christian BalzerNetwork/Systems Engineer ch...@gol.com Global OnLine Japan/Rakuten Communications http://www.gol.com/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] SSD-Cache Tier + RBD-Cache = Filesystem corruption?
>>Could it be that rbd caching + qemu writeback cache + ceph cach tier >>writeback are not playing well together? rbd caching=true is the same than qemu writeback. Setting cache=writeback in qemu, configure the librbd with rbd cache=true if you have fs corruption, it seem that flush from guest are not going correctly to the final storage. I never have had problem with rbd_cache=true . Maybe it's a bug with ssd cache tier... - Mail original - De: "Udo Waechter" À: "ceph-users" Envoyé: Samedi 6 Février 2016 11:31:51 Objet: [ceph-users] SSD-Cache Tier + RBD-Cache = Filesystem corruption? Hello, I am experiencing totally weird filesystem corruptions with the following setup: * Ceph infernalis on Debian8 * 10 OSDs (5 hosts) with spinning disks * 4 OSDs (1 host, with SSDs) The SSDs are new in my setup and I am trying to setup a Cache tier. Now, with the spinning disks Ceph is running since about a year without any major issues. Replacing disks and all that went fine. Ceph is used by rbd+libvirt+kvm with rbd_cache = true rbd_cache_writethrough_until_flush = true rbd_cache_size = 128M rbd_cache_max_dirty = 96M Also, in libvirt, I have cachemode=writeback enabled. So far so good. Now, I've added the SSD-Cache tier to the picture with "cache-mode writeback" The SSD-Machine also has "deadline" scheduler enabled. Suddenly VMs start to corrupt their filesystems (all ext4) with "Journal failed". Trying to reboot the machines ends in "No bootable drive" Using parted and testdisk on the image mapped via rbd reveals that the partition table is gone. testdisk finds the proper ones, e2fsck repairs the filesystem beyond usage afterwards. This does not happen to all machines, It happens to those that actually do some or most fo the IO elasticsearch, MariaDB+Galera, postgres, backup, GIT So I thought, yesterday one of my ldap-servers died, and that one is not doing IO. Could it be that rbd caching + qemu writeback cache + ceph cach tier writeback are not playing well together? I've read through some older mails on the list, where people had similar problems and suspected somehting like that. What are the proper/right settings for rdb/qemu/libvirt? libvirt: cachemode=none (writeback?) rdb: cache_mode = none SSD-tier: cachemode: writeback ? Thanks for any help, udo. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com