Re: [ceph-users] filesystem fragmentation on ext4 OSD
Am 06.02.2014 16:24, schrieb Mark Nelson: Hi Christian, can you tell me a little bit about how you are using Ceph and what kind of IO you are doing? Just forgot to mention: we're running Ceph 0.72.2 on Linux 3.10 (both storage servers and inside VMs) and Qemu-KVM 1.5.3. Regards Christian -- Dipl.-Inf. Christian Kauhaus · k...@gocept.com · systems administration gocept gmbh co. kg · Forsterstraße 29 · 06112 Halle (Saale) · Germany http://gocept.com · tel +49 345 219401-11 Python, Pyramid, Plone, Zope · consulting, development, hosting, operations ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] filesystem fragmentation on ext4 OSD
On 02/06/2014 01:41 PM, Christian Kauhaus wrote: Am 06.02.2014 16:24, schrieb Mark Nelson: Hi Christian, can you tell me a little bit about how you are using Ceph and what kind of IO you are doing? Sure. We're using it almost exclusively for serving VM images that are accessed from Qemu's built-in RBD client. The VMs themselves perform a very wide range of I/O types, from servers that write mainly log files to ZEO database servers with nearly completely random I/O. Many VMs have slowly increasing storage utilization. A reason could be that the OSDs issue syncfs() calls and ext4 cuts FS extents from just what has been written so far. But I'm not sure about the exact pattern of OSD/filesystem interaction. Ok, so the reason I was wondering about the use case is if you were doing RBD specifically. Fragmentation has been something we've periodically kind of battled with but still see in some cases. BTRFS especially can get pretty spectacularly fragmented due to COW and overwrites. There's a thread from a couple of weeks ago called rados io hints that you may want to look at/contribute to. Thanks! Mark HTH Christian ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] filesystem fragmentation on ext4 OSD
Am 07.02.2014 14:42, schrieb Mark Nelson: Ok, so the reason I was wondering about the use case is if you were doing RBD specifically. Fragmentation has been something we've periodically kind of battled with but still see in some cases. BTRFS especially can get pretty spectacularly fragmented due to COW and overwrites. There's a thread from a couple of weeks ago called rados io hints that you may want to look at/contribute to. Thank you for the hint. Sage's proposal on ceph-devel sounds good, so I'll wait for an implementation. Regards Christian -- Dipl.-Inf. Christian Kauhaus · k...@gocept.com · systems administration gocept gmbh co. kg · Forsterstraße 29 · 06112 Halle (Saale) · Germany http://gocept.com · tel +49 345 219401-11 Python, Pyramid, Plone, Zope · consulting, development, hosting, operations ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] filesystem fragmentation on ext4 OSD
On Fri, 07 Feb 2014 18:46:31 +0100 Christian Kauhaus wrote: Am 07.02.2014 14:42, schrieb Mark Nelson: Ok, so the reason I was wondering about the use case is if you were doing RBD specifically. Fragmentation has been something we've periodically kind of battled with but still see in some cases. BTRFS especially can get pretty spectacularly fragmented due to COW and overwrites. There's a thread from a couple of weeks ago called rados io hints that you may want to look at/contribute to. Thank you for the hint. Sage's proposal on ceph-devel sounds good, so I'll wait for an implementation. Pardon me for stating the maybe painfully obvious, but wouldn't setting the allocsize to 4MB (with XFS and the default Ceph object size) do a world of good to prevent fragmentation? Nothing like tjat in ext4, though dealloc might help. Despite really liking btrfs a lot it consistently comes in last by quite a margin when it comes to speed, especially in my main use case of mail storage (the clear winner there is ext4, followed by XFS). And the KVM people warn against using it as a backing store for a reason. ^.^ Regards, Christian (another one) -- Christian BalzerNetwork/Systems Engineer ch...@gol.com Global OnLine Japan/Fusion Communications http://www.gol.com/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] filesystem fragmentation on ext4 OSD
On Sat, 8 Feb 2014, Christian Balzer wrote: On Fri, 07 Feb 2014 18:46:31 +0100 Christian Kauhaus wrote: Am 07.02.2014 14:42, schrieb Mark Nelson: Ok, so the reason I was wondering about the use case is if you were doing RBD specifically. Fragmentation has been something we've periodically kind of battled with but still see in some cases. BTRFS especially can get pretty spectacularly fragmented due to COW and overwrites. There's a thread from a couple of weeks ago called rados io hints that you may want to look at/contribute to. Thank you for the hint. Sage's proposal on ceph-devel sounds good, so I'll wait for an implementation. Pardon me for stating the maybe painfully obvious, but wouldn't setting the allocsize to 4MB (with XFS and the default Ceph object size) do a world of good to prevent fragmentation? This is what we plan on doing, although I was thinking an allocation size of 1MB might be more appropriate as a default. In any case, though, the challenge is that not all objects are RBD objects, nor are all images using 4MB objects, so the OSD can't blindly do this; it needs to respond to a hint from the client. Ilya is working on this now. sage ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] filesystem fragmentation on ext4 OSD
On Sat, 8 Feb 2014, Christian Balzer wrote: On Fri, 7 Feb 2014 19:22:54 -0800 (PST) Sage Weil wrote: On Sat, 8 Feb 2014, Christian Balzer wrote: On Fri, 07 Feb 2014 18:46:31 +0100 Christian Kauhaus wrote: Am 07.02.2014 14:42, schrieb Mark Nelson: Ok, so the reason I was wondering about the use case is if you were doing RBD specifically. Fragmentation has been something we've periodically kind of battled with but still see in some cases. BTRFS especially can get pretty spectacularly fragmented due to COW and overwrites. There's a thread from a couple of weeks ago called rados io hints that you may want to look at/contribute to. Thank you for the hint. Sage's proposal on ceph-devel sounds good, so I'll wait for an implementation. Pardon me for stating the maybe painfully obvious, but wouldn't setting the allocsize to 4MB (with XFS and the default Ceph object size) do a world of good to prevent fragmentation? This is what we plan on doing, although I was thinking an allocation size of 1MB might be more appropriate as a default. In any case, though, the challenge is that not all objects are RBD objects, nor are all images using 4MB objects, so the OSD can't blindly do this; it needs to respond to a hint from the client. Ilya is working on this now. Of course, for a generic implementation that would need to be done with what was discussed in the rados io hints thread. In a use case where the sole usage would be RBD with the default object size, mounting the XFS file systems with allocsize=4m might do the trick for now though, right? If that option does what it sounds like it does, then yeah! sage ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] filesystem fragmentation on ext4 OSD
On 02/06/2014 04:17 AM, Christian Kauhaus wrote: Hi, after running Ceph for a while I see a lot of fragmented files on our OSD filesystems (all running ext4). For example: itchy ~ # fsck -f /srv/ceph/osd/ceph-5 fsck von util-linux 2.22.2 e2fsck 1.42 (29-Nov-2011) [...] /dev/mapper/vgosd00-ceph--osd00: 461903/418119680 files (33.7% non-contiguous), 478239460/836229120 blocks This is an unusually high value for ext4. The normal expectation is something in the 5% range. I suspect that such a high fragmentation produces lots of unnecessary seeks on the disks. Has anyone an idea what to do to make Ceph fragment an OSD filesystem less? Hi Christian, can you tell me a little bit about how you are using Ceph and what kind of IO you are doing? TIA Christian ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] filesystem fragmentation on ext4 OSD
Am 06.02.2014 16:24, schrieb Mark Nelson: Hi Christian, can you tell me a little bit about how you are using Ceph and what kind of IO you are doing? Sure. We're using it almost exclusively for serving VM images that are accessed from Qemu's built-in RBD client. The VMs themselves perform a very wide range of I/O types, from servers that write mainly log files to ZEO database servers with nearly completely random I/O. Many VMs have slowly increasing storage utilization. A reason could be that the OSDs issue syncfs() calls and ext4 cuts FS extents from just what has been written so far. But I'm not sure about the exact pattern of OSD/filesystem interaction. HTH Christian -- Dipl.-Inf. Christian Kauhaus · k...@gocept.com · systems administration gocept gmbh co. kg · Forsterstraße 29 · 06112 Halle (Saale) · Germany http://gocept.com · tel +49 345 219401-11 Python, Pyramid, Plone, Zope · consulting, development, hosting, operations ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com