Re: [ceph-users] filesystem fragmentation on ext4 OSD

2014-02-07 Thread Christian Kauhaus
Am 06.02.2014 16:24, schrieb Mark Nelson:
 Hi Christian, can you tell me a little bit about how you are using Ceph and
 what kind of IO you are doing?

Just forgot to mention: we're running Ceph 0.72.2 on Linux 3.10 (both storage
servers and inside VMs) and Qemu-KVM 1.5.3.

Regards

Christian

-- 
Dipl.-Inf. Christian Kauhaus  · k...@gocept.com · systems administration
gocept gmbh  co. kg · Forsterstraße 29 · 06112 Halle (Saale) · Germany
http://gocept.com · tel +49 345 219401-11
Python, Pyramid, Plone, Zope · consulting, development, hosting, operations
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] filesystem fragmentation on ext4 OSD

2014-02-07 Thread Mark Nelson

On 02/06/2014 01:41 PM, Christian Kauhaus wrote:

Am 06.02.2014 16:24, schrieb Mark Nelson:

Hi Christian, can you tell me a little bit about how you are using Ceph and
what kind of IO you are doing?


Sure. We're using it almost exclusively for serving VM images that are
accessed from Qemu's built-in RBD client. The VMs themselves perform a very
wide range of I/O types, from servers that write mainly log files to ZEO
database servers with nearly completely random I/O. Many VMs have slowly
increasing storage utilization.

A reason could be that the OSDs issue syncfs() calls and ext4 cuts FS extents
from just what has been written so far. But I'm not sure about the exact
pattern of OSD/filesystem interaction.


Ok, so the reason I was wondering about the use case is if you were 
doing RBD specifically.  Fragmentation has been something we've 
periodically kind of battled with but still see in some cases.  BTRFS 
especially can get pretty spectacularly fragmented due to COW and 
overwrites.  There's a thread from a couple of weeks ago called rados 
io hints that you may want to look at/contribute to.


Thanks!
Mark



HTH

Christian



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] filesystem fragmentation on ext4 OSD

2014-02-07 Thread Christian Kauhaus
Am 07.02.2014 14:42, schrieb Mark Nelson:
 Ok, so the reason I was wondering about the use case is if you were doing RBD
 specifically.  Fragmentation has been something we've periodically kind of
 battled with but still see in some cases.  BTRFS especially can get pretty
 spectacularly fragmented due to COW and overwrites.  There's a thread from a
 couple of weeks ago called rados io hints that you may want to look
 at/contribute to.

Thank you for the hint. Sage's proposal on ceph-devel sounds good, so I'll
wait for an implementation.

Regards

Christian

-- 
Dipl.-Inf. Christian Kauhaus  · k...@gocept.com · systems administration
gocept gmbh  co. kg · Forsterstraße 29 · 06112 Halle (Saale) · Germany
http://gocept.com · tel +49 345 219401-11
Python, Pyramid, Plone, Zope · consulting, development, hosting, operations
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] filesystem fragmentation on ext4 OSD

2014-02-07 Thread Christian Balzer
On Fri, 07 Feb 2014 18:46:31 +0100 Christian Kauhaus wrote:

 Am 07.02.2014 14:42, schrieb Mark Nelson:
  Ok, so the reason I was wondering about the use case is if you were
  doing RBD specifically.  Fragmentation has been something we've
  periodically kind of battled with but still see in some cases.  BTRFS
  especially can get pretty spectacularly fragmented due to COW and
  overwrites.  There's a thread from a couple of weeks ago called rados
  io hints that you may want to look at/contribute to.
 
 Thank you for the hint. Sage's proposal on ceph-devel sounds good, so
 I'll wait for an implementation.
 

Pardon me for stating the maybe painfully obvious, but wouldn't setting the
allocsize to 4MB (with XFS and the default Ceph object size) do a world of
good to prevent fragmentation?

Nothing like tjat in ext4, though dealloc might help.

Despite really liking btrfs a lot it consistently comes in last by quite a
margin when it comes to speed, especially in my main use case of mail
storage (the clear winner there is ext4, followed by XFS). 
And the KVM people warn against using it as a backing store for a reason.
^.^

Regards,

Christian (another one)
-- 
Christian BalzerNetwork/Systems Engineer
ch...@gol.com   Global OnLine Japan/Fusion Communications
http://www.gol.com/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] filesystem fragmentation on ext4 OSD

2014-02-07 Thread Sage Weil
On Sat, 8 Feb 2014, Christian Balzer wrote:
 On Fri, 07 Feb 2014 18:46:31 +0100 Christian Kauhaus wrote:
 
  Am 07.02.2014 14:42, schrieb Mark Nelson:
   Ok, so the reason I was wondering about the use case is if you were
   doing RBD specifically.  Fragmentation has been something we've
   periodically kind of battled with but still see in some cases.  BTRFS
   especially can get pretty spectacularly fragmented due to COW and
   overwrites.  There's a thread from a couple of weeks ago called rados
   io hints that you may want to look at/contribute to.
  
  Thank you for the hint. Sage's proposal on ceph-devel sounds good, so
  I'll wait for an implementation.
  
 
 Pardon me for stating the maybe painfully obvious, but wouldn't setting the
 allocsize to 4MB (with XFS and the default Ceph object size) do a world of
 good to prevent fragmentation?

This is what we plan on doing, although I was thinking an allocation size 
of 1MB might be more appropriate as a default.  In any case, though, the 
challenge is that not all objects are RBD objects, nor are all images 
using 4MB objects, so the OSD can't blindly do this; it needs to respond 
to a hint from the client.  Ilya is working on this now.

sage
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] filesystem fragmentation on ext4 OSD

2014-02-07 Thread Sage Weil
On Sat, 8 Feb 2014, Christian Balzer wrote:
 On Fri, 7 Feb 2014 19:22:54 -0800 (PST) Sage Weil wrote:
 
  On Sat, 8 Feb 2014, Christian Balzer wrote:
   On Fri, 07 Feb 2014 18:46:31 +0100 Christian Kauhaus wrote:
   
Am 07.02.2014 14:42, schrieb Mark Nelson:
 Ok, so the reason I was wondering about the use case is if you were
 doing RBD specifically.  Fragmentation has been something we've
 periodically kind of battled with but still see in some cases.
 BTRFS especially can get pretty spectacularly fragmented due to
 COW and overwrites.  There's a thread from a couple of weeks ago
 called rados io hints that you may want to look at/contribute to.

Thank you for the hint. Sage's proposal on ceph-devel sounds good, so
I'll wait for an implementation.

   
   Pardon me for stating the maybe painfully obvious, but wouldn't
   setting the allocsize to 4MB (with XFS and the default Ceph object
   size) do a world of good to prevent fragmentation?
  
  This is what we plan on doing, although I was thinking an allocation
  size of 1MB might be more appropriate as a default.  In any case,
  though, the challenge is that not all objects are RBD objects, nor are
  all images using 4MB objects, so the OSD can't blindly do this; it needs
  to respond to a hint from the client.  Ilya is working on this now.
  
 Of course, for a generic implementation that would need to be done with
 what was discussed in the rados io hints thread.
 
 In a use case where the sole usage would be RBD with the default object
 size, mounting the XFS file systems with allocsize=4m might do the trick
 for now though, right?

If that option does what it sounds like it does, then yeah!

sage

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] filesystem fragmentation on ext4 OSD

2014-02-06 Thread Mark Nelson

On 02/06/2014 04:17 AM, Christian Kauhaus wrote:

Hi,

after running Ceph for a while I see a lot of fragmented files on our OSD
filesystems (all running ext4). For example:

itchy ~ # fsck -f /srv/ceph/osd/ceph-5
fsck von util-linux 2.22.2
e2fsck 1.42 (29-Nov-2011)
[...]
/dev/mapper/vgosd00-ceph--osd00: 461903/418119680 files (33.7%
non-contiguous), 478239460/836229120 blocks

This is an unusually high value for ext4. The normal expectation is something
in the 5% range. I suspect that such a high fragmentation produces lots of
unnecessary seeks on the disks.

Has anyone an idea what to do to make Ceph fragment an OSD filesystem less?


Hi Christian, can you tell me a little bit about how you are using Ceph 
and what kind of IO you are doing?




TIA

Christian



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] filesystem fragmentation on ext4 OSD

2014-02-06 Thread Christian Kauhaus
Am 06.02.2014 16:24, schrieb Mark Nelson:
 Hi Christian, can you tell me a little bit about how you are using Ceph and
 what kind of IO you are doing?

Sure. We're using it almost exclusively for serving VM images that are
accessed from Qemu's built-in RBD client. The VMs themselves perform a very
wide range of I/O types, from servers that write mainly log files to ZEO
database servers with nearly completely random I/O. Many VMs have slowly
increasing storage utilization.

A reason could be that the OSDs issue syncfs() calls and ext4 cuts FS extents
from just what has been written so far. But I'm not sure about the exact
pattern of OSD/filesystem interaction.

HTH

Christian

-- 
Dipl.-Inf. Christian Kauhaus  · k...@gocept.com · systems administration
gocept gmbh  co. kg · Forsterstraße 29 · 06112 Halle (Saale) · Germany
http://gocept.com · tel +49 345 219401-11
Python, Pyramid, Plone, Zope · consulting, development, hosting, operations
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com