Re: [ceph-users] Workaround for XFS lockup resulting in down OSDs

2017-02-08 Thread Thorvald Natvig
Hi, High-concurrency backfilling or flushing a cache tier triggers it fairly reliably. Setting backfills to >16 and switching from hammer to jewel tunables (which moves most of the data) will trigger this, as will going in the opposite direction. The nodes where we observed this most commonly ar

Re: [ceph-users] Workaround for XFS lockup resulting in down OSDs

2017-02-08 Thread Shinobu Kinjo
On Wed, Feb 8, 2017 at 8:07 PM, Dan van der Ster wrote: > Hi, > > This is interesting. Do you have a bit more info about how to identify > a server which is suffering from this problem? Is there some process > (xfs* or kswapd?) we'll see as busy in top or iotop. That's my question as well. If you

Re: [ceph-users] Workaround for XFS lockup resulting in down OSDs

2017-02-08 Thread Thorvald Natvig
erted to Kernel 4.4 (from Ubuntu) the problem stopped > immediately. > > Nick > >> -Original Message- >> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Dan >> van der Ster >> Sent: 08 February 2017 11:08 >> To: Thorvald Nat

Re: [ceph-users] Workaround for XFS lockup resulting in down OSDs

2017-02-08 Thread Nick Fisk
lto:ceph-users-boun...@lists.ceph.com] On Behalf Of Dan > van der Ster > Sent: 08 February 2017 11:08 > To: Thorvald Natvig > Cc: ceph-users > Subject: Re: [ceph-users] Workaround for XFS lockup resulting in down OSDs > > Hi, > > This is interesting. Do you have a bit

Re: [ceph-users] Workaround for XFS lockup resulting in down OSDs

2017-02-08 Thread Dan van der Ster
Hi, This is interesting. Do you have a bit more info about how to identify a server which is suffering from this problem? Is there some process (xfs* or kswapd?) we'll see as busy in top or iotop. Also, which kernel are you using? Cheers, Dan On Tue, Feb 7, 2017 at 6:59 PM, Thorvald Natvig wr

[ceph-users] Workaround for XFS lockup resulting in down OSDs

2017-02-07 Thread Thorvald Natvig
Hi, We've encountered a small "kernel feature" in XFS using Filestore. We have a workaround, and would like to share in case others have the same problem. Under high load, on slow storage, with lots of dirty buffers and low memory, there's a design choice with unfortunate side-effects if you have