Re: Struggling with file system slowness

2017-05-09 Thread Matt McKinnon
Those snapshots were created using Marc Merlin's script (thanks, Marc). 
They don't do anything except sit around on the file system for a week 
or so and then are removed.


I'm now doing quarter-hourly snaps instead of nightly since I have 
nightly backups of the filesytem going off-site.  So far the 
btrfs-transaction and memory spikes have not returned.


-Matt





On 05/09/2017 03:14 PM, Liu Bo wrote:

On Fri, May 05, 2017 at 09:24:32AM -0400, Matt McKinnon wrote:

Too little information. Is IO happening at the same time? Is
compression on? Deduplicated? Lots of subvolumes? SSD? What
kind of workload and file size/distribution profile?


Only write IO during the load spikes.  No compression, no deduplication.  12
volumes (including snapshots).  Spinning disks.  Medium workload; file sizes
are all over the map since this hold about 30 user home directories.

Interestingly enough, the problems which had persisted for many weeks went
away when all snapshots were removed.  btrfs-transaction spikes disappeared.
Memory usage went from 30G to under 2G.



Were those snapshots served as backup?

Could you please elaborate how you create snapshots?  We could
probably hammer out a testcase to improve the situation.

Thanks,

-liubo
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Struggling with file system slowness

2017-05-09 Thread Liu Bo
On Fri, May 05, 2017 at 09:24:32AM -0400, Matt McKinnon wrote:
> > Too little information. Is IO happening at the same time? Is
> > compression on? Deduplicated? Lots of subvolumes? SSD? What
> > kind of workload and file size/distribution profile?
> 
> Only write IO during the load spikes.  No compression, no deduplication.  12
> volumes (including snapshots).  Spinning disks.  Medium workload; file sizes
> are all over the map since this hold about 30 user home directories.
> 
> Interestingly enough, the problems which had persisted for many weeks went
> away when all snapshots were removed.  btrfs-transaction spikes disappeared.
> Memory usage went from 30G to under 2G.
>

Were those snapshots served as backup?

Could you please elaborate how you create snapshots?  We could
probably hammer out a testcase to improve the situation.

Thanks,

-liubo
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Struggling with file system slowness

2017-05-04 Thread Duncan
Matt McKinnon posted on Thu, 04 May 2017 09:15:28 -0400 as excerpted:

> Hi All,
> 
> Trying to peg down why I have one server that has btrfs-transacti pegged
> at 100% CPU for most of the time.
> 
> I thought this might have to do with fragmentation as mentioned in the
> Gotchas page in the wiki (btrfs-endio-wri doesn't seem to be involved as
> mentioned in the wiki), but after running a full defrag of the file
> system, and also enabling the 'autodefrag' mount option, the problem
> still persists.
> 
> What's the best way to figure out what btrfs is chugging away at here?
> 
> Kernel: 4.10.13-custom
> btrfs-progs: v4.10.2

Headed for work so briefer than usual...

Three questions:

Number of snapshots per subvolume?

Quotas enabled?

Do you do dedupe or otherwise have lots of reflinks?


These dramatically affect scaling.  Keeping the number of snapshots per 
subvolume under 300, under 100 if possible, should help a lot.  Quotas 
dramatically worsen the problem, so keeping them disabled unless your use-
case calls for them should help (and if your use-case calls for them, 
consider a filesystem where the quota feature is more mature).  And 
reflinks are the mechanism behind snapshots, so too many of them for 
other reasons (such as dedupe) create problems too, tho a snapshot 
basically reflinks /everything/, so it takes quite a few reflinks to 
trigger the scaling issues of a single snapshot, meaning they aren't 
normally a problem unless dedupe is done on a /massive/ scale.

Of course defrag interacts with snapshots too, tho it shouldn't affect 
/this/ problem, but potentially eating up more space than expected as it 
breaks the reflinks.


Beyond that, have you tried a (readonly) btrfs check and/or a scrub or 
balance recently?  Perhaps there's something wrong that's snagging 
things, and you simply haven't otherwise detected it yet?

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Struggling with file system slowness

2017-05-04 Thread Peter Grandi
> Trying to peg down why I have one server that has
> btrfs-transacti pegged at 100% CPU for most of the time.

Too little information. Is IO happening at the same time? Is
compression on? Deduplicated? Lots of subvolumes? SSD? What kind
of workload and file size/distribution profile?

Typical high CPU are extents (your defragging not necessarily
worked), and 'qgroups', especially with many subvolumes. It
could be the fre space cache in some rare cases.

  
https://www.google.ca/search?num=100=images_q=cxpu_epq=btrfs-transaction

To this something like this happens often, but is not
Btrfs-related, but triggered for example by near-memory
exhaustion in the kernel memory manager.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Struggling with file system slowness

2017-05-04 Thread Matt McKinnon

Hi All,

Trying to peg down why I have one server that has btrfs-transacti pegged 
at 100% CPU for most of the time.


I thought this might have to do with fragmentation as mentioned in the 
Gotchas page in the wiki (btrfs-endio-wri doesn't seem to be involved as 
mentioned in the wiki), but after running a full defrag of the file 
system, and also enabling the 'autodefrag' mount option, the problem 
still persists.


What's the best way to figure out what btrfs is chugging away at here?

Kernel: 4.10.13-custom
btrfs-progs: v4.10.2


-Matt
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html