Re: Struggling with file system slowness
Those snapshots were created using Marc Merlin's script (thanks, Marc). They don't do anything except sit around on the file system for a week or so and then are removed. I'm now doing quarter-hourly snaps instead of nightly since I have nightly backups of the filesytem going off-site. So far the btrfs-transaction and memory spikes have not returned. -Matt On 05/09/2017 03:14 PM, Liu Bo wrote: On Fri, May 05, 2017 at 09:24:32AM -0400, Matt McKinnon wrote: Too little information. Is IO happening at the same time? Is compression on? Deduplicated? Lots of subvolumes? SSD? What kind of workload and file size/distribution profile? Only write IO during the load spikes. No compression, no deduplication. 12 volumes (including snapshots). Spinning disks. Medium workload; file sizes are all over the map since this hold about 30 user home directories. Interestingly enough, the problems which had persisted for many weeks went away when all snapshots were removed. btrfs-transaction spikes disappeared. Memory usage went from 30G to under 2G. Were those snapshots served as backup? Could you please elaborate how you create snapshots? We could probably hammer out a testcase to improve the situation. Thanks, -liubo -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Struggling with file system slowness
On Fri, May 05, 2017 at 09:24:32AM -0400, Matt McKinnon wrote: > > Too little information. Is IO happening at the same time? Is > > compression on? Deduplicated? Lots of subvolumes? SSD? What > > kind of workload and file size/distribution profile? > > Only write IO during the load spikes. No compression, no deduplication. 12 > volumes (including snapshots). Spinning disks. Medium workload; file sizes > are all over the map since this hold about 30 user home directories. > > Interestingly enough, the problems which had persisted for many weeks went > away when all snapshots were removed. btrfs-transaction spikes disappeared. > Memory usage went from 30G to under 2G. > Were those snapshots served as backup? Could you please elaborate how you create snapshots? We could probably hammer out a testcase to improve the situation. Thanks, -liubo -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Struggling with file system slowness
Matt McKinnon posted on Thu, 04 May 2017 09:15:28 -0400 as excerpted: > Hi All, > > Trying to peg down why I have one server that has btrfs-transacti pegged > at 100% CPU for most of the time. > > I thought this might have to do with fragmentation as mentioned in the > Gotchas page in the wiki (btrfs-endio-wri doesn't seem to be involved as > mentioned in the wiki), but after running a full defrag of the file > system, and also enabling the 'autodefrag' mount option, the problem > still persists. > > What's the best way to figure out what btrfs is chugging away at here? > > Kernel: 4.10.13-custom > btrfs-progs: v4.10.2 Headed for work so briefer than usual... Three questions: Number of snapshots per subvolume? Quotas enabled? Do you do dedupe or otherwise have lots of reflinks? These dramatically affect scaling. Keeping the number of snapshots per subvolume under 300, under 100 if possible, should help a lot. Quotas dramatically worsen the problem, so keeping them disabled unless your use- case calls for them should help (and if your use-case calls for them, consider a filesystem where the quota feature is more mature). And reflinks are the mechanism behind snapshots, so too many of them for other reasons (such as dedupe) create problems too, tho a snapshot basically reflinks /everything/, so it takes quite a few reflinks to trigger the scaling issues of a single snapshot, meaning they aren't normally a problem unless dedupe is done on a /massive/ scale. Of course defrag interacts with snapshots too, tho it shouldn't affect /this/ problem, but potentially eating up more space than expected as it breaks the reflinks. Beyond that, have you tried a (readonly) btrfs check and/or a scrub or balance recently? Perhaps there's something wrong that's snagging things, and you simply haven't otherwise detected it yet? -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Struggling with file system slowness
> Trying to peg down why I have one server that has > btrfs-transacti pegged at 100% CPU for most of the time. Too little information. Is IO happening at the same time? Is compression on? Deduplicated? Lots of subvolumes? SSD? What kind of workload and file size/distribution profile? Typical high CPU are extents (your defragging not necessarily worked), and 'qgroups', especially with many subvolumes. It could be the fre space cache in some rare cases. https://www.google.ca/search?num=100=images_q=cxpu_epq=btrfs-transaction To this something like this happens often, but is not Btrfs-related, but triggered for example by near-memory exhaustion in the kernel memory manager. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Struggling with file system slowness
Hi All, Trying to peg down why I have one server that has btrfs-transacti pegged at 100% CPU for most of the time. I thought this might have to do with fragmentation as mentioned in the Gotchas page in the wiki (btrfs-endio-wri doesn't seem to be involved as mentioned in the wiki), but after running a full defrag of the file system, and also enabling the 'autodefrag' mount option, the problem still persists. What's the best way to figure out what btrfs is chugging away at here? Kernel: 4.10.13-custom btrfs-progs: v4.10.2 -Matt -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html