Re: Is the checkpoint interval adjustable?
Torbjørn schrieb: >> Just curious: What would be the benefit of increasing the checkpoint >> interval? > Laptops typically spin down disks to save power. If btrfs forces a write > every 30 second, you have to spin it back up. I'd expect btrfs not to write to the disk when a checkpoint is reached and no writes occurred to the filesystem meanwhile... Could some developer shed some light on this? IMHO if this is true, there is no point in increasing the checkpoint interval... Thoughts? Regards, Kai -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Is the checkpoint interval adjustable?
On 08/03/2013 07:28 PM, Kai Krakow wrote: Mike Audia schrieb: I believe 30 sec is the default for the checkpoint interval. Is this adjustable? -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Just curious: What would be the benefit of increasing the checkpoint interval? Laptops typically spin down disks to save power. If btrfs forces a write every 30 second, you have to spin it back up. -- Torbjørn As far as I understood it would not decrease write load on the drives because it will only update a few pointers and probably increase the generation number... Greetings, Kai -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Is the checkpoint interval adjustable?
Mike Audia schrieb: > I believe 30 sec is the default for the checkpoint interval. Is this > adjustable? -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html Just curious: What would be the benefit of increasing the checkpoint interval? As far as I understood it would not decrease write load on the drives because it will only update a few pointers and probably increase the generation number... Greetings, Kai -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Is the checkpoint interval adjustable?
Mike Audia posted on Fri, 02 Aug 2013 16:58:42 -0400 as excerpted: >> From: David Sterba There were a few requests to tune the interval. This >> finally made me to finish the patch and will send it in a second. > > Thank you, David and to others who kindly replied to my post. I will > try your patch rather than modifying the code > >> > > Are there any unforeseen and effects of doing this? Thank you >> > > for >> > > the consideration. >> > >> > I don't *think* that there should be. One way of looking at it is >> > that both 30 and 300 seconds are an *eternity* for cpu, memory, and >> > storage. >> > Any trouble that you could get in to in 300 seconds some other >> > machine could trivially get in to in 30 with beefier hardware. >> >> That's a good point and lowers my worries a bit, though it would be >> interesting to see in what way a beefy machine blows with 300 seconds >> set. > > I have my system booting to a BTRFS root partition. Let's say I'm using > a value of 300 for my checkpoint interval. Does this mean that if I do > a TON of filesystem writes (say I update my system which pulls down a > bunch of system file updates for example), and I copy over several gigs > of data from a backup, all _between_ checkpoints and for some reason, my > system freezes forcing me to ungracefully restart... is EVERYTHING since > the last checkpoint is lost? When I tried btrfs on faulty hardware a bit over a year ago, yes. And yes, that's the way a btree filesystem such as btrfs generally works, too, because when a change happens, it recurses up the tree until finally the master node is updated. Until the master node is updated, the old master node remains effective. During the time between the first change and the master node update, additional changes may occur, making the final master node update and likely several below it more "efficient", since that single write now covers more than a single change. However, if the system bellys up in the meantime, that means you lose everything since the last master node update. Here's my experience from last year. I had some failing hardware, which turned out to be the mobo, but before I ultimately figured out the problem, I thought it was the disks. Thus, I bought a new one and attempted to replace what I thought was a failing one, copying everything over, and thinking I'd try the new to me btrfs while I was at it. But what was really happening hardware-wise was that my then 8-year-old mobo had some capacitors going bad (I found several bulging and others burst when I finally figured out it was the mobo). That was triggering intermittent I/O errors that I had (wrongly) attributed to the disks dying, thus the replacement attempt. The symptom was SATA retries, downgrading the speed and retrying again, and eventually timing out and resetting the SATA interface. Only sometimes the whole system would lockup before a successful reset, or it would timeout and reset enough times that I'd give up and do a full system reset. The one thing I /did/ notice was that if I kept things cold enough (by the time I was done I had the AC turned down so far I was sitting here in a full winter jacket, long underwear, and a knit hat... in a Phoenix summer with temps of 40-45C/100-115F outside!!), the system would work better, so that's what I was trying to do. It was in this environment that I was attempting to copy all my old data from what I /thought/ was a failing disc drive (or drives, I was running md/raid1 for most of the system), initially blaming the copy failures on what I thought was the failing drive(s), until I had enough data on the new drive to try disconnecting the old drives and copying data around on the new drive. When that acted up with the old drives entirely disconnected, I realized it wasn't the drives after all, and eventually found the problem. But meanwhile, when I'd have to reset, what I'd find is that on btrfs, the whole tree I had been trying to copy over, and that I /thought/ had mostly copied fine, was gone. Or worse, part of the metadata had copied, the filesystem tree or at least part of it, and was still there after a reboot, but all or most of the files were zeroed out!! At least if nothing at all copied I knew right away where I was at. With the zeroed out files, I'd have to figure out how much actual data had copied and remained on the new drive, and where it had gone from saving everything to only saving the metadata, with the actual files zeroed out. Then I could delete them and try again. My previous filesystem (and the one I returned to for a year after I gave up on btrfs for the time being, I'm back on btrfs, with new SSDs, now) was reiserfs. It has actually been *IMPRESSIVELY* reliable for me, even thru various hardware failure, at least since the reiserfs data=ordered by default mode was introduced back in kernel 2.6.6 or some such. (As it turns out, it was the same Chris Mason
Re: Is the checkpoint interval adjustable?
> From: David Sterba > There were a few requests to tune the interval. This finally made me to > finish the patch and will send it in a second. Thank you, David and to others who kindly replied to my post. I will try your patch rather than modifying the code > > > Are there any unforeseen and effects of doing this? Thank you for > > > the consideration. > > > > I don't *think* that there should be. One way of looking at it is that > > both 30 and 300 seconds are an *eternity* for cpu, memory, and storage. > > Any trouble that you could get in to in 300 seconds some other machine > > could trivially get in to in 30 with beefier hardware. > > That's a good point and lowers my worries a bit, though it would be > interesting to see in what way a beefy machine blows with 300 seconds > set. I have my system booting to a BTRFS root partition. Let's say I'm using a value of 300 for my checkpoint interval. Does this mean that if I do a TON of filesystem writes (say I update my system which pulls down a bunch of system file updates for example), and I copy over several gigs of data from a backup, all _between_ checkpoints and for some reason, my system freezes forcing me to ungracefully restart... is EVERYTHING since the last checkpoint is lost? Upon a reboot, will BTRFS just mount up to the last good checkpoiint automatically or will I have a broken system and need to add the `-o recovery` option while I mount it manualy from a chroot? Another naive question: if I shutdown the system between checkpoints, systemd should umount my partitions. Does the syncing of cached data occur after the graceful umount? -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Is the checkpoint interval adjustable?
> There were a few requests to tune the interval. This finally made me to > finish the patch and will send it in a second. Great, thanks. > That's a good point and lowers my worries a bit, though it would be > interesting to see in what way a beefy machine blows with 300 seconds > set. Agreed. Ideally the transaction machinery decides at some point that a transaction is sufficiently huge that it'll saturate the storage pipeline and kicks it off. - z -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Is the checkpoint interval adjustable?
On Wed, Jul 31, 2013 at 03:56:40PM -0700, Zach Brown wrote: > > I am NO programmer by any stretch. Let's say I want them to be once > > every 5 min (300 sec). Is the attached patch sane to acheive this? > > I think it's a reasonable patch to try, yeah. There were a few requests to tune the interval. This finally made me to finish the patch and will send it in a second. > > Are there any unforeseen and effects of doing this? Thank you for > > the consideration. > > I don't *think* that there should be. One way of looking at it is that > both 30 and 300 seconds are an *eternity* for cpu, memory, and storage. > Any trouble that you could get in to in 300 seconds some other machine > could trivially get in to in 30 with beefier hardware. That's a good point and lowers my worries a bit, though it would be interesting to see in what way a beefy machine blows with 300 seconds set. david -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Is the checkpoint interval adjustable?
Zach Brown posted on Wed, 31 Jul 2013 15:56:40 -0700 as excerpted: [Mike Audia wrote...] >> I am NO programmer by any stretch. Let's say I want them to be once >> every 5 min (300 sec). Is the attached patch sane to acheive this? >> Are there any unforeseen and effects of doing this? > I don't *think* that there should be. One way of looking at it is that > both 30 and 300 seconds are an *eternity* for cpu, memory, and storage. > Any trouble that you could get in to in 300 seconds some other machine > could trivially get in to in 30 with beefier hardware. As a sysadmin (not a programmer) that has messed around with, for example, vm.dirty_bytes/ratio, vm.dirty_writeback_centisecs, etc, the concern I'd have is that longer commit periods and larger commit buffers increase the possibility of writeback storms. While I've not tweaked btrfs and I probably need to reexamine my current settings since I've switched to SSD and btrfs, for spinning rust and reiserfs, I ended up tweaking vm.dirty_* here. The files are /proc/sys/vm/* and the kernel documentation for them in Documentation/sysctl/vm.txt. Most distros have an initscript that writes any custom values at boot, using values set in /etc/sysctl.conf and/or /etc/sysctl.d/*, so that's where you'd normally set them once you've settled on values that work for you. The following are the defaults and what I settled on for a wall-powered system. vm.dirty_ratio defaults to 10 (percent of RAM). I've read and agree with opinions that 10% of RAM when RAM is say half a gig (so 10% is ~50 MB) isn't too bad on spinning rust, but it can be MUCH worse when RAM is say my current 16 gig (so 10% is ~1.6 gig), as that's several seconds of writeback on spinning rust. I reset that to 3% (~half a gig), here. vm.dirty_background_ratio similarly, 5 (% of RAM) by default, reset to 1 (~160 MB). (The vm.dirty_(background_)bytes knobs parallel the above "ratio" knobs and may be easier to set for those thinking in terms of writeback backlog size and corresponding system responsiveness or lack thereof during that writeback, instead of percentage of memory dirty. Set one set or the other.) OTOH, vm.dirty_expire_centisecs defaults to 2999 (30 seconds, this is the high priority foreground value and might well be the reason btrfs is coded for a 30 second commit time as well) and vm.dirty_writeback_centisecs defaults to 499 (5 seconds, this is the lower priority background value). I left expire where it was, but decided with the stricter ratio settings, writeback could be 10 seconds, doubling the background writeback time. Before tuning btrfs' hardcoded defaults, I'd suggest tuning these values if you haven't already done so, and keeping them in mind if you do decide to tune btrfs as well. For battery powered systems, also take a look at laptop mode (and laptop- mode-tools), which I use here on my laptop (which I don't have at hand to check what I set for vm.dirty_* on it). -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Is the checkpoint interval adjustable?
> Thank you kindly for the prompt reply. My goal is to make them _less_ > frequent. I assumed as much. I should have added some sympathy smileys :). > I am NO programmer by any stretch. Let's say I want them to be once > every 5 min (300 sec). Is the attached patch sane to acheive this? I think it's a reasonable patch to try, yeah. > Are there any unforeseen and effects of doing this? Thank you for > the consideration. I don't *think* that there should be. One way of looking at it is that both 30 and 300 seconds are an *eternity* for cpu, memory, and storage. Any trouble that you could get in to in 300 seconds some other machine could trivially get in to in 30 with beefier hardware. But I reserve the right to be wrong. - z -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Is the checkpoint interval adjustable?
> On Wed, Jul 31, 2013 at 04:02:29PM -0400, Mike Audia wrote: > > I believe 30 sec is the default for the checkpoint interval. Is this > > adjustable? > > It doesn't look like it. It looks like it's implemented with raw '30's > in the code. > > delay = HZ * 30; > ... > (now < cur->start_time || now - cur->start_time < > 30)) { > > If you want more frequent forced commits you could always syncfs() > regularly from userspace, I suppose. Thank you kindly for the prompt reply. My goal is to make them _less_ frequent. I am NO programmer by any stretch. Let's say I want them to be once every 5 min (300 sec). Is the attached patch sane to acheive this? Are there any unforeseen and effects of doing this? Thank you for the consideration. --- a/fs/btrfs/disk-io.c 2013-07-31 18:05:22.581062955 -0400 +++ b/fs/btrfs/disk-io.c 2013-07-31 18:06:15.243201652 -0400 @@ -1713,7 +1713,7 @@ do { cannot_commit = false; - delay = HZ * 30; + delay = HZ * 300; mutex_lock(&root->fs_info->transaction_kthread_mutex); spin_lock(&root->fs_info->trans_lock); @@ -1725,7 +1725,7 @@ now = get_seconds(); if (!cur->blocked && - (now < cur->start_time || now - cur->start_time < 30)) { + (now < cur->start_time || now - cur->start_time < 300)) { spin_unlock(&root->fs_info->trans_lock); delay = HZ * 5; goto sleep;
Re: Is the checkpoint interval adjustable?
On Wed, Jul 31, 2013 at 04:02:29PM -0400, Mike Audia wrote: > I believe 30 sec is the default for the checkpoint interval. Is this > adjustable? It doesn't look like it. It looks like it's implemented with raw '30's in the code. delay = HZ * 30; ... (now < cur->start_time || now - cur->start_time < 30)) { If you want more frequent forced commits you could always syncfs() regularly from userspace, I suppose. - z -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html