Re: Another ENOSPC situation
On 2016-04-02 01:43, Chris Murphy wrote: On Fri, Apr 1, 2016 at 10:55 PM, Duncan <1i5t5.dun...@cox.net> wrote: Marc Haber posted on Fri, 01 Apr 2016 15:40:29 +0200 as excerpted: [4/502]mh@swivel:~$ sudo btrfs fi usage / Overall: Device size: 600.00GiB Device allocated:600.00GiB Device unallocated:1.00MiB That's the problem right there. The admin didn't do his job and spot the near full allocation issue I don't yet agree this is an admin problem. This is the 2nd or 3rd case we've seen only recently where there's plenty of space in all chunk types and yet ENOSPC happens, seemingly only because there's no unallocated space remaining. I don't know that this is a regression for sure, but it sure seems like one. I personally don't think it's a regression. I've hit this myself before (although I make a point not to anymore, having to jump through hoops to the degree I did to get the FS working again tends to provide a pretty big incentive to not let it happen again), I know a couple of other people who have and never reported it here or on IRC, and I'd be willing to bet that the reason we're seeing it recently is that more 'regular' users (in contrast to system administrators or developers) are using BTRFS, and they tend to be more likely to hit such issues (because they're not as likely to know about them in the first place, let alone how to avoid them). Data,single: Size:553.93GiB, Used:405.73GiB /dev/mapper/swivelbtr 553.93GiB Metadata,DUP: Size:23.00GiB, Used:3.83GiB /dev/mapper/swivelbtr 46.00GiB System,DUP: Size:32.00MiB, Used:112.00KiB /dev/mapper/swivelbtr 64.00MiB Unallocated: /dev/mapper/swivelbtr 1.00MiB [5/503]mh@swivel:~$ Both data and metadata have several GiB free, data ~140 GiB free, and metadata isn't into global reserve, so the system isn't totally wedged, only partially, due to the lack of unallocated space. Unallocated space alone hasn't ever caused this that I can remember. It's most often been totally full metadata chunks, with free space in allocated data chunks, with no unallocated space out of which to create another metadata chunk to write out changes. There should be plenty of space for either a -dusage=1 or -musage=1 balance to free up a bunch of partially allocated chunks. Offhand I don't think the profiles filter is helpful in this case. OK so where I could be wrong is that I'm expecting balance doesn't require allocated space to work. I'd expect that it can COW extents from one chunk into another existing chunk (of the same type) and then once that's successful, free up that chunk, i.e. revert it back to unallocated. If balance can only copy into newly allocated chunks, that seems like a big problem. I thought that problems had been fixed a very long time ago. Balance has always allocated new chunks. This is IMHO one of the big issues with the current implementation of it (the other being that it can't be made asynchronous without some creative userspace work). If we aren't converting chunk types and we're on a single device FS, we should be tail-packing existing chunks before we try to allocate new ones. And what we don't see from 'usage' that we will see from 'df' is the GlobalReserve values. I'd like to see that. Anyway, in the meantime there is a work around: btrfs dev add Just add a device, even if it's an 8GiB flash drive. But it can be a spare space on a partition, or it can be a logical volume, or whatever you want. That'll add some gigs of unallocated space. Now the balance will work, or for absolutely sure there's a bug (and a new one because this has always worked in the past). After whatever filtered or full balance is done, make sure to 'btfs dev rem' and confirm it's gone with 'btrfs fi show' before removing the device. It's a two device volume until that device is successfully removed and is in something of a fragile state until then because any loss of data on that 2nd device has a good chance of face planting the file system. If you can ensure with a relative degree of certainty that you won't lose power or crash, and you have lots of RAM, a small ramdisk (or even zram) works well for this too. I wouldn't use either personally for a critical filesystem (I'd pull out the disk and hook it up internally to another system with spare disk space and handle things there), but both options should work fine. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Another ENOSPC situation
Chris Murphy posted on Fri, 01 Apr 2016 23:43:46 -0600 as excerpted: > On Fri, Apr 1, 2016 at 10:55 PM, Duncan <1i5t5.dun...@cox.net> wrote: >> Marc Haber posted on Fri, 01 Apr 2016 15:40:29 +0200 as excerpted: > >>> [4/502]mh@swivel:~$ sudo btrfs fi usage / Overall: >>> Device size: 600.00GiB Device allocated: >>> 600.00GiB Device unallocated:1.00MiB >> >> That's the problem right there. The admin didn't do his job and spot >> the near full allocation issue > > > I don't yet agree this is an admin problem. This is the 2nd or 3rd case > we've seen only recently where there's plenty of space in all chunk > types and yet ENOSPC happens, seemingly only because there's no > unallocated space remaining. I don't know that this is a regression for > sure, but it sure seems like one. Notice that he said _balance_ failed with ENOSPC. He did _NOT_ say he was getting it in ordinary usage, just yet. Which would fit a 100% allocated situation, with plenty of space left in both data and metadata chunks. The plenty of space left inside the chunks would keep ordinary usage from running into problems just yet, but balance really /does/ need room to allocate at least one new chunk in ordered to properly handle the chunk rewrite via COW. (At least for data, metadata seems to work a bit differently. See below.) Balance has always failed with ENOSPC if there was no unallocated space left. It used to happen all the time, before btrfs learned how to delete empty chunks in 3.17, but while that helps, it only works for literally /empty/ chunks. Chunks with even a single block/node still in use don't get deleted automatically. What I think is happening now is that while the empty-chunk deleting from 3.17 on helped, it has been long enough since then, now, that people with particular usage patterns, I'd strongly suspect those with heavy snapshotting, don't tend to fully empty their chunks to the extent that those with other usage patterns do, and it has been just long enough now that we're beginning to see the problem reported again, because deleting empty chunks helped, but they weren't fully emptying enough chunks to keep up with things that way, in their particular use-cases. >>> Data,single: Size:553.93GiB, Used:405.73GiB >>>/dev/mapper/swivelbtr 553.93GiB >>> >>> Metadata,DUP: Size:23.00GiB, Used:3.83GiB >>>/dev/mapper/swivelbtr 46.00GiB >>> >>> System,DUP: Size:32.00MiB, Used:112.00KiB >>>/dev/mapper/swivelbtr 64.00MiB >>> >>> Unallocated: >>>/dev/mapper/swivelbtr 1.00MiB >>> [5/503]mh@swivel:~$ >> >> Both data and metadata have several GiB free, data ~140 GiB free, and >> metadata isn't into global reserve, so the system isn't totally wedged, >> only partially, due to the lack of unallocated space. > > Unallocated space alone hasn't ever caused this that I can remember. > It's most often been totally full metadata chunks, with free space in > allocated data chunks, with no unallocated space out of which to create > another metadata chunk to write out changes. Unallocated space alone doesn't cause ENOSPC with normal operations; for those you're correct, running out of either data or metadata space is required as well. (Normally it's metadata that runs out, but I recall seeing one post from someone who had metadata room but full data. The behavior was.. "interesting", as he could do renames, etc, and even create small files as long as they were small enough to stay in metadata. As soon as he tried to do anything that needed an actual data extent, however, ENOSPC.) But balance has always required space to allocate at least one chunk, as COW means the existing chunk can't be released until everything is rewritten into the new one. Tho it seems that btrfs can sometimes either write very small metadata chunks, which don't forget are dup by default on a single device, as they are in this case. He has 1 MiB unallocated. Split in half that's 512 KiB. I'm not sure if btrfs can go that small, but if it can, and it can find a low enough usage metadata chunk to write into it, freeing the larger metadata chunk... Or maybe btrfs can actually use the global reserve for that, since global reserve is part of metadata. If it can, a 512 MiB global reserve would be just large enough to write the two copies of a nominally 256 MiB metadata chunk. Either way, I've seen a number of times now where btrfs was able to balance metadata, when it had less than the 256 (*2 if dup) MiB unallocated that would normally be required. Maybe it /is/ able to use global reserve for that, which would allow it to work, as long as metadata isn't so tight that it's already using global reserve. That's actually what I bet it's doing, now that I think about it. Because as long as the global reserve isn't being used, 512 MiB of global reserve would be exactly 2*256 MiB metadata chunks, and if they're un
Re: Another ENOSPC situation
On Fri, Apr 1, 2016 at 10:55 PM, Duncan <1i5t5.dun...@cox.net> wrote: > Marc Haber posted on Fri, 01 Apr 2016 15:40:29 +0200 as excerpted: >> [4/502]mh@swivel:~$ sudo btrfs fi usage / >> Overall: >> Device size: 600.00GiB >> Device allocated:600.00GiB >> Device unallocated:1.00MiB > > That's the problem right there. The admin didn't do his job and spot the > near full allocation issue I don't yet agree this is an admin problem. This is the 2nd or 3rd case we've seen only recently where there's plenty of space in all chunk types and yet ENOSPC happens, seemingly only because there's no unallocated space remaining. I don't know that this is a regression for sure, but it sure seems like one. >> >> Data,single: Size:553.93GiB, Used:405.73GiB >>/dev/mapper/swivelbtr 553.93GiB >> >> Metadata,DUP: Size:23.00GiB, Used:3.83GiB >>/dev/mapper/swivelbtr 46.00GiB >> >> System,DUP: Size:32.00MiB, Used:112.00KiB >>/dev/mapper/swivelbtr 64.00MiB >> >> Unallocated: >>/dev/mapper/swivelbtr 1.00MiB >> [5/503]mh@swivel:~$ > > Both data and metadata have several GiB free, data ~140 GiB free, and > metadata isn't into global reserve, so the system isn't totally wedged, > only partially, due to the lack of unallocated space. Unallocated space alone hasn't ever caused this that I can remember. It's most often been totally full metadata chunks, with free space in allocated data chunks, with no unallocated space out of which to create another metadata chunk to write out changes. There should be plenty of space for either a -dusage=1 or -musage=1 balance to free up a bunch of partially allocated chunks. Offhand I don't think the profiles filter is helpful in this case. OK so where I could be wrong is that I'm expecting balance doesn't require allocated space to work. I'd expect that it can COW extents from one chunk into another existing chunk (of the same type) and then once that's successful, free up that chunk, i.e. revert it back to unallocated. If balance can only copy into newly allocated chunks, that seems like a big problem. I thought that problems had been fixed a very long time ago. And what we don't see from 'usage' that we will see from 'df' is the GlobalReserve values. I'd like to see that. Anyway, in the meantime there is a work around: btrfs dev add Just add a device, even if it's an 8GiB flash drive. But it can be a spare space on a partition, or it can be a logical volume, or whatever you want. That'll add some gigs of unallocated space. Now the balance will work, or for absolutely sure there's a bug (and a new one because this has always worked in the past). After whatever filtered or full balance is done, make sure to 'btfs dev rem' and confirm it's gone with 'btrfs fi show' before removing the device. It's a two device volume until that device is successfully removed and is in something of a fragile state until then because any loss of data on that 2nd device has a good chance of face planting the file system. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Another ENOSPC situation
Marc Haber posted on Fri, 01 Apr 2016 15:40:29 +0200 as excerpted: > Hi, > > just for a change, this is another btrfs on a different host. The host > is also running Debian unstable with mainline kernels, the btrfs in > question was created (not converted) in March 2015 with btrfs-tools > 3.17. It is the root fs of my main work notebook which is under > workstation load, with lots of snapshots being created and deleted. > > Balance immediately fails with ENOSPC > > balance -dprofiles=single -dusage=1 goes through "fine" ("had to > relocate 0 out of 602 chunks") > > balance -dprofiles=single -dusage=2 also ENOSPCes immediately. > > [4/502]mh@swivel:~$ sudo btrfs fi usage / > Overall: > Device size: 600.00GiB > Device allocated:600.00GiB > Device unallocated:1.00MiB That's the problem right there. The admin didn't do his job and spot the near full allocation issue (perhaps with the help of some script set to run periodically and tell him about it) before it got critical, and now there's no room left to balance, to fix the problem. This despite the fact that the admin chose to run a not yet entirely stable filesystem that's well known to run off the rails in precisely this sort of way, occasionally, with specific use-cases such as heavy snapshotting more often than others. > Device missing: 0.00B > Used:413.40GiB > Free (estimated):148.20GiB (min: 148.20GiB) Tho the used vs. free isn't all that bad... it's just that the allocated vs. unallocated was allowed to run off the rails and get the filesystem in a bind. But that does mean it should be possible to do something about it. =:^) > Data ratio: 1.00 > Metadata ratio: 2.00 > Global reserve: 512.00MiB (used: 0.00B) > > Data,single: Size:553.93GiB, Used:405.73GiB >/dev/mapper/swivelbtr 553.93GiB > > Metadata,DUP: Size:23.00GiB, Used:3.83GiB >/dev/mapper/swivelbtr 46.00GiB > > System,DUP: Size:32.00MiB, Used:112.00KiB >/dev/mapper/swivelbtr 64.00MiB > > Unallocated: >/dev/mapper/swivelbtr 1.00MiB > [5/503]mh@swivel:~$ Both data and metadata have several GiB free, data ~140 GiB free, and metadata isn't into global reserve, so the system isn't totally wedged, only partially, due to the lack of unallocated space. > btrfs balance -mprofiles seems to do something. one kworked and one > btrfs-transaction process hog one CPU core each for hours, while > blocking the filesystem for minutes apiece, which leads to the host > being nearly unuseable up to the point of "clock and mouse pointer > frozen for nearly ten minutes". > > The btrfs balance cancel I issued after four hours of this state took > eleven minutes alone to complete. It's worth noting as an aside that Linux isn't necessarily tuned for interactivity by default, tho there are definitely ways to make it more so. Additionally, on some mobos at least, it's possible to tweak the BIOS balance between interactivity and thruput. An old Tyan board (PCI not the newer PCIE, which avoids some of the problems with multiple dedicated buses) I had was tilted a bit heavily toward thruput, which did make sense as it was actually a server board, until I tweaked things a bit. That made a LOT of difference, curing the dragging, but also curing occasional audio runouts, etc. Turns out it was simply tuned to do huge bus "packets" (I forgot the proper in-context term, and that board died a few years ago, so...), increasing thruput, but also increasing latency beyond what the sound card and keyboard/mouse (or in that case the human operating them) could reasonably deal with. By shortening the PCI "packet length", it reduced thruput a bit but greatly improved latency, letting other users have their turn when they needed it, not some time later. Of course in addition to PCIE putting many of those things on dedicated buses these days, ssds are so much faster that a lot of things that could potentially be problems on spinning rust, simply don't tend to be issues on ssds. As much as anything, I think that's what a lot of users bothered by such problems are turning to, and I'd bet that's a good part of why SSDs are as popular as they are, as well. I know I've simply not had many of the problems here that others had, and while I think part of it is the multiple relatively small but independent filesystems and part of it may be because I don't use snapshotting, I also think a major part of it is simply that the SSDs I'm running btrfs on are simply so much faster than spinning rust that the problems either don't occur, or if they do, they're done before I even notice them. FWIW, I do still use spinning rust, but for my media partition and (second) backups, not for anything speed critical at all. And FWIW, I still use reiserfs on that
Re: Another ENOSPC situation
On Fri, Apr 1, 2016 at 10:40 PM, Marc Haber wrote: > On Fri, Apr 01, 2016 at 09:20:52PM +0200, Henk Slager wrote: >> On Fri, Apr 1, 2016 at 6:50 PM, Marc Haber >> wrote: >> > On Fri, Apr 01, 2016 at 06:30:20PM +0200, Marc Haber wrote: >> >> On Fri, Apr 01, 2016 at 05:44:30PM +0200, Henk Slager wrote: >> >> > On Fri, Apr 1, 2016 at 3:40 PM, Marc Haber >> >> > wrote: >> >> > > btrfs balance -mprofiles seems to do something. one kworked and one >> >> > > btrfs-transaction process hog one CPU core each for hours, while >> >> > > blocking the filesystem for minutes apiece, which leads to the host >> >> > > being nearly unuseable up to the point of "clock and mouse pointer >> >> > > frozen for nearly ten minutes". >> >> > >> >> > I assume you still have your every 10 minutes snapshotting running >> >> > while balancing? >> >> >> >> No, I disabled the cronjob before trying the balance. I might be >> >> crazy, but not stup^wunexperienced. >> > >> > That being said, I would still expect the code not to allow _this_ >> > kind of effect on the entire system when two alledgely incompatible >> > operations run simultaneously. I mean, Linux is a multi-user, >> > multi-tasking operating system where one simply cannot expect all >> > processes to be cooperative to each other. We have the operating >> > systems to prevent this kind of issues, not to cause them. >> >> Maybe look at it differently: Does user mh have trouble using this >> laptop w.r.t. storing files? > > No. I would have cried murder otherwise. > >> In openSUSE Tumbleweed (the snapshot from end of march), root access >> is needed to change the default snapshotting config, otherwise you >> will have a 10 year history. After that change has been done according >> to needs of the user, there is no need to run manual balance. > > So you are saying the balancing a filesystem should never be > necessary? Or what are you trying to say? There is a package bbtrfsmaintenance which does balancing for the user after it is configured by root according to user's wishes and needs. Key thing I want to say is that you should change you snapshotting rate and/or policy. It has been hinted before and it is more a psychological issue than technical I think. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Another ENOSPC situation
On Fri, Apr 01, 2016 at 09:20:52PM +0200, Henk Slager wrote: > On Fri, Apr 1, 2016 at 6:50 PM, Marc Haber > wrote: > > On Fri, Apr 01, 2016 at 06:30:20PM +0200, Marc Haber wrote: > >> On Fri, Apr 01, 2016 at 05:44:30PM +0200, Henk Slager wrote: > >> > On Fri, Apr 1, 2016 at 3:40 PM, Marc Haber > >> > wrote: > >> > > btrfs balance -mprofiles seems to do something. one kworked and one > >> > > btrfs-transaction process hog one CPU core each for hours, while > >> > > blocking the filesystem for minutes apiece, which leads to the host > >> > > being nearly unuseable up to the point of "clock and mouse pointer > >> > > frozen for nearly ten minutes". > >> > > >> > I assume you still have your every 10 minutes snapshotting running > >> > while balancing? > >> > >> No, I disabled the cronjob before trying the balance. I might be > >> crazy, but not stup^wunexperienced. > > > > That being said, I would still expect the code not to allow _this_ > > kind of effect on the entire system when two alledgely incompatible > > operations run simultaneously. I mean, Linux is a multi-user, > > multi-tasking operating system where one simply cannot expect all > > processes to be cooperative to each other. We have the operating > > systems to prevent this kind of issues, not to cause them. > > Maybe look at it differently: Does user mh have trouble using this > laptop w.r.t. storing files? No. I would have cried murder otherwise. > In openSUSE Tumbleweed (the snapshot from end of march), root access > is needed to change the default snapshotting config, otherwise you > will have a 10 year history. After that change has been done according > to needs of the user, there is no need to run manual balance. So you are saying the balancing a filesystem should never be necessary? Or what are you trying to say? Greetings Marc -- - Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Another ENOSPC situation
On Fri, Apr 1, 2016 at 6:50 PM, Marc Haber wrote: > On Fri, Apr 01, 2016 at 06:30:20PM +0200, Marc Haber wrote: >> On Fri, Apr 01, 2016 at 05:44:30PM +0200, Henk Slager wrote: >> > On Fri, Apr 1, 2016 at 3:40 PM, Marc Haber >> > wrote: >> > > btrfs balance -mprofiles seems to do something. one kworked and one >> > > btrfs-transaction process hog one CPU core each for hours, while >> > > blocking the filesystem for minutes apiece, which leads to the host >> > > being nearly unuseable up to the point of "clock and mouse pointer >> > > frozen for nearly ten minutes". >> > >> > I assume you still have your every 10 minutes snapshotting running >> > while balancing? >> >> No, I disabled the cronjob before trying the balance. I might be >> crazy, but not stup^wunexperienced. > > That being said, I would still expect the code not to allow _this_ > kind of effect on the entire system when two alledgely incompatible > operations run simultaneously. I mean, Linux is a multi-user, > multi-tasking operating system where one simply cannot expect all > processes to be cooperative to each other. We have the operating > systems to prevent this kind of issues, not to cause them. Maybe look at it differently: Does user mh have trouble using this laptop w.r.t. storing files? In openSUSE Tumbleweed (the snapshot from end of march), root access is needed to change the default snapshotting config, otherwise you will have a 10 year history. After that change has been done according to needs of the user, there is no need to run manual balance. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Another ENOSPC situation
On Fri, Apr 01, 2016 at 06:30:20PM +0200, Marc Haber wrote: > On Fri, Apr 01, 2016 at 05:44:30PM +0200, Henk Slager wrote: > > On Fri, Apr 1, 2016 at 3:40 PM, Marc Haber > > wrote: > > > btrfs balance -mprofiles seems to do something. one kworked and one > > > btrfs-transaction process hog one CPU core each for hours, while > > > blocking the filesystem for minutes apiece, which leads to the host > > > being nearly unuseable up to the point of "clock and mouse pointer > > > frozen for nearly ten minutes". > > > > I assume you still have your every 10 minutes snapshotting running > > while balancing? > > No, I disabled the cronjob before trying the balance. I might be > crazy, but not stup^wunexperienced. That being said, I would still expect the code not to allow _this_ kind of effect on the entire system when two alledgely incompatible operations run simultaneously. I mean, Linux is a multi-user, multi-tasking operating system where one simply cannot expect all processes to be cooperative to each other. We have the operating systems to prevent this kind of issues, not to cause them. Greetings Marc -- - Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Another ENOSPC situation
On Fri, Apr 01, 2016 at 05:44:30PM +0200, Henk Slager wrote: > On Fri, Apr 1, 2016 at 3:40 PM, Marc Haber > wrote: > > btrfs balance -mprofiles seems to do something. one kworked and one > > btrfs-transaction process hog one CPU core each for hours, while > > blocking the filesystem for minutes apiece, which leads to the host > > being nearly unuseable up to the point of "clock and mouse pointer > > frozen for nearly ten minutes". > > I assume you still have your every 10 minutes snapshotting running > while balancing? No, I disabled the cronjob before trying the balance. I might be crazy, but not stup^wunexperienced. Greetings Marc -- - Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany| lose things."Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Another ENOSPC situation
On Fri, Apr 1, 2016 at 3:40 PM, Marc Haber wrote: > Hi, > > just for a change, this is another btrfs on a different host. The host > is also running Debian unstable with mainline kernels, the btrfs in > question was created (not converted) in March 2015 with btrfs-tools > 3.17. It is the root fs of my main work notebook which is under > workstation load, with lots of snapshots being created and deleted. > > Balance immediately fails with ENOSPC > > balance -dprofiles=single -dusage=1 goes through "fine" ("had to > relocate 0 out of 602 chunks") > > balance -dprofiles=single -dusage=2 also ENOSPCes immediately. > > [4/502]mh@swivel:~$ sudo btrfs fi usage / > Overall: > Device size: 600.00GiB > Device allocated:600.00GiB > Device unallocated:1.00MiB > Device missing: 0.00B > Used:413.40GiB > Free (estimated):148.20GiB (min: 148.20GiB) > Data ratio: 1.00 > Metadata ratio: 2.00 > Global reserve: 512.00MiB (used: 0.00B) > > Data,single: Size:553.93GiB, Used:405.73GiB >/dev/mapper/swivelbtr 553.93GiB > > Metadata,DUP: Size:23.00GiB, Used:3.83GiB >/dev/mapper/swivelbtr 46.00GiB > > System,DUP: Size:32.00MiB, Used:112.00KiB >/dev/mapper/swivelbtr 64.00MiB > > Unallocated: >/dev/mapper/swivelbtr 1.00MiB > [5/503]mh@swivel:~$ > > btrfs balance -mprofiles seems to do something. one kworked and one > btrfs-transaction process hog one CPU core each for hours, while > blocking the filesystem for minutes apiece, which leads to the host > being nearly unuseable up to the point of "clock and mouse pointer > frozen for nearly ten minutes". I assume you still have your every 10 minutes snapshotting running while balancing? > The btrfs balance cancel I issued after four hours of this state took > eleven minutes alone to complete. > > These are all log entries that were obtained after starting btrfs > balance -mprofiles on 09:43 > Apr 1 12:18:21 swivel kernel: [253651.970413] BTRFS info (device dm-14): > found 3523 extents > Apr 1 12:18:21 swivel kernel: [253652.035572] BTRFS info (device dm-14): > relocating block group 1538365849600 flags 36 > Apr 1 13:30:57 swivel kernel: [258007.653597] BTRFS info (device dm-14): > found 3585 extents > Apr 1 13:30:57 swivel kernel: [258007.746541] BTRFS info (device dm-14): > relocating block group 1536755236864 flags 36 > Apr 1 13:49:39 swivel kernel: [259130.296184] BTRFS info (device dm-14): > found 3047 extents > Apr 1 13:49:39 swivel kernel: [259130.357314] BTRFS info (device dm-14): > relocating block group 1528702173184 flags 36 > Apr 1 14:30:00 swivel kernel: [261550.776348] BTRFS info (device dm-14): > found 4200 extents > > This kernel trace from 11:16 is not btrfs-related, is it? I guess it's > bluetooth related since it happened simultaneously to the bluetooth > device popping out an in: > Apr 1 11:16:38 swivel kernel: [249948.993751] usb 1-1.4: USB disconnect, > device number 39 > Apr 1 11:16:38 swivel systemd[1]: Starting Load/Save RF Kill Switch Status... > Apr 1 11:16:38 swivel systemd[1]: Started Load/Save RF Kill Switch Status. > Apr 1 11:16:38 swivel systemd[1]: bluetooth.target: Unit not needed anymore. > Stopping. > Apr 1 11:16:38 swivel systemd[1]: Stopped target Bluetooth. > Apr 1 11:16:38 swivel laptop-mode: Laptop mode > Apr 1 11:16:38 swivel laptop-mode: enabled, not active > Apr 1 11:16:39 swivel kernel: [249949.211549] usb 1-1.4: new full-speed USB > device number 40 using ehci-pci > Apr 1 11:16:39 swivel kernel: [249949.308386] usb 1-1.4: New USB device > found, idVendor=0a5c, idProduct=217f > Apr 1 11:16:39 swivel kernel: [249949.308397] usb 1-1.4: New USB device > strings: Mfr=1, Product=2, SerialNumber=3 > Apr 1 11:16:39 swivel kernel: [249949.308402] usb 1-1.4: Product: Broadcom > Bluetooth Device > Apr 1 11:16:39 swivel kernel: [249949.308407] usb 1-1.4: Manufacturer: > Broadcom Corp > Apr 1 11:16:39 swivel kernel: [249949.308412] usb 1-1.4: SerialNumber: > CCAF78F1274F > Apr 1 11:16:39 swivel systemd[1]: Reached target Bluetooth. > Apr 1 11:16:39 swivel kernel: [249949.507794] [ cut here > ] > Apr 1 11:16:39 swivel kernel: [249949.507810] WARNING: CPU: 1 PID: 11 at > arch/x86/kernel/cpu/perf_event_intel_ds.c:325 reserve_ds_buffers+0x102/0x326() > Apr 1 11:16:39 swivel kernel: [249949.507813] alloc_bts_buffer: BTS buffer > allocation failure > Apr 1 11:16:39 swivel kernel: [249949.507816] Modules linked in: cpuid > hid_generic usbhid hid e1000e tun ctr ccm rfcomm bridge stp llc > cpufreq_userspace cpufreq_stats cpufreq_conservative cpufreq_powersave > nf_conntrack_netlink nfnetlink bnep binfmt_misc intel_rapl > x86_pkg_temp_thermal arc4 intel_powerclamp kvm_intel kvm irqbypass iwldvm > snd_hda_codec_conexant snd_hda_codec_generic mac80211 inpu