Re: [markfasheh/duperemove] Why blocksize is limit to 1MB?
Hi, Before doing that, probably one way to think about it could be, what would be the probablitity of two 100M blocks generate the same hash and be treated as identical. Thanks, Xin Sent: Monday, January 02, 2017 at 4:32 AM From: "Peter Becker" <floyd@gmail.com> To: "Xin Zhou" <xin.z...@gmx.com> Cc: linux-btrfs <linux-btrfs@vger.kernel.org> Subject: Re: [markfasheh/duperemove] Why blocksize is limit to 1MB? > 1M is already a little bit too big in size. Not in my usecase :) Is it right the this isn't an limit in btrfs? So i can patch this and try 100M. The reason is, that i must dedupe the whole 8 TB in less then a day but with 128K and 1M blocksize it will take a week. I don't know why adding extends take so long. I/O during adding extends is less then 4MB/s, and CPU (dual core) and memory (8 GB) usage are less then 20%, on bare metal. 2017-01-01 5:38 GMT+01:00 Xin Zhou <xin.z...@gmx.com>: > Hi, > > In general, the larger the block / chunk size is, the less dedup can be > achieved. > 1M is already a little bit too big in size. > > Thanks, > Xin > > > > > Sent: Friday, December 30, 2016 at 12:28 PM > From: "Peter Becker" <floyd@gmail.com> > To: linux-btrfs <linux-btrfs@vger.kernel.org> > Subject: [markfasheh/duperemove] Why blocksize is limit to 1MB? > Hello, i have a 8 TB volume with multiple files with hundreds of GB each. > I try to dedupe this because the first hundred GB of many files are identical. > With 128KB blocksize with nofiemap and lookup-extends=no option, will > take more then a week (only dedupe, previously hashed). So i tryed -b > 100M but this returned me an error: "Blocksize is bounded ...". > > The reason is that the blocksize is limit to > > #define MAX_BLOCKSIZE (1024U*1024) > > But i can't found any description why. > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [markfasheh/duperemove] Why blocksize is limit to 1MB?
Hi, In general, the larger the block / chunk size is, the less dedup can be achieved. 1M is already a little bit too big in size. Thanks, Xin Sent: Friday, December 30, 2016 at 12:28 PM From: "Peter Becker"To: linux-btrfs Subject: [markfasheh/duperemove] Why blocksize is limit to 1MB? Hello, i have a 8 TB volume with multiple files with hundreds of GB each. I try to dedupe this because the first hundred GB of many files are identical. With 128KB blocksize with nofiemap and lookup-extends=no option, will take more then a week (only dedupe, previously hashed). So i tryed -b 100M but this returned me an error: "Blocksize is bounded ...". The reason is that the blocksize is limit to #define MAX_BLOCKSIZE (1024U*1024) But i can't found any description why. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [CORRUPTION FILESYSTEM] Corrupted and unrecoverable file system during the snapshot receive
>Unless there are bugs that >would show up in other situations as well (or an out-of-space condition >is triggered that would likewise show up in other situations with a >similar amount of data/metadata written), That is exactly some bugs come from. For simple cases, it is ok to assume the send/receive always succeed. And if it errors out, assumes the delete always succeed, and the file system is in consistent status, and good luck with the data. Xin Sent: Sunday, December 25, 2016 at 7:52 PM From: Duncan <1i5t5.dun...@cox.net> To: linux-btrfs@vger.kernel.org Subject: Re: [CORRUPTION FILESYSTEM] Corrupted and unrecoverable file system during the snapshot receive Xin Zhou posted on Mon, 26 Dec 2016 03:36:09 +0100 as excerpted: > One interesting thing to investigate might be the btrfs send / receive > result, under a disruptive network environment. If the connection breaks > in the middle of transfer (at different phase, maybe), see what could be > the file system status. Btrfs send, sends from a read-only snapshot, so the sending filesystem shouldn't be harmed no matter what happens to send. Btrfs receive does all its work in a new subvolume (basically a snapshot in an incremental send, tho I'm not sure it's a full snapshot in the technical sense), modifying the files therein using standard calls used in other contexts as well, so absent bugs that should appear in those other contexts too if they exist, the worst damage that a receive should be able to do is an unfaithful replay of the send stream, such that an appropriate copy of the sent snapshot doesn't appear on the receiver. Which means even in the case of error, cleanup is as simple as deleting the aborted/incompletely-received subvolume. Unless there are bugs that would show up in other situations as well (or an out-of-space condition is triggered that would likewise show up in other situations with a similar amount of data/metadata written), there should be no effects outside that received subvolume. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [CORRUPTION FILESYSTEM] Corrupted and unrecoverable file system during the snapshot receive
That is one way to diagnose the issue in data path. If ssh can guarantee data transfer and retry, then those data protection company does not need to have a whole team handle the send / receive for remote data backup. In your case, if the conection is very light, then the issue could be in other place. Xin Sent: Monday, December 26, 2016 at 3:04 AM From: "Giuseppe Della Bianca" <b...@adria.it> To: "Xin Zhou" <xin.z...@gmx.com>, "Btrfs BTRFS" <linux-btrfs@vger.kernel.org> Subject: Re: [CORRUPTION FILESYSTEM] Corrupted and unrecoverable file system during the snapshot receive Hi. I agree with Duncan, and I add: - For remote transfer is used ssh. ssh is designed to ensure integrity of data. - Remote transfer uses a Gigabit Ethernet, it is never congested. - I had the same problems with a local btrfs receive. - The script currently has 907 lines of code, many of which are to ensure the detection and display of btrfs tools errors. - The script stops executing when btrs tools return an error code. - Is not possible that the script does not display error messages or ignore error code of btrfs tools. An example of today: (2016-12-26 10:53:51) Start btrfsManage . . . Start managing SEND ' / ' filesystem ' root ' snapshot in ' /dev/sda2 ' Sending ' root-2016-12-04_18:13:57.35 ' source snapshot to ' btrfsreceive ' subvolume . . . btrfs send -p /tmp/tmp.xJWkEN1U23/btrfssnapshot/root/root-2016-12-03_18:07:09.34 /tmp/tmp.xJWkEN1U23/btrfssnapshot/root/root-2016-12-04_18:13:57.35 | btrfs receive /tmp/tmp.pWWKP4vfAy/btrfsreceive/root/.part/ . . . At subvol /tmp/tmp.xJWkEN1U23/btrfssnapshot/root/root-2016-12-04_18:13:57.35 . . . ERROR: truncate usr/share/locale/it/LC_MESSAGES/kio4.mo failed: Read-only file system . . . At snapshot root-2016-12-04_18:13:57.35 . . . _EC_ERR_ 1 . . . _EC_ERR_ 141 (2016-12-26 10:54:28) End btrfsManage . . . End managing SEND ' / ' filesystem ' root ' snapshot in ' /dev/sda2 ' WITH ERRORS Checking filesystem on /dev/sda2 UUID: 44f1de7e-a65b-41ce-8ff4-20f7ed83e106 checking extents ref mismatch on [62408097792 16384] extent item 0, found 1 Backref 62408097792 parent 1060 root 1060 not found in extent tree backpointer mismatch on [62408097792 16384] owner ref check failed [62408097792 16384] ref mismatch on [77565509632 16384] extent item 0, found 1 Backref 77565509632 parent 1060 root 1060 not found in extent tree backpointer mismatch on [77565509632 16384] ]zac[ Backref 77826916352 parent 1060 root 1060 not found in extent tree backpointer mismatch on [77826916352 16384] owner ref check failed [77826916352 16384] ref mismatch on [77853933568 16384] extent item 0, found 1 Backref 77853933568 parent 1060 root 1060 not found in extent tree backpointer mismatch on [77853933568 16384] owner ref check failed [77853933568 16384] checking free space cache checking fs roots warning line 3822 checking csums checking root refs found 135128678400 bytes used err is 0 total csum bytes: 126946572 total tree bytes: 5132206080 total fs tree bytes: 4744757248 total extent tree bytes: 240795648 btree space waste bytes: 914832832 file data blocks allocated: 3311786532864 referenced 703616266240 Is likely that mine is a special case. But a special case, with a code change in other points, can become a problem for many. It's not nice to say, but it seems I have to hope that my problem becomes a problem of many. Meanwhile, I'll find my own workaround of a probable serious btrfs bug. Thank you. Gdb > Hi, > > Probably can try to use "-v" to enable more output print. > A quick look at the send / receive code, it seems a little bit risky. > It seems lack of specific error handlings, and in most cases, return the > same error code. I think it might be helpful, when a transfer succeed, the > command prints the transfer id, source / dest, and a specific "success" > string. > Such output could help the script to figure out if a transfer really > succeed. > > The code is relatively new to me, I did not see retry logic in stream > handling, please correct me if I am wrong about this. So, I am not quite > sure about the transfer behavior, if the system subject to network issues > in heavy workload, in which packets missing or connect issues are not rare. > > Since the test mentioned at the begining deletes the snapshots after a > transfer, while most users keep the middle snapshot even in cascading > transfer, probably the current btrfs and cmds still works for regular > users. > > Thanks, > Xin > > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [CORRUPTION FILESYSTEM] Corrupted and unrecoverable file system during the snapshot receive
Hi, For free software with open source code, that is quite good. Most commercial product has a very robust error handling in transport, to guarantee no corruption due to transfer issues. One interesting thing to investigate might be the btrfs send / receive result, under a disruptive network environment. If the connection breaks in the middle of transfer (at different phase, maybe), see what could be the file system status. Thanks, Xin Sent: Sunday, December 25, 2016 at 2:57 PM From: Duncan <1i5t5.dun...@cox.net> To: linux-btrfs@vger.kernel.org Subject: Re: [CORRUPTION FILESYSTEM] Corrupted and unrecoverable file system during the snapshot receive Xin Zhou posted on Sat, 24 Dec 2016 21:15:40 +0100 as excerpted: > The code is relatively new to me, I did not see retry logic in stream > handling, please correct me if I am wrong about this. > So, I am not quite sure about the transfer behavior, if the system > subject to network issues in heavy workload, > in which packets missing or connect issues are not rare. As you likely know I'm neither a dev, just a list regular and btrfs user myself, and I'm not particularly familiar with send/receive as I don't use it myself, but... AFAIK, the send and receive sides are specifically designed to be separate and to work with STDOUT/STDIN, so it's possible with STDOUT redirection to "send" to a local file instead of directly to receive, and then to replay that file on the other end by cat-ing it to receive. As such, transfer behavior isn't really a factor at the btrfs layer, since handling problems in the transfer layer is the responsibility of whatever method the user is using to do that transfer, and the user is presumed to use a transfer method with whatever reliability guarantees they deem necessary. So network behavior isn't really a factor at the btrfs level as that's the transfer layer and btrfs isn't worrying about that, simply assuming it to have the necessary reliability. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [CORRUPTION FILESYSTEM] Corrupted and unrecoverable file system during the snapshot receive
Hi, Would you like to show the "btrfs send/receive" command the script are using, including all the parameters, and how the script waits for a completion of a transfer. >From the beginning of the thread, it seems the transfer tests are going >through different network environment. Thanks, Xin Sent: Friday, December 23, 2016 at 9:48 AM From: b...@adria.it To: "Xin Zhou" <xin.z...@gmx.com> Cc: "Btrfs BTRFS" <linux-btrfs@vger.kernel.org> Subject: Re: [CORRUPTION FILESYSTEM] Corrupted and unrecoverable file system during the snapshot receive Yes. Is through to the btrfs-tools error message that the script has printed, that I realized the filesystem corruption. P.S. Various messages that you see in the working examples of the script, are emitted directly by the btrfs-tools. Gdb Xin Zhou <xin.z...@gmx.com>: > Hi, > > Does the script check the transfer status, and is there a transfer returns an > error code? > Thanks, > Xin >  >  > > Sent: Thursday, December 22, 2016 at 11:28 PM > From: "Giuseppe Della Bianca" <b...@adria.it> > To: "Btrfs BTRFS" <linux-btrfs@vger.kernel.org> > Subject: Re: [CORRUPTION FILESYSTEM] Corrupted and unrecoverable file system > during the snapshot receive > (synthetic resend) > > Hi. > > Is possible that there are transfers, cancellations and other, at the same > time, but not in the same subvolume. > > My script checks that there are no transfers in progress on the same > subvolume. > > Is possible that the same subvolume is mounted several times (temporary > mount > at the beginning, and unmount at the end, in my script). > > > Thanks for all. > > > P.S. Sorry for my bad English. > > > Gdb > > > In data mercoledì 21 dicembre 2016 23:14:44, Xin Zhou ha scritto: > > Hi, > > Racing condition can happen, if running multiple transfers to the same > > destination. Would you like to tell how many transfers are the scripts > > running at a time to a specific hdd? > > > > Thanks, > > Xin > > > > > > Sent: Wednesday, December 21, 2016 at 1:11 PM > > From: "Chris Murphy" <li...@colorremedies.com> > > To: No recipient address > > Cc: "Giuseppe Della Bianca" <b...@adria.it>, "Xin Zhou" > <xin.z...@gmx.com>, > > "Btrfs BTRFS" <linux-btrfs@vger.kernel.org> Subject: Re: [CORRUPTION > > FILESYSTEM] Corrupted and unrecoverable file system during the snapshot > > receive > > On Wed, Dec 21, 2016 at 2:09 PM, Chris Murphy <li...@colorremedies.com> > wrote: > > > What about CONFIG_BTRFS_FS_CHECK_INTEGRITY? And then using check_int > > > mount option? > > > > This slows things down, and in that case it might avoid the problem if > > it's the result of a race condition. > > > > -- > > Chris Murphy > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > This mail has been sent using Alpikom webmail system http://www.alpikom.it[http://www.alpikom.it] -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [CORRUPTION FILESYSTEM] Corrupted and unrecoverable file system during the snapshot receive
Hi, Does the script check the transfer status, and is there a transfer returns an error code? Thanks, Xin Sent: Thursday, December 22, 2016 at 11:28 PM From: "Giuseppe Della Bianca" <b...@adria.it> To: "Btrfs BTRFS" <linux-btrfs@vger.kernel.org> Subject: Re: [CORRUPTION FILESYSTEM] Corrupted and unrecoverable file system during the snapshot receive (synthetic resend) Hi. Is possible that there are transfers, cancellations and other, at the same time, but not in the same subvolume. My script checks that there are no transfers in progress on the same subvolume. Is possible that the same subvolume is mounted several times (temporary mount at the beginning, and unmount at the end, in my script). Thanks for all. P.S. Sorry for my bad English. Gdb In data mercoledì 21 dicembre 2016 23:14:44, Xin Zhou ha scritto: > Hi, > Racing condition can happen, if running multiple transfers to the same > destination. Would you like to tell how many transfers are the scripts > running at a time to a specific hdd? > > Thanks, > Xin > > > Sent: Wednesday, December 21, 2016 at 1:11 PM > From: "Chris Murphy" <li...@colorremedies.com> > To: No recipient address > Cc: "Giuseppe Della Bianca" <b...@adria.it>, "Xin Zhou" <xin.z...@gmx.com>, > "Btrfs BTRFS" <linux-btrfs@vger.kernel.org> Subject: Re: [CORRUPTION > FILESYSTEM] Corrupted and unrecoverable file system during the snapshot > receive > On Wed, Dec 21, 2016 at 2:09 PM, Chris Murphy <li...@colorremedies.com> wrote: > > What about CONFIG_BTRFS_FS_CHECK_INTEGRITY? And then using check_int > > mount option? > > This slows things down, and in that case it might avoid the problem if > it's the result of a race condition. > > -- > Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs_log2phys: cannot lookup extent mapping
Hi, If the change of disk format between versions is precisely documented, it is plausible to create a utility to convert the old volume to new ones, trigger the workflow, upgrade the kernel and boots up for mounting the new volume. Currently, the btrfs wiki shows partial content of the on-disk format. Thanks, Xin Sent: Wednesday, December 21, 2016 at 6:50 AM From: "David Hanke"To: linux-btrfs@vger.kernel.org Subject: Re: btrfs_log2phys: cannot lookup extent mapping Hi Duncan, Thank you for your reply. If I've emailed the wrong list, please let me know. What I hear you saying, in short, is that btrfs is not yet fully stable but current 4.x versions may work better. I'm willing to upgrade, but I'm told that the upgrade process may result in total failure, and I'm not sure I can trust the contents of the volume either way. Given that, it seems I must backup the backup, erase and start over. What would you do? Thank you, David On 12/20/16 17:24, Duncan wrote: > David Hanke posted on Tue, 20 Dec 2016 09:52:25 -0600 as excerpted: > >> I've been using a btrfs-based volume for backups, but lately the >> system's been filling the syslog with errors like "btrfs_log2phys: >> cannot lookup extent mapping for 7129125486592" at the rate of hundreds >> per second. (Please see output below for more details.) Despite the >> errors, the files I've looked at appear to be written and read >> successfully. >> >> I'm wondering if the contents of the volume are trustworthy and whether >> this problem is resolvable without backing up, erasing and starting >> over? >> >> Thank you! >> >> David >> >> >> # uname -a >> Linux backup2 3.0.101.RNx86_64.3 #1 SMP Wed Apr 1 16:02:14 PDT 2015 >> x86_64 GNU/Linux >> >> # btrfs --version >> Btrfs v3.17.3 > FWIW... > > [TL;DR: see the four bottom line choices, at the bottom.] > > This is the upstream btrfs development and discussion list for a > filesystem that's still stabilizing (that is, not fully stable and > mature) and that remains under heavy development and bug fixing. As > such, list focus is heavily forward looking, with an extremely strong > recommendation to use current kernels (and to a lessor extent btrfs > userspace) if you're going to be running btrfs, as these have all the > latest bugfixes. > > Put a different way, the general view and strong recommendation of the > list is that because btrfs is still under heavy development, with bug > fixes, some more major than others, every kernel cycle, while we > recognize that choosing to run old and stale^H^Hble kernels and userspace > is a legitimate choice on its own, that choice of stability over support > for the latest and greatest, is viewed as incompatible with choosing to > run a still under heavy development filesystem. Choosing one OR the > other is strongly recommended. > > For list purposes, we recommend and best support the last two kernel > release series in two tracks, LTS/long-term-stable, or current release > track. On the LTS track, that's the LTS 4.4 and 4.1 series. On the > current track, 4.9 is the latest release, so 4.9 and 4.8 are best > supported. > > Meanwhile, it's worth keeping in mind that the experimental label and > accompanying extremely strong "eat your babies" level warnings weren't > peeled off until IIRC 3.12 or so, meaning anything before that is not > only ancient history in list terms, but also still labeled as "eat your > babies" level experimental. Why anyone choosing to run an ancient eat- > your-babies level experimental version of a filesystem that's now rather > more stable and mature, tho not yet fully stabilized, is beyond me. If > they're interested in newer filesystems, running newer and less buggy > versions is reasonable; if they're interested in years-stale level of > stability, then running such filesystems, especially when still labeled > eat-your-babies level experimental back then, seems an extremely odd > choice indeed. > > Of course, on-list we do recognize that various distros did and do offer > support at some level for older than list-recommended version btrfs, in > part because they backport fixes from newer versions. However, because > we're forward development focused we don't track what patches these > distros may or may not have backported and thus aren't in a good position > to provide good support for them. Instead, users choosing to use such > kernels are generally asked to choose between upgrading to something > reasonably supportable on-list if they wish to go that route, or referred > back to their distros for the support they're in a far better position to > offer, since they know what they've backported and what they haven't, > while we don't. > > As for btrfs userspace, the way btrfs works, during normal runtime, > userspace primarily calls the kernel to do the real work, so userspace > version isn't as big a deal unless you're trying to use a feature only > supported by newer versions, except that if it's /too/ old, the
Re: [CORRUPTION FILESYSTEM] Corrupted and unrecoverable file system during the snapshot receive
Hi, Racing condition can happen, if running multiple transfers to the same destination. Would you like to tell how many transfers are the scripts running at a time to a specific hdd? Thanks, Xin Sent: Wednesday, December 21, 2016 at 1:11 PM From: "Chris Murphy" <li...@colorremedies.com> To: No recipient address Cc: "Giuseppe Della Bianca" <b...@adria.it>, "Xin Zhou" <xin.z...@gmx.com>, "Btrfs BTRFS" <linux-btrfs@vger.kernel.org> Subject: Re: [CORRUPTION FILESYSTEM] Corrupted and unrecoverable file system during the snapshot receive On Wed, Dec 21, 2016 at 2:09 PM, Chris Murphy <li...@colorremedies.com> wrote: > What about CONFIG_BTRFS_FS_CHECK_INTEGRITY? And then using check_int > mount option? This slows things down, and in that case it might avoid the problem if it's the result of a race condition. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [CORRUPTION FILESYSTEM] Corrupted and unrecoverable file system during the snapshot receive
Hi, The system seems running some customized scripts continuously backup data from a NVME drive to HDDs. If the 3 HDDs backup storage are same in btrfs config, and the there is a bug in btrfs code, they all suppose to fail after the same operation sequence. Otherwise, probably one of the HDDs might have issue, or there is a bug in layer below btrfs. For the customize script, it might be helpful to check the file system consistency after each transfer. That might be useful to figure out which step generates a corruption, and if there is error propagations. Regards, Xin Sent: Monday, December 19, 2016 at 10:55 AM From: "Giuseppe Della Bianca" <b...@adria.it> To: "Xin Zhou" <xin.z...@gmx.com> Cc: linux-btrfs@vger.kernel.org Subject: Re: [CORRUPTION FILESYSTEM] Corrupted and unrecoverable file system during the snapshot receive a concrete example SNAPSHOT /dev/nvme0n1p2 on /tmp/tmp.X3vU6dLLVI type btrfs (rw,relatime,ssd,space_cache,subvolid=5,subvol=/) btrfsManage SNAPSHOT / (2016-12-19 19:44:00) Start btrfsManage . . . Start managing SNAPSHOT ' / ' filesystem ' root ' snapshot In ' btrfssnapshot ' latest source snapshot ' root-2016-12-18_15:10:01.40 ' . . . date ' 2016-12-18_15:10:01 ' number ' 40 ' Creation ' root-2016-12-19_19:44:00.part ' snapshot from ' root ' subvolume . . . Create a readonly snapshot of '/tmp/tmp.X3vU6dLLVI/root' in '/tmp/tmp.X3vU6dLLVI/btrfssnapshot/root/root-2016-12-19_19:44:00.part' Renaming ' root-2016-12-19_19:44:00.part ' into ' root-2016-12-19_19:44:00.41 ' snapshot Source snapshot list of ' root ' subvolume . . . btrfssnapshot/root/root-2016-08-28-12-35-01.1 ]zac[ . . . btrfssnapshot/root/root-2016-12-19_19:44:00.41 (2016-12-19 19:44:05) End btrfsManage . . . End managing SNAPSHOT ' / ' filesystem ' root ' snapshot CORRECTLY SEND e RECEIVE /dev/nvme0n1p2 on /tmp/tmp.o78czE0Bo6 type btrfs (rw,relatime,ssd,space_cache,subvolid=5,subvol=/) /dev/sda2 on /tmp/tmp.XcwqQCKq09 type btrfs (rw,relatime,space_cache,subvolid=5,subvol=/) btrfsManage SEND / /dev/sda2 (2016-12-19 19:47:24) Start btrfsManage . . . Start managing SEND ' / ' filesystem ' root ' snapshot in ' /dev/sda2 ' Sending ' root-2016-12-19_19:44:00.41 ' source snapshot to ' btrfsreceive ' subvolume . . . btrfs send -p /tmp/tmp.o78czE0Bo6/btrfssnapshot/root/root-2016-12-18_15:10:01.40 /tmp/tmp.o78czE0Bo6/btrfssnapshot/root/root-2016-12-19_19:44:00.41 | btrfs receive /tmp/tmp.XcwqQCKq09/btrfsreceive/root/.part/ . . . At subvol /tmp/tmp.o78czE0Bo6/btrfssnapshot/root/root-2016-12-19_19:44:00.41 . . . At snapshot root-2016-12-19_19:44:00.41 Creation ' root-2016-12-19_19:44:00.41 ' snapshot from ' .part/root-2016-12-19_19:44:00.41 ' subvolume . . . Create a readonly snapshot of '/tmp/tmp.XcwqQCKq09/btrfsreceive/root/.part/root-2016-12-19_19:44:00.41' in '/tmp/tmp.XcwqQCKq09/btrfsreceive/root/root-2016-12-19_19:44:00.41' . . . Delete subvolume (commit): '/tmp/tmp.XcwqQCKq09/btrfsreceive/root/.part/root-2016-12-19_19:44:00.41' Snapshot list in ' /dev/sda2 ' device . . . btrfsreceive/data_backup/data_backup-2016-12-17_12:07:00.1 . . . btrfsreceive/data_storage/data_storage-2016-12-10_17:05:51.1 . . . btrfsreceive/root/root-2016-08-28-12-35-01.1 ]zac[ . . . btrfsreceive/root/root-2016-12-19_19:44:00.41 (2016-12-19 19:48:37) End btrfsManage . . . End managing SEND ' / ' filesystem ' root ' snapshot in ' /dev/sda2 ' CORRECTLY > Hi Giuseppe, > > Would you like to tell some details about: > 1. the XYZ snapshot was taken from which subvolume > 2. where the base (initial) snapshot is stored > 3. The 3 partitions receives the same snapshot, are they in the same btrfs > configuration and subvol structure? > > Also, would you send the link reports "two files unreadable error" post > mentioned in step 2? Hope can see the message and figure out if the issue > first comes from sender or receiver side. > > Thanks, > Xin > > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Help please: BTRFS fs crashed due to bad removal of USB drive, no help from recovery procedures
Hi Jari, The message shows: > [ 135.446260] BTRFS error (device sdb1): superblock contains fatal errors So according this info, before trying to run repair / rescue procedure, would you like to show the 0,1,2 superblock status? Regards, Xin Sent: Monday, December 19, 2016 at 2:32 AM From: "Jari Seppälä" <lihamakaroonilaati...@gmail.com> To: linux-btrfs@vger.kernel.org Cc: "Xin Zhou" <xin.z...@gmx.com> Subject: Re: Help please: BTRFS fs crashed due to bad removal of USB drive, no help from recovery procedures Xin Zhou <xin.z...@gmx.com> kirjoitti 17.12.2016 kello 22.27: > > Hi Jari, > > Similar with other file system, btrfs has copies of super blocks. > Try to run "man btrfs check", "man btrfs rescue" and related commands for > more details. > Regards, > Xin Hi Xin, I did follow all recovery procedures from man and wiki pages. Tools do not help as they thing there is no BTRFS fs anymore. However if I try to reformat the device I get: btrfs-progs v4.4 See http://btrfs.wiki.kernel.org for more information. /dev/sdb1 appears to contain an existing filesystem (btrfs). So, recovery tools seem to thing there is no btrfs filesystem. Mkfs seems to thing there is. What I have tried: btrfsck /dev/sdb1 mount -t btrfs -o ro /dev/sdb1 /mnt/share/ mount -t btrfs -o ro,recovery /dev/sdb1 /mnt/share/ mount -t btrfs -o roootflags=recovery,nospace_cache /dev/sdb1 /mnt/share/ mount -t btrfs -o rootflags=recovery,nospace_cache /dev/sdb1 /mnt/share/ mount -t btrfs -o rootflags=recovery,nospace_cache,clear_cache /dev/sdb1 /mnt/share/ mount -t btrfs -o ro,rootflags=recovery,nospace_cache,clear_cache /dev/sdb1 /mnt/share/ btrfs restore /dev/sdb1 /target/device btrfs rescue zero-log /dev/sdb1 btrfsck --init-csum-tree /dev/sdb1 btrfsck --fix-crc /dev/sdb1 btrfsck --check-data-csum /dev/sdb1 btrfs rescue chunk-recover /dev/sdb1 btrfs rescue super-recover /dev/sdb1 btrfs rescue zero-log /dev/sdb1 No help whatsoever. Jari > > > > Sent: Saturday, December 17, 2016 at 2:06 AM > From: "Jari Seppälä" <lihamakaroonilaati...@gmail.com> > To: linux-btrfs@vger.kernel.org > Subject: Help please: BTRFS fs crashed due to bad removal of USB drive, no > help from recovery procedures > Syslog tells: > [ 135.446222] BTRFS error (device sdb1): system chunk array too small 0 < 97 > [ 135.446260] BTRFS error (device sdb1): superblock contains fatal errors > [ 135.462544] BTRFS error (device sdb1): open_ctree failed > > What have been done: > * All "btrfs rescue" options > > Info on system > * fs on external SSD via USB > * kernel 4.9.0 (tried with 4.8.13) > * btrfs-tools 4.4 > * Mythbuntu (Ubuntu) 16.04.1 LTS with latest fixes 2012-12-16 > > Any help appreciated. Around 300G of TV recordings on the drive, which of > course will eventually come as replays. > > Jari > -- > *** Jari Seppälä > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at > http://vger.kernel.org/majordomo-info.html[http://vger.kernel.org/majordomo-info.html] -- *** Jari Seppälä -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [CORRUPTION FILESYSTEM] Corrupted and unrecoverable file system during the snapshot receive
Hi Giuseppe, Would you like to tell some details about: 1. the XYZ snapshot was taken from which subvolume 2. where the base (initial) snapshot is stored 3. The 3 partitions receives the same snapshot, are they in the same btrfs configuration and subvol structure? Also, would you send the link reports "two files unreadable error" post mentioned in step 2? Hope can see the message and figure out if the issue first comes from sender or receiver side. Thanks, Xin Sent: Sunday, December 18, 2016 at 11:59 AM From: "Giuseppe Della Bianca"To: linux-btrfs@vger.kernel.org Subject: Re: [CORRUPTION FILESYSTEM] Corrupted and unrecoverable file system during the snapshot receive > Same problem, this time on a local subvolume. > > kernel-4.8.8-100.fc23.x86_64 > > btrfs-progs v4.8.5 ]zac[ I had three filesystem corruption. The point at which the problem it appeared, is similar in all three cases. Subvolume structure and operations sequence: btrfsreceive/ btrfsreceive/root/ btrfsreceive/root/.part/ 1) Sending XYZ differential snapshot in to ' btrfsreceive/root/.part/ '. 2) Create snapshot from ' btrfsreceive/root/.part/XYZ ' to ' btrfsreceive/root /XYZ '. 3) Delete snapshot ' btrfsreceive/root/.part/XYZ '. Always in step 2) I had two files unreadable error (view previous posts), and one already existing object error (see below). All three times I had to re-create from scratch the various partitions (on disks and systems different). I can help you, in some way, to find the problem? Or is useless to continue report it? dic 18 18:29:58 exnetold.gdb.it kernel: [ cut here ] dic 18 18:29:58 exnetold.gdb.it kernel: WARNING: CPU: 1 PID: 4325 at fs/btrfs/extent-tree.c:2960 btrfs_run_delayed_refs+0x283/0x2b0 [btrfs] dic 18 18:29:58 exnetold.gdb.it kernel: BTRFS: Transaction aborted (error -17) dic 18 18:29:58 exnetold.gdb.it kernel: Modules linked in: fuse xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun nf_conntrack_netbios_ns nf_conntrack_br dic 18 18:29:58 exnetold.gdb.it kernel: soundcore acpi_cpufreq tpm_tis tpm_tis_core tpm nfsd auth_rpcgss nfs_acl lockd grace sunrpc ata_generic nouveau vide dic 18 18:29:58 exnetold.gdb.it kernel: CPU: 1 PID: 4325 Comm: umount Tainted: G W 4.8.8-100.fc23.x86_64 #1 dic 18 18:29:58 exnetold.gdb.it kernel: Hardware name: System manufacturer System Product Name/M2N, BIOS 0902 02/16/2009 dic 18 18:29:58 exnetold.gdb.it kernel: 0286 dd260fac 8ffa0d25bb60 bc3e493e dic 18 18:29:58 exnetold.gdb.it kernel: 8ffa0d25bbb0 8ffa0d25bba0 bc0a0ecb dic 18 18:29:58 exnetold.gdb.it kernel: 0b900049 8ff9e61b40a0 8ffa2da77800 dic 18 18:29:58 exnetold.gdb.it kernel: Call Trace: dic 18 18:29:58 exnetold.gdb.it kernel: [] dump_stack+0x63/0x85 dic 18 18:29:58 exnetold.gdb.it kernel: [] __warn+0xcb/0xf0 dic 18 18:29:58 exnetold.gdb.it kernel: [] warn_slowpath_fmt+0x5f/0x80 dic 18 18:29:58 exnetold.gdb.it kernel: [] btrfs_run_delayed_refs+0x283/0x2b0 [btrfs] dic 18 18:29:58 exnetold.gdb.it kernel: [] ? btrfs_cow_block+0x10c/0x1e0 [btrfs] dic 18 18:29:58 exnetold.gdb.it kernel: [] commit_cowonly_roots+0xae/0x2e0 [btrfs] dic 18 18:29:58 exnetold.gdb.it kernel: [] ? btrfs_run_delayed_refs+0x206/0x2b0 [btrfs] dic 18 18:29:58 exnetold.gdb.it kernel: [] ? btrfs_qgroup_account_extents+0x84/0x180 [btrfs] dic 18 18:29:58 exnetold.gdb.it kernel: [] btrfs_commit_transaction+0x547/0xa40 [btrfs] dic 18 18:29:58 exnetold.gdb.it kernel: [] btrfs_commit_super+0x8f/0xa0 [btrfs] dic 18 18:29:58 exnetold.gdb.it kernel: [] close_ctree+0x2db/0x380 [btrfs] dic 18 18:29:58 exnetold.gdb.it kernel: [] ? evict_inodes+0x15a/0x180 dic 18 18:29:58 exnetold.gdb.it kernel: [] btrfs_put_super+0x19/0x20 [btrfs] dic 18 18:29:58 exnetold.gdb.it kernel: [] generic_shutdown_super+0x6f/0xf0 dic 18 18:29:58 exnetold.gdb.it kernel: [] kill_anon_super+0x12/0x20 dic 18 18:29:58 exnetold.gdb.it kernel: [] btrfs_kill_super+0x18/0x110 [btrfs] dic 18 18:29:58 exnetold.gdb.it kernel: [] deactivate_locked_super+0x43/0x70 dic 18 18:29:58 exnetold.gdb.it kernel: [] deactivate_super+0x5c/0x60 dic 18 18:29:58 exnetold.gdb.it kernel: [] cleanup_mnt+0x3f/0x90 dic 18 18:29:58 exnetold.gdb.it kernel: [] __cleanup_mnt+0x12/0x20 dic 18 18:29:58 exnetold.gdb.it kernel: [] task_work_run+0x7e/0xa0 dic 18 18:29:58 exnetold.gdb.it kernel: [] exit_to_usermode_loop+0xc2/0xd0 dic 18 18:29:58 exnetold.gdb.it kernel: [] syscall_return_slowpath+0xa1/0xb0 dic 18 18:29:58 exnetold.gdb.it kernel: [] entry_SYSCALL_64_fastpath+0xa2/0xa4 dic 18 18:29:58 exnetold.gdb.it kernel: ---[ end trace f7eb2e818f727168 ]--- dic 18 18:29:58 exnetold.gdb.it kernel: BTRFS: error (device sda3) in btrfs_run_delayed_refs:2960: errno=-17 Object already exists dic 18 18:29:58 exnetold.gdb.it kernel: BTRFS info (device sda3): forced readonly dic 18 18:29:58 exnetold.gdb.it kernel: BTRFS warning (device sda3): Skipping commit of aborted
Re: OOM: Better, but still there on
Hi, The system supposes to have special memory reservation for coredump and other debug info when encountering panic, the size seems configurable. Thanks, Xin Sent: Saturday, December 17, 2016 at 6:44 AM From: "Tetsuo Handa"To: "Nils Holland" , "Michal Hocko" Cc: linux-ker...@vger.kernel.org, linux...@kvack.org, "Chris Mason" , "David Sterba" , linux-btrfs@vger.kernel.org Subject: Re: OOM: Better, but still there on On 2016/12/17 21:59, Nils Holland wrote: > On Sat, Dec 17, 2016 at 01:02:03AM +0100, Michal Hocko wrote: >> mount -t tracefs none /debug/trace >> echo 1 > /debug/trace/events/vmscan/enable >> cat /debug/trace/trace_pipe > trace.log >> >> should help >> [...] > > No problem! I enabled writing the trace data to a file and then tried > to trigger another OOM situation. That worked, this time without a > complete kernel panic, but with only my processes being killed and the > system becoming unresponsive. When that happened, I let it run for > another minute or two so that in case it was still logging something > to the trace file, it could continue to do so some time longer. Then I > rebooted with the only thing that still worked, i.e. by means of magic > SysRequest. Under OOM situation, writing to a file on disk unlikely works. Maybe logging via network ( "cat /debug/trace/trace_pipe > /dev/udp/$ip/$port" if your are using bash) works better. (I wish we can do it from kernel so that /bin/cat is not disturbed by delays due to page fault.) If you can configure netconsole for logging OOM killer messages and UDP socket for logging trace_pipe messages, udplogger at https://osdn.net/projects/akari/scm/svn/tree/head/branches/udplogger/ might fit for logging both output with timestamp into a single file. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html[http://vger.kernel.org/majordomo-info.html] -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Help please: BTRFS fs crashed due to bad removal of USB drive, no help from recovery procedures
Hi Jari, Similar with other file system, btrfs has copies of super blocks. Try to run "man btrfs check", "man btrfs rescue" and related commands for more details. Regards, Xin Sent: Saturday, December 17, 2016 at 2:06 AM From: "Jari Seppälä"To: linux-btrfs@vger.kernel.org Subject: Help please: BTRFS fs crashed due to bad removal of USB drive, no help from recovery procedures Syslog tells: [ 135.446222] BTRFS error (device sdb1): system chunk array too small 0 < 97 [ 135.446260] BTRFS error (device sdb1): superblock contains fatal errors [ 135.462544] BTRFS error (device sdb1): open_ctree failed What have been done: * All "btrfs rescue" options Info on system * fs on external SSD via USB * kernel 4.9.0 (tried with 4.8.13) * btrfs-tools 4.4 * Mythbuntu (Ubuntu) 16.04.1 LTS with latest fixes 2012-12-16 Any help appreciated. Around 300G of TV recordings on the drive, which of course will eventually come as replays. Jari -- *** Jari Seppälä -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Server hangs when mount BTRFS filesystem.
Hi Кравцов, >From the log message, it seems dm-22 has been running out space, probably some >checksum did not get committed to disk. And when trying to repair, it reports checksum missing. merge_reloc_roots:2426: errno=-28 No space left Dec 15 00:05:47 OraCI2 kernel: BTRFS warning (device dm-22): Skipping commit of aborted transaction. Dec 15 00:05:47 OraCI2 kernel: BTRFS: error (device dm-22) in cleanup_transaction:1854: errno=-28 No space left Dec 15 00:05:57 OraCI2 kernel: pending csums is 34287616 ... ERROR: errors found in extent allocation tree or chunk allocation Fixed 0 roots. checking free space cache [.] root 5 inode 28350 errors 1000, some csum missing root 5 inode 28351 errors 1000, some csum missing Thanks, Xin Sent: Thursday, December 15, 2016 at 12:58 AM From: "Кравцов Роман Владимирович"To: linux-btrfs@vger.kernel.org Subject: Server hangs when mount BTRFS filesystem. Hello. First, server is hangs when btrfs balance working (see logs below). After server reset can't mount filesystem. When trying to execute command # mount -t btrfs /dev/OraCI2/pes.isuse_bp.stands /var/lib/docker/db/pes.isuse_bp.stands/pes.isuse_bp.standby.base/ server hangs without any messages and log records. # btrfs --version btrfs-progs v4.8.3 # btrfs fi show /dev/mapper/OraCI2-pes.isuse_bp.stands Label: 'pes.isuse_bp.stands' uuid: ada5d777-565b-48e7-87dc-c58c8ad13466 Total devices 1 FS bytes used 2.24TiB devid 1 size 3.49TiB used 2.35TiB path /dev/mapper/OraCI2-pes.isuse_bp.stands # btrfsck --repair -p /dev/OraCI2/pes.isuse_bp.stands enabling repair mode Checking filesystem on /dev/OraCI2/pes.isuse_bp.stands UUID: ada5d777-565b-48e7-87dc-c58c8ad13466 parent transid verify failed on 2651226128384 wanted 136007 found 136176 parent transid verify failed on 2651226128384 wanted 136007 found 136176 Ignoring transid failure leaf parent key incorrect 2651226128384 bad block 2651226128384 ERROR: errors found in extent allocation tree or chunk allocation Fixed 0 roots. checking free space cache [.] root 5 inode 28350 errors 1000, some csum missing root 5 inode 28351 errors 1000, some csum missing root 5 inode 28354 errors 1000, some csum missing root 5 inode 28358 errors 1000, some csum missing root 5 inode 28360 errors 1000, some csum missing root 5 inode 28361 errors 1000, some csum missing root 5 inode 28368 errors 1000, some csum missing root 5 inode 28369 errors 1000, some csum missing root 5 inode 28370 errors 1000, some csum missing root 5 inode 28371 errors 1000, some csum missing root 5 inode 28372 errors 1000, some csum missing root 5 inode 28373 errors 1000, some csum missing root 5 inode 28376 errors 1000, some csum missing root 5 inode 28377 errors 1000, some csum missing root 5 inode 28378 errors 1000, some csum missing root 5 inode 28379 errors 1000, some csum missing root 5 inode 28380 errors 1000, some csum missing root 5 inode 28381 errors 1000, some csum missing root 5 inode 28382 errors 1000, some csum missing root 5 inode 28383 errors 1000, some csum missing root 5 inode 28384 errors 1000, some csum missing root 5 inode 28385 errors 1000, some csum missing root 5 inode 28386 errors 1000, some csum missing root 5 inode 28387 errors 1000, some csum missing root 5 inode 28388 errors 1000, some csum missing root 5 inode 28389 errors 1000, some csum missing root 5 inode 28390 errors 1000, some csum missing root 5 inode 28391 errors 1000, some csum missing root 5 inode 28392 errors 1000, some csum missing root 5 inode 28393 errors 1000, some csum missing root 5 inode 28394 errors 1000, some csum missing root 5 inode 28395 errors 1000, some csum missing root 5 inode 28396 errors 1000, some csum missing root 5 inode 55108 errors 1000, some csum missing root 5 inode 55313 errors 1000, some csum missing root 5 inode 55314 errors 1000, some csum missing root 5 inode 55315 errors 1000, some csum missing root 5 inode 55316 errors 1000, some csum missing root 5 inode 55317 errors 1000, some csum missing root 5 inode 55318 errors 1000, some csum missing checking csums checking root refs Recowing metadata block 2651226128384 found 2462630760448 bytes used err is 0 total csum bytes: 2398866488 total tree bytes: 5910593536 total fs tree bytes: 1679392768 total extent tree bytes: 1436450816 btree space waste bytes: 887715010 file data blocks allocated: 459312458981376 referenced 2199769403392 extent buffer leak: start 2651226128384 len 16384 # cat /var/log/messages | grep 'Dec 15 00' Dec 15 00:02:35 OraCI2 kernel: BTRFS info (device dm-22): found 41156 extents Dec 15 00:02:35 OraCI2 kernel: BTRFS info (device dm-22): relocating block group 2568411414528 flags 1 Dec 15 00:02:37 OraCI2 kernel: BTRFS info (device dm-22): found 34939 extents Dec 15 00:05:47 OraCI2 kernel: use_block_rsv: 20 callbacks suppressed Dec 15 00:05:47 OraCI2 kernel: [ cut here ] Dec 15 00:05:47 OraCI2 kernel: WARNING: CPU: 35 PID: 30215 at fs/btrfs/extent-tree.c:8321
Re: page allocation stall in kernel 4.9 when copying files from one btrfs hdd to another
Hi, The dirty data is in large amount, probably unable to commit to disk. And this seems to happen when copying from 7200rpm to 5600rpm disks, according to previous post. Probably the I/Os are buffered and pending, unable to get finished in-time. It might be helpful to know if this only happens for specific types of 5600 rpm disks? And are these disks on RAID groups? Thanks. Xin Sent: Wednesday, December 14, 2016 at 3:38 AM From: adminTo: "Michal Hocko" Cc: linux-btrfs@vger.kernel.org, linux-ker...@vger.kernel.org, "David Sterba" , "Chris Mason" Subject: Re: page allocation stall in kernel 4.9 when copying files from one btrfs hdd to another Hi, I verified the log files and see no prior oom killer invocation. Unfortunately the machine has been rebooted since. Next time it happens, I will also look in dmesg. Thanks, David Arendt Michal Hocko – Wed., 14. December 2016 11:31 > Btw. the stall should be preceded by the OOM killer invocation. Could > you share the OOM report please. I am asking because such an OOM killer > would be clearly pre-mature as per your meminfo. I am trying to change > that code and seeing your numbers might help me. > > Thanks! > > On Wed 14-12-16 11:17:43, Michal Hocko wrote: > > On Tue 13-12-16 18:11:01, David Arendt wrote: > > > Hi, > > > > > > I receive the following page allocation stall while copying lots of > > > large files from one btrfs hdd to another. > > > > > > Dec 13 13:04:29 server kernel: kworker/u16:8: page allocation stalls for > > > 12260ms, order:0, mode:0x2400840(GFP_NOFS|__GFP_NOFAIL) > > > Dec 13 13:04:29 server kernel: CPU: 0 PID: 24959 Comm: kworker/u16:8 > > > Tainted: P O 4.9.0 #1 > > [...] > > > Dec 13 13:04:29 server kernel: Call Trace: > > > Dec 13 13:04:29 server kernel: [] ? dump_stack+0x46/0x5d > > > Dec 13 13:04:29 server kernel: [] ? > > > warn_alloc+0x111/0x130 > > > Dec 13 13:04:33 server kernel: [] ? > > > __alloc_pages_nodemask+0xbe8/0xd30 > > > Dec 13 13:04:33 server kernel: [] ? > > > pagecache_get_page+0xe4/0x230 > > > Dec 13 13:04:33 server kernel: [] ? > > > alloc_extent_buffer+0x10b/0x400 > > > Dec 13 13:04:33 server kernel: [] ? > > > btrfs_alloc_tree_block+0x125/0x560 > > > > OK, so this is > > find_or_create_page(mapping, index, GFP_NOFS|__GFP_NOFAIL) > > > > The main question is whether this really needs to be NOFS request... > > > > > Dec 13 13:04:33 server kernel: [] ? > > > read_extent_buffer_pages+0x21f/0x280 > > > Dec 13 13:04:33 server kernel: [] ? > > > __btrfs_cow_block+0x141/0x580 > > > Dec 13 13:04:33 server kernel: [] ? > > > btrfs_cow_block+0x100/0x150 > > > Dec 13 13:04:33 server kernel: [] ? > > > btrfs_search_slot+0x1e9/0x9c0 > > > Dec 13 13:04:33 server kernel: [] ? > > > __set_extent_bit+0x512/0x550 > > > Dec 13 13:04:33 server kernel: [] ? > > > lookup_inline_extent_backref+0xf5/0x5e0 > > > Dec 13 13:04:34 server kernel: [] ? > > > set_extent_bit+0x24/0x30 > > > Dec 13 13:04:34 server kernel: [] ? > > > update_block_group.isra.34+0x114/0x380 > > > Dec 13 13:04:34 server kernel: [] ? > > > __btrfs_free_extent.isra.35+0xf4/0xd20 > > > Dec 13 13:04:34 server kernel: [] ? > > > btrfs_merge_delayed_refs+0x61/0x5d0 > > > Dec 13 13:04:34 server kernel: [] ? > > > __btrfs_run_delayed_refs+0x902/0x10a0 > > > Dec 13 13:04:34 server kernel: [] ? > > > btrfs_run_delayed_refs+0x90/0x2a0 > > > Dec 13 13:04:34 server kernel: [] ? > > > delayed_ref_async_start+0x84/0xa0 > > > > What would cause the reclaim recursion? > > > > > Dec 13 13:04:34 server kernel: Mem-Info: > > > Dec 13 13:04:34 server kernel: active_anon:20 inactive_anon:34 > > > isolated_anon:0\x0a active_file:7370032 inactive_file:450105 > > > isolated_file:320\x0a unevictable:0 dirty:522748 writeback:189 > > > unstable:0\x0a slab_reclaimable:178255 slab_unreclaimable:124617\x0a > > > mapped:4236 shmem:0 pagetables:1163 bounce:0\x0a free:38224 free_pcp:241 > > > free_cma:0 > > > > This speaks for itself. There is a lot of dirty data, basically no > > anonymous memory and GFP_NOFS cannot do much to reclaim obviously. This > > is either a configuraion bug as somebody noted down the thread (setting > > the dirty_ratio) or suboptimality of the btrfs code which might request > > NOFS even though it is not strictly necessary. This would be more for > > btrfs developers. > > -- > > Michal Hocko > > SUSE Labs > > -- > Michal Hocko > SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: page allocation stall in kernel 4.9 when copying files from one btrfs hdd to another
Hi David, It has GFP_NOFS flags, according to definition, the issue might have happened during initial DISK/IO. By the way, did you get a chance to dump the meminfo and run "top" before the system hang? It seems more info about the system running state needed to know the issue. Thanks. Xin Sent: Tuesday, December 13, 2016 at 9:11 AM From: "David Arendt"To: linux-btrfs@vger.kernel.org, linux-ker...@vger.kernel.org Subject: page allocation stall in kernel 4.9 when copying files from one btrfs hdd to another Hi, I receive the following page allocation stall while copying lots of large files from one btrfs hdd to another. Dec 13 13:04:29 server kernel: kworker/u16:8: page allocation stalls for 12260ms, order:0, mode:0x2400840(GFP_NOFS|__GFP_NOFAIL) Dec 13 13:04:29 server kernel: CPU: 0 PID: 24959 Comm: kworker/u16:8 Tainted: P O 4.9.0 #1 Dec 13 13:04:29 server kernel: Hardware name: ASUS All Series/H87M-PRO, BIOS 2102 10/28/2014 Dec 13 13:04:29 server kernel: Workqueue: btrfs-extent-refs btrfs_extent_refs_helper Dec 13 13:04:29 server kernel: 813f3a59 81976b28 c90011093750 Dec 13 13:04:29 server kernel: 81114fc1 02400840f39b6bc0 81976b28 c900110936f8 Dec 13 13:04:29 server kernel: 88070010 c90011093760 c90011093710 02400840 Dec 13 13:04:29 server kernel: Call Trace: Dec 13 13:04:29 server kernel: [] ? dump_stack+0x46/0x5d Dec 13 13:04:29 server kernel: [] ? warn_alloc+0x111/0x130 Dec 13 13:04:33 server kernel: [] ? __alloc_pages_nodemask+0xbe8/0xd30 Dec 13 13:04:33 server kernel: [] ? pagecache_get_page+0xe4/0x230 Dec 13 13:04:33 server kernel: [] ? alloc_extent_buffer+0x10b/0x400 Dec 13 13:04:33 server kernel: [] ? btrfs_alloc_tree_block+0x125/0x560 Dec 13 13:04:33 server kernel: [] ? read_extent_buffer_pages+0x21f/0x280 Dec 13 13:04:33 server kernel: [] ? __btrfs_cow_block+0x141/0x580 Dec 13 13:04:33 server kernel: [] ? btrfs_cow_block+0x100/0x150 Dec 13 13:04:33 server kernel: [] ? btrfs_search_slot+0x1e9/0x9c0 Dec 13 13:04:33 server kernel: [] ? __set_extent_bit+0x512/0x550 Dec 13 13:04:33 server kernel: [] ? lookup_inline_extent_backref+0xf5/0x5e0 Dec 13 13:04:34 server kernel: [] ? set_extent_bit+0x24/0x30 Dec 13 13:04:34 server kernel: [] ? update_block_group.isra.34+0x114/0x380 Dec 13 13:04:34 server kernel: [] ? __btrfs_free_extent.isra.35+0xf4/0xd20 Dec 13 13:04:34 server kernel: [] ? btrfs_merge_delayed_refs+0x61/0x5d0 Dec 13 13:04:34 server kernel: [] ? __btrfs_run_delayed_refs+0x902/0x10a0 Dec 13 13:04:34 server kernel: [] ? btrfs_run_delayed_refs+0x90/0x2a0 Dec 13 13:04:34 server kernel: [] ? delayed_ref_async_start+0x84/0xa0 Dec 13 13:04:34 server kernel: [] ? process_one_work+0x11d/0x3b0 Dec 13 13:04:34 server kernel: [] ? worker_thread+0x42/0x4b0 Dec 13 13:04:34 server kernel: [] ? process_one_work+0x3b0/0x3b0 Dec 13 13:04:34 server kernel: [] ? process_one_work+0x3b0/0x3b0 Dec 13 13:04:34 server kernel: [] ? do_group_exit+0x2e/0xa0 Dec 13 13:04:34 server kernel: [] ? kthread+0xb9/0xd0 Dec 13 13:04:34 server kernel: [] ? kthread_park+0x50/0x50 Dec 13 13:04:34 server kernel: [] ? ret_from_fork+0x22/0x30 Dec 13 13:04:34 server kernel: Mem-Info: Dec 13 13:04:34 server kernel: active_anon:20 inactive_anon:34 isolated_anon:0\x0a active_file:7370032 inactive_file:450105 isolated_file:320\x0a unevictable:0 dirty:522748 writeback:189 unstable:0\x0a slab_reclaimable:178255 slab_unreclaimable:124617\x0a mapped:4236 shmem:0 pagetables:1163 bounce:0\x0a free:38224 free_pcp:241 free_cma:0 Dec 13 13:04:34 server kernel: Node 0 active_anon:80kB inactive_anon:136kB active_file:29480128kB inactive_file:1800420kB unevictable:0kB isolated(anon):0kB isolated(file):1280kB mapped:16944kB dirty:2090992kB writeback:756kB shmem:0kB writeback_tmp:0kB unstable:0kB pages_scanned:258821 all_unreclaimable? no Dec 13 13:04:34 server kernel: DMA free:15868kB min:8kB low:20kB high:32kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15976kB managed:15892kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:24kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB Dec 13 13:04:34 server kernel: lowmem_reserve[]: 0 3428 32019 32019 Dec 13 13:04:34 server kernel: DMA32 free:116800kB min:2448kB low:5956kB high:9464kB active_anon:0kB inactive_anon:0kB active_file:3087928kB inactive_file:191336kB unevictable:0kB writepending:221828kB present:3590832kB managed:3513936kB mlocked:0kB slab_reclaimable:93252kB slab_unreclaimable:20520kB kernel_stack:48kB pagetables:212kB bounce:0kB free_pcp:4kB local_pcp:0kB free_cma:0kB Dec 13 13:04:34 server kernel: lowmem_reserve[]: 0 0 0 0 Dec 13 13:04:34 server kernel: DMA: 1*4kB (U) 1*8kB (U) 1*16kB (U) 1*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15868kB Dec 13 13:04:34 server kernel: DMA32: 940*4kB (UME) 4006*8kB (UME) 3308*16kB (UME) 791*32kB
Re: [PATCH] btrfs: fix hole read corruption for compressed inline extents
Hi Zygo, Since the corruption happens after I/O and checksum, could it be possible to add some bug catcher code in code path for debug build, to help narrowing down the issue? Thanks, Xin Sent: Saturday, December 10, 2016 at 9:16 PM From: "Zygo Blaxell"To: "Roman Mamedov" , "Filipe Manana" Cc: linux-btrfs@vger.kernel.org Subject: Re: [PATCH] btrfs: fix hole read corruption for compressed inline extents Ping? I know at least two people have read this patch, but it hasn't appeared in the usual integration branches yet, and I've seen no actionable suggestion to improve it. I've provided two non-overlapping rationales for it. Is there something else you are looking for? This patch is a fix for a simple data corruption bug. It (or some equivalent fix for the same bug) should be on its way to all stable kernels starting from 2.6.32. Thanks On Mon, Nov 28, 2016 at 05:27:10PM +0500, Roman Mamedov wrote: > On Mon, 28 Nov 2016 00:03:12 -0500 > Zygo Blaxell wrote: > > > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c > > index 8e3a5a2..b1314d6 100644 > > --- a/fs/btrfs/inode.c > > +++ b/fs/btrfs/inode.c > > @@ -6803,6 +6803,12 @@ static noinline int uncompress_inline(struct > > btrfs_path *path, > > max_size = min_t(unsigned long, PAGE_SIZE, max_size); > > ret = btrfs_decompress(compress_type, tmp, page, > > extent_offset, inline_size, max_size); > > + WARN_ON(max_size > PAGE_SIZE); > > + if (max_size < PAGE_SIZE) { > > + char *map = kmap(page); > > + memset(map + max_size, 0, PAGE_SIZE - max_size); > > + kunmap(page); > > + } > > kfree(tmp); > > return ret; > > } > > Wasn't this already posted as: > > btrfs: fix silent data corruption while reading compressed inline extents > https://patchwork.kernel.org/patch/9371971/ > > but you don't indicate that's a V2 or something, and in fact the patch seems > exactly the same, just the subject and commit message are entirely different. > Quite confusing. > > -- > With respect, > Roman > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at > http://vger.kernel.org/majordomo-info.html[http://vger.kernel.org/majordomo-info.html] -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs-find-root duration?
Hi Markus, Some file system automatically generates snapshot, and create a hidden folder for recovery, if the user accidently deletes some files. It seems btrfs also has a autosnap feature, so if this option has been enabled before deletion, or the volume has been mannually generated snapshots, then probably it might be able to perform fast recover. Regards, Xin Sent: Saturday, December 10, 2016 at 4:12 PM From: "Markus Binsteiner"To: linux-btrfs@vger.kernel.org Subject: btrfs-find-root duration? It seems I've accidentally deleted all files in my home directory, which sits in its own btrfs partition (lvm on luks). Now I'm trying to find the roots to be able to use btrfs restore later on. btrfs-find-root seems to be taking ages though. I've run it like so: btrfs-find-root /dev/mapper/think--big-home -o 5 > roots.txt After 16 hours, there is still no output, but it's still running utilizing 100% of one core. Is there any way to gauge how much longer it'll take? Should there have been output already while it's running? When I run it without redirecting stdout, I get: $ btrfs-find-root /dev/mapper/think--big-home -o 5 Superblock doesn't contain generation info for root 5 Superblock doesn't contain the level info for root 5 When I omit the '-o 5', it says: $ btrfs-find-root /dev/mapper/think--big-home Superblock thinks the generation is 593818 Superblock thinks the level is 0 Is the latter the way to run it? Did that initially, but that didn't return any results in a reasonable timeframe either. The filesystem was created with Debian Jessie, but I'm using Ubuntu ( btrfs-progs v4.7.3 ) to try to restore the files at the moment. Thanks! -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/6] btrfs dax IO
Hi Liu, >From the patch, is the snapshot disabled by disabling the COW in the mounting >path? It seems the create_snapshot() in ioctl.c does not get changed. I experienced some similar system but am a bit new to the brtfs code. Thanks, Xin Subject: [PATCH 0/6] btrfs dax IOFrom: Liu BoDate: Wed, 7 Dec 2016 13:45:04 -0800Cc: Chris Mason , Jan Kara , David Sterba This is a prelimanary patch set to add dax support for btrfs, with this we can do normal read/write to dax files and can mmap dax files to userspace so that applications have the ability to access persistent memory directly. Please note that currently this is limited to nocow, i.e. all dax inodes do not have COW behaviour. COW:no mutliple device:no clone/reflink: no snapshot: no compression:no checksum: no Right now snapshot is disabled while mounting with -odax, but snapshot can be created without -odax, and writing to a dax file in snapshot will get -EIO. Clone/reflink is dealt with as same as snapshot, -EIO will be returned when writing to shared extents. This has adopted the latest iomap framework for dax read/write and dax mmap. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html