Re: Battling an issue with btrfs quota
On 01/31/2017 01:15 AM, Philipp Kern wrote: [...] >> 149 00 RW [btrfs-transacti] > > So there's always a running btrfs-transaction. The kernel messages start > off like this: [...] At it turns out, it also OOMs the complete machine after 2h while consuming the available 4 GB RAM (w/o swap): > [rootfs ]# [ 6347.417391] Out of memory: Kill process 228 (sh) score 0 or > sacrifice child > [ 6347.450652] Killed process 228 (sh) total-vm:6724kB, anon-rss:112kB, > file-rss:1624kB, shmem-rss:0kB > [ 6347.500015] Kernel panic - not syncing: Out of memory and no killable > processes... > [ 6347.500015] > [ 6347.544684] CPU: 0 PID: 149 Comm: btrfs-transacti Not tainted 4.9.6-1-ARCH > #1 > [ 6347.580157] Hardware name: HP ProLiant MicroServer Gen8, BIOS J06 > 07/16/2015 > [ 6347.614874] c9d77738 81305440 c9d77800 > 81930520 > [ 6347.651089] c9d777c0 8117eed5 0008 > c9d777d0 > [ 6347.687037] c9d77768 09689dbd 0003 > 8801032adcc0 > [ 6347.723589] Call Trace: > [ 6347.735614] [] dump_stack+0x63/0x83 > [ 6347.760845] [] panic+0xe4/0x22d > [ 6347.784187] [] out_of_memory+0x333/0x480 > [ 6347.811209] [] __alloc_pages_nodemask+0xda6/0xe80 > [ 6347.842223] [] alloc_pages_current+0x95/0x140 > [ 6347.871859] [] __page_cache_alloc+0xc4/0xe0 > [ 6347.900582] [] pagecache_get_page+0xe7/0x290 > [ 6347.929238] [] alloc_extent_buffer+0x113/0x480 [btrfs] > [ 6347.962291] [] read_tree_block+0x20/0x60 [btrfs] > [ 6347.992642] [] > read_block_for_search.isra.16+0x138/0x300 [btrfs] > [ 6348.030866] [] btrfs_search_slot+0x3be/0x9b0 [btrfs] > [ 6348.063286] [] find_parent_nodes+0x116/0x14b0 [btrfs] > [ 6348.096113] [] __btrfs_find_all_roots+0xbe/0x130 [btrfs] > [ 6348.131093] [] btrfs_find_all_roots+0x55/0x70 [btrfs] > [ 6348.164002] [] > btrfs_qgroup_prepare_account_extents+0x58/0xa0 [btrfs] > [ 6348.203555] [] > btrfs_commit_transaction.part.12+0x3e4/0xa90 [btrfs] > [ 6348.241927] [] ? wake_atomic_t_function+0x60/0x60 > [ 6348.273073] [] btrfs_commit_transaction+0x3b/0x70 > [btrfs] > [ 6348.307196] [] transaction_kthread+0x1ab/0x1e0 [btrfs] > [ 6348.340041] [] ? btrfs_cleanup_transaction+0x570/0x570 > [btrfs] > [ 6348.376634] [] kthread+0xd9/0xf0 > [ 6348.400423] [] ? __switch_to+0x2d2/0x630 > [ 6348.428162] [] ? kthread_park+0x60/0x60 > [ 6348.455055] [] ret_from_fork+0x25/0x30 > [ 6348.481684] Kernel Offset: disabled > [ 6348.505391] ---[ end Kernel panic - not syncing: Out of memory and no > killable processes... > [ 6348.505391] > Killed Kind regards Philipp Kern signature.asc Description: OpenPGP digital signature
Re: btrfs recovery
Oliver Freyermuth posted on Sat, 28 Jan 2017 17:46:24 +0100 as excerpted: >> Just don't count on restore to save your *** and always treat what it >> can often bring to current as a pleasant surprise, and having it fail >> won't be a down side, while having it work, if it does, will always be >> up side. >> =:^) >> > I'll keep that in mind, and I think that in the future, before trying > any "btrfs check" (or even repair) > I will always try restore first if my backup was not fresh enough :-). That's a wise idea, as long as you have the resources to actually be able to write the files somewhere (as people running btrfs really should, because it's /not/ fully stable yet). One of the great things about restore is that all the writing it does is to the destination filesystem -- it doesn't attempt to actually write or repair anything on the filesystem it's trying to restore /from/, so it's far lower risk than anything that /does/ actually attempt to write to or repair the potentially damaged filesystem. That makes it /extremely/ useful as a "first, to the extent possibke, make sure the backups are safely freshened" tool. =:^) Meanwhile, FWIW, restore can also be used as a sort of undelete tool. Remember, btrfs is COW and writes any changes to a new location. The old location tends to stick around, not any more referenced by anything "live", but still there until some other change happens to overwrite it. Just like undelete on a more conventional filesystem, therefore, as long as you notice the problem before the old location has been overwritten again, it's often possible to recover it, altho the mechanisms involved are rather different on btrfs. Basically, you use btrfs-find-root to get a list of old roots, then point restore at them using the -t option. There's a page on the wiki that goes into some detail in a more desperate "restore anything" context, but here, once you found a root that looked promising, you'd use restore's regex option to restore /just/ the file you're interested in, as it existed at the time that root was written. There's actually a btrfs-undelete script on github that turns the otherwise multiple manual steps into a nice, smooth, undelete operation. Or at least it's supposed to. I've never actually used it, tho I have examined the script out of curiosity to see what it did and how, and it / looks/ like it should work. I've kept that trick (and knowledge of where to look for the script) filed away in the back of my head in case I need it someday. =:^) -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs recovery
Michael Born posted on Mon, 30 Jan 2017 22:07:00 +0100 as excerpted: > Am 30.01.2017 um 21:51 schrieb Chris Murphy: >> On Mon, Jan 30, 2017 at 1:02 PM, Michael Born>> wrote: >>> Hi btrfs experts. >>> >>> Hereby I apply for the stupidity of the month award. >> >> There's still another day :-D >> >> >>> Before switching from Suse 13.2 to 42.2, I copied my / partition with >>> dd to an image file - while the system was online/running. >>> Now, I can't mount the image. >> >> That won't ever work for any file system. It must be unmounted. > > I could mount and copy the data out of my /home image.dd (encrypted > xfs). That was also online while dd-ing it. There's another angle with btrfs that makes block device image copies such as that a big problem, even if the dd was done with the filesystem unmounted. This hasn't yet been mentioned in this thread, that I've seen, anyway. * Btrfs takes the filesystem UUID, universally unique ID, at face value, considering it *UNIQUE* and actually identifying the various components of a possibly multi-device filesystem by the UUID. Again, this is because btrfs, unlike normal filesystems, can be composed of multiple devices, so btrfs needs a way to detect what devices form parts of what filesystems, and it does this by tracking the UUID and considering anything with that UUID (which is supposed to be unique to that filesystem, remember, it actually _says_ "unique" in the label, after all) to be part of that filesystem. Now you dd the block device somewhere else, making another copy, and btrfs suddenly has more devices that have UUIDs saying they belong to this filesystem than it should! That has actually triggered corruption in some cases, because btrfs gets mixed up and writes changes to the wrong device, because after all, it *HAS* to be part of the same filesystem, because it has the same universally *unique* ID. Only the supposedly "unique" ID isn't so "unique" any more, because someone copied the block device, and now there's two copies of the filesystem claiming to be the same one! "Unique" is no longer "unique" and it has created the all too predictable problems as a result. There are ways to work around the problem. Basically, don't let btrfs see both copies at the same time, and *definitely* don't let it see both copies when one is mounted or an attempt is being made to mount it. (Btrfs "sees" a new device when btrfs device scan is run. Unfortunately for this case, udev tends to run btrfs device scan automatically whenever it detects a new device that seems to have btrfs on it. So it can be rather difficult to keep btrfs from seeing it, because udev tends to monitor for new devices and see it right away, and tell btrfs about it when it does. But it's possible to avoid damage if you're careful to only dd the unmounted filesystem device(s) and to safely hide one copy before attempting to mount the other.) Of course that wasn't the case here. With the dd of a live-mounted btrfs device, it's quite possible that btrfs detected and started writing to the dd-destination device instead of the original at some point, screwing things up even more than they would have been for a normal filesystem live-mounted dd. In turn, it's quite possible that's why the old xfs /home still mounted, but the btrfs / didn't, because the xfs, while potentially damaged a bit, didn't suffer the abuse of writes to the wrong device that btrfs may well have suffered, due to the non-uniqueness of the supposedly universally unique IDs and the very confused btrfs that may well have caused. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: File system is oddly full after kernel upgrade, balance doesn't help
MegaBrutal posted on Sat, 28 Jan 2017 19:15:01 +0100 as excerpted: > Of course I can't retrieve the data from before the balance, but here is > the data from now: FWIW, if it's available, btrfs fi usage tends to yield the richest information. But it's also a (relatively) new addition to the btrfs- tools suite, and the results of btrfs fi show combined with btrfs fi df are the older version, together displaying the same critical information, tho without quite as much multi-device information. Meanwhile, both btrfs fi usage and btrfs fi df require a mounted btrfs, so when it won't mount, btrfs fi show is about the best that can be done, at least staying within the normal admin-user targeted commands (there's developer diagnostics targeted commands, but I'm not a dev, just a btrfs list regular and btrfs user myself, and to date have left those commands for the devs to play with). But since usage is available, that's all I'm quoting, here: > root@vmhost:~# btrfs fi usage /tmp/mnt/curlybrace > Overall: > Device size: 2.00GiB > Device allocated: 1.90GiB > Device unallocated: 103.38MiB > Device missing: 0.00B > Used: 789.94MiB > Free (estimated): 162.18MiB(min: 110.50MiB) > Data ratio: 1.00 > Metadata ratio: 2.00 > Global reserve: 512.00MiB(used: 0.00B) > > Data,single: Size:773.62MiB, Used:714.82MiB >/dev/mapper/vmdata--vg-lxc--curlybrace 773.62MiB > > Metadata,DUP: Size:577.50MiB, Used:37.55MiB >/dev/mapper/vmdata--vg-lxc--curlybrace 1.13GiB > > System,DUP: Size:8.00MiB, Used:16.00KiB >/dev/mapper/vmdata--vg-lxc--curlybrace 16.00MiB > > Unallocated: >/dev/mapper/vmdata--vg-lxc--curlybrace 103.38MiB > > > So... if I sum the data, metadata, and the global reserve, I see why > only ~170 MB is left. I have no idea, however, why the global reserve > sneaked up to 512 MB for such a small file system, and how could I > resolve this situation. Any ideas? That's an interesting issue I've not seen before, tho my experience is relatively limited compared to say Chris (Murphy)'s or Hugo's, as other than my own systems, my experience is limited to the list, while they do the IRC channels, etc. I've no idea how to resolve it, unless per some chance balance removes excess global reserve as well (I simply don't know, it has never come up that I've seen before). But IIRC one of the devs (or possibly Hugo) mentioned something about global reserve being dynamic, based on... something, IDR what. Given my far lower global reserve on multiple relatively small btrfs and the fact that my own use-case doesn't use subvolumes or snapshots, if yours does and you have quite a few, that /might/ be the explanation. FWIW, while I tend to use rather small btrfs as well, in my case they're nearly all btrfs dual-device raid1. However, a usage comparison based on my closest sized filesystem can still be useful, particularly the global reserve. Here's my /, as you can see, 8 GiB per device raid1, so one copy (comparable to single mode if it were a single device, no dup mode metadata as it's a copy on each device) on each: # btrfs fi u / Overall: Device size: 16.00GiB Device allocated: 7.06GiB Device unallocated:8.94GiB Device missing: 0.00B Used: 4.38GiB Free (estimated): 5.51GiB (min: 5.51GiB) Data ratio: 2.00 Metadata ratio: 2.00 Global reserve: 16.00MiB (used: 0.00B) Data,RAID1: Size:3.00GiB, Used:1.96GiB /dev/sda5 3.00GiB /dev/sdb5 3.00GiB Metadata,RAID1: Size:512.00MiB, Used:232.77MiB /dev/sda5 512.00MiB /dev/sdb5 512.00MiB System,RAID1: Size:32.00MiB, Used:16.00KiB /dev/sda5 32.00MiB /dev/sdb5 32.00MiB Unallocated: /dev/sda5 4.47GiB /dev/sdb5 4.47GiB It is worth noting that global reserve actually comes from metadata. That's why metadata never reports fully used, because global reserve isn't included in the used count, but can't normally be used for normal metadata. Also note that under normal conditions, global reserve is always 0 used as btrfs is quite reluctant to use it for routine metadata storage, and will normally only use it for getting out of COW-based jams due to the fact that because of COW, even deleting something means temporarily allocating additional space to write the new metadata, without the deleted stuff, into. Normally, btrfs will only write to global reserve if metadata space is all used and it thinks that by doing so it can end up actually freeing space. In normal operations it will simply see the lack of regular metadata space available and will error out, without using the global reserve. So if at any time btrfs reports
Re: [PATCH] Btrfs: bulk delete checksum items in the same leaf
On Mon, Jan 30, 2017 at 05:02:45PM -0800, Omar Sandoval wrote: > On Sat, Jan 28, 2017 at 06:06:32AM +, fdman...@kernel.org wrote: > > From: Filipe Manana> > > > Very often we have the checksums for an extent spread in multiple items > > in the checksums tree, and currently the algorithm to delete them starts > > by looking for them one by one and then deleting them one by one, which > > is not optimal since each deletion involves shifting all the other items > > in the leaf and when the leaf reaches some low threshold, to move items > > off the leaf into its left and right neighbor leafs. Also, after each > > item deletion we release our search path and start a new search for other > > checksums items. > > > > So optimize this by deleting in bulk all the items in the same leaf that > > contain checksums for the extent being freed. > > > > Signed-off-by: Filipe Manana > > --- > > fs/btrfs/file-item.c | 28 +++- > > 1 file changed, 27 insertions(+), 1 deletion(-) > > > > diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c > > index e97e322..d7d6d4a 100644 > > --- a/fs/btrfs/file-item.c > > +++ b/fs/btrfs/file-item.c > > @@ -643,7 +643,33 @@ int btrfs_del_csums(struct btrfs_trans_handle *trans, > > > > /* delete the entire item, it is inside our range */ > > if (key.offset >= bytenr && csum_end <= end_byte) { > > - ret = btrfs_del_item(trans, root, path); > > + int del_nr = 1; > > + > > + /* > > +* Check how many csum items preceding this one in this > > +* leaf correspond to our range and then delete them all > > +* at once. > > +*/ > > + if (key.offset > bytenr && path->slots[0] > 0) { > > + int slot = path->slots[0] - 1; > > + > > + while (slot >= 0) { > > + struct btrfs_key pk; > > + > > + btrfs_item_key_to_cpu(leaf, , slot); > > + if (pk.offset < bytenr || > > + pk.type != BTRFS_EXTENT_CSUM_KEY || > > + pk.objectid != > > + BTRFS_EXTENT_CSUM_OBJECTID) > > + break; > > + path->slots[0] = slot; > > + del_nr++; > > + key.offset = pk.offset; > > + slot--; > > + } > > + } > > + ret = btrfs_del_items(trans, root, path, > > + path->slots[0], del_nr); > > if (ret) > > goto out; > > if (key.offset == bytenr) > > Hmm, this seems like the kind of operation that could use a helper. > btrfs_del_item_range() or something like that, which takes the maximum > key to delete. What do you think? Err, or in this case, the minimum key. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: bulk delete checksum items in the same leaf
On Sat, Jan 28, 2017 at 06:06:32AM +, fdman...@kernel.org wrote: > From: Filipe Manana> > Very often we have the checksums for an extent spread in multiple items > in the checksums tree, and currently the algorithm to delete them starts > by looking for them one by one and then deleting them one by one, which > is not optimal since each deletion involves shifting all the other items > in the leaf and when the leaf reaches some low threshold, to move items > off the leaf into its left and right neighbor leafs. Also, after each > item deletion we release our search path and start a new search for other > checksums items. > > So optimize this by deleting in bulk all the items in the same leaf that > contain checksums for the extent being freed. > > Signed-off-by: Filipe Manana > --- > fs/btrfs/file-item.c | 28 +++- > 1 file changed, 27 insertions(+), 1 deletion(-) > > diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c > index e97e322..d7d6d4a 100644 > --- a/fs/btrfs/file-item.c > +++ b/fs/btrfs/file-item.c > @@ -643,7 +643,33 @@ int btrfs_del_csums(struct btrfs_trans_handle *trans, > > /* delete the entire item, it is inside our range */ > if (key.offset >= bytenr && csum_end <= end_byte) { > - ret = btrfs_del_item(trans, root, path); > + int del_nr = 1; > + > + /* > + * Check how many csum items preceding this one in this > + * leaf correspond to our range and then delete them all > + * at once. > + */ > + if (key.offset > bytenr && path->slots[0] > 0) { > + int slot = path->slots[0] - 1; > + > + while (slot >= 0) { > + struct btrfs_key pk; > + > + btrfs_item_key_to_cpu(leaf, , slot); > + if (pk.offset < bytenr || > + pk.type != BTRFS_EXTENT_CSUM_KEY || > + pk.objectid != > + BTRFS_EXTENT_CSUM_OBJECTID) > + break; > + path->slots[0] = slot; > + del_nr++; > + key.offset = pk.offset; > + slot--; > + } > + } > + ret = btrfs_del_items(trans, root, path, > + path->slots[0], del_nr); > if (ret) > goto out; > if (key.offset == bytenr) Hmm, this seems like the kind of operation that could use a helper. btrfs_del_item_range() or something like that, which takes the maximum key to delete. What do you think? -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs recovery
Hello, Micheal, Yes, you would certainly run the risk of doing more damage with dd, so if you have an alternative, use that, and avoid dd. If nothing else works and you need the files, you might try it as a last resort. My guess (and it is only a guess) is that if the image is close to the same size as the root partition, the file data is there. But that doesn't do you much good if btrfs cannot read the "container" or the specific partition and file system information, which btrfs send provides. Does someone on the list know if ext3/4 data recovery tools can also search btrfs data? That's another option. Gordon On Mon, Jan 30, 2017 at 4:37 PM, Michael Bornwrote: > Hi Gordon, > > I'm quite sure this is not a good idea. > I do understand, that dd-ing a running system will miss some changes > done to the file system while copying. I'm surprised that I didn't end > up with some corrupted files, but with no files at all. > Also, I'm not interested in restoring the old Suse 13.2 system. I just > want some configuration files from it. > > Cheers, > Michael > > Am 30.01.2017 um 23:24 schrieb GWB: >> << >> Hi btrfs experts. >> >> Hereby I apply for the stupidity of the month award. >> >> I have no doubt that I will will mount a serious challenge to you for >> that title, so you haven't won yet. >> >> Why not dd the image back onto the original partition (or another >> partition identical in size) and see if that is readable? >> >> My limited experience with btrfs (I am not an expert) is that read >> only snapshots work well in this situation, but the initial hurdle is >> using dd to get the image back onto a partition. So I wonder if you >> could dd the image back onto the original media (the hd sdd), then >> make a read only snapshot, and then send the snapshot with btrfs send >> to another storage medium. With any luck, the machine might boot, and >> you might find other snapshots which you may be able to turn into read >> only snaps for btrfs send. >> >> This has worked for me on Ubuntu 14 for quite some time, but luckily I >> have not had to restore the image file sent from btrfs send yet. I >> say luckily, because I realise now that the image created from btrfs >> send should be tested, but so far no catastrophic failures with my >> root partition have occurred (knock on wood). >> >> dd is (like dumpfs, ddrescue, and the bsd variations) good for what it >> tries to do, but not so great on for some file systems for more >> intricate uses. But why not try: >> >> dd if=imagefile.dd of=/dev/sdaX >> >> and see if it boots? If it does not, then perhaps you have another >> shot at the one time mount for btrfs rw if that works. >> >> Or is your root partition now running fine under Suse 14.2, and you >> are just looking to recover a file files from the image? If so, you >> might try to dd from the image to a partition of original size as the >> previous root, then adjust with gparted or fpart, and see if it is >> readable. >> >> So instead of trying to restore a btrfs file structure, why not just >> restore a partition with dd that happens to contain a btrfs file >> structure, and then adjust the partition size to match the original? >> btrfs cares about the tree structures, etc. dd does not. >> >> What you did is not unusual, and can work fine with a number of file >> structures, but the potential for disaster with dd is also great. The >> only thing I know of in btrfs that does a similar thing is: >> >> btrfs send -f btrfs-send-image-file /mount/read-write-snapshot >> >> Chances are, of course, good that without having current backups dd >> could potentially ruin the rest of your file system set up, so maybe >> transfer the image over to another machine that is expendable and test >> this out. I use btrfs on root and zfs for data, and make lots of >> snapshots and send them to incremental backups (mostly zfs, but btrfs >> works nicely with Ubuntu on root, with the occasional balance >> problem). >> >> If dd did it, dd might be able to fix it. Do that first before you >> try to restore btrfs file structures. >> >> Or is this a terrible idea? Someone else on the list should say so if >> they know otherwise. >> >> Gordon > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Battling an issue with btrfs quota
Hi, my btrfs-based system (~2.5 TiB stored in the filesystem replicated onto on two disks, running kernel 4.9.6-1-ARCH) locked up after I enabled quotas and had a btrfs-size tool running. Now the question is how to recover from that. Whenever I mount the filesystem I end up with btrfs-cleaner and a kworker hanging: > [ 491.154603] INFO: task kworker/u128:3:105 blocked for more than 120 > seconds. > [ 491.188559] Not tainted 4.9.6-1-ARCH #1 > [ 491.209443] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > this message. > [ 491.247188] kworker/u128:3 D0 105 2 0x > [ 491.247208] Workqueue: btrfs-qgroup-rescan btrfs_qgroup_rescan_helper > [btrfs] > [ 491.247210] 880103bc8800 8801034ba7c0 > 8801062580c0 > [ 491.247213] 880105fe8d40 c9c63c30 81605cdf > 8801034ba7c0 > [ 491.247215] 0001 8801062580c0 810aa490 > 8801034ba7c0 > [ 491.247217] Call Trace: > [ 491.247222] [] ? __schedule+0x22f/0x6e0 > [ 491.247224] [] ? wake_up_q+0x80/0x80 > [ 491.247226] [] schedule+0x3d/0x90 > [ 491.247237] [] wait_current_trans.isra.8+0xbe/0x110 > [btrfs] > [ 491.247240] [] ? wake_atomic_t_function+0x60/0x60 > [ 491.247249] [] start_transaction+0x2bc/0x4a0 [btrfs] > [ 491.247258] [] btrfs_start_transaction+0x18/0x20 [btrfs] > [ 491.247267] [] btrfs_qgroup_rescan_worker+0x7a/0x610 > [btrfs] > [ 491.247278] [] btrfs_scrubparity_helper+0x7d/0x350 > [btrfs] > [ 491.247288] [] btrfs_qgroup_rescan_helper+0xe/0x10 > [btrfs] > [ 491.247291] [] process_one_work+0x1e5/0x470 > [ 491.247292] [] worker_thread+0x48/0x4e0 > [ 491.247294] [] ? process_one_work+0x470/0x470 > [ 491.247296] [] kthread+0xd9/0xf0 > [ 491.247298] [] ? __switch_to+0x2d2/0x630 > [ 491.247299] [] ? kthread_park+0x60/0x60 > [ 491.247301] [] ret_from_fork+0x25/0x30 > [ 491.247306] INFO: task btrfs-cleaner:148 blocked for more than 120 seconds. > [ 491.280723] Not tainted 4.9.6-1-ARCH #1 > [ 491.302026] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > this message. > [ 491.340471] btrfs-cleaner D0 148 2 0x > [ 491.340475] 880103bc8800 8801032acf80 > 8801062580c0 > [ 491.340478] 8801032a8d40 c9cc3cf0 81605cdf > 8801032acf80 > [ 491.340480] 0001 8801062580c0 810aa490 > 8801032acf80 > [ 491.340482] Call Trace: > [ 491.340487] [] ? __schedule+0x22f/0x6e0 > [ 491.340489] [] ? wake_up_q+0x80/0x80 > [ 491.340491] [] schedule+0x3d/0x90 > [ 491.340505] [] wait_current_trans.isra.8+0xbe/0x110 > [btrfs] > [ 491.340508] [] ? wake_atomic_t_function+0x60/0x60 > [ 491.340517] [] start_transaction+0x2bc/0x4a0 [btrfs] > [ 491.340525] [] btrfs_start_transaction+0x18/0x20 [btrfs] > [ 491.340534] [] btrfs_drop_snapshot+0x4e9/0x880 [btrfs] > [ 491.340542] [] > btrfs_clean_one_deleted_snapshot+0xbb/0x110 [btrfs] > [ 491.340552] [] cleaner_kthread+0x141/0x1b0 [btrfs] > [ 491.340560] [] ? > btrfs_destroy_pinned_extent+0x120/0x120 [btrfs] > [ 491.340562] [] kthread+0xd9/0xf0 > [ 491.340564] [] ? __switch_to+0x2d2/0x630 > [ 491.340565] [] ? kthread_park+0x60/0x60 > [ 491.340566] [] ret_from_fork+0x25/0x30 Unfortunately whenever I try to execute a btrfs command against the mounted filesystem -- e.g. to disable quota -- the command hangs. And unfortunately that's in a shell without job control over a serial console. Relevant output from ps: > 105 00 DW [kworker/u128:3] > 107 00 SW [kworker/u128:5] > 111 00 SW< [bioset] > 112 00 SW< [bioset] > 113 00 SW< [bioset] > 115 00 SW [kworker/1:2] > 117 00 SW< [kworker/0:1H] > 118 00 SW< [kworker/1:1H] > 122 00 SW< [bioset] > 123 0 6724 Ssh -i > 128 00 SW< [btrfs-worker] > 129 00 SW< [kworker/u129:0] > 130 00 SW< [btrfs-worker-hi] > 131 00 SW< [btrfs-delalloc] > 132 00 SW< [btrfs-flush_del] > 133 00 SW< [btrfs-cache] > 134 00 SW< [btrfs-submit] > 135 00 SW< [btrfs-fixup] > 136 00 SW< [btrfs-endio] > 137 00 SW< [btrfs-endio-met] > 138 00 SW< [btrfs-endio-met] > 139 00 SW< [btrfs-endio-rai] > 140 00 SW< [btrfs-endio-rep] > 141 00 SW< [btrfs-rmw] > 142 00 SW< [btrfs-endio-wri] > 143 00 SW< [btrfs-freespace] > 144 00 SW< [btrfs-delayed-m] > 145 00 SW< [btrfs-readahead] > 146 00 SW< [btrfs-qgroup-re] > 147 00 SW< [btrfs-extent-re] > 148 00 DW [btrfs-cleaner] > 149 00 RW [btrfs-transacti] So there's always a running btrfs-transaction. The kernel
Re: btrfs recovery
Hi Gordon, I'm quite sure this is not a good idea. I do understand, that dd-ing a running system will miss some changes done to the file system while copying. I'm surprised that I didn't end up with some corrupted files, but with no files at all. Also, I'm not interested in restoring the old Suse 13.2 system. I just want some configuration files from it. Cheers, Michael Am 30.01.2017 um 23:24 schrieb GWB: > << > Hi btrfs experts. > > Hereby I apply for the stupidity of the month award. >>> > > I have no doubt that I will will mount a serious challenge to you for > that title, so you haven't won yet. > > Why not dd the image back onto the original partition (or another > partition identical in size) and see if that is readable? > > My limited experience with btrfs (I am not an expert) is that read > only snapshots work well in this situation, but the initial hurdle is > using dd to get the image back onto a partition. So I wonder if you > could dd the image back onto the original media (the hd sdd), then > make a read only snapshot, and then send the snapshot with btrfs send > to another storage medium. With any luck, the machine might boot, and > you might find other snapshots which you may be able to turn into read > only snaps for btrfs send. > > This has worked for me on Ubuntu 14 for quite some time, but luckily I > have not had to restore the image file sent from btrfs send yet. I > say luckily, because I realise now that the image created from btrfs > send should be tested, but so far no catastrophic failures with my > root partition have occurred (knock on wood). > > dd is (like dumpfs, ddrescue, and the bsd variations) good for what it > tries to do, but not so great on for some file systems for more > intricate uses. But why not try: > > dd if=imagefile.dd of=/dev/sdaX > > and see if it boots? If it does not, then perhaps you have another > shot at the one time mount for btrfs rw if that works. > > Or is your root partition now running fine under Suse 14.2, and you > are just looking to recover a file files from the image? If so, you > might try to dd from the image to a partition of original size as the > previous root, then adjust with gparted or fpart, and see if it is > readable. > > So instead of trying to restore a btrfs file structure, why not just > restore a partition with dd that happens to contain a btrfs file > structure, and then adjust the partition size to match the original? > btrfs cares about the tree structures, etc. dd does not. > > What you did is not unusual, and can work fine with a number of file > structures, but the potential for disaster with dd is also great. The > only thing I know of in btrfs that does a similar thing is: > > btrfs send -f btrfs-send-image-file /mount/read-write-snapshot > > Chances are, of course, good that without having current backups dd > could potentially ruin the rest of your file system set up, so maybe > transfer the image over to another machine that is expendable and test > this out. I use btrfs on root and zfs for data, and make lots of > snapshots and send them to incremental backups (mostly zfs, but btrfs > works nicely with Ubuntu on root, with the occasional balance > problem). > > If dd did it, dd might be able to fix it. Do that first before you > try to restore btrfs file structures. > > Or is this a terrible idea? Someone else on the list should say so if > they know otherwise. > > Gordon -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs recovery
<< Hi btrfs experts. Hereby I apply for the stupidity of the month award. >> I have no doubt that I will will mount a serious challenge to you for that title, so you haven't won yet. Why not dd the image back onto the original partition (or another partition identical in size) and see if that is readable? My limited experience with btrfs (I am not an expert) is that read only snapshots work well in this situation, but the initial hurdle is using dd to get the image back onto a partition. So I wonder if you could dd the image back onto the original media (the hd sdd), then make a read only snapshot, and then send the snapshot with btrfs send to another storage medium. With any luck, the machine might boot, and you might find other snapshots which you may be able to turn into read only snaps for btrfs send. This has worked for me on Ubuntu 14 for quite some time, but luckily I have not had to restore the image file sent from btrfs send yet. I say luckily, because I realise now that the image created from btrfs send should be tested, but so far no catastrophic failures with my root partition have occurred (knock on wood). dd is (like dumpfs, ddrescue, and the bsd variations) good for what it tries to do, but not so great on for some file systems for more intricate uses. But why not try: dd if=imagefile.dd of=/dev/sdaX and see if it boots? If it does not, then perhaps you have another shot at the one time mount for btrfs rw if that works. Or is your root partition now running fine under Suse 14.2, and you are just looking to recover a file files from the image? If so, you might try to dd from the image to a partition of original size as the previous root, then adjust with gparted or fpart, and see if it is readable. So instead of trying to restore a btrfs file structure, why not just restore a partition with dd that happens to contain a btrfs file structure, and then adjust the partition size to match the original? btrfs cares about the tree structures, etc. dd does not. What you did is not unusual, and can work fine with a number of file structures, but the potential for disaster with dd is also great. The only thing I know of in btrfs that does a similar thing is: btrfs send -f btrfs-send-image-file /mount/read-write-snapshot Chances are, of course, good that without having current backups dd could potentially ruin the rest of your file system set up, so maybe transfer the image over to another machine that is expendable and test this out. I use btrfs on root and zfs for data, and make lots of snapshots and send them to incremental backups (mostly zfs, but btrfs works nicely with Ubuntu on root, with the occasional balance problem). If dd did it, dd might be able to fix it. Do that first before you try to restore btrfs file structures. Or is this a terrible idea? Someone else on the list should say so if they know otherwise. Gordon On Mon, Jan 30, 2017 at 3:16 PM, Hans van Kranenburgwrote: > On 01/30/2017 10:07 PM, Michael Born wrote: >> >> >> Am 30.01.2017 um 21:51 schrieb Chris Murphy: >>> On Mon, Jan 30, 2017 at 1:02 PM, Michael Born >>> wrote: Hi btrfs experts. Hereby I apply for the stupidity of the month award. >>> >>> There's still another day :-D >>> >>> >>> Before switching from Suse 13.2 to 42.2, I copied my / partition with dd to an image file - while the system was online/running. Now, I can't mount the image. >>> >>> That won't ever work for any file system. It must be unmounted. >> >> I could mount and copy the data out of my /home image.dd (encrypted >> xfs). That was also online while dd-ing it. >> Could you give me some instructions how to repair the file system or extract some files from it? >>> >>> Not possible. The file system was being modified while dd was >>> happening, so the image you've taken is inconsistent. >> >> The files I'm interested in (fstab, NetworkManager.conf, ...) didn't >> change for months. Why would they change in the moment I copy their >> blocks with dd? > > The metadata of btrfs is organized in a bunch of tree structures. The > top of the trees (the smallest parts, trees are upside-down here /\ ) > and the superblock get modified quite often. Every time a tree gets > modified, the new modified parts are written as a modified copy in > unused space. > > So even if the files themselves do not change... if you miss those new > writes which are being done in space that your dd already left behind... > you end up with old and new parts of trees all over the place. > > In other words, a big puzzle with parts that do not connect with each > other any more. > > And that's exactly what you see in all the errors. E.g. "parent transid > verify failed on 32869482496 wanted 550112 found 550121" <- a part of a > tree points to another part, but suddenly something else is found which > should not be there. In this case wanted 550112 found 550121
Re: btrfs recovery
Am 30.01.2017 um 22:20 schrieb Chris Murphy: > On Mon, Jan 30, 2017 at 2:07 PM, Michael Bornwrote: >> The files I'm interested in (fstab, NetworkManager.conf, ...) didn't >> change for months. Why would they change in the moment I copy their >> blocks with dd? > > They didn't change. The file system changed. While dd is reading, it > might be minutes between capturing different parts of the file system, > and each superblock is in different locations on the disk, > guaranteeing that if the dd takes more than 30 seconds, your dd image > has different generation super blocks. Btrfs notices this at mount > time and will refuse to mount because the file system is inconsistent. > > It is certainly possible to fix this, but it's likely to be really, > really tedious. The existing tools don't take this use case into > account. > > Maybe btfs-find-root can come up with some suggestions and you can use > btrfs restore -t with the bytenr from find root, to see if you can get > this old data, ignoring the changes that don't affect the old data. > > What you do with this is btrfs-find-root and see what it comes up > with. And work with the most recent (highest) generation going > backward, plugging in the bytenr into btrfs restore with -t option. > You'll also want to use the dry run to see if you're getting what you > want. It's best to use the exact path if you know it, this takes much > less time for it to search all files in a given tree. If you don't > know the exact path, but you know part of a file name, then you'll > need to use the regex option; or just let it dump everything it can > from the image and go dumpster diving... I really want to try the "btrfs-find-root / btrfs restore -t" method. But, btrfs-find-root gives me just the 3 lines output and then nothing for 16 hours. I think, I saw a similar report that the tool just doesn't report back in the mailing list archive (btrfs-find-root duration? Markus Binsteiner Sat, 10 Dec 2016 16:12:25 -0800) ./btrfs-find-root /dev/loop0 Couldn't read tree root Superblock thinks the generation is 550114 Superblock thinks the level is 1 Hans, also thank you for the explanation even though I'm not sure, I understand. I would be happy with older parts of the tree which then have lower numbers than the 550112. Michael -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs recovery
On Mon, Jan 30, 2017 at 2:20 PM, Chris Murphy <li...@colorremedies.com> wrote: > What people do with huge databases, which have this same problem, > they'll take a volume snapshot. This first commits everything in > flight, freezes the fs so no more changes can happen, then takes a > snapshot, then unfreezes the original so the database can stay online. > The freeze takes maybe a second or maybe a bit longer depending on how > much stuff needs to be committed to stable media. Then backup the > snapshot as a read-only volume. Once the backups is done, delete the > snapshot.I In Btrfs land, the way to do it is snapshot a subvolume, and then rsync or 'btrfs send' the contents of the snapshot somewhere. I actually often use this for whole volume backups: ## this will capture /boot and /boot/efi on separate file systems and put the tar on Btrfs root. cd / tar -acf boot.tar.gz boot/ ## My subvolumes are at the top level, fstab mounts them specifically, so mount the top level to get access sudo mount -o noatime /mnt ## Take a snapshot of rootfs sudo btrfs sub snap - r /mnt/root /mnt/root.20170130 ## Send it to remote server sudo btrfs send /mnt/root.20170130 | ssh chris@server "cat - > ~/root.20170130.btrfs' ## Restore it from server, assumes the subvolume/snapshot does not exist ssh chris@server "sudo btrfs send -f ~/root.20170130.btrfs" | sudo btrfs receive /mnt/ The same can be done with incremental images, but of course you need all the the files and named in a sane way so you know in what order to restore them since those incrementals are parent/child specific. The other thing this avoids, critically, is the form of corruptions of Btrfs whenever two or more of the same volume (by UUID) appears to the kernel at the same time, and one of them is mounted. See gotchas of block level copies. https://btrfs.wiki.kernel.org/index.php/Gotchas -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs recovery
On Mon, Jan 30, 2017 at 2:07 PM, Michael Bornwrote: > > > Am 30.01.2017 um 21:51 schrieb Chris Murphy: >> On Mon, Jan 30, 2017 at 1:02 PM, Michael Born >> wrote: >>> Hi btrfs experts. >>> >>> Hereby I apply for the stupidity of the month award. >> >> There's still another day :-D >> >> >> >>> >>> Before switching from Suse 13.2 to 42.2, I copied my / partition with dd >>> to an image file - while the system was online/running. >>> Now, I can't mount the image. >> >> That won't ever work for any file system. It must be unmounted. > > I could mount and copy the data out of my /home image.dd (encrypted > xfs). That was also online while dd-ing it. If there are no substantial writes happening, it's possible it'll behave like a power failure, read the journal and continue possibly with the most recent commits being lost. But any substantial amount of writes means some part of the volume is changed, and the update reflecting that change is elsewhere, meanwhile the dd is capturing the volume at different points in time rather than exactly as it is. It's just not workable. What people do with huge databases, which have this same problem, they'll take a volume snapshot. This first commits everything in flight, freezes the fs so no more changes can happen, then takes a snapshot, then unfreezes the original so the database can stay online. The freeze takes maybe a second or maybe a bit longer depending on how much stuff needs to be committed to stable media. Then backup the snapshot as a read-only volume. Once the backups is done, delete the snapshot. > >>> Could you give me some instructions how to repair the file system or >>> extract some files from it? >> >> Not possible. The file system was being modified while dd was >> happening, so the image you've taken is inconsistent. > > The files I'm interested in (fstab, NetworkManager.conf, ...) didn't > change for months. Why would they change in the moment I copy their > blocks with dd? They didn't change. The file system changed. While dd is reading, it might be minutes between capturing different parts of the file system, and each superblock is in different locations on the disk, guaranteeing that if the dd takes more than 30 seconds, your dd image has different generation super blocks. Btrfs notices this at mount time and will refuse to mount because the file system is inconsistent. It is certainly possible to fix this, but it's likely to be really, really tedious. The existing tools don't take this use case into account. Maybe btfs-find-root can come up with some suggestions and you can use btrfs restore -t with the bytenr from find root, to see if you can get this old data, ignoring the changes that don't affect the old data. What you do with this is btrfs-find-root and see what it comes up with. And work with the most recent (highest) generation going backward, plugging in the bytenr into btrfs restore with -t option. You'll also want to use the dry run to see if you're getting what you want. It's best to use the exact path if you know it, this takes much less time for it to search all files in a given tree. If you don't know the exact path, but you know part of a file name, then you'll need to use the regex option; or just let it dump everything it can from the image and go dumpster diving... -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs recovery
On 01/30/2017 10:07 PM, Michael Born wrote: > > > Am 30.01.2017 um 21:51 schrieb Chris Murphy: >> On Mon, Jan 30, 2017 at 1:02 PM, Michael Born>> wrote: >>> Hi btrfs experts. >>> >>> Hereby I apply for the stupidity of the month award. >> >> There's still another day :-D >> >> >> >>> >>> Before switching from Suse 13.2 to 42.2, I copied my / partition with dd >>> to an image file - while the system was online/running. >>> Now, I can't mount the image. >> >> That won't ever work for any file system. It must be unmounted. > > I could mount and copy the data out of my /home image.dd (encrypted > xfs). That was also online while dd-ing it. > >>> Could you give me some instructions how to repair the file system or >>> extract some files from it? >> >> Not possible. The file system was being modified while dd was >> happening, so the image you've taken is inconsistent. > > The files I'm interested in (fstab, NetworkManager.conf, ...) didn't > change for months. Why would they change in the moment I copy their > blocks with dd? The metadata of btrfs is organized in a bunch of tree structures. The top of the trees (the smallest parts, trees are upside-down here /\ ) and the superblock get modified quite often. Every time a tree gets modified, the new modified parts are written as a modified copy in unused space. So even if the files themselves do not change... if you miss those new writes which are being done in space that your dd already left behind... you end up with old and new parts of trees all over the place. In other words, a big puzzle with parts that do not connect with each other any more. And that's exactly what you see in all the errors. E.g. "parent transid verify failed on 32869482496 wanted 550112 found 550121" <- a part of a tree points to another part, but suddenly something else is found which should not be there. In this case wanted 550112 found 550121 means it's bumping into something "from the future". Whaaa.. -- Hans van Kranenburg -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs recovery
Am 30.01.2017 um 21:51 schrieb Chris Murphy: > On Mon, Jan 30, 2017 at 1:02 PM, Michael Bornwrote: >> Hi btrfs experts. >> >> Hereby I apply for the stupidity of the month award. > > There's still another day :-D > > > >> >> Before switching from Suse 13.2 to 42.2, I copied my / partition with dd >> to an image file - while the system was online/running. >> Now, I can't mount the image. > > That won't ever work for any file system. It must be unmounted. I could mount and copy the data out of my /home image.dd (encrypted xfs). That was also online while dd-ing it. >> Could you give me some instructions how to repair the file system or >> extract some files from it? > > Not possible. The file system was being modified while dd was > happening, so the image you've taken is inconsistent. The files I'm interested in (fstab, NetworkManager.conf, ...) didn't change for months. Why would they change in the moment I copy their blocks with dd? Michael -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: fix leak of subvolume writers counter
On Sat, Jan 28, 2017 at 06:06:54AM +, fdman...@kernel.org wrote: > From: Robbie Ko> > When falling back from a nocow write to a regular cow write, we were > leaking the subvolume writers counter in 2 situations, preventing > snapshot creation from ever completing in the future, as it waits > for that counter to go down to zero before the snapshot creation > starts. > > In run_delalloc_nocow, maybe not release subv_writers conter, > will lead to create snapshot hang. Reviewed-by: Liu Bo Thanks, -liubo > > Signed-off-by: Robbie Ko > Reviewed-by: Filipe Manana > [Improved changelog and subject] > Signed-off-by: Filipe Manana > --- > fs/btrfs/inode.c | 10 -- > 1 file changed, 8 insertions(+), 2 deletions(-) > > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c > index a713d9d..7221d66 100644 > --- a/fs/btrfs/inode.c > +++ b/fs/btrfs/inode.c > @@ -1404,10 +1404,16 @@ static noinline int run_delalloc_nocow(struct inode > *inode, >* either valid or do not exist. >*/ > if (csum_exist_in_range(fs_info, disk_bytenr, > - num_bytes)) > + num_bytes)) { > + if (!nolock) > + btrfs_end_write_no_snapshoting(root); > goto out_check; > - if (!btrfs_inc_nocow_writers(fs_info, disk_bytenr)) > + } > + if (!btrfs_inc_nocow_writers(fs_info, disk_bytenr)) { > + if (!nolock) > + btrfs_end_write_no_snapshoting(root); > goto out_check; > + } > nocow = 1; > } else if (extent_type == BTRFS_FILE_EXTENT_INLINE) { > extent_end = found_key.offset + > -- > 2.7.0.rc3 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hard crash on 4.9.5, part 2
I have an error on this file system I've had in the distant pass where the mount would fail with a "file exists" error. Running a btrfs check gives the following over and over again: Found file extent holes: start: 0, len: 290816 root 257 inode 28472371 errors 1000, some csum missing root 257 inode 28472416 errors 1000, some csum missing root 257 inode 9182183 errors 1000, some csum missing root 257 inode 9182186 errors 1000, some csum missing root 257 inode 28419536 errors 1100, file extent discount, some csum missing Found file extent holes: start: 0, len: 290816 root 257 inode 28472371 errors 1000, some csum missing root 257 inode 28472416 errors 1000, some csum missing root 257 inode 9182183 errors 1000, some csum missing root 257 inode 9182186 errors 1000, some csum missing root 257 inode 28419536 errors 1100, file extent discount, some csum missing Are these found per subvolume snapshot I have and will eventually end? Here is the crash after the mount (with recovery/usebackuproot): [ 627.233213] BTRFS warning (device sda1): 'recovery' is deprecated, use 'usebackuproot' instead [ 627.233216] BTRFS info (device sda1): trying to use backup root at mount time [ 627.233218] BTRFS info (device sda1): disk space caching is enabled [ 627.233220] BTRFS info (device sda1): has skinny extents [ 709.234688] [ cut here ] [ 709.234734] WARNING: CPU: 5 PID: 3468 at fs/btrfs/file.c:546 btrfs_drop_extent_cache+0x3e8/0x400 [btrfs] [ 709.234735] Modules linked in: ipmi_devintf nfsd auth_rpcgss nfs_acl nfs lockd grace sunrpc fscache lp parport intel_rapl sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel xt_tcpudp kvm nf_conntrack_ipv4 nf_defrag_ipv4 irqbypass crct10d if_pclmul crc32_pclmul ghash_clmulni_intel xt_conntrack aesni_intel btrfs nf_conntrack aes_x86_64 lrw gf128mul iptable_filter glue_h elper ip_tables ablk_helper cryptd x_tables dm_multipath joydev mei_me ioatdma mei lpc_ich wmi ipmi_si ipmi_msghandler shpchp mac_hi d ses enclosure scsi_transport_sas raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor hid_generic megarai d_sas raid6_pq ahci libcrc32c libahci igb usbhid raid1 hid i2c_algo_bit raid0 dca ptp multipath pps_core linear dm_mirror dm_region_ hash dm_log [ 709.234812] CPU: 5 PID: 3468 Comm: mount Not tainted 4.9.5-custom #1 [ 709.234813] Hardware name: Supermicro X9DRH-7TF/7F/iTF/iF/X9DRH-7TF/7F/iTF/iF, BIOS 3.0b 04/28/2014 [ 709.234816] bd3784bb7568 8e3c8e7c [ 709.234820] bd3784bb75a8 8e07d3d1 02220070 9e5f0ae4d150 [ 709.234823] 0002d000 9e5f0bc91f78 9e5f0bc91da8 0002c000 [ 709.234827] Call Trace: [ 709.234837] [] dump_stack+0x63/0x87 [ 709.234846] [] __warn+0xd1/0xf0 [ 709.234850] [] warn_slowpath_null+0x1d/0x20 [ 709.234874] [] btrfs_drop_extent_cache+0x3e8/0x400 [btrfs] [ 709.234895] [] __btrfs_drop_extents+0x5b2/0xd30 [btrfs] [ 709.234914] [] ? generic_bin_search.constprop.36+0x8b/0x1e0 [btrfs] [ 709.234931] [] ? btrfs_set_path_blocking+0x36/0x70 [btrfs] [ 709.234942] [] ? kmem_cache_alloc+0x194/0x1a0 [ 709.234958] [] ? btrfs_alloc_path+0x1a/0x20 [btrfs] [ 709.234977] [] btrfs_drop_extents+0x79/0xa0 [btrfs] [ 709.235002] [] replay_one_extent+0x414/0x7b0 [btrfs] [ 709.235007] [] ? autoremove_wake_function+0x40/0x40 [ 709.235030] [] replay_one_buffer+0x4cc/0x7c0 [btrfs] [ 709.235053] [] ? mark_extent_buffer_accessed+0x4f/0x70 [btrfs] [ 709.235074] [] walk_down_log_tree+0x1ba/0x3b0 [btrfs] [ 709.235094] [] walk_log_tree+0xb4/0x1a0 [btrfs] [ 709.235114] [] btrfs_recover_log_trees+0x20e/0x460 [btrfs] [ 709.235133] [] ? replay_one_extent+0x7b0/0x7b0 [btrfs] [ 709.235154] [] open_ctree+0x2640/0x27f0 [btrfs] [ 709.235171] [] btrfs_mount+0xca4/0xec0 [btrfs] [ 709.235176] [] ? find_next_zero_bit+0x1e/0x20 [ 709.235180] [] ? pcpu_next_unpop+0x3e/0x50 [ 709.235184] [] ? find_next_bit+0x19/0x20 [ 709.235190] [] mount_fs+0x39/0x160 [ 709.235193] [] ? __alloc_percpu+0x15/0x20 [ 709.235196] [] vfs_kern_mount+0x67/0x110 [ 709.235213] [] btrfs_mount+0x18b/0xec0 [btrfs] [ 709.235216] [] ? find_next_zero_bit+0x1e/0x20 [ 709.235220] [] mount_fs+0x39/0x160 [ 709.235223] [] ? __alloc_percpu+0x15/0x20 [ 709.235225] [] vfs_kern_mount+0x67/0x110 [ 709.235228] [] do_mount+0x1bb/0xc80 [ 709.235232] [] ? kmem_cache_alloc_trace+0x14b/0x1b0 [ 709.235235] [] SyS_mount+0x83/0xd0 [ 709.235240] [] entry_SYSCALL_64_fastpath+0x1e/0xad [ 709.235243] ---[ end trace d4e5dcddb432b7d3 ]--- [ 709.354972] BTRFS: error (device sda1) in btrfs_replay_log:2506: errno=-17 Object already exists (Failed to recover log tree) [ 709.355570] BTRFS error (device sda1): cleaner transaction attach returned -30 [ 709.548919] BTRFS error (device sda1): open_ctree failed -Matt -- To unsubscribe from this list: send the line "unsubscribe
Re: btrfs recovery
On Mon, Jan 30, 2017 at 1:02 PM, Michael Bornwrote: > Hi btrfs experts. > > Hereby I apply for the stupidity of the month award. There's still another day :-D > > Before switching from Suse 13.2 to 42.2, I copied my / partition with dd > to an image file - while the system was online/running. > Now, I can't mount the image. That won't ever work for any file system. It must be unmounted. > Could you give me some instructions how to repair the file system or > extract some files from it? Not possible. The file system was being modified while dd was happening, so the image you've taken is inconsistent. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: remove duplicated find_get_pages_contig
This creates a helper to manipulate page bits to avoid duplicate uses. Signed-off-by: Liu Bo--- fs/btrfs/extent_io.c | 202 --- fs/btrfs/extent_io.h | 3 +- 2 files changed, 98 insertions(+), 107 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index d5f3edb..136fe96 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -1549,94 +1549,122 @@ static noinline u64 find_delalloc_range(struct extent_io_tree *tree, return found; } -static noinline void __unlock_for_delalloc(struct inode *inode, - struct page *locked_page, - u64 start, u64 end) +/* + * index_ret: record where we stop + * This only returns errors when page_ops has PAGE_LOCK. + */ +static int +__process_pages_contig(struct address_space *mapping, struct page *locked_page, + pgoff_t start_index, pgoff_t end_index, + unsigned long page_ops, pgoff_t *index_ret) { - int ret; + unsigned long nr_pages = end_index - start_index + 1; + pgoff_t index = start_index; struct page *pages[16]; - unsigned long index = start >> PAGE_SHIFT; - unsigned long end_index = end >> PAGE_SHIFT; - unsigned long nr_pages = end_index - index + 1; + unsigned pages_locked = 0; + unsigned ret; + int err = 0; int i; - if (index == locked_page->index && end_index == index) - return; + /* +* Do NOT skip locked_page since we may need to set PagePrivate2 on it. +*/ - while (nr_pages > 0) { - ret = find_get_pages_contig(inode->i_mapping, index, -min_t(unsigned long, nr_pages, -ARRAY_SIZE(pages)), pages); - for (i = 0; i < ret; i++) { - if (pages[i] != locked_page) - unlock_page(pages[i]); - put_page(pages[i]); - } - nr_pages -= ret; - index += ret; - cond_resched(); + /* PAGE_LOCK should not be mixed with other ops. */ + if (page_ops & PAGE_LOCK) { + ASSERT(page_ops == PAGE_LOCK); + ASSERT(index_ret); + ASSERT(*index_ret == start_index); } -} -static noinline int lock_delalloc_pages(struct inode *inode, - struct page *locked_page, - u64 delalloc_start, - u64 delalloc_end) -{ - unsigned long index = delalloc_start >> PAGE_SHIFT; - unsigned long start_index = index; - unsigned long end_index = delalloc_end >> PAGE_SHIFT; - unsigned long pages_locked = 0; - struct page *pages[16]; - unsigned long nrpages; - int ret; - int i; - - /* the caller is responsible for locking the start index */ - if (index == locked_page->index && index == end_index) - return 0; + if ((page_ops & PAGE_SET_ERROR) && nr_pages > 0) + mapping_set_error(mapping, -EIO); - /* skip the page at the start index */ - nrpages = end_index - index + 1; - while (nrpages > 0) { - ret = find_get_pages_contig(inode->i_mapping, index, + while (nr_pages > 0) { + ret = find_get_pages_contig(mapping, index, min_t(unsigned long, -nrpages, ARRAY_SIZE(pages)), pages); +nr_pages, ARRAY_SIZE(pages)), pages); if (ret == 0) { - ret = -EAGAIN; - goto done; - } - /* now we have an array of pages, lock them all */ - for (i = 0; i < ret; i++) { /* -* the caller is taking responsibility for -* locked_page +* Only if we're going to lock these pages, can we find +* nothing at @index. */ + ASSERT(page_ops & PAGE_LOCK); + goto out; + } + + for (i = 0; i < ret; i++) { + if (page_ops & PAGE_SET_PRIVATE2) + SetPagePrivate2(pages[i]); + if (pages[i] != locked_page) { - lock_page(pages[i]); - if (!PageDirty(pages[i]) || - pages[i]->mapping != inode->i_mapping) { - ret = -EAGAIN; + if (page_ops & PAGE_CLEAR_DIRTY) + clear_page_dirty_for_io(pages[i]); +
Re: btrfs recovery
On 01/30/2017 09:02 PM, Michael Born wrote: > Hi btrfs experts. > > Hereby I apply for the stupidity of the month award. > But, maybe you can help me restoring my dd backup or extracting some > files from it? > > Before switching from Suse 13.2 to 42.2, I copied my / partition with dd > to an image file - while the system was online/running. > Now, I can't mount the image. Making a block level copy of a filesystem while it is online and being modified has a near 100% chance of producing a corrupt result. Simply think of the fact that something gets written somewhere at the end of the disk which also relates to something that gets written at the beginning of the disk, while your dd copy is doing its thing somewhere in between... -- Hans van Kranenburg -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: create a helper to create em for IO
We have similar codes to create and insert extent mapping around IO path, this merges them into a single helper. Signed-off-by: Liu Bo--- fs/btrfs/inode.c | 187 +-- 1 file changed, 72 insertions(+), 115 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 3d3753a..5e28355 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -108,11 +108,10 @@ static noinline int cow_file_range(struct inode *inode, u64 start, u64 end, u64 delalloc_end, int *page_started, unsigned long *nr_written, int unlock, struct btrfs_dedupe_hash *hash); -static struct extent_map *create_pinned_em(struct inode *inode, u64 start, - u64 len, u64 orig_start, - u64 block_start, u64 block_len, - u64 orig_block_len, u64 ram_bytes, - int type); +static struct extent_map * +create_io_em(struct inode *inode, u64 start, u64 len, u64 orig_start, +u64 block_start, u64 block_len, u64 orig_block_len, +u64 ram_bytes, int compress_type, int type); static int btrfs_dirty_inode(struct inode *inode); @@ -690,7 +689,6 @@ static noinline void submit_compressed_extents(struct inode *inode, struct btrfs_key ins; struct extent_map *em; struct btrfs_root *root = BTRFS_I(inode)->root; - struct extent_map_tree *em_tree = _I(inode)->extent_tree; struct extent_io_tree *io_tree; int ret = 0; @@ -778,46 +776,19 @@ static noinline void submit_compressed_extents(struct inode *inode, * here we're doing allocation and writeback of the * compressed pages */ - btrfs_drop_extent_cache(inode, async_extent->start, - async_extent->start + - async_extent->ram_size - 1, 0); - - em = alloc_extent_map(); - if (!em) { - ret = -ENOMEM; - goto out_free_reserve; - } - em->start = async_extent->start; - em->len = async_extent->ram_size; - em->orig_start = em->start; - em->mod_start = em->start; - em->mod_len = em->len; - - em->block_start = ins.objectid; - em->block_len = ins.offset; - em->orig_block_len = ins.offset; - em->ram_bytes = async_extent->ram_size; - em->bdev = fs_info->fs_devices->latest_bdev; - em->compress_type = async_extent->compress_type; - set_bit(EXTENT_FLAG_PINNED, >flags); - set_bit(EXTENT_FLAG_COMPRESSED, >flags); - em->generation = -1; - - while (1) { - write_lock(_tree->lock); - ret = add_extent_mapping(em_tree, em, 1); - write_unlock(_tree->lock); - if (ret != -EEXIST) { - free_extent_map(em); - break; - } - btrfs_drop_extent_cache(inode, async_extent->start, - async_extent->start + - async_extent->ram_size - 1, 0); - } - - if (ret) + em = create_io_em(inode, async_extent->start, + async_extent->ram_size, /* len */ + async_extent->start, /* orig_start */ + ins.objectid, /* block_start */ + ins.offset, /* block_len */ + ins.offset, /* orig_block_len */ + async_extent->ram_size, /* ram_bytes */ + async_extent->compress_type, + BTRFS_ORDERED_COMPRESSED); + if (IS_ERR(em)) + /* ret value is not necessary due to void function */ goto out_free_reserve; + free_extent_map(em); ret = btrfs_add_ordered_extent_compress(inode, async_extent->start, @@ -952,7 +923,6 @@ static noinline int cow_file_range(struct inode *inode, u64 blocksize = fs_info->sectorsize; struct btrfs_key ins; struct extent_map *em; - struct extent_map_tree *em_tree = _I(inode)->extent_tree; int ret = 0; if (btrfs_is_free_space_inode(inode)) { @@ -1008,39 +978,18 @@ static noinline int cow_file_range(struct inode *inode, if (ret < 0) goto
[PATCH] Btrfs: kill trans in run_delalloc_nocow and btrfs_cross_ref_exist
run_delalloc_nocow has used trans in two places where they don't actually need @trans. For btrfs_lookup_file_extent, we search for file extents without COWing anything, and for btrfs_cross_ref_exist, the only place where we need @trans is deferencing it in order to get running_transaction which we could easily get from the global fs_info. Signed-off-by: Liu Bo--- fs/btrfs/ctree.h | 3 +-- fs/btrfs/extent-tree.c | 22 -- fs/btrfs/inode.c | 38 +++--- 3 files changed, 16 insertions(+), 47 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 6a82371..73b2d51 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -2577,8 +2577,7 @@ int btrfs_pin_extent_for_log_replay(struct btrfs_fs_info *fs_info, u64 bytenr, u64 num_bytes); int btrfs_exclude_logged_extents(struct btrfs_fs_info *fs_info, struct extent_buffer *eb); -int btrfs_cross_ref_exist(struct btrfs_trans_handle *trans, - struct btrfs_root *root, +int btrfs_cross_ref_exist(struct btrfs_root *root, u64 objectid, u64 offset, u64 bytenr); struct btrfs_block_group_cache *btrfs_lookup_block_group( struct btrfs_fs_info *info, diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index ed254b8..097fa4a 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -3025,8 +3025,7 @@ int btrfs_set_disk_extent_flags(struct btrfs_trans_handle *trans, return ret; } -static noinline int check_delayed_ref(struct btrfs_trans_handle *trans, - struct btrfs_root *root, +static noinline int check_delayed_ref(struct btrfs_root *root, struct btrfs_path *path, u64 objectid, u64 offset, u64 bytenr) { @@ -3034,9 +3033,14 @@ static noinline int check_delayed_ref(struct btrfs_trans_handle *trans, struct btrfs_delayed_ref_node *ref; struct btrfs_delayed_data_ref *data_ref; struct btrfs_delayed_ref_root *delayed_refs; + struct btrfs_transaction *cur_trans; int ret = 0; - delayed_refs = >transaction->delayed_refs; + cur_trans = root->fs_info->running_transaction; + if (!cur_trans) + return 0; + + delayed_refs = _trans->delayed_refs; spin_lock(_refs->lock); head = btrfs_find_delayed_ref_head(delayed_refs, bytenr); if (!head) { @@ -3087,8 +3091,7 @@ static noinline int check_delayed_ref(struct btrfs_trans_handle *trans, return ret; } -static noinline int check_committed_ref(struct btrfs_trans_handle *trans, - struct btrfs_root *root, +static noinline int check_committed_ref(struct btrfs_root *root, struct btrfs_path *path, u64 objectid, u64 offset, u64 bytenr) { @@ -3159,9 +3162,8 @@ static noinline int check_committed_ref(struct btrfs_trans_handle *trans, return ret; } -int btrfs_cross_ref_exist(struct btrfs_trans_handle *trans, - struct btrfs_root *root, - u64 objectid, u64 offset, u64 bytenr) +int btrfs_cross_ref_exist(struct btrfs_root *root, u64 objectid, u64 offset, + u64 bytenr) { struct btrfs_path *path; int ret; @@ -3172,12 +3174,12 @@ int btrfs_cross_ref_exist(struct btrfs_trans_handle *trans, return -ENOENT; do { - ret = check_committed_ref(trans, root, path, objectid, + ret = check_committed_ref(root, path, objectid, offset, bytenr); if (ret && ret != -ENOENT) goto out; - ret2 = check_delayed_ref(trans, root, path, objectid, + ret2 = check_delayed_ref(root, path, objectid, offset, bytenr); } while (ret2 == -EAGAIN); diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 082b968..3d3753a 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -1250,7 +1250,6 @@ static noinline int run_delalloc_nocow(struct inode *inode, { struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); struct btrfs_root *root = BTRFS_I(inode)->root; - struct btrfs_trans_handle *trans; struct extent_buffer *leaf; struct btrfs_path *path; struct btrfs_file_extent_item *fi; @@ -1286,30 +1285,10 @@ static noinline int run_delalloc_nocow(struct inode *inode, nolock = btrfs_is_free_space_inode(inode); - if (nolock) - trans = btrfs_join_transaction_nolock(root); - else - trans = btrfs_join_transaction(root); - - if (IS_ERR(trans)) { -
[PATCH] Btrfs: pass delayed_refs directly to btrfs_find_delayed_ref_head
All we need is @delayed_refs, all callers have get it ahead of calling btrfs_find_delayed_ref_head since lock needs to be acquired firstly, there is no reason to deference it again inside the function. Signed-off-by: Liu Bo--- fs/btrfs/backref.c | 2 +- fs/btrfs/delayed-ref.c | 5 + fs/btrfs/delayed-ref.h | 3 ++- fs/btrfs/extent-tree.c | 6 +++--- 4 files changed, 7 insertions(+), 9 deletions(-) diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c index 8299601..db70659 100644 --- a/fs/btrfs/backref.c +++ b/fs/btrfs/backref.c @@ -1284,7 +1284,7 @@ static int find_parent_nodes(struct btrfs_trans_handle *trans, */ delayed_refs = >transaction->delayed_refs; spin_lock(_refs->lock); - head = btrfs_find_delayed_ref_head(trans, bytenr); + head = btrfs_find_delayed_ref_head(delayed_refs, bytenr); if (head) { if (!mutex_trylock(>mutex)) { atomic_inc(>node.refs); diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c index ef724a5..aebb48c 100644 --- a/fs/btrfs/delayed-ref.c +++ b/fs/btrfs/delayed-ref.c @@ -911,11 +911,8 @@ int btrfs_add_delayed_extent_op(struct btrfs_fs_info *fs_info, * the head node if any where found, or NULL if not. */ struct btrfs_delayed_ref_head * -btrfs_find_delayed_ref_head(struct btrfs_trans_handle *trans, u64 bytenr) +btrfs_find_delayed_ref_head(struct btrfs_delayed_ref_root *delayed_refs, u64 bytenr) { - struct btrfs_delayed_ref_root *delayed_refs; - - delayed_refs = >transaction->delayed_refs; return find_ref_head(_refs->href_root, bytenr, 0); } diff --git a/fs/btrfs/delayed-ref.h b/fs/btrfs/delayed-ref.h index 50947b5..22ca93b 100644 --- a/fs/btrfs/delayed-ref.h +++ b/fs/btrfs/delayed-ref.h @@ -262,7 +262,8 @@ void btrfs_merge_delayed_refs(struct btrfs_trans_handle *trans, struct btrfs_delayed_ref_head *head); struct btrfs_delayed_ref_head * -btrfs_find_delayed_ref_head(struct btrfs_trans_handle *trans, u64 bytenr); +btrfs_find_delayed_ref_head(struct btrfs_delayed_ref_root *delayed_refs, + u64 bytenr); int btrfs_delayed_ref_lock(struct btrfs_trans_handle *trans, struct btrfs_delayed_ref_head *head); static inline void btrfs_delayed_ref_unlock(struct btrfs_delayed_ref_head *head) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index e97302f..ed254b8 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -888,7 +888,7 @@ int btrfs_lookup_extent_info(struct btrfs_trans_handle *trans, delayed_refs = >transaction->delayed_refs; spin_lock(_refs->lock); - head = btrfs_find_delayed_ref_head(trans, bytenr); + head = btrfs_find_delayed_ref_head(delayed_refs, bytenr); if (head) { if (!mutex_trylock(>mutex)) { atomic_inc(>node.refs); @@ -3038,7 +3038,7 @@ static noinline int check_delayed_ref(struct btrfs_trans_handle *trans, delayed_refs = >transaction->delayed_refs; spin_lock(_refs->lock); - head = btrfs_find_delayed_ref_head(trans, bytenr); + head = btrfs_find_delayed_ref_head(delayed_refs, bytenr); if (!head) { spin_unlock(_refs->lock); return 0; @@ -7092,7 +7092,7 @@ static noinline int check_ref_cleanup(struct btrfs_trans_handle *trans, delayed_refs = >transaction->delayed_refs; spin_lock(_refs->lock); - head = btrfs_find_delayed_ref_head(trans, bytenr); + head = btrfs_find_delayed_ref_head(delayed_refs, bytenr); if (!head) goto out_delayed_unlock; -- 2.5.5 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: remove unused trans in read_block_for_search
@trans is not used at all, this removes it. Signed-off-by: Liu Bo--- fs/btrfs/ctree.c | 17 - 1 file changed, 8 insertions(+), 9 deletions(-) diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c index a426dc8..dd8014a 100644 --- a/fs/btrfs/ctree.c +++ b/fs/btrfs/ctree.c @@ -2437,10 +2437,9 @@ noinline void btrfs_unlock_up_safe(struct btrfs_path *path, int level) * reada. -EAGAIN is returned and the search must be repeated. */ static int -read_block_for_search(struct btrfs_trans_handle *trans, - struct btrfs_root *root, struct btrfs_path *p, - struct extent_buffer **eb_ret, int level, int slot, - struct btrfs_key *key, u64 time_seq) +read_block_for_search(struct btrfs_root *root, struct btrfs_path *p, + struct extent_buffer **eb_ret, int level, int slot, + struct btrfs_key *key, u64 time_seq) { struct btrfs_fs_info *fs_info = root->fs_info; u64 blocknr; @@ -2870,8 +2869,8 @@ int btrfs_search_slot(struct btrfs_trans_handle *trans, struct btrfs_root goto done; } - err = read_block_for_search(trans, root, p, - , level, slot, key, 0); + err = read_block_for_search(root, p, , level, + slot, key, 0); if (err == -EAGAIN) goto again; if (err) { @@ -3014,7 +3013,7 @@ int btrfs_search_old_slot(struct btrfs_root *root, struct btrfs_key *key, goto done; } - err = read_block_for_search(NULL, root, p, , level, + err = read_block_for_search(root, p, , level, slot, key, time_seq); if (err == -EAGAIN) goto again; @@ -5784,7 +5783,7 @@ int btrfs_next_old_leaf(struct btrfs_root *root, struct btrfs_path *path, next = c; next_rw_lock = path->locks[level]; - ret = read_block_for_search(NULL, root, path, , level, + ret = read_block_for_search(root, path, , level, slot, , 0); if (ret == -EAGAIN) goto again; @@ -5834,7 +5833,7 @@ int btrfs_next_old_leaf(struct btrfs_root *root, struct btrfs_path *path, if (!level) break; - ret = read_block_for_search(NULL, root, path, , level, + ret = read_block_for_search(root, path, , level, 0, , 0); if (ret == -EAGAIN) goto again; -- 2.5.5 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: add another missing end_page_writeback on submit_extent_page failure
On Fri, Jan 13, 2017 at 03:12:31PM +0900, takafumi-sslab wrote: > Thanks for your replying. > > I understand this bug is more complicated than I expected. > I classify error cases under submit_extent_page() below > > A: ENOMEM error at btrfs_bio_alloc() in submit_extent_page() > I first assumed this case and sent the mail. > When bio_ret is NULL, submit_extent_page() calls btrfs_bio_alloc(). > Then, btrfs_bio_alloc() may fail and submit_extent_page() returns -ENOMEM. > In this case, bio_endio() is not called and the page's writeback bit > remains. > So, there is a need to call end_page_writeback() in the error handling. > > B: errors under submit_one_bio() of submit_extent_page() > Errors that occur under submit_one_bio() handles at bio_endio(), and > bio_endio() would call end_page_writeback(). > > Therefore, as you mentioned in the last mail, simply adding > end_page_writeback() like my last email and commit 55e3bd2e0c2e1 can > conflict in the case of B. > To avoid such conflict, one easy solution is adding PageWriteback() check > too. > > How do you think of this solution? (sorry for the late reply.) I think its caller, "__extent_writepage", has covered the above case by setting page writeback again. Thanks, -liubo > > Sincerely, > > On 2016/12/22 15:20, Liu Bo wrote: > > On Fri, Dec 16, 2016 at 03:41:50PM +0900, Takafumi Kubota wrote: > > > This is actually inspired by Filipe's patch(55e3bd2e0c2e1). > > > > > > When submit_extent_page() in __extent_writepage_io() fails, > > > Btrfs misses clearing a writeback bit of the failed page. > > > This causes the false under-writeback page. > > > Then, another sync task hangs in filemap_fdatawait_range(), > > > because it waits the false under-writeback page. > > > > > > CPU0CPU1 > > > > > > __extent_writepage_io() > > >ret = submit_extent_page() // fail > > > > > >if (ret) > > > SetPageError(page) > > > // miss clearing the writeback bit > > > > > > sync() > > >... > > >filemap_fdatawait_range() > > > wait_on_page_writeback(page); > > > // wait the false under-writeback > > > page > > > > > > Signed-off-by: Takafumi Kubota> > > --- > > > fs/btrfs/extent_io.c | 4 +++- > > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > > > diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c > > > index 1e67723..ef9793b 100644 > > > --- a/fs/btrfs/extent_io.c > > > +++ b/fs/btrfs/extent_io.c > > > @@ -3443,8 +3443,10 @@ static noinline_for_stack int > > > __extent_writepage_io(struct inode *inode, > > >bdev->bio, max_nr, > > >end_bio_extent_writepage, > > >0, 0, 0, false); > > > - if (ret) > > > + if (ret) { > > > SetPageError(page); > > > + end_page_writeback(page); > > > + } > > OK...this could be complex as we don't know which part in > > submit_extent_page gets the error, if the page has been added into bio > > and bio_end would call end_page_writepage(page) as well, so whichever > > comes later, the BUG() in end_page_writeback() would complain. > > > > Looks like commit 55e3bd2e0c2e1 also has the same problem although I > > gave it my reviewed-by. > > > > Thanks, > > > > -liubo > > > > > cur = cur + iosize; > > > pg_offset += iosize; > > > -- > > > 1.9.3 > > > > > > -- > > > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > > > the body of a message to majord...@vger.kernel.org > > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- > Keio University > System Software Laboratory > Takafumi Kubota > takafumi.kubota1...@sslab.ics.keio.jp > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
btrfs recovery
Hi btrfs experts. Hereby I apply for the stupidity of the month award. But, maybe you can help me restoring my dd backup or extracting some files from it? Before switching from Suse 13.2 to 42.2, I copied my / partition with dd to an image file - while the system was online/running. Now, I can't mount the image. I tried many commands (some output is below) that are suggested in the wiki or blog pages without any success. Unfortunately, the promising tool btrfs-find-root seems not to work. I let it run on backup1.dd for 16 hours with the only output being: ./btrfs-find-root /dev/loop0 Couldn't read tree root Superblock thinks the generation is 550114 Superblock thinks the level is 1 I then stopped it manually. (The 60GB dd file is on a ssd and one cpu core was at 100% load all night) I also tried the git clone of btrfs-progs which I checked out (the tagged versions 4.9, 4.7, 4.4, 4.1) and compiled. I always got the btrfs-find-root output as shown above. Could you give me some instructions how to repair the file system or extract some files from it? Thank you, Michael PS: could you please CC me, as I'm not subscribed to the list. Some commands and their output. mount -t btrfs -o recovery,ro /dev/loop0 /mnt/oldroot/ mount: Falscher Dateisystemtyp, ungültige Optionen, der Superblock von /dev/loop0 ist beschädigt, fehlende Kodierungsseite oder ein anderer Fehler dmesg -T says: [Mo Jan 30 01:08:20 2017] BTRFS info (device loop0): enabling auto recovery [Mo Jan 30 01:08:20 2017] BTRFS info (device loop0): disk space caching is enabled [Mo Jan 30 01:08:20 2017] BTRFS error (device loop0): bad tree block start 0 32865271808 [Mo Jan 30 01:08:20 2017] BTRFS: failed to read tree root on loop0 [Mo Jan 30 01:08:20 2017] BTRFS error (device loop0): bad tree block start 0 32865271808 [Mo Jan 30 01:08:20 2017] BTRFS: failed to read tree root on loop0 [Mo Jan 30 01:08:20 2017] BTRFS error (device loop0): bad tree block start 0 32862011392 [Mo Jan 30 01:08:20 2017] BTRFS: failed to read tree root on loop0 [Mo Jan 30 01:08:20 2017] BTRFS error (device loop0): parent transid verify failed on 32869482496 wanted 550112 found 550121 [Mo Jan 30 01:08:20 2017] BTRFS: failed to read tree root on loop0 [Mo Jan 30 01:08:20 2017] BTRFS error (device loop0): bad tree block start 0 32865353728 [Mo Jan 30 01:08:20 2017] BTRFS: failed to read tree root on loop0 [Mo Jan 30 01:08:20 2017] BTRFS: open_ctree failed --- btrfs fi show Label: none uuid: 1c203c00-2768-4ea8-9e00-94aba5825394 Total devices 1 FS bytes used 29.28GiB devid1 size 60.00GiB used 32.07GiB path /dev/sda2 Label: none uuid: 91a79eeb-08e0-470e-beab-916b38e09aca Total devices 1 FS bytes used 44.23GiB devid1 size 60.00GiB used 60.00GiB path /dev/loop0 The 1st one is my now running Suse 42.2 / --- btrfs check /dev/loop0 checksum verify failed on 32865271808 found E4E3BDB6 wanted checksum verify failed on 32865271808 found E4E3BDB6 wanted bytenr mismatch, want=32865271808, have=0 Couldn't read tree root ERROR: cannot open file system --- ./btrfs restore -l /dev/loop0 checksum verify failed on 32865271808 found E4E3BDB6 wanted checksum verify failed on 32865271808 found E4E3BDB6 wanted bytenr mismatch, want=32865271808, have=0 Couldn't read tree root Could not open root, trying backup super checksum verify failed on 32865271808 found E4E3BDB6 wanted checksum verify failed on 32865271808 found E4E3BDB6 wanted bytenr mismatch, want=32865271808, have=0 Couldn't read tree root Could not open root, trying backup super ERROR: superblock bytenr 274877906944 is larger than device size 64428703744 Could not open root, trying backup super --- uname -a Linux linux-azo5 4.4.36-8-default #1 SMP Fri Dec 9 16:18:38 UTC 2016 (3ec5648) x86_64 x86_64 x86_64 GNU/Linux -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: fix -EINVEL in tree log recovering
On Tue, Oct 11, 2016 at 10:01 AM, robbiekowrote: > From: Robbie Ko > > when tree log recovery, space_cache rebuild or dirty maybe save the cache. > and then replay extent with disk_bytenr and disk_num_bytes, > but disk_bytenr and disk_num_bytes maybe had been use for free space inode, > will lead to -EINVEL. -EINVEL -> -EINVAL More importantly, and sorry to say, but I can't parse nor make sense of your change log. It kind of seems you're saying that replaying an extent from the log tree can collide somehow with the space reserved for a free space cache, or the other way around, writing a space cache attempts to use an extent that overlaps an extent that was replayed during log recovery (presumably during the transaction commit done at the end of the log recovery). Now honestly, think of how you would explain the problem in your native tongue. Do you think a single short sentence like that is enough to explain such a non-trivial problem? I doubt it, no matter what language we pick... Or think that in a few months or maybe years (or whatever time frame) even you forgot what was the problem and you're trying to remember the details by reading the change log - do you think this change log would help at all? At least tell us what (function) is returning -EINVAL, make a function call graph, give a sample scenario, or better yet, send a test case (fstests) to reproduce this, since it seems to be a fully deterministic and 100% reproducible case. Thanks. > > BTRFS: error in btrfs_replay_log:2446: errno=-22 unknown (Failed to recover > log tree) > > therefore, we not save cache when tree log recovering. > > Signed-off-by: Robbie Ko > --- > fs/btrfs/extent-tree.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c > index 665da8f..38b932c 100644 > --- a/fs/btrfs/extent-tree.c > +++ b/fs/btrfs/extent-tree.c > @@ -3434,6 +3434,7 @@ again: > > spin_lock(_group->lock); > if (block_group->cached != BTRFS_CACHE_FINISHED || > + block_group->fs_info->log_root_recovering || > !btrfs_test_opt(root->fs_info, SPACE_CACHE)) { > /* > * don't bother trying to write stuff out _if_ > -- > 1.9.1 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Filipe David Manana, "People will forget what you said, people will forget what you did, but people will never forget how you made them feel." -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: fix leak subvol subv_writers conter
On Fri, Oct 7, 2016 at 3:01 AM, robbiekowrote: > From: Robbie Ko > > In run_delalloc_nocow, maybe not release subv_writers conter, > will lead to create snapshot hang. > > Signed-off-by: Robbie Ko I've picked this into my integration branch for 4.11 and rewrote the changelog and subject. Thanks. > --- > fs/btrfs/inode.c | 10 -- > 1 file changed, 8 insertions(+), 2 deletions(-) > > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c > index e6811c4..9722554 100644 > --- a/fs/btrfs/inode.c > +++ b/fs/btrfs/inode.c > @@ -1386,11 +1386,17 @@ next_slot: > * this ensure that csum for a given extent are > * either valid or do not exist. > */ > - if (csum_exist_in_range(root, disk_bytenr, num_bytes)) > + if (csum_exist_in_range(root, disk_bytenr, > num_bytes)) { > + if (!nolock) > + btrfs_end_write_no_snapshoting(root); > goto out_check; > + } > if (!btrfs_inc_nocow_writers(root->fs_info, > -disk_bytenr)) > +disk_bytenr)) { > + if (!nolock) > + btrfs_end_write_no_snapshoting(root); > goto out_check; > + } > nocow = 1; > } else if (extent_type == BTRFS_FILE_EXTENT_INLINE) { > extent_end = found_key.offset + > -- > 1.9.1 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Filipe David Manana, "People will forget what you said, people will forget what you did, but people will never forget how you made them feel." -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: fix leak of subvolume writers counter
From: Robbie KoWhen falling back from a nocow write to a regular cow write, we were leaking the subvolume writers counter in 2 situations, preventing snapshot creation from ever completing in the future, as it waits for that counter to go down to zero before the snapshot creation starts. In run_delalloc_nocow, maybe not release subv_writers conter, will lead to create snapshot hang. Signed-off-by: Robbie Ko Reviewed-by: Filipe Manana [Improved changelog and subject] Signed-off-by: Filipe Manana --- fs/btrfs/inode.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index a713d9d..7221d66 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -1404,10 +1404,16 @@ static noinline int run_delalloc_nocow(struct inode *inode, * either valid or do not exist. */ if (csum_exist_in_range(fs_info, disk_bytenr, - num_bytes)) + num_bytes)) { + if (!nolock) + btrfs_end_write_no_snapshoting(root); goto out_check; - if (!btrfs_inc_nocow_writers(fs_info, disk_bytenr)) + } + if (!btrfs_inc_nocow_writers(fs_info, disk_bytenr)) { + if (!nolock) + btrfs_end_write_no_snapshoting(root); goto out_check; + } nocow = 1; } else if (extent_type == BTRFS_FILE_EXTENT_INLINE) { extent_end = found_key.offset + -- 2.7.0.rc3 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: bulk delete checksum items in the same leaf
From: Filipe MananaVery often we have the checksums for an extent spread in multiple items in the checksums tree, and currently the algorithm to delete them starts by looking for them one by one and then deleting them one by one, which is not optimal since each deletion involves shifting all the other items in the leaf and when the leaf reaches some low threshold, to move items off the leaf into its left and right neighbor leafs. Also, after each item deletion we release our search path and start a new search for other checksums items. So optimize this by deleting in bulk all the items in the same leaf that contain checksums for the extent being freed. Signed-off-by: Filipe Manana --- fs/btrfs/file-item.c | 28 +++- 1 file changed, 27 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c index e97e322..d7d6d4a 100644 --- a/fs/btrfs/file-item.c +++ b/fs/btrfs/file-item.c @@ -643,7 +643,33 @@ int btrfs_del_csums(struct btrfs_trans_handle *trans, /* delete the entire item, it is inside our range */ if (key.offset >= bytenr && csum_end <= end_byte) { - ret = btrfs_del_item(trans, root, path); + int del_nr = 1; + + /* +* Check how many csum items preceding this one in this +* leaf correspond to our range and then delete them all +* at once. +*/ + if (key.offset > bytenr && path->slots[0] > 0) { + int slot = path->slots[0] - 1; + + while (slot >= 0) { + struct btrfs_key pk; + + btrfs_item_key_to_cpu(leaf, , slot); + if (pk.offset < bytenr || + pk.type != BTRFS_EXTENT_CSUM_KEY || + pk.objectid != + BTRFS_EXTENT_CSUM_OBJECTID) + break; + path->slots[0] = slot; + del_nr++; + key.offset = pk.offset; + slot--; + } + } + ret = btrfs_del_items(trans, root, path, + path->slots[0], del_nr); if (ret) goto out; if (key.offset == bytenr) -- 2.7.0.rc3 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 5/6] btrfs-progs: convert: Switch to new rollback function
On Wed, Jan 25, 2017 at 08:42:01AM +0800, Qu Wenruo wrote: > >> So this implies the current implementation is not good enough for review. > > > > I'd say the code hasn't been cleaned up for a long time so it's not good > > enough for adding new features and doing broader fixes. The v2 rework > > has fixed quite an important issue, but for other issues I'd rather get > > smaller patches that eg. prepare the code for the final change. > > Something that I can review without needing to reread the whole convert > > and refresh memories about all details. > > > >> I'll try to extract more more set operation and make the core part more > >> refined, with more ascii art comment for it. > > > > The ascii diagrams help, the overall convert design could be also better > > documented etc. At the moment I'd rather spend some time on cleaning up > > the sources but also don't want to block the fixes you've been sending. > > I need to think about that more. > > Feel free to block the rework. > > I'll start from sending out basic documentations explaining the logic > behind convert/rollback, which should help review. FYI, I've reorganized the convert files a bit, this patchset does not apply anymore, but I'm expecting some more changes to it so please adapt it to the new file structure. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fresh Raid-1 setup, dump-tree shows invalid owner id
> > Yes, the owner is the number of the tree. > > DATA_RELOC_TREE is -9, but then unsigned 64 bits. > -9 + 2**64 > 18446744073709551607L > > So the result is a number that's close to the max or 64 bits. > > You can find those numbers in the kernel source in > include/uapi/linux/btrfs_tree.h > > e.g.: > > #define BTRFS_DATA_RELOC_TREE_OBJECTID -9ULL > Thanks for the details. This owner number looked different from other owner ids, so wanted to check on the same, now understood. Cheers. Lakshmipathi.G -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs recovery
On 2017-01-28 00:00, Duncan wrote: Austin S. Hemmelgarn posted on Fri, 27 Jan 2017 07:58:20 -0500 as excerpted: On 2017-01-27 06:01, Oliver Freyermuth wrote: I'm also running 'memtester 12G' right now, which at least tests 2/3 of the memory. I'll leave that running for a day or so, but of course it will not provide a clear answer... A small update: while the online memtester is without any errors still, I checked old syslogs from the machine and found something intriguing. kernel: Corrupted low memory at 88009000 (9000 phys) = 00098d39 kernel: Corrupted low memory at 88009000 (9000 phys) = 00099795 kernel: Corrupted low memory at 88009000 (9000 phys) = 000dd64e 0x9000 = 36K... This seems to be consistently happening from time to time (I have low memory corruption checking compiled in). The numbers always consistently increase, and after a reboot, start fresh from a small number again. I suppose this is a BIOS bug and it's storing some counter in low memory. I am unsure whether this could have triggered the BTRFS corruption, nor do I know what to do about it (are there kernel quirks for that?). The vendor does not provide any updates, as usual. If someone could confirm whether this might cause corruption for btrfs (and maybe direct me to the correct place to ask for a kernel quirk for this device - do I ask on MM, or somewhere else?), that would be much appreciated. It is a firmware bug, Linux doesn't use stuff in that physical address range at all. I don't think it's likely that this specific bug caused the corruption, but given that the firmware doesn't have it's allocations listed correctly in the e820 table (if they were listed correctly, you wouldn't be seeing this message), it would not surprise me if the firmware was involved somehow. Correct me if I'm wrong (I'm no kernel expert, but I've been building my own kernel for well over a decade now so having a working familiarity with the kernel options, of which the following is my possibly incorrect read), but I believe that's only "fact check: mostly correct" (mostly as in yes it's the default, but there's a mainline kernel option to change it). I was just going over the related kernel options again a couple days ago, so they're fresh in my head, and AFAICT... There are THREE semi-related kernel options (config UI option location is based on the mainline 4.10-rc5+ git kernel I'm presently running): DEFAULT_MMAP_MIN_ADDR Config location: Processor type and features: Low address space to protect from user allocation This one is virtual memory according to config help, so likely not directly related, but similar idea. Yeah, it really only affects userspace. In effect, it's the lowest virtual address that a userspace program can allocate memory at. By default on most systems it only covers the first page (which is to protect against NULL pointer bugs). Most distros set it at 64k to provide a bit of extra protection. There are a handful that set it to 0 so that vm86 stuff works, but the number of such distros is going down over time because vm86 is not a common use case, and this can be configured at runtime through /proc/sys/vm/mmap_min_addr. X86_CHECK_BIOS_CORRUPTION Location: Same section, a few lines below the first one: Check for low memory corruption I guess this is the option you (OF) have enabled. Note that according to help, in addition to enabling this in options, a runtime kernel commandline option must be given as well, to actually enable the checks. There's another option that controls the default (I forget the config option and I'm too lazy right now to check), but he obviously either has that option enabled or has it enabled at run-time, otherwise there wouldn't be any messages in the kernel log about the check failing. FWIW, the reason this defaults to being off is that it runs every 60 seconds, and therefore has a significant impact on power usage on mobile systems. X86_RESERVE_LOW Location: Same section, immediately below the check option: Amount of low memory, in kilobytes, to reserve for the BIOS Help for this one suggests enabling the check bios corruption option above if there are any doubts, so the two are directly related. Yes. This specifies both the kernel equivalent of DEFAULT_MMAP_MIN_ADDR (so the kernel won't use anything with a physical address between 0 and this range), and the upper bound for the corruption check. All three options apparently default to 64K (as that's what I see here and I don't believe I've changed them), but can be changed. See the kernel options help and where it points for more. My read of the above is that yes, by default the kernel won't use physical 0x9000 (36K), as it's well within the 64K default reserve area, but a blanket "Linux doesn't use stuff in that physical address range at all" is incorrect, as if the defaults have been changed it /could/ use that space (#3's minimum is 1 page, 4K, leaving that 36K address
Re: raid1: cannot add disk to replace faulty because can only mount fs as read-only.
On 2017-01-28 04:17, Andrei Borzenkov wrote: 27.01.2017 23:03, Austin S. Hemmelgarn пишет: On 2017-01-27 11:47, Hans Deragon wrote: On 2017-01-24 14:48, Adam Borowski wrote: On Tue, Jan 24, 2017 at 01:57:24PM -0500, Hans Deragon wrote: If I remove 'ro' from the option, I cannot get the filesystem mounted because of the following error: BTRFS: missing devices(1) exceeds the limit(0), writeable mount is not allowed So I am stuck. I can only mount the filesystem as read-only, which prevents me to add a disk. A known problem: you get only one shot at fixing the filesystem, but that's not because of some damage but because the check whether the fs is in a shape is good enough to mount is oversimplistic. Here's a patch, if you apply it and recompile, you'll be able to mount degraded rw. Note that it removes a safety harness: here, the harness got tangled up and keeps you from recovering when it shouldn't, but it _has_ valid uses that. Meow! Greetings, Ok, that solution will solve my problem in the short run, i.e. getting my raid1 up again. However, as a user, I am seeking for an easy, no maintenance raid solution. I wish that if a drive fails, the btrfs filesystem still mounts rw and leaves the OS running, but warns the user of the failing disk and easily allow the addition of a new drive to reintroduce redundancy. Are there any plans within the btrfs community to implement such a feature? In a year from now, when the other drive will fail, will I hit again this problem, i.e. my OS failing to start, booting into a terminal, and cannot reintroduce a new drive without recompiling the kernel? Before I make any suggestions regarding this, I should point out that mounting read-write when a device is missing is what caused this issue in the first place. How do you replace device when filesystem is mounted read-only? I'm saying that the use case you're asking to have supported is the reason stuff like this happens. If you're mounting read-write degraded and fixing the filesystem _immediately_ then it's not an issue, that's exactly what read-write degraded mounts are for. If you're mounting read-write degraded and then having the system run as if nothing was wrong, then I have zero sympathy because that's _dangerous_, even with LVM, MD-RAID, or even hardware RAID (actually, especially with hardware RAID, LVM and MD are smart enough to automatically re-sync, most hardware RAID controllers aren't). That said, as I mentioned further down in my initial reply, you absolutely should be monitoring the filesystem and not letting things get this bad if at all possible. It's actually very rare that a storage device fails catastrophically with no warning (at least, on the scale that most end users are operating). At a minimum, even if you're using ext4 on top of LVM, you should be monitoring SMART attributes on the storage devices (or whatever the SCSI equivalent is if you use SCSI/SAS/FC devices). While not 100% reliable (they are getting better though), they're generally a pretty good way to tell if a disk is likely to fail in the near future. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fresh Raid-1 setup, dump-tree shows invalid owner id
On 01/30/2017 02:54 AM, Lakshmipathi.G wrote: > After creating raid1: > $./mkfs.btrfs -f -d raid1 -m raid1 /dev/sda6 /dev/sda7 > > and using > $./btrfs inspect-internal dump-tree /dev/sda6 #./btrfs-debug-tree /dev/sda6 > > shows possible wrong value for 'owner'? > -- > checksum tree key (CSUM_TREE ROOT_ITEM 0) > leaf 29425664 items 0 free space 16283 generation 4 owner 7 > fs uuid 94fee00b-00aa-4d69-b947-347f743117f2 > chunk uuid 6477561c-cbca-45e4-980d-56727a8dc9d9 > data reloc tree key (DATA_RELOC_TREE ROOT_ITEM 0) > leaf 29442048 items 2 free space 16061 generation 4 owner > 18446744073709551607 <<< owner id? > fs uuid 94fee00b-00aa-4d69-b947-347f743117f2 > chunk uuid 6477561c-cbca-45e4-980d-56727a8dc9d9 > -- > > or is that expected output? Yes, the owner is the number of the tree. DATA_RELOC_TREE is -9, but then unsigned 64 bits. >>> -9 + 2**64 18446744073709551607L So the result is a number that's close to the max or 64 bits. You can find those numbers in the kernel source in include/uapi/linux/btrfs_tree.h e.g.: #define BTRFS_DATA_RELOC_TREE_OBJECTID -9ULL -- Hans van Kranenburg -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fresh Raid-1 setup, dump-tree shows invalid owner id
Raid1 is irrelevant, looks like this happen with simple case too. $./mkfs.btrfs tests/test.img $./btrfs-debug-tree tests/test.img possible issue with ./btrfs-debug-tree stdout? On Mon, Jan 30, 2017 at 7:24 AM, Lakshmipathi.Gwrote: > After creating raid1: > $./mkfs.btrfs -f -d raid1 -m raid1 /dev/sda6 /dev/sda7 > > and using > $./btrfs inspect-internal dump-tree /dev/sda6 #./btrfs-debug-tree /dev/sda6 > > shows possible wrong value for 'owner'? > -- > checksum tree key (CSUM_TREE ROOT_ITEM 0) > leaf 29425664 items 0 free space 16283 generation 4 owner 7 > fs uuid 94fee00b-00aa-4d69-b947-347f743117f2 > chunk uuid 6477561c-cbca-45e4-980d-56727a8dc9d9 > data reloc tree key (DATA_RELOC_TREE ROOT_ITEM 0) > leaf 29442048 items 2 free space 16061 generation 4 owner > 18446744073709551607 <<< owner id? > fs uuid 94fee00b-00aa-4d69-b947-347f743117f2 > chunk uuid 6477561c-cbca-45e4-980d-56727a8dc9d9 > -- > > or is that expected output? > > Cheers. > Lakshmipathi.G > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 8/8] Revert "ext4: fix wrong gfp type under transaction"
On Fri 27-01-17 11:40:42, Theodore Ts'o wrote: > On Fri, Jan 27, 2017 at 10:37:35AM +0100, Michal Hocko wrote: > > If this ever turn out to be a problem and with the vmapped stacks we > > have good chances to get a proper stack traces on a potential overflow > > we can add the scope API around the problematic code path with the > > explanation why it is needed. > > Yeah, or maybe we can automate it? Can the reclaim code check how > much stack space is left and do the right thing automatically? I am not sure how to do that. Checking for some magic value sounds quite fragile to me. It also sounds a bit strange to focus only on the reclaim while other code paths might suffer from the same problem. What is actually the deepest possible call chain from the slab reclaim where I stopped? I have tried to follow that path but hit the callback wall quite early. > The reason why I'm nervous is that nojournal mode is not a common > configuration, and "wait until production systems start failing" is > not a strategy that I or many SRE-types find comforting. I understand that but I would be much more happier if we did the decision based on the actual data rather than a fear something would break down. -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html