Re: 3.15.1: kernel BUG at fs/btrfs/locking.c:269
Marc MERLIN posted on Wed, 02 Jul 2014 13:41:52 -0700 as excerpted: This got triggered by an rsync I think. I'm not sure which of my btrfs FS has the issue yet since BUG_ON isn't very helpful as discussed earlier. [160562.925463] parent transid verify failed on 2776298520576 wanted 41015 found 18120 [160562.950297] [ cut here ] [160562.965904] kernel BUG at fs/btrfs/locking.c:269! But shouldn't messages like 'parent transid verify failed' print which device this happened on to give the operator a hint on where the problem is? Could someone do a pass at those and make sure they all print the device ID/name? kernel 3.16 series here, rc2+ when this happened, rc3+ now. IOW it's 3.15 series (Marc) and 3.16 series (me) both known affected. FWIW, I'm not sure what originally triggered it, but I recently had a couple bad shutdowns -- systemd would say it was turning off the system but it wouldn't shut off, and on manual shutoff and later reboot, the read-write-mounted btrfs (two separate filesystems, home and log, root, including the rest of the system, is read-only by default, here) failed to mount. The mount failures triggered on the above parent transid failed errors with kernel BUG at fs/btrs/locking.c -- I /believe/ 269, but I couldn't swear to it. The biggest difference here (other than the fact that it happened on mount of critical filesystems at boot, so probably double-digit seconds in since boot at most) was that the parent transid numbers were only a few off, something like wanted , found -2. [End of technical stuff. The rest is discussion of my recovery experience, both because it might be of help to others and because it lets me tell my experience. =:^) ] I have backups but they weren't as current as I would have liked, so I decided to try recovery. My rootfs is btrfs as well, a /separate/ btrfs, but it remains mounted read-only by default and is only mounted read-write for updates, so wasn't damaged. That includes the bits of /var that I can get away with being read-only, with various /var/lib/* subdirs being symlinks to /home/var/lib/* subdirs, where they need to be writable, and of course /var/log being a separate dedicated filesystem -- one of the two that was damaged, and the usual /var/run and /var/lock symlinks to /run and /run/lock (with /run being a tmpfs mount on standard systemd configurations). As a result, the rootfs mounted from the initramfs and systemd on it was invoked as the new PID 1 init in the transfer from initramfs. Systemd was in turn able to start early boot services and anything that didn't have a dependency on home (including the bits of / var/lib symlinked into home) or log being mounted. But that of course left a number of critical services failing due to dependency on the home and/or log mounts, since those mounts were failing. Fortunately, while some of the time the errors would trigger a full kernel lockup with the above parent transid and locking BUG, other times the mount attempt would simply error out, and systemd would drop me to the emergency-mode root-login prompt. (If it hadn't, I'd have had to switch to booting the backup.) Since the main rootfs including /usr, /etc and much of /var was already mounted and safely read-only so I wasn't too worried about damaging it, that left me with only a partly working system, but access to all the normal recovery tools, manpages, etc, I'd normally have. The only big thing (other than X/kde, of course) initially was network access, due to dependencies on the unmountable filesystems for local DNS and I think iptables logging. I could have reconfigured that if I had to, but after I got log back up, I found I had network access (presumably with fallback to the ISP DNS), and was able to get to the wiki to research recovery of home a bit further. I decided to tackle log (/var/log) before home since it was smaller and I figured I could use anything I learned in that process to help me save more of home. My policy is no backup log partition, since I don't do backups regularly enough for the logs thereon to be of much likely usefulness. That left me trying to repair or recover what I could and then doing a mkfs.btrfs on it. The various repair options I tried didn't help -- they either died without helpful output or triggered the same lockup. Mount with the recovery or recovery,ro wouldn't work, and neither would btrfs check. Btrfs rescue didn't look useful, as I couldn't find useful documentation on chunk-recover and the supers looked fine (btrfs-show-super) so super- recover was unnecessary. I tried btrfs-zero-log on the log partition, but it didn't make the problem better and might have made it worse, so didn't try it on home. That left btrfs restore. I used it on log without really understanding what I was doing, and lost an entire directory worth of logs. =:^( Fortunately, I was able to learn a bit,
Re: [RFC PATCH] Revert btrfs: allow mounting btrfs subvolumes with different ro/rw options
[List CCd. I hate Gmail.] Noob alert. On 3 July 2014 02:28, Qu Wenruo quwen...@cn.fujitsu.com wrote: Subject: Re: [RFC PATCH] Revert btrfs: allow mounting btrfs subvolumes w= ith different ro/rw options From: Goffredo Baroncelli kreij...@libero.it To: Qu Wenruo quwen...@cn.fujitsu.com, linux-btrfs@vger.kernel.org Date: 2014=E5=B9=B407=E6=9C=8803=E6=97=A5 01:48 On 07/01/2014 11:30 AM, Qu Wenruo wrote: This commit has the following problem: 1) Break the ro mount rule. When users mount the whole btrfs ro, it is still possible to mount subvol rw and change the contents. Which make the whole fs ro mount non-sense. Where is the problem ? I see an use case when I want a conservative default: mount all ro except some subvolumes. In any case it is not a security problem because if the user has the capability to mount a subvolume, also he has the capability to remount,r= w the whole filesystem. Not security problem but behavior not consistent. If user mount the whole disk ro, he or she want the fs read only and noth= ing will change in it. If you mount a subvol rw, then the whole disk ro expectation is broken. Things will change even the whole disk is readonly. This assumption seems wrong and untenable if considered from a different angle: one doesn't mount the whole disk ro, merely the default subvolume. # mount -o ro /dev/sda1 /mnt is merely convenient short-hand for # mount -o ro,subvol=3D@ [or whatever] /dev/sda1 /mnt and anyone who expects this to magically protect the whole disk is, frankly, confused. Substituting partitions for subvolumes: mounting /dev/sda2 read-only should have no effect on /dev/sda3. Even if you went a bit batty and decided to make /dev/sda2 the default partition: # ln -sf /dev/sda2 /dev/sda # mount -o ro /dev/sda /mnt/this/is/silly syntactic sugar doesn't change anything. Subvolumes are logically discrete entities, the fact that they share trees on-disk is merely a (very nice) implementation detail. It is impossible to mount a whole disk under btrfs. Tobias The problem also happens when a parent subvol is mounted rw but child sub= vol is mounted ro. User can still modify the child subvol through parent subvol, still broke the readonly rule. This makes sense, though. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 3.15.1: kernel BUG at fs/btrfs/locking.c:269
On Wed, Jul 02, 2014 at 01:41:52PM -0700, Marc MERLIN wrote: This got triggered by an rsync I think. I'm not sure which of my btrfs FS has the issue yet since BUG_ON isn't very helpful as discussed earlier. [160562.925463] parent transid verify failed on 2776298520576 wanted 41015 found 18120 [160562.950297] [ cut here ] [160562.965904] kernel BUG at fs/btrfs/locking.c:269! But shouldn't messages like 'parent transid verify failed' print which device this happened on to give the operator a hint on where the problem is? Could someone do a pass at those and make sure they all print the device ID/name? Bug below: Full log before the crash: INFO: task btrfs-transacti:3358 blocked for more than 120 seconds. Not tainted 3.15.1-amd64-i915-preempt-20140216jbp #1 echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. btrfs-transacti D 0 3358 2 0x 8800c50ebc50 0046 8800c50ebc20 8800c50ebfd8 8800c6914390 000141c0 7fff 8801433b8f10 0002 8161c9b0 7fff 8800c50ebc60 Call Trace: [8161c9b0] ? sock_rps_reset_flow+0x32/0x32 [8161d3c6] schedule+0x73/0x75 [8161c9e9] schedule_timeout+0x39/0x129 [8107653d] ? get_parent_ip+0xd/0x3c [8162338f] ? preempt_count_add+0x7a/0x8d [8161dbac] __wait_for_common+0x11a/0x159 [8107810f] ? wake_up_state+0x12/0x12 [8161dc0f] wait_for_completion+0x24/0x26 [81237ce6] btrfs_wait_and_free_delalloc_work+0x16/0x28 [8123fd3a] btrfs_run_ordered_operations+0x1e7/0x21e [81229aa4] btrfs_flush_all_pending_stuffs+0x4e/0x55 [8122b25a] btrfs_commit_transaction+0x20d/0x8b0 [81227b41] transaction_kthread+0xf8/0x1ab [81227a49] ? btrfs_cleanup_transaction+0x44c/0x44c [8106b4b4] kthread+0xae/0xb6 [8106b406] ? __kthread_parkme+0x61/0x61 [8162667c] ret_from_fork+0x7c/0xb0 [8106b406] ? __kthread_parkme+0x61/0x61 INFO: task kworker/u8:13:13157 blocked for more than 120 seconds. Not tainted 3.15.1-amd64-i915-preempt-20140216jbp #1 echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. kworker/u8:13 D 0 13157 2 0x0080 Workqueue: btrfs-flush_delalloc normal_work_helper 8800041cfc00 0046 8800041cfbd0 8800041cffd8 8800034f40d0 000141c0 88021f2941c0 8800034f40d0 8800041cfca0 810fdc2f 0002 8800041cfc10 Call Trace: [810fdc2f] ? wait_on_page_read+0x3c/0x3c [8161d3c6] schedule+0x73/0x75 [8161d56b] io_schedule+0x60/0x7a [810fdc3d] sleep_on_page+0xe/0x12 [8161d7fd] __wait_on_bit+0x48/0x7a [810fdbdd] wait_on_page_bit+0x7a/0x7c [81084821] ? autoremove_wake_function+0x34/0x34 [810fef03] filemap_fdatawait_range+0x7e/0x126 [8122f1cf] ? btrfs_submit_direct+0x3f4/0x3f4 [8122d7aa] ? btrfs_writepages+0x28/0x2a [811084c6] ? do_writepages+0x1e/0x2c [810ff38e] ? __filemap_fdatawrite_range+0x55/0x57 [8124006f] btrfs_wait_ordered_range+0x6a/0x11a [8122fe01] btrfs_run_delalloc_work+0x27/0x69 [812508db] normal_work_helper+0xfe/0x240 [81065d7e] process_one_work+0x195/0x2d2 [81066020] worker_thread+0x136/0x205 [81065eea] ? process_scheduled_works+0x2f/0x2f [8106b4b4] kthread+0xae/0xb6 [8106b406] ? __kthread_parkme+0x61/0x61 [8162667c] ret_from_fork+0x7c/0xb0 [8106b406] ? __kthread_parkme+0x61/0x61 INFO: task btrfs-transacti:3358 blocked for more than 120 seconds. Not tainted 3.15.1-amd64-i915-preempt-20140216jbp #1 echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. btrfs-transacti D 0 3358 2 0x 8800c50ebc50 0046 8800c50ebc20 8800c50ebfd8 8800c6914390 000141c0 7fff 8801433b8f10 0002 8161c9b0 7fff 8800c50ebc60 Call Trace: [8161c9b0] ? sock_rps_reset_flow+0x32/0x32 [8161d3c6] schedule+0x73/0x75 [8161c9e9] schedule_timeout+0x39/0x129 [8107653d] ? get_parent_ip+0xd/0x3c [8162338f] ? preempt_count_add+0x7a/0x8d [8161dbac] __wait_for_common+0x11a/0x159 [8107810f] ? wake_up_state+0x12/0x12 [8161dc0f] wait_for_completion+0x24/0x26 [81237ce6] btrfs_wait_and_free_delalloc_work+0x16/0x28 [8123fd3a] btrfs_run_ordered_operations+0x1e7/0x21e [81229aa4] btrfs_flush_all_pending_stuffs+0x4e/0x55 [8122b25a] btrfs_commit_transaction+0x20d/0x8b0 [81227b41] transaction_kthread+0xf8/0x1ab [81227a49] ? btrfs_cleanup_transaction+0x44c/0x44c
Re: 3.15.1: kernel BUG at fs/btrfs/locking.c:269
On 07/03/2014 04:13 PM, Liu Bo wrote: On Wed, Jul 02, 2014 at 01:41:52PM -0700, Marc MERLIN wrote: This got triggered by an rsync I think. I'm not sure which of my btrfs FS has the issue yet since BUG_ON isn't very helpful as discussed earlier. [160562.925463] parent transid verify failed on 2776298520576 wanted 41015 found 18120 [160562.950297] [ cut here ] [160562.965904] kernel BUG at fs/btrfs/locking.c:269! But shouldn't messages like 'parent transid verify failed' print which device this happened on to give the operator a hint on where the problem is? Could someone do a pass at those and make sure they all print the device ID/name? Bug below: Full log before the crash: INFO: task btrfs-transacti:3358 blocked for more than 120 seconds. Not tainted 3.15.1-amd64-i915-preempt-20140216jbp #1 echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. btrfs-transacti D 0 3358 2 0x 8800c50ebc50 0046 8800c50ebc20 8800c50ebfd8 8800c6914390 000141c0 7fff 8801433b8f10 0002 8161c9b0 7fff 8800c50ebc60 Call Trace: [8161c9b0] ? sock_rps_reset_flow+0x32/0x32 [8161d3c6] schedule+0x73/0x75 [8161c9e9] schedule_timeout+0x39/0x129 [8107653d] ? get_parent_ip+0xd/0x3c [8162338f] ? preempt_count_add+0x7a/0x8d [8161dbac] __wait_for_common+0x11a/0x159 [8107810f] ? wake_up_state+0x12/0x12 [8161dc0f] wait_for_completion+0x24/0x26 [81237ce6] btrfs_wait_and_free_delalloc_work+0x16/0x28 [8123fd3a] btrfs_run_ordered_operations+0x1e7/0x21e [81229aa4] btrfs_flush_all_pending_stuffs+0x4e/0x55 [8122b25a] btrfs_commit_transaction+0x20d/0x8b0 [81227b41] transaction_kthread+0xf8/0x1ab [81227a49] ? btrfs_cleanup_transaction+0x44c/0x44c [8106b4b4] kthread+0xae/0xb6 [8106b406] ? __kthread_parkme+0x61/0x61 [8162667c] ret_from_fork+0x7c/0xb0 [8106b406] ? __kthread_parkme+0x61/0x61 INFO: task kworker/u8:13:13157 blocked for more than 120 seconds. Not tainted 3.15.1-amd64-i915-preempt-20140216jbp #1 echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. kworker/u8:13 D 0 13157 2 0x0080 Workqueue: btrfs-flush_delalloc normal_work_helper 8800041cfc00 0046 8800041cfbd0 8800041cffd8 8800034f40d0 000141c0 88021f2941c0 8800034f40d0 8800041cfca0 810fdc2f 0002 8800041cfc10 Call Trace: [810fdc2f] ? wait_on_page_read+0x3c/0x3c [8161d3c6] schedule+0x73/0x75 [8161d56b] io_schedule+0x60/0x7a [810fdc3d] sleep_on_page+0xe/0x12 [8161d7fd] __wait_on_bit+0x48/0x7a [810fdbdd] wait_on_page_bit+0x7a/0x7c [81084821] ? autoremove_wake_function+0x34/0x34 [810fef03] filemap_fdatawait_range+0x7e/0x126 [8122f1cf] ? btrfs_submit_direct+0x3f4/0x3f4 [8122d7aa] ? btrfs_writepages+0x28/0x2a [811084c6] ? do_writepages+0x1e/0x2c [810ff38e] ? __filemap_fdatawrite_range+0x55/0x57 [8124006f] btrfs_wait_ordered_range+0x6a/0x11a [8122fe01] btrfs_run_delalloc_work+0x27/0x69 [812508db] normal_work_helper+0xfe/0x240 [81065d7e] process_one_work+0x195/0x2d2 [81066020] worker_thread+0x136/0x205 [81065eea] ? process_scheduled_works+0x2f/0x2f [8106b4b4] kthread+0xae/0xb6 [8106b406] ? __kthread_parkme+0x61/0x61 [8162667c] ret_from_fork+0x7c/0xb0 [8106b406] ? __kthread_parkme+0x61/0x61 INFO: task btrfs-transacti:3358 blocked for more than 120 seconds. Not tainted 3.15.1-amd64-i915-preempt-20140216jbp #1 echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. btrfs-transacti D 0 3358 2 0x 8800c50ebc50 0046 8800c50ebc20 8800c50ebfd8 8800c6914390 000141c0 7fff 8801433b8f10 0002 8161c9b0 7fff 8800c50ebc60 Call Trace: [8161c9b0] ? sock_rps_reset_flow+0x32/0x32 [8161d3c6] schedule+0x73/0x75 [8161c9e9] schedule_timeout+0x39/0x129 [8107653d] ? get_parent_ip+0xd/0x3c [8162338f] ? preempt_count_add+0x7a/0x8d [8161dbac] __wait_for_common+0x11a/0x159 [8107810f] ? wake_up_state+0x12/0x12 [8161dc0f] wait_for_completion+0x24/0x26 [81237ce6] btrfs_wait_and_free_delalloc_work+0x16/0x28 [8123fd3a] btrfs_run_ordered_operations+0x1e7/0x21e [81229aa4] btrfs_flush_all_pending_stuffs+0x4e/0x55 [8122b25a] btrfs_commit_transaction+0x20d/0x8b0 [81227b41] transaction_kthread+0xf8/0x1ab [81227a49] ? btrfs_cleanup_transaction+0x44c/0x44c
Re: [PATCH v2] Btrfs: fix crash when starting transaction
On Tue, 24 Jun 2014 17:46:58 +0100, Filipe David Borba Manana wrote: Often when starting a transaction we commit the currently running transaction, which can end up writing block group caches when the current process has its journal_info set to NULL (and not to a transaction). This makes our assertion at btrfs_check_data_free_space() (current_journal != NULL) fail, resulting in a crash/hang. Therefore fix it by setting journal_info. Two different traces of this issue follow below. 1) [51502.241936] BTRFS: assertion failed: current-journal_info, file: fs/btrfs/extent-tree.c, line: 3670 [51502.242213] [ cut here ] [51502.242493] kernel BUG at fs/btrfs/ctree.h:3964! [51502.242669] invalid opcode: [#1] SMP DEBUG_PAGEALLOC (...) [51502.244010] Call Trace: [51502.244010] [a02bc025] btrfs_check_data_free_space+0x395/0x3a0 [btrfs] [51502.244010] [a02c3bdc] btrfs_write_dirty_block_groups+0x4ac/0x640 [btrfs] [51502.244010] [a0357a6a] commit_cowonly_roots+0x164/0x226 [btrfs] [51502.244010] [a02d53cd] btrfs_commit_transaction+0x4ed/0xab0 [btrfs] [51502.244010] [8168ec7b] ? _raw_spin_unlock+0x2b/0x40 [51502.244010] [a02d6259] start_transaction+0x459/0x620 [btrfs] [51502.244010] [a02d67ab] btrfs_start_transaction+0x1b/0x20 [btrfs] [51502.244010] [a02d73e1] __unlink_start_trans+0x31/0xe0 [btrfs] [51502.244010] [a02dea67] btrfs_unlink+0x37/0xc0 [btrfs] [51502.244010] [811bb054] ? do_unlinkat+0x114/0x2a0 [51502.244010] [811baebc] vfs_unlink+0xcc/0x150 [51502.244010] [811bb1a0] do_unlinkat+0x260/0x2a0 [51502.244010] [811a9ef4] ? filp_close+0x64/0x90 [51502.244010] [810aaea6] ? trace_hardirqs_on_caller+0x16/0x1e0 [51502.244010] [81349cab] ? trace_hardirqs_on_thunk+0x3a/0x3f [51502.244010] [811be9eb] SyS_unlinkat+0x1b/0x40 [51502.244010] [81698452] system_call_fastpath+0x16/0x1b [51502.244010] Code: 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55 89 f1 48 c7 c2 71 13 36 a0 48 89 fe 31 c0 48 c7 c7 b8 43 36 a0 48 89 e5 e8 5d b0 32 e1 0f 0b 0f 1f 44 00 00 55 b9 11 00 00 00 48 89 e5 41 55 49 89 f5 [51502.244010] RIP [a03575da] assfail.constprop.88+0x1e/0x20 [btrfs] 2) [25405.097230] BTRFS: assertion failed: current-journal_info, file: fs/btrfs/extent-tree.c, line: 3670 [25405.097488] [ cut here ] [25405.097767] kernel BUG at fs/btrfs/ctree.h:3964! [25405.097940] invalid opcode: [#1] SMP DEBUG_PAGEALLOC (...) [25405.18] Call Trace: [25405.18] [a02bc025] btrfs_check_data_free_space+0x395/0x3a0 [btrfs] [25405.18] [a02c3bdc] btrfs_write_dirty_block_groups+0x4ac/0x640 [btrfs] [25405.18] [a035755a] commit_cowonly_roots+0x164/0x226 [btrfs] [25405.18] [a02d53cd] btrfs_commit_transaction+0x4ed/0xab0 [btrfs] [25405.18] [8109c170] ? bit_waitqueue+0xc0/0xc0 [25405.18] [a02d6259] start_transaction+0x459/0x620 [btrfs] [25405.18] [a02d67ab] btrfs_start_transaction+0x1b/0x20 [btrfs] [25405.18] [a02e3407] btrfs_create+0x47/0x210 [btrfs] [25405.18] [a02d74cc] ? btrfs_permission+0x3c/0x80 [btrfs] [25405.18] [811bc63b] vfs_create+0x9b/0x130 [25405.18] [811bcf19] do_last+0x849/0xe20 [25405.18] [811b9409] ? link_path_walk+0x79/0x820 [25405.18] [811bd5b5] path_openat+0xc5/0x690 [25405.18] [810ab07d] ? trace_hardirqs_on+0xd/0x10 [25405.18] [811cdcd2] ? __alloc_fd+0x32/0x1d0 [25405.18] [811be2a3] do_filp_open+0x43/0xa0 [25405.18] [811cddf1] ? __alloc_fd+0x151/0x1d0 [25405.18] [811abcfc] do_sys_open+0x13c/0x230 [25405.18] [810aaea6] ? trace_hardirqs_on_caller+0x16/0x1e0 [25405.18] [811abe12] SyS_open+0x22/0x30 [25405.18] [81698452] system_call_fastpath+0x16/0x1b [25405.18] Code: 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55 89 f1 48 c7 c2 51 13 36 a0 48 89 fe 31 c0 48 c7 c7 d0 43 36 a0 48 89 e5 e8 6d b5 32 e1 0f 0b 0f 1f 44 00 00 55 b9 11 00 00 00 48 89 e5 41 55 49 89 f5 [25405.18] RIP [a03570ca] assfail.constprop.88+0x1e/0x20 [btrfs] Signed-off-by: Filipe David Borba Manana fdman...@gmail.com --- V2: Removed test for current-journal_info == NULL. At this point it's always expected to be NULL. Reviewed-by: Miao Xie mi...@cn.fujitsu.com fs/btrfs/transaction.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index ac984a3..614eac3 100644 --- a/fs/btrfs/transaction.c
Re: [RFC PATCH] Revert btrfs: allow mounting btrfs subvolumes with different ro/rw options
Original Message Subject: Re: [RFC PATCH] Revert btrfs: allow mounting btrfs subvolumes with different ro/rw options From: Tobias Geerinckx-Rice tobias.geerinckx.r...@gmail.com To: Qu Wenruo quwen...@cn.fujitsu.com Date: 2014年07月03日 16:06 [List CCd. I hate Gmail.] Noob alert. On 3 July 2014 02:28, Qu Wenruo quwen...@cn.fujitsu.com wrote: Subject: Re: [RFC PATCH] Revert btrfs: allow mounting btrfs subvolumes w= ith different ro/rw options From: Goffredo Baroncelli kreij...@libero.it To: Qu Wenruo quwen...@cn.fujitsu.com, linux-btrfs@vger.kernel.org Date: 2014=E5=B9=B407=E6=9C=8803=E6=97=A5 01:48 On 07/01/2014 11:30 AM, Qu Wenruo wrote: This commit has the following problem: 1) Break the ro mount rule. When users mount the whole btrfs ro, it is still possible to mount subvol rw and change the contents. Which make the whole fs ro mount non-sense. Where is the problem ? I see an use case when I want a conservative default: mount all ro except some subvolumes. In any case it is not a security problem because if the user has the capability to mount a subvolume, also he has the capability to remount,r= w the whole filesystem. Not security problem but behavior not consistent. If user mount the whole disk ro, he or she want the fs read only and noth= ing will change in it. If you mount a subvol rw, then the whole disk ro expectation is broken. Things will change even the whole disk is readonly. This assumption seems wrong and untenable if considered from a different angle: one doesn't mount the whole disk ro, merely the default subvolume. # mount -o ro /dev/sda1 /mnt is merely convenient short-hand for # mount -o ro,subvol=3D@ [or whatever] /dev/sda1 /mnt and anyone who expects this to magically protect the whole disk is, frankly, confused. Substituting partitions for subvolumes: mounting /dev/sda2 read-only should have no effect on /dev/sda3. Even if you went a bit batty and decided to make /dev/sda2 the default partition: # ln -sf /dev/sda2 /dev/sda # mount -o ro /dev/sda /mnt/this/is/silly syntactic sugar doesn't change anything. Subvolumes are logically discrete entities, the fact that they share trees on-disk is merely a (very nice) implementation detail. It is impossible to mount a whole disk under btrfs. Oh, sorry for my confusing words. To make it clear, when mentioning 'the whole disk(or partition whatever)' I mean the FS_TREE. (Of course not the default subvolume) The problem is that, even you mount a subvolume ro, you can still change contents in the subvolume through its rw parent subvolume. And if a subvolume can still be modified, the ro mount lose it meaning. So we need special rules to prevent such things. Thanks, Qu Tobias The problem also happens when a parent subvol is mounted rw but child sub= vol is mounted ro. User can still modify the child subvol through parent subvol, still broke the readonly rule. This makes sense, though. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 3.15.1: kernel BUG at fs/btrfs/locking.c:269
On Thu, Jul 03, 2014 at 04:20:47PM +0800, Wang Shilong wrote: On 07/03/2014 04:13 PM, Liu Bo wrote: On Wed, Jul 02, 2014 at 01:41:52PM -0700, Marc MERLIN wrote: This got triggered by an rsync I think. I'm not sure which of my btrfs FS has the issue yet since BUG_ON isn't very helpful as discussed earlier. [160562.925463] parent transid verify failed on 2776298520576 wanted 41015 found 18120 [160562.950297] [ cut here ] [160562.965904] kernel BUG at fs/btrfs/locking.c:269! But shouldn't messages like 'parent transid verify failed' print which device this happened on to give the operator a hint on where the problem is? Could someone do a pass at those and make sure they all print the device ID/name? Bug below: Full log before the crash: INFO: task btrfs-transacti:3358 blocked for more than 120 seconds. Not tainted 3.15.1-amd64-i915-preempt-20140216jbp #1 echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. btrfs-transacti D 0 3358 2 0x 8800c50ebc50 0046 8800c50ebc20 8800c50ebfd8 8800c6914390 000141c0 7fff 8801433b8f10 0002 8161c9b0 7fff 8800c50ebc60 Call Trace: [8161c9b0] ? sock_rps_reset_flow+0x32/0x32 [8161d3c6] schedule+0x73/0x75 [8161c9e9] schedule_timeout+0x39/0x129 [8107653d] ? get_parent_ip+0xd/0x3c [8162338f] ? preempt_count_add+0x7a/0x8d [8161dbac] __wait_for_common+0x11a/0x159 [8107810f] ? wake_up_state+0x12/0x12 [8161dc0f] wait_for_completion+0x24/0x26 [81237ce6] btrfs_wait_and_free_delalloc_work+0x16/0x28 [8123fd3a] btrfs_run_ordered_operations+0x1e7/0x21e [81229aa4] btrfs_flush_all_pending_stuffs+0x4e/0x55 [8122b25a] btrfs_commit_transaction+0x20d/0x8b0 [81227b41] transaction_kthread+0xf8/0x1ab [81227a49] ? btrfs_cleanup_transaction+0x44c/0x44c [8106b4b4] kthread+0xae/0xb6 [8106b406] ? __kthread_parkme+0x61/0x61 [8162667c] ret_from_fork+0x7c/0xb0 [8106b406] ? __kthread_parkme+0x61/0x61 INFO: task kworker/u8:13:13157 blocked for more than 120 seconds. Not tainted 3.15.1-amd64-i915-preempt-20140216jbp #1 echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. kworker/u8:13 D 0 13157 2 0x0080 Workqueue: btrfs-flush_delalloc normal_work_helper 8800041cfc00 0046 8800041cfbd0 8800041cffd8 8800034f40d0 000141c0 88021f2941c0 8800034f40d0 8800041cfca0 810fdc2f 0002 8800041cfc10 Call Trace: [810fdc2f] ? wait_on_page_read+0x3c/0x3c [8161d3c6] schedule+0x73/0x75 [8161d56b] io_schedule+0x60/0x7a [810fdc3d] sleep_on_page+0xe/0x12 [8161d7fd] __wait_on_bit+0x48/0x7a [810fdbdd] wait_on_page_bit+0x7a/0x7c [81084821] ? autoremove_wake_function+0x34/0x34 [810fef03] filemap_fdatawait_range+0x7e/0x126 [8122f1cf] ? btrfs_submit_direct+0x3f4/0x3f4 [8122d7aa] ? btrfs_writepages+0x28/0x2a [811084c6] ? do_writepages+0x1e/0x2c [810ff38e] ? __filemap_fdatawrite_range+0x55/0x57 [8124006f] btrfs_wait_ordered_range+0x6a/0x11a [8122fe01] btrfs_run_delalloc_work+0x27/0x69 [812508db] normal_work_helper+0xfe/0x240 [81065d7e] process_one_work+0x195/0x2d2 [81066020] worker_thread+0x136/0x205 [81065eea] ? process_scheduled_works+0x2f/0x2f [8106b4b4] kthread+0xae/0xb6 [8106b406] ? __kthread_parkme+0x61/0x61 [8162667c] ret_from_fork+0x7c/0xb0 [8106b406] ? __kthread_parkme+0x61/0x61 INFO: task btrfs-transacti:3358 blocked for more than 120 seconds. Not tainted 3.15.1-amd64-i915-preempt-20140216jbp #1 echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. btrfs-transacti D 0 3358 2 0x 8800c50ebc50 0046 8800c50ebc20 8800c50ebfd8 8800c6914390 000141c0 7fff 8801433b8f10 0002 8161c9b0 7fff 8800c50ebc60 Call Trace: [8161c9b0] ? sock_rps_reset_flow+0x32/0x32 [8161d3c6] schedule+0x73/0x75 [8161c9e9] schedule_timeout+0x39/0x129 [8107653d] ? get_parent_ip+0xd/0x3c [8162338f] ? preempt_count_add+0x7a/0x8d [8161dbac] __wait_for_common+0x11a/0x159 [8107810f] ? wake_up_state+0x12/0x12 [8161dc0f] wait_for_completion+0x24/0x26 [81237ce6] btrfs_wait_and_free_delalloc_work+0x16/0x28 [8123fd3a] btrfs_run_ordered_operations+0x1e7/0x21e [81229aa4] btrfs_flush_all_pending_stuffs+0x4e/0x55
Re: [PATCH] btrfs: only unlock block in verify_parent_transid if we locked it
On Wed, Jun 25, 2014 at 01:45:41PM -0700, Josef Bacik wrote: This is a regression from my patch a26e8c9f75b0bfd89e8f110737b136eb5994, we need to only unlock the block if we were the one who locked it. Otherwise this will trip BUG_ON()'s in locking.c Thanks, Reviewed-by: Liu Bo bo.li@oracle.com -liubo cc: sta...@vger.kernel.org Signed-off-by: Josef Bacik jba...@fb.com --- fs/btrfs/disk-io.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 8bb4aa1..f00165d 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -369,7 +369,8 @@ static int verify_parent_transid(struct extent_io_tree *io_tree, out: unlock_extent_cached(io_tree, eb-start, eb-start + eb-len - 1, cached_state, GFP_NOFS); - btrfs_tree_read_unlock_blocking(eb); + if (need_lock) + btrfs_tree_read_unlock_blocking(eb); return ret; } -- 2.0.0 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/4] Add superblock checksum check for btrfs-progs
Before this patchset, btrfs-progs will overall ignore the superblock checksum and continue the routine. Sometimes this may cause disasters like checking a btrfs with corrupted superblock will lead to crash in btrfs-progs. This patch introduces superblock checksum check into btrfs_read_dev_super(), making btrfs-progs much more restricted and robust. To allow super-recover to open devices, add options to scan all 3 superblocks when using super-recover. Also updated the related error string and fix a bug in chunk-recover that will not be triggered until superblock csum is calculated. Qu Wenruo (4): btrfs-progs: Check superblock's checsum when read dev super btrfs-progs: Allow btrfs_read_dev_super() to read all 3 super for super_recover. btrfs-progs: Add more meaningful return value for btrfs_read_dev_super() and corresponding error string. btrfs-progs: Fix size for malloc for superblock checksum. btrfs-find-root.c | 9 -- chunk-recover.c | 18 +++ cmds-filesystem.c | 9 -- disk-io.c | 91 +-- disk-io.h | 5 +-- super-recover.c | 2 +- utils.c | 16 ++ volumes.c | 8 ++--- volumes.h | 2 +- 9 files changed, 104 insertions(+), 56 deletions(-) -- 2.0.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/4] btrfs-progs: Allow btrfs_read_dev_super() to read all 3 super for super_recover.
Btrfs-progs superblock checksum check is somewhat too restricted for super-recover, since current btrfs-progs will only read the 1st superblock and if you need super-recover the 1st superblock is possibly already damaged. The fix is introducing super_recover parameter for btrfs_read_dev_super() and callers to allow scan backup superblocks if needed. Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com --- btrfs-find-root.c | 4 ++-- chunk-recover.c | 6 +++--- cmds-filesystem.c | 2 +- disk-io.c | 17 ++--- disk-io.h | 5 +++-- super-recover.c | 2 +- utils.c | 11 ++- volumes.c | 4 ++-- volumes.h | 2 +- 9 files changed, 29 insertions(+), 24 deletions(-) diff --git a/btrfs-find-root.c b/btrfs-find-root.c index 25d79f1..e31a9b5 100644 --- a/btrfs-find-root.c +++ b/btrfs-find-root.c @@ -82,7 +82,7 @@ static struct btrfs_root *open_ctree_broken(int fd, const char *device) return NULL; } - ret = btrfs_scan_fs_devices(fd, device, fs_devices, 0, 1); + ret = btrfs_scan_fs_devices(fd, device, fs_devices, 0, 1, 1); if (ret) goto out; @@ -94,7 +94,7 @@ static struct btrfs_root *open_ctree_broken(int fd, const char *device) disk_super = fs_info-super_copy; ret = btrfs_read_dev_super(fs_devices-latest_bdev, - disk_super, fs_info-super_bytenr); + disk_super, fs_info-super_bytenr, 1); if (ret) { printk(No valid btrfs found\n); goto out_devices; diff --git a/chunk-recover.c b/chunk-recover.c index 613d715..9baedd7 100644 --- a/chunk-recover.c +++ b/chunk-recover.c @@ -1283,7 +1283,7 @@ open_ctree_with_broken_chunk(struct recover_control *rc) disk_super = fs_info-super_copy; ret = btrfs_read_dev_super(fs_info-fs_devices-latest_bdev, - disk_super, fs_info-super_bytenr); + disk_super, fs_info-super_bytenr, 1); if (ret) { fprintf(stderr, No valid btrfs found\n); goto out_devices; @@ -1349,7 +1349,7 @@ static int recover_prepare(struct recover_control *rc, char *path) goto fail_close_fd; } - ret = btrfs_read_dev_super(fd, sb, BTRFS_SUPER_INFO_OFFSET); + ret = btrfs_read_dev_super(fd, sb, BTRFS_SUPER_INFO_OFFSET, 1); if (ret) { fprintf(stderr, read super block error\n); goto fail_free_sb; @@ -1368,7 +1368,7 @@ static int recover_prepare(struct recover_control *rc, char *path) goto fail_free_sb; } - ret = btrfs_scan_fs_devices(fd, path, fs_devices, 0, 1); + ret = btrfs_scan_fs_devices(fd, path, fs_devices, 0, 1, 1); if (ret) goto fail_free_sb; diff --git a/cmds-filesystem.c b/cmds-filesystem.c index 306f715..d2e46dc 100644 --- a/cmds-filesystem.c +++ b/cmds-filesystem.c @@ -513,7 +513,7 @@ static int dev_to_fsid(char *dev, __u8 *fsid) disk_super = (struct btrfs_super_block *)buf; ret = btrfs_read_dev_super(fd, disk_super, - BTRFS_SUPER_INFO_OFFSET); + BTRFS_SUPER_INFO_OFFSET, 0); if (ret) goto out; diff --git a/disk-io.c b/disk-io.c index e447af8..1bd9fae 100644 --- a/disk-io.c +++ b/disk-io.c @@ -980,7 +980,7 @@ void btrfs_cleanup_all_caches(struct btrfs_fs_info *fs_info) int btrfs_scan_fs_devices(int fd, const char *path, struct btrfs_fs_devices **fs_devices, - u64 sb_bytenr, int run_ioctl) + u64 sb_bytenr, int run_ioctl, int super_recover) { u64 total_devs; int ret; @@ -988,7 +988,7 @@ int btrfs_scan_fs_devices(int fd, const char *path, sb_bytenr = BTRFS_SUPER_INFO_OFFSET; ret = btrfs_scan_one_device(fd, path, fs_devices, - total_devs, sb_bytenr); + total_devs, sb_bytenr, super_recover); if (ret) { fprintf(stderr, No valid Btrfs found on %s\n, path); return ret; @@ -1076,7 +1076,8 @@ static struct btrfs_fs_info *__open_ctree_fd(int fp, const char *path, fs_info-on_restoring = 1; ret = btrfs_scan_fs_devices(fp, path, fs_devices, sb_bytenr, - !(flags OPEN_CTREE_RECOVER_SUPER)); + !(flags OPEN_CTREE_RECOVER_SUPER), + (flags OPEN_CTREE_RECOVER_SUPER)); if (ret) goto out; @@ -1096,9 +1097,9 @@ static struct btrfs_fs_info *__open_ctree_fd(int fp, const char *path, disk_super = fs_info-super_copy; if (!(flags OPEN_CTREE_RECOVER_SUPER)) ret = btrfs_read_dev_super(fs_devices-latest_bdev, -
[PATCH 3/4] btrfs-progs: Add more meaningful return value for btrfs_read_dev_super() and corresponding error string.
Since btrfs_read_dev_super() now can distinguish non-btrfs fs and corrupted superblock thanks for the newly introduced super csum check, the return value and corresponding error string should also be updated to print more meaningful errors for end users. Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com --- btrfs-find-root.c | 5 - chunk-recover.c | 10 -- cmds-filesystem.c | 7 ++- disk-io.c | 36 ++-- utils.c | 5 - volumes.c | 4 +--- 6 files changed, 49 insertions(+), 18 deletions(-) diff --git a/btrfs-find-root.c b/btrfs-find-root.c index e31a9b5..f3bf452 100644 --- a/btrfs-find-root.c +++ b/btrfs-find-root.c @@ -96,7 +96,10 @@ static struct btrfs_root *open_ctree_broken(int fd, const char *device) ret = btrfs_read_dev_super(fs_devices-latest_bdev, disk_super, fs_info-super_bytenr, 1); if (ret) { - printk(No valid btrfs found\n); + if (ret == -ENOENT) + printk(No valid btrfs found\n); + if (ret == -EIO) + printk(Superblock is corrupted\n); goto out_devices; } diff --git a/chunk-recover.c b/chunk-recover.c index 9baedd7..c8badf9 100644 --- a/chunk-recover.c +++ b/chunk-recover.c @@ -1285,7 +1285,10 @@ open_ctree_with_broken_chunk(struct recover_control *rc) ret = btrfs_read_dev_super(fs_info-fs_devices-latest_bdev, disk_super, fs_info-super_bytenr, 1); if (ret) { - fprintf(stderr, No valid btrfs found\n); + if (ret == -ENOENT) + printk(No valid btrfs found\n); + if (ret == -EIO) + printk(Superblock is corrupted\n); goto out_devices; } @@ -1351,7 +1354,10 @@ static int recover_prepare(struct recover_control *rc, char *path) ret = btrfs_read_dev_super(fd, sb, BTRFS_SUPER_INFO_OFFSET, 1); if (ret) { - fprintf(stderr, read super block error\n); + if (ret == -ENOENT) + printk(No valid btrfs found\n); + if (ret == -EIO) + printk(Superblock is corrupted\n); goto fail_free_sb; } diff --git a/cmds-filesystem.c b/cmds-filesystem.c index d2e46dc..d58397d 100644 --- a/cmds-filesystem.c +++ b/cmds-filesystem.c @@ -604,9 +604,14 @@ static int cmd_show(int argc, char **argv) } else { ret = dev_to_fsid(search, fsid); if (ret) { - fprintf(stderr, + if (ret == -ENOENT) + fprintf(stderr, ERROR: No btrfs on %s\n, search); + if (ret == -EIO) + fprintf(stderr, + Superblock is corrupted on %s\n, + search); return 1; } uuid_unparse(fsid, uuid_buf); diff --git a/disk-io.c b/disk-io.c index 1bd9fae..4cc831b 100644 --- a/disk-io.c +++ b/disk-io.c @@ -990,7 +990,11 @@ int btrfs_scan_fs_devices(int fd, const char *path, ret = btrfs_scan_one_device(fd, path, fs_devices, total_devs, sb_bytenr, super_recover); if (ret) { - fprintf(stderr, No valid Btrfs found on %s\n, path); + if (ret == -ENOENT) + fprintf(stderr, No valid Btrfs found on %s\n, path); + if (ret == -EIO) + fprintf(stderr, Superblock is corrupted on %s\n, + path); return ret; } @@ -1101,7 +1105,10 @@ static struct btrfs_fs_info *__open_ctree_fd(int fp, const char *path, else ret = btrfs_read_dev_super(fp, disk_super, sb_bytenr, 0); if (ret) { - printk(No valid btrfs found\n); + if (ret == -ENOENT) + printk(No valid btrfs found\n); + if (ret == -EIO) + printk(Superblock is corrupted\n); goto out_devices; } @@ -1201,11 +1208,11 @@ int btrfs_read_dev_super(int fd, struct btrfs_super_block *sb, u64 sb_bytenr, if (sb_bytenr != BTRFS_SUPER_INFO_OFFSET) { ret = pread64(fd, data, sizeof(data), sb_bytenr); if (ret sizeof(data)) - return -1; +
[PATCH 0/4] Add superblock checksum check for btrfs-progs
Before this patchset, btrfs-progs will overall ignore the superblock checksum and continue the routine. Sometimes this may cause disasters like checking a btrfs with corrupted superblock will lead to crash in btrfs-progs. This patch introduces superblock checksum check into btrfs_read_dev_super(), making btrfs-progs much more restricted and robust. To allow super-recover to open devices, add options to scan all 3 superblocks when using super-recover. Also updated the related error string and fix a bug in chunk-recover that will not be triggered until superblock csum is calculated. Qu Wenruo (4): btrfs-progs: Check superblock's checsum when read dev super btrfs-progs: Allow btrfs_read_dev_super() to read all 3 super for super_recover. btrfs-progs: Add more meaningful return value for btrfs_read_dev_super() and corresponding error string. btrfs-progs: Fix size for malloc for superblock checksum. btrfs-find-root.c | 9 -- chunk-recover.c | 18 +++ cmds-filesystem.c | 9 -- disk-io.c | 91 +-- disk-io.h | 5 +-- super-recover.c | 2 +- utils.c | 16 ++ volumes.c | 8 ++--- volumes.h | 2 +- 9 files changed, 104 insertions(+), 56 deletions(-) -- 2.0.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/4] btrfs-progs: Fix malloc size for superblock.
recover_prepare() in chunk-recover.c alloc memory which only contains sizeof(struct btrfs_super_block). This will cause glibc malloc error after superblock csum is calculated. Use BTRFS_SUPER_INFO_SIZE to fix the bug. Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com --- chunk-recover.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/chunk-recover.c b/chunk-recover.c index c8badf9..7dfaf82 100644 --- a/chunk-recover.c +++ b/chunk-recover.c @@ -1345,7 +1345,7 @@ static int recover_prepare(struct recover_control *rc, char *path) return -1; } - sb = malloc(sizeof(struct btrfs_super_block)); + sb = malloc(BTRFS_SUPER_INFO_SIZE); if (!sb) { fprintf(stderr, allocating memory for sb failed.\n); ret = -ENOMEM; -- 2.0.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 1/4] btrfs-progs: Check superblock's checsum when read dev super
Btrfs-progs will read the superblock without checking the checksum. When all superblocks are corrupted, continuing will cause disaster. So this patch will add checksum check for btrfs-progs when reading superblocks. Also fix a bug that btrfs_read_dev_super() only reads sizeof(struct btrfs_super_block), corrent size should be BTRFS_SUPER_INFO_SIZE. Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com --- v2: Use corrent memcmp src. Read the whole supblock size(sectorsize) other than sizeof(btrfs_super_block). --- disk-io.c | 46 +- 1 file changed, 29 insertions(+), 17 deletions(-) diff --git a/disk-io.c b/disk-io.c index 8db0335..e447af8 100644 --- a/disk-io.c +++ b/disk-io.c @@ -1186,22 +1186,25 @@ int btrfs_read_dev_super(int fd, struct btrfs_super_block *sb, u64 sb_bytenr) { u8 fsid[BTRFS_FSID_SIZE]; int fsid_is_initialized = 0; - struct btrfs_super_block buf; + u8 data[BTRFS_SUPER_INFO_SIZE]; + struct btrfs_super_block *buf = (struct btrfs_super_block *) data; int i; int ret; u64 transid = 0; u64 bytenr; + u32 crc; + char crc_result[BTRFS_CSUM_SIZE]; if (sb_bytenr != BTRFS_SUPER_INFO_OFFSET) { - ret = pread64(fd, buf, sizeof(buf), sb_bytenr); - if (ret sizeof(buf)) + ret = pread64(fd, data, sizeof(data), sb_bytenr); + if (ret sizeof(data)) return -1; - if (btrfs_super_bytenr(buf) != sb_bytenr || - btrfs_super_magic(buf) != BTRFS_MAGIC) + if (btrfs_super_bytenr(buf) != sb_bytenr || + btrfs_super_magic(buf) != BTRFS_MAGIC) return -1; - memcpy(sb, buf, sizeof(*sb)); + memcpy(sb, data, sizeof(data)); return 0; } @@ -1214,22 +1217,31 @@ int btrfs_read_dev_super(int fd, struct btrfs_super_block *sb, u64 sb_bytenr) for (i = 0; i 1; i++) { bytenr = btrfs_sb_offset(i); - ret = pread64(fd, buf, sizeof(buf), bytenr); - if (ret sizeof(buf)) + ret = pread64(fd, data, sizeof(data), bytenr); + if (ret sizeof(data)) break; - if (btrfs_super_bytenr(buf) != bytenr ) + if (btrfs_super_bytenr(buf) != bytenr) continue; - /* if magic is NULL, the device was removed */ - if (btrfs_super_magic(buf) == 0 i == 0) + /* if first super block is not btrfs, the device was removed */ + if (btrfs_super_magic(buf) != BTRFS_MAGIC i == 0) return -1; - if (btrfs_super_magic(buf) != BTRFS_MAGIC) + if (btrfs_super_magic(buf) != BTRFS_MAGIC) + continue; + + /* check if the superblock is damaged */ + crc = ~(u32)0; + crc = btrfs_csum_data(NULL, (char *)buf + BTRFS_CSUM_SIZE, + crc, BTRFS_SUPER_INFO_SIZE - + BTRFS_CSUM_SIZE); + btrfs_csum_final(crc, crc_result); + if (memcmp(crc_result, buf, btrfs_super_csum_size(buf))) continue; if (!fsid_is_initialized) { - memcpy(fsid, buf.fsid, sizeof(fsid)); + memcpy(fsid, buf-fsid, sizeof(fsid)); fsid_is_initialized = 1; - } else if (memcmp(fsid, buf.fsid, sizeof(fsid))) { + } else if (memcmp(fsid, buf-fsid, sizeof(fsid))) { /* * the superblocks (the original one and * its backups) contain data of different @@ -1238,9 +1250,9 @@ int btrfs_read_dev_super(int fd, struct btrfs_super_block *sb, u64 sb_bytenr) continue; } - if (btrfs_super_generation(buf) transid) { - memcpy(sb, buf, sizeof(*sb)); - transid = btrfs_super_generation(buf); + if (btrfs_super_generation(buf) transid) { + memcpy(sb, data, sizeof(data)); + transid = btrfs_super_generation(buf); } } -- 2.0.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/4] btrfs-progs: Allow btrfs_read_dev_super() to read all 3 super for super_recover.
Btrfs-progs superblock checksum check is somewhat too restricted for super-recover, since current btrfs-progs will only read the 1st superblock and if you need super-recover the 1st superblock is possibly already damaged. The fix is introducing super_recover parameter for btrfs_read_dev_super() and callers to allow scan backup superblocks if needed. Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com --- btrfs-find-root.c | 4 ++-- chunk-recover.c | 6 +++--- cmds-filesystem.c | 2 +- disk-io.c | 17 ++--- disk-io.h | 5 +++-- super-recover.c | 2 +- utils.c | 11 ++- volumes.c | 4 ++-- volumes.h | 2 +- 9 files changed, 29 insertions(+), 24 deletions(-) diff --git a/btrfs-find-root.c b/btrfs-find-root.c index 25d79f1..e31a9b5 100644 --- a/btrfs-find-root.c +++ b/btrfs-find-root.c @@ -82,7 +82,7 @@ static struct btrfs_root *open_ctree_broken(int fd, const char *device) return NULL; } - ret = btrfs_scan_fs_devices(fd, device, fs_devices, 0, 1); + ret = btrfs_scan_fs_devices(fd, device, fs_devices, 0, 1, 1); if (ret) goto out; @@ -94,7 +94,7 @@ static struct btrfs_root *open_ctree_broken(int fd, const char *device) disk_super = fs_info-super_copy; ret = btrfs_read_dev_super(fs_devices-latest_bdev, - disk_super, fs_info-super_bytenr); + disk_super, fs_info-super_bytenr, 1); if (ret) { printk(No valid btrfs found\n); goto out_devices; diff --git a/chunk-recover.c b/chunk-recover.c index 613d715..9baedd7 100644 --- a/chunk-recover.c +++ b/chunk-recover.c @@ -1283,7 +1283,7 @@ open_ctree_with_broken_chunk(struct recover_control *rc) disk_super = fs_info-super_copy; ret = btrfs_read_dev_super(fs_info-fs_devices-latest_bdev, - disk_super, fs_info-super_bytenr); + disk_super, fs_info-super_bytenr, 1); if (ret) { fprintf(stderr, No valid btrfs found\n); goto out_devices; @@ -1349,7 +1349,7 @@ static int recover_prepare(struct recover_control *rc, char *path) goto fail_close_fd; } - ret = btrfs_read_dev_super(fd, sb, BTRFS_SUPER_INFO_OFFSET); + ret = btrfs_read_dev_super(fd, sb, BTRFS_SUPER_INFO_OFFSET, 1); if (ret) { fprintf(stderr, read super block error\n); goto fail_free_sb; @@ -1368,7 +1368,7 @@ static int recover_prepare(struct recover_control *rc, char *path) goto fail_free_sb; } - ret = btrfs_scan_fs_devices(fd, path, fs_devices, 0, 1); + ret = btrfs_scan_fs_devices(fd, path, fs_devices, 0, 1, 1); if (ret) goto fail_free_sb; diff --git a/cmds-filesystem.c b/cmds-filesystem.c index 306f715..d2e46dc 100644 --- a/cmds-filesystem.c +++ b/cmds-filesystem.c @@ -513,7 +513,7 @@ static int dev_to_fsid(char *dev, __u8 *fsid) disk_super = (struct btrfs_super_block *)buf; ret = btrfs_read_dev_super(fd, disk_super, - BTRFS_SUPER_INFO_OFFSET); + BTRFS_SUPER_INFO_OFFSET, 0); if (ret) goto out; diff --git a/disk-io.c b/disk-io.c index e447af8..1bd9fae 100644 --- a/disk-io.c +++ b/disk-io.c @@ -980,7 +980,7 @@ void btrfs_cleanup_all_caches(struct btrfs_fs_info *fs_info) int btrfs_scan_fs_devices(int fd, const char *path, struct btrfs_fs_devices **fs_devices, - u64 sb_bytenr, int run_ioctl) + u64 sb_bytenr, int run_ioctl, int super_recover) { u64 total_devs; int ret; @@ -988,7 +988,7 @@ int btrfs_scan_fs_devices(int fd, const char *path, sb_bytenr = BTRFS_SUPER_INFO_OFFSET; ret = btrfs_scan_one_device(fd, path, fs_devices, - total_devs, sb_bytenr); + total_devs, sb_bytenr, super_recover); if (ret) { fprintf(stderr, No valid Btrfs found on %s\n, path); return ret; @@ -1076,7 +1076,8 @@ static struct btrfs_fs_info *__open_ctree_fd(int fp, const char *path, fs_info-on_restoring = 1; ret = btrfs_scan_fs_devices(fp, path, fs_devices, sb_bytenr, - !(flags OPEN_CTREE_RECOVER_SUPER)); + !(flags OPEN_CTREE_RECOVER_SUPER), + (flags OPEN_CTREE_RECOVER_SUPER)); if (ret) goto out; @@ -1096,9 +1097,9 @@ static struct btrfs_fs_info *__open_ctree_fd(int fp, const char *path, disk_super = fs_info-super_copy; if (!(flags OPEN_CTREE_RECOVER_SUPER)) ret = btrfs_read_dev_super(fs_devices-latest_bdev, -
[PATCH v2 1/4] btrfs-progs: Check superblock's checsum when read dev super
Btrfs-progs will read the superblock without checking the checksum. When all superblocks are corrupted, continuing will cause disaster. So this patch will add checksum check for btrfs-progs when reading superblocks. Also fix a bug that btrfs_read_dev_super() only reads sizeof(struct btrfs_super_block), corrent size should be BTRFS_SUPER_INFO_SIZE. Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com --- v2: Use corrent memcmp src. Read the whole supblock size(sectorsize) other than sizeof(btrfs_super_block). --- disk-io.c | 46 +- 1 file changed, 29 insertions(+), 17 deletions(-) diff --git a/disk-io.c b/disk-io.c index 8db0335..e447af8 100644 --- a/disk-io.c +++ b/disk-io.c @@ -1186,22 +1186,25 @@ int btrfs_read_dev_super(int fd, struct btrfs_super_block *sb, u64 sb_bytenr) { u8 fsid[BTRFS_FSID_SIZE]; int fsid_is_initialized = 0; - struct btrfs_super_block buf; + u8 data[BTRFS_SUPER_INFO_SIZE]; + struct btrfs_super_block *buf = (struct btrfs_super_block *) data; int i; int ret; u64 transid = 0; u64 bytenr; + u32 crc; + char crc_result[BTRFS_CSUM_SIZE]; if (sb_bytenr != BTRFS_SUPER_INFO_OFFSET) { - ret = pread64(fd, buf, sizeof(buf), sb_bytenr); - if (ret sizeof(buf)) + ret = pread64(fd, data, sizeof(data), sb_bytenr); + if (ret sizeof(data)) return -1; - if (btrfs_super_bytenr(buf) != sb_bytenr || - btrfs_super_magic(buf) != BTRFS_MAGIC) + if (btrfs_super_bytenr(buf) != sb_bytenr || + btrfs_super_magic(buf) != BTRFS_MAGIC) return -1; - memcpy(sb, buf, sizeof(*sb)); + memcpy(sb, data, sizeof(data)); return 0; } @@ -1214,22 +1217,31 @@ int btrfs_read_dev_super(int fd, struct btrfs_super_block *sb, u64 sb_bytenr) for (i = 0; i 1; i++) { bytenr = btrfs_sb_offset(i); - ret = pread64(fd, buf, sizeof(buf), bytenr); - if (ret sizeof(buf)) + ret = pread64(fd, data, sizeof(data), bytenr); + if (ret sizeof(data)) break; - if (btrfs_super_bytenr(buf) != bytenr ) + if (btrfs_super_bytenr(buf) != bytenr) continue; - /* if magic is NULL, the device was removed */ - if (btrfs_super_magic(buf) == 0 i == 0) + /* if first super block is not btrfs, the device was removed */ + if (btrfs_super_magic(buf) != BTRFS_MAGIC i == 0) return -1; - if (btrfs_super_magic(buf) != BTRFS_MAGIC) + if (btrfs_super_magic(buf) != BTRFS_MAGIC) + continue; + + /* check if the superblock is damaged */ + crc = ~(u32)0; + crc = btrfs_csum_data(NULL, (char *)buf + BTRFS_CSUM_SIZE, + crc, BTRFS_SUPER_INFO_SIZE - + BTRFS_CSUM_SIZE); + btrfs_csum_final(crc, crc_result); + if (memcmp(crc_result, buf, btrfs_super_csum_size(buf))) continue; if (!fsid_is_initialized) { - memcpy(fsid, buf.fsid, sizeof(fsid)); + memcpy(fsid, buf-fsid, sizeof(fsid)); fsid_is_initialized = 1; - } else if (memcmp(fsid, buf.fsid, sizeof(fsid))) { + } else if (memcmp(fsid, buf-fsid, sizeof(fsid))) { /* * the superblocks (the original one and * its backups) contain data of different @@ -1238,9 +1250,9 @@ int btrfs_read_dev_super(int fd, struct btrfs_super_block *sb, u64 sb_bytenr) continue; } - if (btrfs_super_generation(buf) transid) { - memcpy(sb, buf, sizeof(*sb)); - transid = btrfs_super_generation(buf); + if (btrfs_super_generation(buf) transid) { + memcpy(sb, data, sizeof(data)); + transid = btrfs_super_generation(buf); } } -- 2.0.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V4 1/9] Btrfs: device_list_add() should not update list when mounted
From: Anand Jain anand.j...@oracle.com device_list_add() is called when user runs btrfs dev scan, which would add any btrfs device into the btrfs_fs_devices list. Now think of a mounted btrfs. And a new device which contains the a SB from the mounted btrfs devices. In this situation when user runs btrfs dev scan, the current code would just replace existing device with the new device. Which is to note that old device is neither closed nor gracefully removed from the btrfs. The FS is still operational with the old bdev however the device name is the btrfs_device is new which is provided by the btrfs dev scan. reproducer: devmgt[1] detach /dev/sdc replace the missing disk /dev/sdc btrfs rep start -f 1 /dev/sde /btrfs Label: none uuid: 5dc0aaf4-4683-4050-b2d6-5ebe5f5cd120 Total devices 2 FS bytes used 32.00KiB devid1 size 958.94MiB used 115.88MiB path /dev/sde devid2 size 958.94MiB used 103.88MiB path /dev/sdd make /dev/sdc to reappear devmgt attach host2 btrfs dev scan btrfs fi show -m Label: none uuid: 5dc0aaf4-4683-4050-b2d6-5ebe5f5cd120^M Total devices 2 FS bytes used 32.00KiB^M devid1 size 958.94MiB used 115.88MiB path /dev/sdc - Wrong. devid2 size 958.94MiB used 103.88MiB path /dev/sdd since /dev/sdc has been replaced with /dev/sde, the /dev/sdc shouldn't be part of the btrfs-fsid when it reappears. If user want it to be part of it then sys admin should be using btrfs device add instead. [1] github.com/anajain/devmgt.git Signed-off-by: Anand Jain anand.j...@oracle.com Signed-off-by: Wang Shilong wangsl.f...@cn.fujitsu.com Signed-off-by: Miao Xie mi...@cn.fujitsu.com --- Changelog v3-v4: - Fix the over-80-charactor problem --- fs/btrfs/volumes.c | 27 +++ 1 file changed, 27 insertions(+) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index a9c11a0..16e71a1 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -508,6 +508,33 @@ static noinline int device_list_add(const char *path, ret = 1; device-fs_devices = fs_devices; } else if (!device-name || strcmp(device-name-str, path)) { + /* +* When FS is already mounted. +* 1. If you are here and if the device-name is NULL that +*means this device was missing at time of FS mount. +* 2. If you are here and if the device-name is different +*from 'path' that means either +* a. The same device disappeared and reappeared with +* different name. or +* b. The missing-disk-which-was-replaced, has +* reappeared now. +* +* We must allow 1 and 2a above. But 2b would be a spurious +* and unintentional. +* +* Further in case of 1 and 2a above, the disk at 'path' +* would have missed some transaction when it was away and +* in case of 2a the stale bdev has to be updated as well. +* 2b must not be allowed at all time. +*/ + + /* +* As of now don't allow update to btrfs_fs_device through +* the btrfs dev scan cli, after FS has been mounted. +*/ + if (fs_devices-opened) + return -EBUSY; + name = rcu_string_strdup(path, GFP_NOFS); if (!name) return -ENOMEM; -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RESEND 3/9] Btrfs: make defragment work with nodatacow option
From: Wang Shilong wangsl.f...@cn.fujitsu.com Btrfs defragment will utilize COW feature, which means this did not work for nodatacow option, this problem was detected by xfstests generic/018 with nodatacow mount option. Fix this problem by forcing cow for a extent with state @EXTETN_DEFRAG setting. Signed-off-by: Wang Shilong wangsl.f...@cn.fujitsu.com Signed-off-by: Miao Xie mi...@cn.fujitsu.com --- fs/btrfs/btrfs_inode.h | 6 ++ fs/btrfs/inode.c | 39 --- 2 files changed, 42 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h index a0cf3e5..01cfcba 100644 --- a/fs/btrfs/btrfs_inode.h +++ b/fs/btrfs/btrfs_inode.h @@ -127,6 +127,12 @@ struct btrfs_inode { u64 delalloc_bytes; /* +* total number of bytes pending defrag, used by stat to check whether +* it needs COW. +*/ + u64 defrag_bytes; + + /* * the size of the file stored in the metadata on disk. data=ordered * means the in-memory i_size might be larger than the size on disk * because not all the blocks are written yet. diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 6b65fab..a616fa4 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -1425,6 +1425,26 @@ error: return ret; } +static inline int need_force_cow(struct inode *inode, u64 start, u64 end) +{ + + if (!(BTRFS_I(inode)-flags BTRFS_INODE_NODATACOW) + !(BTRFS_I(inode)-flags BTRFS_INODE_PREALLOC)) + return 0; + + /* +* @defrag_bytes is a hint value, no spinlock held here, +* if is not zero, it means the file is defragging. +* Force cow if given extent needs to be defragged. +*/ + if (BTRFS_I(inode)-defrag_bytes + test_range_bit(BTRFS_I(inode)-io_tree, start, end, + EXTENT_DEFRAG, 0, NULL)) + return 1; + + return 0; +} + /* * extent_io.c call back to do delayed allocation processing */ @@ -1434,11 +1454,12 @@ static int run_delalloc_range(struct inode *inode, struct page *locked_page, { int ret; struct btrfs_root *root = BTRFS_I(inode)-root; + int force_cow = need_force_cow(inode, start, end); - if (BTRFS_I(inode)-flags BTRFS_INODE_NODATACOW) { + if (BTRFS_I(inode)-flags BTRFS_INODE_NODATACOW !force_cow) { ret = run_delalloc_nocow(inode, locked_page, start, end, page_started, 1, nr_written); - } else if (BTRFS_I(inode)-flags BTRFS_INODE_PREALLOC) { + } else if (BTRFS_I(inode)-flags BTRFS_INODE_PREALLOC !force_cow) { ret = run_delalloc_nocow(inode, locked_page, start, end, page_started, 0, nr_written); } else if (!btrfs_test_opt(root, COMPRESS) @@ -1535,6 +1556,8 @@ static void btrfs_set_bit_hook(struct inode *inode, struct extent_state *state, unsigned long *bits) { + if ((*bits EXTENT_DEFRAG) !(*bits EXTENT_DELALLOC)) + WARN_ON(1); /* * set_bit and clear bit hooks normally require _irqsave/restore * but in this case, we are only testing for the DELALLOC @@ -1557,6 +1580,8 @@ static void btrfs_set_bit_hook(struct inode *inode, root-fs_info-delalloc_batch); spin_lock(BTRFS_I(inode)-lock); BTRFS_I(inode)-delalloc_bytes += len; + if (*bits EXTENT_DEFRAG) + BTRFS_I(inode)-defrag_bytes += len; if (do_list !test_bit(BTRFS_INODE_IN_DELALLOC_LIST, BTRFS_I(inode)-runtime_flags)) btrfs_add_delalloc_inodes(root, inode); @@ -1571,6 +1596,13 @@ static void btrfs_clear_bit_hook(struct inode *inode, struct extent_state *state, unsigned long *bits) { + u64 len = state-end + 1 - state-start; + + spin_lock(BTRFS_I(inode)-lock); + if ((state-state EXTENT_DEFRAG) (*bits EXTENT_DEFRAG)) + BTRFS_I(inode)-defrag_bytes -= len; + spin_unlock(BTRFS_I(inode)-lock); + /* * set_bit and clear bit hooks normally require _irqsave/restore * but in this case, we are only testing for the DELALLOC @@ -1578,7 +1610,6 @@ static void btrfs_clear_bit_hook(struct inode *inode, */ if ((state-state EXTENT_DELALLOC) (*bits EXTENT_DELALLOC)) { struct btrfs_root *root = BTRFS_I(inode)-root; - u64 len = state-end + 1 - state-start; bool do_list = !btrfs_is_free_space_inode(inode); if (*bits EXTENT_FIRST_DELALLOC) { @@ -8089,6 +8120,7 @@ struct inode *btrfs_alloc_inode(struct super_block *sb) ei-last_sub_trans = 0; ei-logged_trans = 0;
[PATCH RESEND 6/9] Btrfs: cleanup the read failure record after write or when the inode is freeing
After the data is written successfully, we should cleanup the read failure record in that range because - If we set data COW for the file, the range that the failure record pointed to is mapped to a new place, so it is invalid. - If we set no data COW for the file, and if there is no error during writting, the corrupted data is corrected, so the failure record can be removed. And if some errors happen on the mirrors, we also needn't worry about it because the failure record will be recreated if we read the same place again. Sometimes, we may fail to correct the data, so the failure records will be left in the tree, we need free them when we free the inode or the memory leak happens. Signed-off-by: Miao Xie mi...@cn.fujitsu.com --- fs/btrfs/extent_io.c | 34 ++ fs/btrfs/extent_io.h | 1 + fs/btrfs/inode.c | 6 ++ 3 files changed, 41 insertions(+) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 3a64354..d67bb4f 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2003,6 +2003,40 @@ static int free_io_failure(struct inode *inode, struct io_failure_record *rec, } /* + * Can be called when + * - hold extent lock + * - under ordered extent + * - the inode is freeing + */ +void btrfs_free_io_failure_record(struct inode *inode, u64 start, u64 end) +{ + struct extent_io_tree *failure_tree = BTRFS_I(inode)-io_failure_tree; + struct io_failure_record *failrec; + struct extent_state *state, *next; + + if (RB_EMPTY_ROOT(failure_tree-state)) + return; + + spin_lock(failure_tree-lock); + state = find_first_extent_bit_state(failure_tree, start, EXTENT_DIRTY); + while (state) { + if (state-start end) + break; + + ASSERT(state-end = end); + + next = next_state(state); + + failrec = (struct io_failure_record *)state-private; + free_extent_state(state); + kfree(failrec); + + state = next; + } + spin_unlock(failure_tree-lock); +} + +/* * this bypasses the standard btrfs submit functions deliberately, as * the standard behavior is to write all copies in a raid setup. here we only * want to write the one bad copy. so we do the mapping for ourselves and issue diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h index ccc264e..d06780b 100644 --- a/fs/btrfs/extent_io.h +++ b/fs/btrfs/extent_io.h @@ -347,6 +347,7 @@ int repair_io_failure(struct btrfs_fs_info *fs_info, u64 start, int end_extent_writepage(struct page *page, int err, u64 start, u64 end); int repair_eb_io_failure(struct btrfs_root *root, struct extent_buffer *eb, int mirror_num); +void btrfs_free_io_failure_record(struct inode *inode, u64 start, u64 end); #ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS noinline u64 find_lock_delalloc_range(struct inode *inode, struct extent_io_tree *tree, diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 15902eb..b431c58 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -2670,6 +2670,10 @@ static int btrfs_finish_ordered_io(struct btrfs_ordered_extent *ordered_extent) goto out; } + btrfs_free_io_failure_record(inode, ordered_extent-file_offset, +ordered_extent-file_offset + +ordered_extent-len - 1); + if (test_bit(BTRFS_ORDERED_TRUNCATED, ordered_extent-flags)) { truncated = true; logical_len = ordered_extent-truncated_len; @@ -4745,6 +4749,8 @@ void btrfs_evict_inode(struct inode *inode) /* do we really want it for -i_nlink 0 and zero btrfs_root_refs? */ btrfs_wait_ordered_range(inode, 0, (u64)-1); + btrfs_free_io_failure_record(inode, 0, (u64)-1); + if (root-fs_info-log_root_recovering) { BUG_ON(test_bit(BTRFS_INODE_HAS_ORPHAN_ITEM, BTRFS_I(inode)-runtime_flags)); -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RESEND 5/9] Btrfs: fix missing error handler if submiting re-read bio fails
We forgot to free failure record and bio after submitting re-read bio failed, fix it. Signed-off-by: Miao Xie mi...@cn.fujitsu.com --- fs/btrfs/extent_io.c | 5 + 1 file changed, 5 insertions(+) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 23398ad..3a64354 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2345,6 +2345,11 @@ static int bio_readpage_error(struct bio *failed_bio, u64 phy_offset, ret = tree-ops-submit_bio_hook(inode, read_mode, bio, failrec-this_mirror, failrec-bio_flags, 0); + if (ret) { + free_io_failure(inode, failrec, 0); + bio_put(bio); + } + return ret; } -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 9/9] Btrfs: fix writing data into the seed filesystem
If we mounted a seed filesystem with degraded option, and then added a new device into the seed filesystem, then we found adding device failed because of the IO failure. Steps to reproduce: # mkfs.btrfs -d raid1 -m raid1 dev0 dev1 # btrfstune -S 1 dev0 # mount dev0 -o degraded mnt # btrfs device add -f dev2 mnt It is because the original didn't set the chunk on the seed device to be read-only if the degraded flag was set. It was introduced by patch f48b90756, which fixed the problem the raid1 filesystem became read-only after one device of it was missing. But this fix method was not right, we should set the read-only flag according to the number of the missing devices, not the degraded mount option, if the number of the missing devices is less than the max error number that the profile of the chunk tolerates, we don't set it to be read-only. Cc: Josef Bacik jba...@fb.com Signed-off-by: Miao Xie mi...@cn.fujitsu.com --- fs/btrfs/volumes.c | 52 1 file changed, 36 insertions(+), 16 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 73a82e5..daecfa5 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -4584,12 +4584,31 @@ out: return ret; } +static inline int btrfs_chunk_max_errors(struct map_lookup *map) +{ + int max_errors; + + if (map-type (BTRFS_BLOCK_GROUP_RAID1 | +BTRFS_BLOCK_GROUP_RAID10 | +BTRFS_BLOCK_GROUP_RAID5 | +BTRFS_BLOCK_GROUP_DUP)) { + max_errors = 1; + } else if (map-type BTRFS_BLOCK_GROUP_RAID6) { + max_errors = 2; + } else { + max_errors = 0; + } + + return max_errors; +} + int btrfs_chunk_readonly(struct btrfs_root *root, u64 chunk_offset) { struct extent_map *em; struct map_lookup *map; struct btrfs_mapping_tree *map_tree = root-fs_info-mapping_tree; int readonly = 0; + int miss_ndevs = 0; int i; read_lock(map_tree-map_tree.lock); @@ -4598,18 +4617,27 @@ int btrfs_chunk_readonly(struct btrfs_root *root, u64 chunk_offset) if (!em) return 1; - if (btrfs_test_opt(root, DEGRADED)) { - free_extent_map(em); - return 0; - } - map = (struct map_lookup *)em-bdev; for (i = 0; i map-num_stripes; i++) { + if (map-stripes[i].dev-missing) { + miss_ndevs++; + continue; + } + if (!map-stripes[i].dev-writeable) { readonly = 1; - break; + goto end; } } + + /* +* If the number of missing devices is larger than max errors, +* we can not write the data into that chunk successfully, so +* set it readonly. +*/ + if (miss_ndevs btrfs_chunk_max_errors(map)) + readonly = 1; +end: free_extent_map(em); return readonly; } @@ -5220,16 +5248,8 @@ static int __btrfs_map_block(struct btrfs_fs_info *fs_info, int rw, } } - if (rw (REQ_WRITE | REQ_GET_READ_MIRRORS)) { - if (map-type (BTRFS_BLOCK_GROUP_RAID1 | -BTRFS_BLOCK_GROUP_RAID10 | -BTRFS_BLOCK_GROUP_RAID5 | -BTRFS_BLOCK_GROUP_DUP)) { - max_errors = 1; - } else if (map-type BTRFS_BLOCK_GROUP_RAID6) { - max_errors = 2; - } - } + if (rw (REQ_WRITE | REQ_GET_READ_MIRRORS)) + max_errors = btrfs_chunk_max_errors(map); if (dev_replace_is_ongoing (rw (REQ_WRITE | REQ_DISCARD)) dev_replace-tgtdev != NULL) { -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RESEND 4/9] Btrfs: fix put dio bio twice when we submit dio bio fail
The caller of btrfs_submit_direct_hook() will put the original dio bio when btrfs_submit_direct_hook() return a error number, so we needn't put the original bio in btrfs_submit_direct_hook(). Signed-off-by: Miao Xie mi...@cn.fujitsu.com --- fs/btrfs/inode.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index a616fa4..15902eb 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -7325,10 +7325,8 @@ static int btrfs_submit_direct_hook(int rw, struct btrfs_dio_private *dip, map_length = orig_bio-bi_iter.bi_size; ret = btrfs_map_block(root-fs_info, rw, start_sector 9, map_length, NULL, 0); - if (ret) { - bio_put(orig_bio); + if (ret) return -EIO; - } if (map_length = orig_bio-bi_iter.bi_size) { bio = orig_bio; @@ -7345,6 +7343,7 @@ static int btrfs_submit_direct_hook(int rw, struct btrfs_dio_private *dip, bio = btrfs_dio_bio_alloc(orig_bio-bi_bdev, start_sector, GFP_NOFS); if (!bio) return -ENOMEM; + bio-bi_private = dip; bio-bi_end_io = btrfs_end_dio_bio; atomic_inc(dip-pending_bios); -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V2 7/9] btrfs: fix null pointer dereference in clone_fs_devices when name is null
From: Anand Jain anand.j...@oracle.com when one of the device path is missing btrfs_device name is null. So this patch will check for that. stack: BUG: unable to handle kernel NULL pointer dereference at 0010 IP: [812e18c0] strlen+0x0/0x30 [a01cd92a] ? clone_fs_devices+0xaa/0x160 [btrfs] [a01cdcf7] btrfs_init_new_device+0x317/0xca0 [btrfs] [81155bca] ? __kmalloc_track_caller+0x15a/0x1a0 [a01d6473] btrfs_ioctl+0xaa3/0x2860 [btrfs] [81132a6c] ? handle_mm_fault+0x48c/0x9c0 [81192a61] ? __blkdev_put+0x171/0x180 [817a784c] ? __do_page_fault+0x4ac/0x590 [81193426] ? blkdev_put+0x106/0x110 [81179175] ? mntput+0x35/0x40 [8116d4b0] do_vfs_ioctl+0x460/0x4a0 [8115c72e] ? fput+0xe/0x10 [81068033] ? task_work_run+0xb3/0xd0 [8116d547] SyS_ioctl+0x57/0x90 [817a793e] ? do_page_fault+0xe/0x10 [817abe52] system_call_fastpath+0x16/0x1b reproducer: mkfs.btrfs -draid1 -mraid1 /dev/sdg1 /dev/sdg2 btrfstune -S 1 /dev/sdg1 modprobe -r btrfs modprobe btrfs mount -o degraded /dev/sdg1 /btrfs btrfs dev add /dev/sdg3 /btrfs Signed-off-by: Anand Jain anand.j...@oracle.com Signed-off-by: Miao Xie mi...@cn.fujitsu.com --- Changelog v1-v2: - Fix the problem that we forgot to set the missing flag for the cloned device --- fs/btrfs/volumes.c | 25 - 1 file changed, 16 insertions(+), 9 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 1891541..4731bd6 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -598,16 +598,23 @@ static struct btrfs_fs_devices *clone_fs_devices(struct btrfs_fs_devices *orig) if (IS_ERR(device)) goto error; - /* -* This is ok to do without rcu read locked because we hold the -* uuid mutex so nothing we touch in here is going to disappear. -*/ - name = rcu_string_strdup(orig_dev-name-str, GFP_NOFS); - if (!name) { - kfree(device); - goto error; + if (orig_dev-missing) { + device-missing = 1; + fs_devices-missing_devices++; + } else { + ASSERT(orig_dev-name); + /* +* This is ok to do without rcu read locked because +* we hold the uuid mutex so nothing we touch in here +* is going to disappear. +*/ + name = rcu_string_strdup(orig_dev-name-str, GFP_NOFS); + if (!name) { + kfree(device); + goto error; + } + rcu_assign_pointer(device-name, name); } - rcu_assign_pointer(device-name, name); list_add(device-dev_list, fs_devices-devices); device-fs_devices = fs_devices; -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 8/9] Btrfs: fix unzeroed members in fs_devices when creating a fs from seed fs
We forgot to zero some members in fs_devices when we create new fs_devices from the one of the seed fs. It would cause the problem that we got wrong chunk profile when allocating chunks. Fix it. Signed-off-by: Miao Xie mi...@cn.fujitsu.com --- fs/btrfs/volumes.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 4731bd6..73a82e5 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -1993,6 +1993,9 @@ static int btrfs_prepare_sprout(struct btrfs_root *root) fs_devices-seeding = 0; fs_devices-num_devices = 0; fs_devices-open_devices = 0; + fs_devices-missing_devices = 0; + fs_devices-num_can_discard = 0; + fs_devices-rotating = 0; fs_devices-seed = seed_devices; generate_random_uuid(fs_devices-fsid); -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH V2 2/9] btrfs: check generation as replace duplicates devid+uuid
From: Anand Jain anand.j...@oracle.com When FS in unmounted we need to check generation number as well since devid+uuid combination could match with the missing replaced disk when it reappears, and without this patch it might pair with the replaced disk again. device_list_add() function is called in the following threads, mount device option mount argument ioctl BTRFS_IOC_SCAN_DEV (btrfs dev scan) ioctl BTRFS_IOC_DEVICES_READY (btrfs dev ready dev) they have been unit tested to work fine with this patch. If the user knows what he is doing and really want to pair with replaced disk (which is not a standard operation), then he should first clear the kernel btrfs device list in the memory by doing the module unload/load and followed with the mount -o device option. Signed-off-by: Anand Jain anand.j...@oracle.com Signed-off-by: Wang Shilong wangsl.f...@cn.fujitsu.com Signed-off-by: Miao Xie mi...@cn.fujitsu.com --- Changelog v1-v2: - Fix the over-80-charactor problem and unreasonable error number --- fs/btrfs/volumes.c | 22 +- 1 file changed, 21 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 16e71a1..1891541 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -532,8 +532,19 @@ static noinline int device_list_add(const char *path, * As of now don't allow update to btrfs_fs_device through * the btrfs dev scan cli, after FS has been mounted. */ - if (fs_devices-opened) + if (fs_devices-opened) { return -EBUSY; + } else { + /* +* That is if the FS is _not_ mounted and if you +* are here, that means there is more than one +* disk with same uuid and devid.We keep the one +* with larger generation number or the last-in if +* generation are equal. +*/ + if (found_transid device-generation) + return -EEXIST; + } name = rcu_string_strdup(path, GFP_NOFS); if (!name) @@ -546,6 +557,15 @@ static noinline int device_list_add(const char *path, } } + /* +* Unmount does not free the btrfs_device struct but would zero +* generation along with most of the other members. So just update +* it back. We need it to pick the disk with largest generation +* (as above). +*/ + if (!fs_devices-opened) + device-generation = found_transid; + if (found_transid fs_devices-latest_trans) { fs_devices-latest_devid = devid; fs_devices-latest_trans = found_transid; -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] Btrfs: fix crash when starting transaction
(2014/07/03 17:30), Miao Xie wrote: On Tue, 24 Jun 2014 17:46:58 +0100, Filipe David Borba Manana wrote: Often when starting a transaction we commit the currently running transaction, which can end up writing block group caches when the current process has its journal_info set to NULL (and not to a transaction). This makes our assertion at btrfs_check_data_free_space() (current_journal != NULL) fail, resulting in a crash/hang. Therefore fix it by setting journal_info. Two different traces of this issue follow below. 1) [51502.241936] BTRFS: assertion failed: current-journal_info, file: fs/btrfs/extent-tree.c, line: 3670 [51502.242213] [ cut here ] [51502.242493] kernel BUG at fs/btrfs/ctree.h:3964! [51502.242669] invalid opcode: [#1] SMP DEBUG_PAGEALLOC (...) [51502.244010] Call Trace: [51502.244010] [a02bc025] btrfs_check_data_free_space+0x395/0x3a0 [btrfs] [51502.244010] [a02c3bdc] btrfs_write_dirty_block_groups+0x4ac/0x640 [btrfs] [51502.244010] [a0357a6a] commit_cowonly_roots+0x164/0x226 [btrfs] [51502.244010] [a02d53cd] btrfs_commit_transaction+0x4ed/0xab0 [btrfs] [51502.244010] [8168ec7b] ? _raw_spin_unlock+0x2b/0x40 [51502.244010] [a02d6259] start_transaction+0x459/0x620 [btrfs] [51502.244010] [a02d67ab] btrfs_start_transaction+0x1b/0x20 [btrfs] [51502.244010] [a02d73e1] __unlink_start_trans+0x31/0xe0 [btrfs] [51502.244010] [a02dea67] btrfs_unlink+0x37/0xc0 [btrfs] [51502.244010] [811bb054] ? do_unlinkat+0x114/0x2a0 [51502.244010] [811baebc] vfs_unlink+0xcc/0x150 [51502.244010] [811bb1a0] do_unlinkat+0x260/0x2a0 [51502.244010] [811a9ef4] ? filp_close+0x64/0x90 [51502.244010] [810aaea6] ? trace_hardirqs_on_caller+0x16/0x1e0 [51502.244010] [81349cab] ? trace_hardirqs_on_thunk+0x3a/0x3f [51502.244010] [811be9eb] SyS_unlinkat+0x1b/0x40 [51502.244010] [81698452] system_call_fastpath+0x16/0x1b [51502.244010] Code: 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55 89 f1 48 c7 c2 71 13 36 a0 48 89 fe 31 c0 48 c7 c7 b8 43 36 a0 48 89 e5 e8 5d b0 32 e1 0f 0b 0f 1f 44 00 00 55 b9 11 00 00 00 48 89 e5 41 55 49 89 f5 [51502.244010] RIP [a03575da] assfail.constprop.88+0x1e/0x20 [btrfs] 2) [25405.097230] BTRFS: assertion failed: current-journal_info, file: fs/btrfs/extent-tree.c, line: 3670 [25405.097488] [ cut here ] [25405.097767] kernel BUG at fs/btrfs/ctree.h:3964! [25405.097940] invalid opcode: [#1] SMP DEBUG_PAGEALLOC (...) [25405.18] Call Trace: [25405.18] [a02bc025] btrfs_check_data_free_space+0x395/0x3a0 [btrfs] [25405.18] [a02c3bdc] btrfs_write_dirty_block_groups+0x4ac/0x640 [btrfs] [25405.18] [a035755a] commit_cowonly_roots+0x164/0x226 [btrfs] [25405.18] [a02d53cd] btrfs_commit_transaction+0x4ed/0xab0 [btrfs] [25405.18] [8109c170] ? bit_waitqueue+0xc0/0xc0 [25405.18] [a02d6259] start_transaction+0x459/0x620 [btrfs] [25405.18] [a02d67ab] btrfs_start_transaction+0x1b/0x20 [btrfs] [25405.18] [a02e3407] btrfs_create+0x47/0x210 [btrfs] [25405.18] [a02d74cc] ? btrfs_permission+0x3c/0x80 [btrfs] [25405.18] [811bc63b] vfs_create+0x9b/0x130 [25405.18] [811bcf19] do_last+0x849/0xe20 [25405.18] [811b9409] ? link_path_walk+0x79/0x820 [25405.18] [811bd5b5] path_openat+0xc5/0x690 [25405.18] [810ab07d] ? trace_hardirqs_on+0xd/0x10 [25405.18] [811cdcd2] ? __alloc_fd+0x32/0x1d0 [25405.18] [811be2a3] do_filp_open+0x43/0xa0 [25405.18] [811cddf1] ? __alloc_fd+0x151/0x1d0 [25405.18] [811abcfc] do_sys_open+0x13c/0x230 [25405.18] [810aaea6] ? trace_hardirqs_on_caller+0x16/0x1e0 [25405.18] [811abe12] SyS_open+0x22/0x30 [25405.18] [81698452] system_call_fastpath+0x16/0x1b [25405.18] Code: 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55 89 f1 48 c7 c2 51 13 36 a0 48 89 fe 31 c0 48 c7 c7 d0 43 36 a0 48 89 e5 e8 6d b5 32 e1 0f 0b 0f 1f 44 00 00 55 b9 11 00 00 00 48 89 e5 41 55 49 89 f5 [25405.18] RIP [a03570ca] assfail.constprop.88+0x1e/0x20 [btrfs] Signed-off-by: Filipe David Borba Manana fdman...@gmail.com --- V2: Removed test for current-journal_info == NULL. At this point it's always expected to be NULL. Reviewed-by: Miao Xie mi...@cn.fujitsu.com Let me clarify my understanding since I'm not good at the transaction code. * What is the route cause? When start_transaction() is called with current-journal_transaction == NULL, we
Re: [PATCH v2] Btrfs: fix crash when starting transaction
On Thu, 3 Jul 2014 19:32:18 +0900, Satoru Takeuchi wrote: (2014/07/03 17:30), Miao Xie wrote: On Tue, 24 Jun 2014 17:46:58 +0100, Filipe David Borba Manana wrote: Often when starting a transaction we commit the currently running transaction, which can end up writing block group caches when the current process has its journal_info set to NULL (and not to a transaction). This makes our assertion at btrfs_check_data_free_space() (current_journal != NULL) fail, resulting in a crash/hang. Therefore fix it by setting journal_info. Two different traces of this issue follow below. 1) [51502.241936] BTRFS: assertion failed: current-journal_info, file: fs/btrfs/extent-tree.c, line: 3670 [51502.242213] [ cut here ] [51502.242493] kernel BUG at fs/btrfs/ctree.h:3964! [51502.242669] invalid opcode: [#1] SMP DEBUG_PAGEALLOC (...) [51502.244010] Call Trace: [51502.244010] [a02bc025] btrfs_check_data_free_space+0x395/0x3a0 [btrfs] [51502.244010] [a02c3bdc] btrfs_write_dirty_block_groups+0x4ac/0x640 [btrfs] [51502.244010] [a0357a6a] commit_cowonly_roots+0x164/0x226 [btrfs] [51502.244010] [a02d53cd] btrfs_commit_transaction+0x4ed/0xab0 [btrfs] [51502.244010] [8168ec7b] ? _raw_spin_unlock+0x2b/0x40 [51502.244010] [a02d6259] start_transaction+0x459/0x620 [btrfs] [51502.244010] [a02d67ab] btrfs_start_transaction+0x1b/0x20 [btrfs] [51502.244010] [a02d73e1] __unlink_start_trans+0x31/0xe0 [btrfs] [51502.244010] [a02dea67] btrfs_unlink+0x37/0xc0 [btrfs] [51502.244010] [811bb054] ? do_unlinkat+0x114/0x2a0 [51502.244010] [811baebc] vfs_unlink+0xcc/0x150 [51502.244010] [811bb1a0] do_unlinkat+0x260/0x2a0 [51502.244010] [811a9ef4] ? filp_close+0x64/0x90 [51502.244010] [810aaea6] ? trace_hardirqs_on_caller+0x16/0x1e0 [51502.244010] [81349cab] ? trace_hardirqs_on_thunk+0x3a/0x3f [51502.244010] [811be9eb] SyS_unlinkat+0x1b/0x40 [51502.244010] [81698452] system_call_fastpath+0x16/0x1b [51502.244010] Code: 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55 89 f1 48 c7 c2 71 13 36 a0 48 89 fe 31 c0 48 c7 c7 b8 43 36 a0 48 89 e5 e8 5d b0 32 e1 0f 0b 0f 1f 44 00 00 55 b9 11 00 00 00 48 89 e5 41 55 49 89 f5 [51502.244010] RIP [a03575da] assfail.constprop.88+0x1e/0x20 [btrfs] 2) [25405.097230] BTRFS: assertion failed: current-journal_info, file: fs/btrfs/extent-tree.c, line: 3670 [25405.097488] [ cut here ] [25405.097767] kernel BUG at fs/btrfs/ctree.h:3964! [25405.097940] invalid opcode: [#1] SMP DEBUG_PAGEALLOC (...) [25405.18] Call Trace: [25405.18] [a02bc025] btrfs_check_data_free_space+0x395/0x3a0 [btrfs] [25405.18] [a02c3bdc] btrfs_write_dirty_block_groups+0x4ac/0x640 [btrfs] [25405.18] [a035755a] commit_cowonly_roots+0x164/0x226 [btrfs] [25405.18] [a02d53cd] btrfs_commit_transaction+0x4ed/0xab0 [btrfs] [25405.18] [8109c170] ? bit_waitqueue+0xc0/0xc0 [25405.18] [a02d6259] start_transaction+0x459/0x620 [btrfs] [25405.18] [a02d67ab] btrfs_start_transaction+0x1b/0x20 [btrfs] [25405.18] [a02e3407] btrfs_create+0x47/0x210 [btrfs] [25405.18] [a02d74cc] ? btrfs_permission+0x3c/0x80 [btrfs] [25405.18] [811bc63b] vfs_create+0x9b/0x130 [25405.18] [811bcf19] do_last+0x849/0xe20 [25405.18] [811b9409] ? link_path_walk+0x79/0x820 [25405.18] [811bd5b5] path_openat+0xc5/0x690 [25405.18] [810ab07d] ? trace_hardirqs_on+0xd/0x10 [25405.18] [811cdcd2] ? __alloc_fd+0x32/0x1d0 [25405.18] [811be2a3] do_filp_open+0x43/0xa0 [25405.18] [811cddf1] ? __alloc_fd+0x151/0x1d0 [25405.18] [811abcfc] do_sys_open+0x13c/0x230 [25405.18] [810aaea6] ? trace_hardirqs_on_caller+0x16/0x1e0 [25405.18] [811abe12] SyS_open+0x22/0x30 [25405.18] [81698452] system_call_fastpath+0x16/0x1b [25405.18] Code: 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55 89 f1 48 c7 c2 51 13 36 a0 48 89 fe 31 c0 48 c7 c7 d0 43 36 a0 48 89 e5 e8 6d b5 32 e1 0f 0b 0f 1f 44 00 00 55 b9 11 00 00 00 48 89 e5 41 55 49 89 f5 [25405.18] RIP [a03570ca] assfail.constprop.88+0x1e/0x20 [btrfs] Signed-off-by: Filipe David Borba Manana fdman...@gmail.com --- V2: Removed test for current-journal_info == NULL. At this point it's always expected to be NULL. Reviewed-by: Miao Xie mi...@cn.fujitsu.com Let me clarify my understanding
Re: [PATCH] Btrfs: fix wrong uevent target
CC Anand Jain Sorry, please ignore this patch. Anand wrote the same patch several days ago, so this bug fix belongs to Anand though he NACKed his patch at that time. Thanks Miao On Wed, 2 Jul 2014 17:03:54 +0800, Miao Xie wrote: block_device's bd_disk points to the disk, not the object which block_device is actually corresponding to(the whole disk or a partition), so we would send uevent to the wrong target. Fix it. Signed-off-by: Miao Xie mi...@cn.fujitsu.com --- fs/btrfs/volumes.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 95828b0..e8b9214 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -123,7 +123,7 @@ static void btrfs_kobject_uevent(struct block_device *bdev, { int ret; - ret = kobject_uevent(disk_to_dev(bdev-bd_disk)-kobj, action); + ret = kobject_uevent(part_to_dev(bdev-bd_part)-kobj, action); if (ret) pr_warn(BTRFS: Sending event '%d' to kobject: '%s' (%p): failed\n, action, -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] Revert btrfs: allow mounting btrfs subvolumes with different ro/rw options
On 3 July 2014 10:33, Qu Wenruo quwen...@cn.fujitsu.com wrote: Oh, sorry for my confusing words. And I probably should have waited for my frustration with my mail client/device/public transport to subside before panicking^Creplying. I use a combination of ro rw (not insanely nested) subvolumes on a few pseudo-embedded home/office servers and would like to keep that arrangement working if possible. I'm also aware that it doesn't protect against all possible bugs. To make it clear, when mentioning 'the whole disk(or partition whatever)' I mean the FS_TREE. (Of course not the default subvolume) The problem is that, even you mount a subvolume ro, you can still change contents in the subvolume through its rw parent subvolume. And if a subvolume can still be modified, the ro mount lose it meaning. That makes so much more sense than my original reading, which was weird and wrong and implied strange subvol-5-only magic. Sorry! So we need special rules to prevent such things. Not that it matters, but: agreed. Tobias -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: implement support for fallocate collapse range
On Thu, Jul 3, 2014 at 5:46 AM, Chandan Rajendra chan...@linux.vnet.ibm.com wrote: On Monday 23 Jun 2014 11:25:47 Filipe David Borba Manana wrote: diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c index aeab453..8f1a371 100644 --- a/fs/btrfs/ctree.c +++ b/fs/btrfs/ctree.c @@ -2825,12 +2825,12 @@ cow_done: * It is safe to drop the lock on our parent before we * go through the expensive btree search on b. * - * If we're inserting or deleting (ins_len != 0), then we might - * be changing slot zero, which may require changing the parent. - * So, we can't drop the lock until after we know which slot - * we're operating on. + * If we're inserting, deleting or updating a key (cow != 0), + * then we might be changing slot zero, which may require + * changing the parent. So, we can't drop the lock until after + * we know which slot we're operating on. */ - if (!ins_len !p-keep_locks) { + if (!cow !p-keep_locks) { int u = level + 1; For the key update case (i.e. (ins_len == 0) and (cow == 1)), maybe we could optimize by having a variable to hold the return value of should_cow_block(), i.e. it keeps track of whether the current metadata block was COWed. If the variable indicates that the block was *not* COWed, then we could release the lock on the parent block since the update operation (even on slot 0) isn't going to change the corresponding entry in the parent block. Hi Chandan, Just because a node wasn't cowed it doesn't mean it isn't going to be updated. Further updating the key works bottom-up - we don't know during btrfs_search_slot() if our caller intends to update a key in the leaf or just update the item, so we need to return a path here with all nodes with slot == 0 having a write lock on them. I'm actually for now basically just undoing this https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=eb653de15987612444b6cde3b0e67b1edd94625f Anyway right now I have a functional issue to tackle first. Thanks. Thanks, chandan -- Filipe David Manana, Reasonable men adapt themselves to the world. Unreasonable men adapt the world to themselves. That's why all progress depends on unreasonable men. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 3.15.1: kernel BUG at fs/btrfs/locking.c:269
Thanks for the patch. Hopefully this will make it to the next 3.15.x kernel. I also went back to 3.14 anyway since the 'blocked for 120 seconds' look like another instance of deadlocks we've been discussing here. But just curious: [160562.925463] parent transid verify failed on 2776298520576 wanted 41015 found 18120 What should I be doing about this? Does it mean that I do have some kind of corruption/damage on my filesystem? Also, is it possible to have all these messages state which devid they occurred on? I don't even know which device I should be worrying about right now, and although I'm running scrub now, my understanding is that scrub doesn't actually look at FS structures and is likely to miss this anyway. Thanks, Marc -- A mouse is a device used to point at the xterm you want to type in - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 1024R/763BE901 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: fix wrong uevent target
On 07/03/2014 07:09 AM, Miao Xie wrote: CC Anand Jain Sorry, please ignore this patch. Anand wrote the same patch several days ago, so this bug fix belongs to Anand though he NACKed his patch at that time. It certainly looks right, but Anand had mentioned that he had a few questions on testing. I've pulled it out for now, but I'll take Anand's version when you're both happy. -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] btrfs-progs: add ask_user confirmation for btrfstune clear seeding flag
On Thu, Jul 03, 2014 at 10:06:33AM +0800, Gui Hecheng wrote: Clear the seeding flag may cause the original filesystem to be writable, which is dangerous. Can you please describe the dangerous scenario a bit more? This would also go to the documentation so it's not only to satisfy my curiosity. Dropping the seeding flag could be dangerous if the filesystem starts in seeding mode, a new device is added, some writes are done, then filesystem is unmounted. Now it's a 2 device filesystem, where the orignal holds some data and without the seeding flag it would accept new writes. Still ok for me, though this is probably the time where some user assumptions may break. In this case, add user confirmation check when clearing seeding flag. Also warn the user that the fs is in a dangerous condition when the seeding flag is cleared if it it forced to. The -y option is tied only to the seeding option, but it should IMO be more general and called --force. Signed-off-by: Gui Hecheng guihc.f...@cn.fujitsu.com --- btrfstune.c | 24 +++- 1 file changed, 23 insertions(+), 1 deletion(-) diff --git a/btrfstune.c b/btrfstune.c index 3f2f0cd..0e18088 100644 --- a/btrfstune.c +++ b/btrfstune.c @@ -103,6 +104,7 @@ static void print_usage(void) fprintf(stderr, \t-S value\tpositive value will enable seeding, zero to disable, negative is not allowed\n); fprintf(stderr, \t-r \t\tenable extended inode refs\n); fprintf(stderr, \t-x \t\tenable skinny metadata extent refs\n); + fprintf(stderr, \t-y \t\tsay yes to clear the seeding flag, make sure that you are aware of the danger\n); The help text could say someting like --force\tallow dangerous changes\n btrfstune only allows setting the bit for extref and skinny-metadata, unsetting would be dangerous as well. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs-progs: add mount status check for btrfs-image
On Thu, Jul 03, 2014 at 10:06:34AM +0800, Gui Hecheng wrote: The btrfs-image tool should not be run on a mounted filesystem. Should not, but for some values of sometimes it makes sense, eg. capturing image of an otherwise quiescent filesystem, a read-only mount or after a crash. This utility is used for debugging so I'd prefer to let the user do as he likes, though printing the warning about the mount status is a good improvement. The undergoing fs operations may change what you have imaged a while ago, this makes the image meanmingless. I'm not familiar with the image format, but maybe we can set a bit in the header when the filesystem was not captured cleanly. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: fix wrong uevent target
Chris, This fix is theoretically correct but my guess that this would solve problem as reported by Qu Wenruo was wrong [1]. Patch is good to integrate. Thanks, Anand [1] Re: [PATCH RFC] btrfs: Add ctime/mtime update for btrfs device add/remove. On 03/07/2014 22:00, Chris Mason wrote: On 07/03/2014 07:09 AM, Miao Xie wrote: CC Anand Jain Sorry, please ignore this patch. Anand wrote the same patch several days ago, so this bug fix belongs to Anand though he NACKed his patch at that time. It certainly looks right, but Anand had mentioned that he had a few questions on testing. I've pulled it out for now, but I'll take Anand's version when you're both happy. -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] Revert btrfs: allow mounting btrfs subvolumes with different ro/rw options
On 07/03/2014 02:28 AM, Qu Wenruo wrote: Original Message Subject: Re: [RFC PATCH] Revert btrfs: allow mounting btrfs subvolumes with different ro/rw options From: Goffredo Baroncelli kreij...@libero.it To: Qu Wenruo quwen...@cn.fujitsu.com, linux-btrfs@vger.kernel.org Date: 2014年07月03日 01:48 On 07/01/2014 11:30 AM, Qu Wenruo wrote: This commit has the following problem: 1) Break the ro mount rule. When users mount the whole btrfs ro, it is still possible to mount subvol rw and change the contents. Which make the whole fs ro mount non-sense. Where is the problem ? I see an use case when I want a conservative default: mount all ro except some subvolumes. In any case it is not a security problem because if the user has the capability to mount a subvolume, also he has the capability to remount,rw the whole filesystem. Not security problem but behavior not consistent. If user mount the whole disk ro, he or she want the fs read only and nothing will change in it. If you mount a subvol rw, then the whole disk ro expectation is broken. Things will change even the whole disk is readonly. Sorry for bother you again, but there is a thing not clear to me: If # mount -o subvolid=5,ro /dev/sda1 /mnt/root # mount -o subvol=subvolname,rw /dev/sda1 /mnt/subvolname I suppose that # touch /mnt/root/touch-test# 1 fails, and # touch /mnt/subvolname/touch-test # 2 succeeded. I understood correctly ? If so this behaviour seems to me correctly. Different is after mounting the subvolume subvolumename, also the whole filesystem results rw (eg: #1 succeeded). G.Baroncelli The problem also happens when a parent subvol is mounted rw but child subvol is mounted ro. User can still modify the child subvol through parent subvol, still broke the readonly rule. Thanks, Qu -- gpg @keyserver.linux.it: Goffredo Baroncelli (kreijackATinwind.it Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/4] btrfs-progs: Allow btrfs_read_dev_super() to read all 3 super for super_recover.
On Thu, Jul 03, 2014 at 05:36:36PM +0800, Qu Wenruo wrote: @@ -1182,7 +1183,8 @@ struct btrfs_root *open_ctree_fd(int fp, const char *path, u64 sb_bytenr, return info-fs_root; } -int btrfs_read_dev_super(int fd, struct btrfs_super_block *sb, u64 sb_bytenr) +int btrfs_read_dev_super(int fd, struct btrfs_super_block *sb, u64 sb_bytenr, + int recover_super) + int max_super = recover_super ? BTRFS_SUPER_MIRROR_MAX : 1; Minor tweak, I've renamed it to super_recover as this is used everywhere else. No need to resend the patch. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] btrfs-progs: Add more meaningful return value for btrfs_read_dev_super() and corresponding error string.
On Thu, Jul 03, 2014 at 05:36:37PM +0800, Qu Wenruo wrote: --- a/btrfs-find-root.c +++ b/btrfs-find-root.c @@ -96,7 +96,10 @@ static struct btrfs_root *open_ctree_broken(int fd, const char *device) ret = btrfs_read_dev_super(fs_devices-latest_bdev, disk_super, fs_info-super_bytenr, 1); if (ret) { - printk(No valid btrfs found\n); + if (ret == -ENOENT) + printk(No valid btrfs found\n); Please use fprintf(stderr, ...) for error messages, until we have a better message logging helpers. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] Add superblock checksum check for btrfs-progs
On Thu, Jul 03, 2014 at 05:36:34PM +0800, Qu Wenruo wrote: Before this patchset, btrfs-progs will overall ignore the superblock checksum and continue the routine. Sometimes this may cause disasters like checking a btrfs with corrupted superblock will lead to crash in btrfs-progs. This patch introduces superblock checksum check into btrfs_read_dev_super(), making btrfs-progs much more restricted and robust. To allow super-recover to open devices, add options to scan all 3 superblocks when using super-recover. Also updated the related error string and fix a bug in chunk-recover that will not be triggered until superblock csum is calculated. Qu Wenruo (4): btrfs-progs: Check superblock's checsum when read dev super btrfs-progs: Allow btrfs_read_dev_super() to read all 3 super for super_recover. btrfs-progs: Add more meaningful return value for btrfs_read_dev_super() and corresponding error string. btrfs-progs: Fix size for malloc for superblock checksum. Nice work. I've added 1, 2 and 4 it to integration. Please update the patch 3 (printf/fprintf). -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs-progs: prevent select invalid dev super after dev replace
On Thu, Jul 03, 2014 at 10:06:35AM +0800, Gui Hecheng wrote: After dev replace, we should not select the superblock of the replaced dev. Otherwise, all the superblokcs will be overwritten by this invalid superblock. To prevent this case, let btrfs-select-super check the first superblock on the selected dev. If the magic doesn't match, then the dev is a replaced dev and error message will show up. Is this patch needed if Qu's superblock checksum patches are applied? Thanks. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] Btrfs: fix crash when starting transaction
(2014/07/03 20:07), Miao Xie wrote: On Thu, 3 Jul 2014 19:32:18 +0900, Satoru Takeuchi wrote: (2014/07/03 17:30), Miao Xie wrote: On Tue, 24 Jun 2014 17:46:58 +0100, Filipe David Borba Manana wrote: Often when starting a transaction we commit the currently running transaction, which can end up writing block group caches when the current process has its journal_info set to NULL (and not to a transaction). This makes our assertion at btrfs_check_data_free_space() (current_journal != NULL) fail, resulting in a crash/hang. Therefore fix it by setting journal_info. Two different traces of this issue follow below. 1) [51502.241936] BTRFS: assertion failed: current-journal_info, file: fs/btrfs/extent-tree.c, line: 3670 [51502.242213] [ cut here ] [51502.242493] kernel BUG at fs/btrfs/ctree.h:3964! [51502.242669] invalid opcode: [#1] SMP DEBUG_PAGEALLOC (...) [51502.244010] Call Trace: [51502.244010] [a02bc025] btrfs_check_data_free_space+0x395/0x3a0 [btrfs] [51502.244010] [a02c3bdc] btrfs_write_dirty_block_groups+0x4ac/0x640 [btrfs] [51502.244010] [a0357a6a] commit_cowonly_roots+0x164/0x226 [btrfs] [51502.244010] [a02d53cd] btrfs_commit_transaction+0x4ed/0xab0 [btrfs] [51502.244010] [8168ec7b] ? _raw_spin_unlock+0x2b/0x40 [51502.244010] [a02d6259] start_transaction+0x459/0x620 [btrfs] [51502.244010] [a02d67ab] btrfs_start_transaction+0x1b/0x20 [btrfs] [51502.244010] [a02d73e1] __unlink_start_trans+0x31/0xe0 [btrfs] [51502.244010] [a02dea67] btrfs_unlink+0x37/0xc0 [btrfs] [51502.244010] [811bb054] ? do_unlinkat+0x114/0x2a0 [51502.244010] [811baebc] vfs_unlink+0xcc/0x150 [51502.244010] [811bb1a0] do_unlinkat+0x260/0x2a0 [51502.244010] [811a9ef4] ? filp_close+0x64/0x90 [51502.244010] [810aaea6] ? trace_hardirqs_on_caller+0x16/0x1e0 [51502.244010] [81349cab] ? trace_hardirqs_on_thunk+0x3a/0x3f [51502.244010] [811be9eb] SyS_unlinkat+0x1b/0x40 [51502.244010] [81698452] system_call_fastpath+0x16/0x1b [51502.244010] Code: 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55 89 f1 48 c7 c2 71 13 36 a0 48 89 fe 31 c0 48 c7 c7 b8 43 36 a0 48 89 e5 e8 5d b0 32 e1 0f 0b 0f 1f 44 00 00 55 b9 11 00 00 00 48 89 e5 41 55 49 89 f5 [51502.244010] RIP [a03575da] assfail.constprop.88+0x1e/0x20 [btrfs] 2) [25405.097230] BTRFS: assertion failed: current-journal_info, file: fs/btrfs/extent-tree.c, line: 3670 [25405.097488] [ cut here ] [25405.097767] kernel BUG at fs/btrfs/ctree.h:3964! [25405.097940] invalid opcode: [#1] SMP DEBUG_PAGEALLOC (...) [25405.18] Call Trace: [25405.18] [a02bc025] btrfs_check_data_free_space+0x395/0x3a0 [btrfs] [25405.18] [a02c3bdc] btrfs_write_dirty_block_groups+0x4ac/0x640 [btrfs] [25405.18] [a035755a] commit_cowonly_roots+0x164/0x226 [btrfs] [25405.18] [a02d53cd] btrfs_commit_transaction+0x4ed/0xab0 [btrfs] [25405.18] [8109c170] ? bit_waitqueue+0xc0/0xc0 [25405.18] [a02d6259] start_transaction+0x459/0x620 [btrfs] [25405.18] [a02d67ab] btrfs_start_transaction+0x1b/0x20 [btrfs] [25405.18] [a02e3407] btrfs_create+0x47/0x210 [btrfs] [25405.18] [a02d74cc] ? btrfs_permission+0x3c/0x80 [btrfs] [25405.18] [811bc63b] vfs_create+0x9b/0x130 [25405.18] [811bcf19] do_last+0x849/0xe20 [25405.18] [811b9409] ? link_path_walk+0x79/0x820 [25405.18] [811bd5b5] path_openat+0xc5/0x690 [25405.18] [810ab07d] ? trace_hardirqs_on+0xd/0x10 [25405.18] [811cdcd2] ? __alloc_fd+0x32/0x1d0 [25405.18] [811be2a3] do_filp_open+0x43/0xa0 [25405.18] [811cddf1] ? __alloc_fd+0x151/0x1d0 [25405.18] [811abcfc] do_sys_open+0x13c/0x230 [25405.18] [810aaea6] ? trace_hardirqs_on_caller+0x16/0x1e0 [25405.18] [811abe12] SyS_open+0x22/0x30 [25405.18] [81698452] system_call_fastpath+0x16/0x1b [25405.18] Code: 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55 89 f1 48 c7 c2 51 13 36 a0 48 89 fe 31 c0 48 c7 c7 d0 43 36 a0 48 89 e5 e8 6d b5 32 e1 0f 0b 0f 1f 44 00 00 55 b9 11 00 00 00 48 89 e5 41 55 49 89 f5 [25405.18] RIP [a03570ca] assfail.constprop.88+0x1e/0x20 [btrfs] Signed-off-by: Filipe David Borba Manana fdman...@gmail.com --- V2: Removed test for current-journal_info == NULL. At this point it's always expected to be NULL. Reviewed-by: Miao Xie mi...@cn.fujitsu.com Let me clarify my understanding since
Re: [PATCH] Btrfs: assert send doesn't attempt to start transactions
(2014/06/25 1:48), Filipe David Borba Manana wrote: When starting a transaction just assert that current-journal_info doesn't contain a send transaction stub, since send isn't supposed to start transactions and when it finishes (either successfully or not) it's supposed to set current-journal_info to NULL. This is motivated by the change titled: Btrfs: fix crash when starting transaction Signed-off-by: Filipe David Borba Manana fdman...@gmail.com Reviewed-by: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com Thanks, Satoru --- fs/btrfs/transaction.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index 614eac3..47870ca 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -386,11 +386,13 @@ start_transaction(struct btrfs_root *root, u64 num_items, unsigned int type, bool reloc_reserved = false; int ret; + /* Send isn't supposed to start transactions. */ + ASSERT(current-journal_info != (void *)BTRFS_SEND_TRANS_STUB); + if (test_bit(BTRFS_FS_STATE_ERROR, root-fs_info-fs_state)) return ERR_PTR(-EROFS); - if (current-journal_info - current-journal_info != (void *)BTRFS_SEND_TRANS_STUB) { + if (current-journal_info) { WARN_ON(type TRANS_EXTWRITERS); h = current-journal_info; h-use_count++; -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] Add superblock checksum check for btrfs-progs
Original Message Subject: Re: [PATCH 0/4] Add superblock checksum check for btrfs-progs From: David Sterba dste...@suse.cz To: Qu Wenruo quwen...@cn.fujitsu.com Date: 2014年07月04日 01:57 On Thu, Jul 03, 2014 at 05:36:34PM +0800, Qu Wenruo wrote: Before this patchset, btrfs-progs will overall ignore the superblock checksum and continue the routine. Sometimes this may cause disasters like checking a btrfs with corrupted superblock will lead to crash in btrfs-progs. This patch introduces superblock checksum check into btrfs_read_dev_super(), making btrfs-progs much more restricted and robust. To allow super-recover to open devices, add options to scan all 3 superblocks when using super-recover. Also updated the related error string and fix a bug in chunk-recover that will not be triggered until superblock csum is calculated. Qu Wenruo (4): btrfs-progs: Check superblock's checsum when read dev super btrfs-progs: Allow btrfs_read_dev_super() to read all 3 super for super_recover. btrfs-progs: Add more meaningful return value for btrfs_read_dev_super() and corresponding error string. btrfs-progs: Fix size for malloc for superblock checksum. Nice work. I've added 1, 2 and 4 it to integration. Please update the patch 3 (printf/fprintf). Thanks for the review and minor tweak for patch 2. I'll send v2 version of patch 3 soon. Thank, Qu -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs-progs: prevent select invalid dev super after dev replace
On Thu, 2014-07-03 at 20:10 +0200, David Sterba wrote: On Thu, Jul 03, 2014 at 10:06:35AM +0800, Gui Hecheng wrote: After dev replace, we should not select the superblock of the replaced dev. Otherwise, all the superblokcs will be overwritten by this invalid superblock. To prevent this case, let btrfs-select-super check the first superblock on the selected dev. If the magic doesn't match, then the dev is a replaced dev and error message will show up. Is this patch needed if Qu's superblock checksum patches are applied? Thanks. Hmm...I think it is not neccessary then, please *ignore* this one. Thanks, David. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Is btrfs related to OOM death problems on my 8GB server with both 3.15.1 and 3.14?
I upgraded my server from 3.14 to 3.15.1 last week, and since then it's been running out of memory and deadlocking (panic= doesn't even work). I downgraded back to 3.14, but I already had the problem once since then. OOM comes in, even though I have 0 swap used and AFAIK all my RAM isn't gone, it then fails to kill enough stuff and eventually it dies like this: [80943.542209] Swap cache stats: add 814596, delete 814595, find 2567491/2808869 [80943.565106] Free swap = 15612448kB [80943.577607] Total swap = 15616764kB [80943.589766] 2021665 pages RAM [80943.600281] 0 pages HighMem/MovableOnly [80943.613284] 28468 pages reserved [80943.624330] 0 pages hwpoisoned [80943.634824] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [80943.659669] [ 918] 0 918 8550 5 236 -1000 udevd [80943.684789] [ 8022] 0 8022 30740 5 89 -1000 auditd [80943.710154] [ 8253] 0 8253 18130 6 123 -1000 sshd [80943.735024] [12001] 0 12001 8540 5 241 -1000 udevd [80943.760152] [18969] 0 18969 8540 5 223 -1000 udevd [80943.785293] Kernel panic - not syncing: Out of memory and no killable processes... Here is my more recent capture on 3.14 when I was able to catch it before the panic and dump a bunch of sysrq data. http://marc.merlins.org/tmp/btrfs-oom.txt Things to note in that log: [90621.895715] 2962 total pagecache pages [90621.895716] 5 pages in swap cache [90621.895717] Swap cache stats: add 145004, delete 144999, find 3314901/3316382 [90621.895718] Free swap = 15230536kB [90621.895718] Total swap = 15616764kB [90621.895718] Total swap = 15616764kB [90621.895719] 2021665 pages RAM [90621.895720] 0 pages HighMem/MovableOnly [90621.895720] 28468 pages reserved I'm not a VM person so I don't know how to read this, but am I out of RAM but not out of swap (since clearly none was used), or am I out of a specific memory region that is causing me problems? I'm not 100% certain btrfs is to blame, but somehow it's suspect when ugprading to 3.15 and getting btrfs problems then caused my 3 months running fine 3.14.0 kernel also to die with the same OOM problems. Then again, I understand it could be red herring. Suggestions either way are appreciated :) I tried raising this: gargamel:~# echo 100 /proc/sys/vm/swappiness But so far I have too much unused RAM for any swap to be touched. gargamel:~# free total used free sharedbuffers cached Mem: 789479249385962956196 0 23962909204 -/+ buffers/cache:20269965867796 Swap: 15616764 0 15616764 The log is too big to paste here, but you can grep it for: [90817.715833] SysRq : Show Memory [90817.715833] SysRq : Show Memory [90817.715833] SysRq : Show Memory [90893.571151] SysRq : Show backtrace of all active CPUs [90921.781599] SysRq : Show Blocked State [91075.976611] SysRq : Show State [91406.972046] SysRq : Terminate All Tasks [91410.771584] SysRq : Emergency Remount R/O [91413.222483] SysRq : Emergency Sync [91430.316955] SysRq : Power Off ^^^ note the kernel was wedged enough that Power Off didn't work, apparently because it failed to swap: [91447.490142] CPU: 3 PID: 48 Comm: kswapd0 Not tainted 3.14.0-amd64-i915-preempt-20140216 #2 [91447.490143] Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3806 08/20/2012 [91447.490145] task: 8802126b6490 ti: 8802126e4000 task.ti: 8802126e4000 [91447.490146] RIP: 0010:[810898f5] [810898f5] do_raw_spin_lock+0x23/0x27 Right after OOM started kicking in, console showed apparent deadlocks in btrfs. But is it possible that btrfs is then also eating all my memory somehow? You can find the long details here: http://marc.merlins.org/tmp/btrfs-oom.txt [90801.680821] INFO: task btrfs-transacti:3433 blocked for more than 120 seconds. [90801.712345] Not tainted 3.14.0-amd64-i915-preempt-20140216 #2 [90801.734394] echo 0 /proc/sys/kernel/hung_task_timeout_secs disables this message. [90801.882691] btrfs-transacti D 88021387e800 0 3433 2 0x [90801.904863] 88020b20de10 0046 88020b20dfd8 88021387e2d0 [90801.928448] 000141c0 88021387e2d0 880211e94800 880029c00dc0 [90801.952015] 8802009d1f28 8802009d1ed0 88020b20de20 [90801.975701] Call Trace: [90801.984443] [8160d2a1] schedule+0x73/0x75 [90802.000438] [8122b575] btrfs_commit_transaction+0x330/0x849 [90802.021140] [81085116] ? finish_wait+0x65/0x65 [90802.038438] [81227c48] transaction_kthread+0xf8/0x1ab [90802.057571] [81227b50] ? btrfs_cleanup_transaction+0x43f/0x43f [90802.079092] [8106bc62] kthread+0xae/0xb6 [90802.094838] [8106bbb4] ? __kthread_parkme+0x61/0x61
Re: [RFC PATCH] Revert btrfs: allow mounting btrfs subvolumes with different ro/rw options
Original Message Subject: Re: [RFC PATCH] Revert btrfs: allow mounting btrfs subvolumes with different ro/rw options From: Goffredo Baroncelli kreij...@inwind.it To: Qu Wenruo quwen...@cn.fujitsu.com, linux-btrfs@vger.kernel.org Date: 2014年07月04日 01:37 On 07/03/2014 02:28 AM, Qu Wenruo wrote: Original Message Subject: Re: [RFC PATCH] Revert btrfs: allow mounting btrfs subvolumes with different ro/rw options From: Goffredo Baroncelli kreij...@libero.it To: Qu Wenruo quwen...@cn.fujitsu.com, linux-btrfs@vger.kernel.org Date: 2014年07月03日 01:48 On 07/01/2014 11:30 AM, Qu Wenruo wrote: This commit has the following problem: 1) Break the ro mount rule. When users mount the whole btrfs ro, it is still possible to mount subvol rw and change the contents. Which make the whole fs ro mount non-sense. Where is the problem ? I see an use case when I want a conservative default: mount all ro except some subvolumes. In any case it is not a security problem because if the user has the capability to mount a subvolume, also he has the capability to remount,rw the whole filesystem. Not security problem but behavior not consistent. If user mount the whole disk ro, he or she want the fs read only and nothing will change in it. If you mount a subvol rw, then the whole disk ro expectation is broken. Things will change even the whole disk is readonly. Sorry for bother you again, but there is a thing not clear to me: If # mount -o subvolid=5,ro /dev/sda1 /mnt/root # mount -o subvol=subvolname,rw /dev/sda1 /mnt/subvolname I suppose that # touch /mnt/root/touch-test # 1 fails, and # touch /mnt/subvolname/touch-test # 2 succeeded. I understood correctly ? Your understanding is right and that is current behavior. But that should not be the correct behavior. If you mount fs_tree ro, btrfs should ensure the whole fs_tree(including all the subvolumes) ro. Or the whole fs_tree is not restricted readonly since you can modify contents inside the rw subvolume, and it's part of the fs_tree.(partly ro and partly rw status) IMO the perfect logical should be like the following: 1) ro mounted subvolume will force all the children subvolumes only ro mountable subvol 5 (mounted ro /) ├── subvol 257 (mounted rw /mnt/btrfrs) So above mounted should not be allowed. But the following mount should be OK: subvol 5 (mounted rw /) ├── subvol 257 (mounted ro /mnt/btrfrs) 2) ro mounted subvolume will not be modified even through the rw mounted parent subvolume. Only this will ensure restricted ro mount option. If anyone has any other ideas about it, I'm happy to listen. Thanks, Qu If so this behaviour seems to me correctly. Different is after mounting the subvolume subvolumename, also the whole filesystem results rw (eg: #1 succeeded). G.Baroncelli The problem also happens when a parent subvol is mounted rw but child subvol is mounted ro. User can still modify the child subvol through parent subvol, still broke the readonly rule. Thanks, Qu -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 3/4] btrfs-progs: Add more meaningful return value for btrfs_read_dev_super() and corresponding error string.
Since btrfs_read_dev_super() now can distinguish non-btrfs fs and corrupted superblock thanks for the newly introduced super csum check, the return value and corresponding error string should also be updated to print more meaningful errors for end users. Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com --- v2: Use fprintf(stderr,) to replace the old printk. --- btrfs-find-root.c | 5 - chunk-recover.c | 10 -- cmds-filesystem.c | 7 ++- disk-io.c | 36 ++-- utils.c | 5 - volumes.c | 4 +--- 6 files changed, 49 insertions(+), 18 deletions(-) diff --git a/btrfs-find-root.c b/btrfs-find-root.c index e31a9b5..7932308 100644 --- a/btrfs-find-root.c +++ b/btrfs-find-root.c @@ -96,7 +96,10 @@ static struct btrfs_root *open_ctree_broken(int fd, const char *device) ret = btrfs_read_dev_super(fs_devices-latest_bdev, disk_super, fs_info-super_bytenr, 1); if (ret) { - printk(No valid btrfs found\n); + if (ret == -ENOENT) + fprintf(stderr, No valid btrfs found\n); + if (ret == -EIO) + fprintf(stderr, Superblock is corrupted\n); goto out_devices; } diff --git a/chunk-recover.c b/chunk-recover.c index 9baedd7..4a16110 100644 --- a/chunk-recover.c +++ b/chunk-recover.c @@ -1285,7 +1285,10 @@ open_ctree_with_broken_chunk(struct recover_control *rc) ret = btrfs_read_dev_super(fs_info-fs_devices-latest_bdev, disk_super, fs_info-super_bytenr, 1); if (ret) { - fprintf(stderr, No valid btrfs found\n); + if (ret == -ENOENT) + fprintf(stderr, No valid btrfs found\n); + if (ret == -EIO) + fprintf(stderr, Superblock is corrupted\n); goto out_devices; } @@ -1351,7 +1354,10 @@ static int recover_prepare(struct recover_control *rc, char *path) ret = btrfs_read_dev_super(fd, sb, BTRFS_SUPER_INFO_OFFSET, 1); if (ret) { - fprintf(stderr, read super block error\n); + if (ret == -ENOENT) + fprintf(stderr, No valid btrfs found\n); + if (ret == -EIO) + fprintf(stderr, Superblock is corrupted\n); goto fail_free_sb; } diff --git a/cmds-filesystem.c b/cmds-filesystem.c index d2e46dc..d58397d 100644 --- a/cmds-filesystem.c +++ b/cmds-filesystem.c @@ -604,9 +604,14 @@ static int cmd_show(int argc, char **argv) } else { ret = dev_to_fsid(search, fsid); if (ret) { - fprintf(stderr, + if (ret == -ENOENT) + fprintf(stderr, ERROR: No btrfs on %s\n, search); + if (ret == -EIO) + fprintf(stderr, + Superblock is corrupted on %s\n, + search); return 1; } uuid_unparse(fsid, uuid_buf); diff --git a/disk-io.c b/disk-io.c index 1bd9fae..5ee7edd 100644 --- a/disk-io.c +++ b/disk-io.c @@ -990,7 +990,11 @@ int btrfs_scan_fs_devices(int fd, const char *path, ret = btrfs_scan_one_device(fd, path, fs_devices, total_devs, sb_bytenr, super_recover); if (ret) { - fprintf(stderr, No valid Btrfs found on %s\n, path); + if (ret == -ENOENT) + fprintf(stderr, No valid Btrfs found on %s\n, path); + if (ret == -EIO) + fprintf(stderr, Superblock is corrupted on %s\n, + path); return ret; } @@ -1101,7 +1105,10 @@ static struct btrfs_fs_info *__open_ctree_fd(int fp, const char *path, else ret = btrfs_read_dev_super(fp, disk_super, sb_bytenr, 0); if (ret) { - printk(No valid btrfs found\n); + if (ret == -ENOENT) + fprintf(stderr, No valid btrfs found\n); + if (ret == -EIO) + fprintf(stderr, Superblock is corrupted\n); goto out_devices; } @@ -1201,11 +1208,11 @@ int btrfs_read_dev_super(int fd, struct btrfs_super_block *sb, u64 sb_bytenr, if (sb_bytenr != BTRFS_SUPER_INFO_OFFSET) { ret =
Re: [PATCH] Btrfs: fix wrong uevent target
Original Message Subject: Re: [PATCH] Btrfs: fix wrong uevent target From: Anand Jain anand.j...@oracle.com To: Chris Mason c...@fb.com Date: 2014年07月04日 01:32 Chris, This fix is theoretically correct but my guess that this would solve problem as reported by Qu Wenruo was wrong [1]. Patch is good to integrate. Thanks, Anand [1] Re: [PATCH RFC] btrfs: Add ctime/mtime update for btrfs device add/remove. Yes, whatever uevent improvement will not solve the problem of 'btrfs dev scan; btrfs dev del; btrfs dev scan'. Since uevent event is send from kernel and received by ueventd, then ueventd goes to update the device file ctime/mtine. The uevent procedure is always asynchronized, so it will not fix the 'btrfs dev scan' libblkid cache problem. Althogh my RFC patch is ugly, it will provide a synchronzied method to update ctime/mtime from kernel. Thank, Qu On 03/07/2014 22:00, Chris Mason wrote: On 07/03/2014 07:09 AM, Miao Xie wrote: CC Anand Jain Sorry, please ignore this patch. Anand wrote the same patch several days ago, so this bug fix belongs to Anand though he NACKed his patch at that time. It certainly looks right, but Anand had mentioned that he had a few questions on testing. I've pulled it out for now, but I'll take Anand's version when you're both happy. -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RESEND 4/9] Btrfs: fix put dio bio twice when we submit dio bio fail
Hi Miao, (2014/07/03 19:22), Miao Xie wrote: The caller of btrfs_submit_direct_hook() will put the original dio bio when btrfs_submit_direct_hook() return a error number, so we needn't put the original bio in btrfs_submit_direct_hook(). Signed-off-by: Miao Xie mi...@cn.fujitsu.com Reviewed-by: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com Here is the review result, CMIIW. call trace: btrfs_submit_direct - btrfs_submit_direct_hook fs/btrfs/inode.c: === static int btrfs_submit_direct_hook(int rw, struct btrfs_dio_private *dip, int skip_sum) { ... struct bio *orig_bio = dip-orig_bio; ... map_length = orig_bio-bi_iter.bi_size; ret = btrfs_map_block(root-fs_info, rw, start_sector 9, map_length, NULL, 0); if (ret) { bio_put(orig_bio); # (1) return -EIO; } ... } ... static void btrfs_submit_direct(int rw, struct bio *dio_bio, struct inode *inode, loff_t file_offset) { dip = kmalloc(sizeof(*dip) + sum_len, GFP_NOFS); if (!dip) { ret = -ENOMEM; goto free_io_bio; # (2) } ... ret = btrfs_submit_direct_hook(rw, dip, skip_sum); if (!ret) return; free_io_bio: bio_put(io_bio); # (3) ... } === If btrfs_map_block() fails in btrfs_submit_direct_hook(), it put orig_bio at (1) and return -EIO. Then caller, btrfs_submit_direct() free the same bio at (3). Since (3) is also used for other error handling (2), I consider your way, removing (1), is better. Thanks, Satoru --- fs/btrfs/inode.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index a616fa4..15902eb 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -7325,10 +7325,8 @@ static int btrfs_submit_direct_hook(int rw, struct btrfs_dio_private *dip, map_length = orig_bio-bi_iter.bi_size; ret = btrfs_map_block(root-fs_info, rw, start_sector 9, map_length, NULL, 0); - if (ret) { - bio_put(orig_bio); + if (ret) return -EIO; - } if (map_length = orig_bio-bi_iter.bi_size) { bio = orig_bio; @@ -7345,6 +7343,7 @@ static int btrfs_submit_direct_hook(int rw, struct btrfs_dio_private *dip, bio = btrfs_dio_bio_alloc(orig_bio-bi_bdev, start_sector, GFP_NOFS); if (!bio) return -ENOMEM; + bio-bi_private = dip; bio-bi_end_io = btrfs_end_dio_bio; atomic_inc(dip-pending_bios); -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs-progs: add mount status check for btrfs-image
On Thu, 2014-07-03 at 18:58 +0200, David Sterba wrote: On Thu, Jul 03, 2014 at 10:06:34AM +0800, Gui Hecheng wrote: The btrfs-image tool should not be run on a mounted filesystem. Should not, but for some values of sometimes it makes sense, eg. capturing image of an otherwise quiescent filesystem, a read-only mount or after a crash. Ah, read-only mount is really a case. This utility is used for debugging so I'd prefer to let the user do as he likes, though printing the warning about the mount status is a good improvement. I agree, then I'll keep the check_mounted and just give a prompt and let it continue. The undergoing fs operations may change what you have imaged a while ago, this makes the image meanmingless. I'm not familiar with the image format, but maybe we can set a bit in the header when the filesystem was not captured cleanly. Hmm...This is a point, I think I'll give it a try in another patch. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] btrfs-progs: add ask_user confirmation for btrfstune clear seeding flag
On Thu, 2014-07-03 at 18:51 +0200, David Sterba wrote: On Thu, Jul 03, 2014 at 10:06:33AM +0800, Gui Hecheng wrote: Clear the seeding flag may cause the original filesystem to be writable, which is dangerous. Can you please describe the dangerous scenario a bit more? This would also go to the documentation so it's not only to satisfy my curiosity. Yes, I'll include a certain scenario in the changelog of a v2 patch. Dropping the seeding flag could be dangerous if the filesystem starts in seeding mode, a new device is added, some writes are done, then filesystem is unmounted. Now it's a 2 device filesystem, where the orignal holds some data and without the seeding flag it would accept new writes. Still ok for me, though this is probably the time where some user assumptions may break. In this case, add user confirmation check when clearing seeding flag. Also warn the user that the fs is in a dangerous condition when the seeding flag is cleared if it it forced to. The -y option is tied only to the seeding option, but it should IMO be more general and called --force. I agree. Signed-off-by: Gui Hecheng guihc.f...@cn.fujitsu.com --- btrfstune.c | 24 +++- 1 file changed, 23 insertions(+), 1 deletion(-) diff --git a/btrfstune.c b/btrfstune.c index 3f2f0cd..0e18088 100644 --- a/btrfstune.c +++ b/btrfstune.c @@ -103,6 +104,7 @@ static void print_usage(void) fprintf(stderr, \t-S value\tpositive value will enable seeding, zero to disable, negative is not allowed\n); fprintf(stderr, \t-r \t\tenable extended inode refs\n); fprintf(stderr, \t-x \t\tenable skinny metadata extent refs\n); + fprintf(stderr, \t-y \t\tsay yes to clear the seeding flag, make sure that you are aware of the danger\n); The help text could say someting like --force\tallow dangerous changes\n btrfstune only allows setting the bit for extref and skinny-metadata, unsetting would be dangerous as well. On my part, I don't find any scenarioes for these two, could you please remind me more? -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Quota Ignored On write
basing of the latest for-linus branch i found i can write way more than the quota btrfs quota enable btrfs subvolume create test btrfs qgruop limit 1G test dd if=/dev/zero of=test/file bs=1024 count=150 output: 150+0 records in 150+0 records out 153600 bytes (1.5 GB) copied, 5.91909 s, 259 MB/s thats a full half gig over the quota limit. I noticed some changes to the quota accounting in the logs, what changed that could cause this? -Kevin Brandstatter -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Quota Ignored On write
Hi Kevin, (2014/07/04 11:13), Kevin Brandstatter wrote: basing of the latest for-linus branch i found i can write way more than the quota btrfs quota enable btrfs subvolume create test btrfs qgruop limit 1G test dd if=/dev/zero of=test/file bs=1024 count=150 output: 150+0 records in 150+0 records out 153600 bytes (1.5 GB) copied, 5.91909 s, 259 MB/s thats a full half gig over the quota limit. I noticed some changes to the quota accounting in the logs, what changed that could cause this? Do you remember what kernel version quota worked correctly? Thanks, Satoru -Kevin Brandstatter -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Quota Ignored On write
3.15.3 via arch/ and from linux-git -Kevin On 07/03/2014 09:21 PM, Satoru Takeuchi wrote: Hi Kevin, (2014/07/04 11:13), Kevin Brandstatter wrote: basing of the latest for-linus branch i found i can write way more than the quota btrfs quota enable btrfs subvolume create test btrfs qgruop limit 1G test dd if=/dev/zero of=test/file bs=1024 count=150 output: 150+0 records in 150+0 records out 153600 bytes (1.5 GB) copied, 5.91909 s, 259 MB/s thats a full half gig over the quota limit. I noticed some changes to the quota accounting in the logs, what changed that could cause this? Do you remember what kernel version quota worked correctly? Thanks, Satoru -Kevin Brandstatter -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Quota Ignored On write
(2014/07/04 11:25), Kevin Brandstatter wrote: 3.15.3 via arch/ and from linux-git OK, I'll bisect it. Satoru -Kevin On 07/03/2014 09:21 PM, Satoru Takeuchi wrote: Hi Kevin, (2014/07/04 11:13), Kevin Brandstatter wrote: basing of the latest for-linus branch i found i can write way more than the quota btrfs quota enable btrfs subvolume create test btrfs qgruop limit 1G test dd if=/dev/zero of=test/file bs=1024 count=150 output: 150+0 records in 150+0 records out 153600 bytes (1.5 GB) copied, 5.91909 s, 259 MB/s thats a full half gig over the quota limit. I noticed some changes to the quota accounting in the logs, what changed that could cause this? Do you remember what kernel version quota worked correctly? Thanks, Satoru -Kevin Brandstatter -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Quota Ignored On write
hmm, is it possible that btrfs is doing some deduplication or compression? The expected behavior works fine with small quotas like 10/20MB but at 1GB i can overwrite quite a bit from /dev/zero I also tried to dd from /dev/urandom (to get some variety other than zeros) dd if=/dev/urandom of=meow bs=1024 count=150 output: dd: error writing ‘meow’: Disk quota exceeded 1163330+0 records in 1163329+0 records out 1191248896 bytes (1.2 GB) copied, 110.25 s, 10.8 MB/s So it looks like its stopping the write, but with a 1GB quota, thats a 20% over quota -Kevin On 07/03/2014 09:21 PM, Satoru Takeuchi wrote: Hi Kevin, (2014/07/04 11:13), Kevin Brandstatter wrote: basing of the latest for-linus branch i found i can write way more than the quota btrfs quota enable btrfs subvolume create test btrfs qgruop limit 1G test dd if=/dev/zero of=test/file bs=1024 count=150 output: 150+0 records in 150+0 records out 153600 bytes (1.5 GB) copied, 5.91909 s, 259 MB/s thats a full half gig over the quota limit. I noticed some changes to the quota accounting in the logs, what changed that could cause this? Do you remember what kernel version quota worked correctly? Thanks, Satoru -Kevin Brandstatter -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 3.15.1: kernel BUG at fs/btrfs/locking.c:269
On Thu, Jul 03, 2014 at 06:44:21AM -0700, Marc MERLIN wrote: Thanks for the patch. Hopefully this will make it to the next 3.15.x kernel. I also went back to 3.14 anyway since the 'blocked for 120 seconds' look like another instance of deadlocks we've been discussing here. But just curious: [160562.925463] parent transid verify failed on 2776298520576 wanted 41015 found 18120 What should I be doing about this? Does it mean that I do have some kind of corruption/damage on my filesystem? If there is another copy for the block(RAID1, DUP, RAID5/6), it'd try to read the copy and repair the crc with the good one, it's all we can do about it. Also, is it possible to have all these messages state which devid they occurred on? I don't even know which device I should be worrying about right now, and although I'm running scrub now, my understanding is that scrub doesn't actually look at FS structures and is likely to miss this anyway. Yes we can but it'd need a bit more effort, for now, all device msg we've seen in panic info comes from sb-s_id which points to @fs_info-latest_device. thanks, -liubo Thanks, Marc -- A mouse is a device used to point at the xterm you want to type in - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 1024R/763BE901 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 9/9] Btrfs: fix writing data into the seed filesystem
On Thu, Jul 03, 2014 at 06:22:13PM +0800, Miao Xie wrote: If we mounted a seed filesystem with degraded option, and then added a new device into the seed filesystem, then we found adding device failed because of the IO failure. Steps to reproduce: # mkfs.btrfs -d raid1 -m raid1 dev0 dev1 # btrfstune -S 1 dev0 # mount dev0 -o degraded mnt # btrfs device add -f dev2 mnt It is because the original didn't set the chunk on the seed device to be read-only if the degraded flag was set. It was introduced by patch f48b90756, which fixed the problem the raid1 filesystem became read-only after one device of it was missing. But this fix method was not right, we should set the read-only flag according to the number of the missing devices, not the degraded mount option, if the number of the missing devices is less than the max error number that the profile of the chunk tolerates, we don't set it to be read-only. Reviewed-by: Liu Bo bo.li@oracle.com -liubo Cc: Josef Bacik jba...@fb.com Signed-off-by: Miao Xie mi...@cn.fujitsu.com --- fs/btrfs/volumes.c | 52 1 file changed, 36 insertions(+), 16 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 73a82e5..daecfa5 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -4584,12 +4584,31 @@ out: return ret; } +static inline int btrfs_chunk_max_errors(struct map_lookup *map) +{ + int max_errors; + + if (map-type (BTRFS_BLOCK_GROUP_RAID1 | + BTRFS_BLOCK_GROUP_RAID10 | + BTRFS_BLOCK_GROUP_RAID5 | + BTRFS_BLOCK_GROUP_DUP)) { + max_errors = 1; + } else if (map-type BTRFS_BLOCK_GROUP_RAID6) { + max_errors = 2; + } else { + max_errors = 0; + } + + return max_errors; +} + int btrfs_chunk_readonly(struct btrfs_root *root, u64 chunk_offset) { struct extent_map *em; struct map_lookup *map; struct btrfs_mapping_tree *map_tree = root-fs_info-mapping_tree; int readonly = 0; + int miss_ndevs = 0; int i; read_lock(map_tree-map_tree.lock); @@ -4598,18 +4617,27 @@ int btrfs_chunk_readonly(struct btrfs_root *root, u64 chunk_offset) if (!em) return 1; - if (btrfs_test_opt(root, DEGRADED)) { - free_extent_map(em); - return 0; - } - map = (struct map_lookup *)em-bdev; for (i = 0; i map-num_stripes; i++) { + if (map-stripes[i].dev-missing) { + miss_ndevs++; + continue; + } + if (!map-stripes[i].dev-writeable) { readonly = 1; - break; + goto end; } } + + /* + * If the number of missing devices is larger than max errors, + * we can not write the data into that chunk successfully, so + * set it readonly. + */ + if (miss_ndevs btrfs_chunk_max_errors(map)) + readonly = 1; +end: free_extent_map(em); return readonly; } @@ -5220,16 +5248,8 @@ static int __btrfs_map_block(struct btrfs_fs_info *fs_info, int rw, } } - if (rw (REQ_WRITE | REQ_GET_READ_MIRRORS)) { - if (map-type (BTRFS_BLOCK_GROUP_RAID1 | - BTRFS_BLOCK_GROUP_RAID10 | - BTRFS_BLOCK_GROUP_RAID5 | - BTRFS_BLOCK_GROUP_DUP)) { - max_errors = 1; - } else if (map-type BTRFS_BLOCK_GROUP_RAID6) { - max_errors = 2; - } - } + if (rw (REQ_WRITE | REQ_GET_READ_MIRRORS)) + max_errors = btrfs_chunk_max_errors(map); if (dev_replace_is_ongoing (rw (REQ_WRITE | REQ_DISCARD)) dev_replace-tgtdev != NULL) { -- 1.9.3 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Quota Ignored On write
Hi Chris and Kevin, On 07/03/2014 09:21 PM, Satoru Takeuchi wrote: Hi Kevin, (2014/07/04 11:13), Kevin Brandstatter wrote: basing of the latest for-linus branch i found i can write way more than the quota btrfs quota enable btrfs subvolume create test btrfs qgruop limit 1G test dd if=/dev/zero of=test/file bs=1024 count=150 output: 150+0 records in 150+0 records out 153600 bytes (1.5 GB) copied, 5.91909 s, 259 MB/s thats a full half gig over the quota limit. I noticed some changes to the quota accounting in the logs, what changed that could cause this? Do you remember what kernel version quota worked correctly? (2014/07/04 11:32), Satoru Takeuchi wrote: (2014/07/04 11:25), Kevin Brandstatter wrote: 3.15.3 via arch/ and from linux-git OK, I'll bisect it. I made the following reproducer based on your operation. It succeeded with 3.15 and failed with 3.16-rc3. So, the problematic patch is not in mason/for-linux branch, but in somewhere between 3.15 and 3.16-rc3. Please wait for a while to finish my bisect... === #!/bin/bash -x TEST_DEV=/dev/vdb TEST_MNT=/home/sat/mnt umount $TEST_MNT mkfs.btrfs -f $TEST_DEV mount $TEST_DEV $TEST_MNT btrfs quota enable $TEST_MNT SUBVOLPATH=$TEST_MNT/quota_test LIMIT=$((1024*100)) btrfs subvolume create $SUBVOLPATH btrfs qgroup limit $LIMIT $SUBVOLPATH TESTFILE=$SUBVOLPATH/test dd if=/dev/zero of=$TESTFILE bs=1024 count=$(($LIMIT*3/2/1024)) SIZE=$(($(ls -s $TESTFILE | awk '{print $1}')*1024)) RET=0 if [ $SIZE -le $LIMIT ] ; then echo [PASS] quota works correctly 2 else echo [FAIL] quota doesn't work 2 RET=1 fi exit $RET === Thanks, Satoru Thanks, Satoru -Kevin Brandstatter -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 3.15.1: kernel BUG at fs/btrfs/locking.c:269
On Fri, Jul 04, 2014 at 11:07:22AM +0800, Liu Bo wrote: [160562.925463] parent transid verify failed on 2776298520576 wanted 41015 found 18120 What should I be doing about this? Does it mean that I do have some kind of corruption/damage on my filesystem? If there is another copy for the block(RAID1, DUP, RAID5/6), it'd try to read the copy and repair the crc with the good one, it's all we can do about it. Right. It's not quite my question though. I mean I don't know what device it's on, never mind what file is affected. If I know which file is corrupted, I can simply delete it and restore from backup, no biggie. Right now I don't even know which one of my 3 btrfs filesystems (over 10TB) has this problem. That makes the message kind of problematic: you have a problem, but not I'm not giving you any fighting chance of finding out where :) Also, is it possible to have all these messages state which devid they occurred on? I don't even know which device I should be worrying about right now, and although I'm running scrub now, my understanding is that scrub doesn't actually look at FS structures and is likely to miss this anyway. Yes we can but it'd need a bit more effort, for now, all device msg we've seen in panic info comes from sb-s_id which points to @fs_info-latest_device. Food for though, as is the message is unfortunately close to useless, except to an FS developer with a system that has only one btrfs filesystem. On Fri, Jul 04, 2014 at 11:50:25AM +0800, Wang Shilong wrote: I am afraid, scrub maybe could not fix such kind of errors, all scrub doing is to verify whether checksums match and if possible use good mirrors to rewrite bad one. I wouldn't be bothered if scrub can't fix it, but it would be good if it could tell me. Such errors seem imply contention itself is corrupted, we may have passed checksum check after ending io, but we fail generation check afterwards. So should I really replace scrub with find / -type f -print0 | xargs grep . /dev/null ? Basically we need something that will scan the filesystem and ensure that all files are reachable correctly without causing filesystem problems, and if one is bad, output the name of the bad file(s). Scrub only does a half job of that it seems. To get physical device name, we still need mirror num to know which device we are locating. Ok, so it's missing for now and therefore the code can't easily report it, I understand. Well, I explained the problem, ext4 and others of course tell me which devid an error is on, hopefully btrfs will able to do so in the near future. Back to the original problem, would you agree that find / -type f -print0 | xargs grep . /dev/nul? may do a better job scanning the entire FS for problems than scrub would? Thanks, Marc -- A mouse is a device used to point at the xterm you want to type in - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Is btrfs related to OOM death problems on my 8GB server with both 3.15.1 and 3.14?
On Thu, 3 Jul 2014 18:19:38 Marc MERLIN wrote: I upgraded my server from 3.14 to 3.15.1 last week, and since then it's been running out of memory and deadlocking (panic= doesn't even work). I downgraded back to 3.14, but I already had the problem once since then. Is there any correlation between such problems and BTRFS operations such as creating snapshots or running a scrub/balance? Back in ~3.10 days I had serious problems with BTRFS memory use when removing multiple snapshots or balancing. But at about 3.13 they all seemed to get fixed. I usually didn't have a kernel panic when I had such problems (although I sometimes had a system lock up solid such that I couldn't even determine what it's problem was). Usually the Oom handler started killing big processes such as chromium when it shouldn't have needed to. Note that I haven't verified that the BTRFS memory use is reasonable in all such situations. Merely that it doesn't use enough to kill my systems. -- My Main Blog http://etbe.coker.com.au/ My Documents Bloghttp://doc.coker.com.au/ -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Quota Ignored On write
Hi Josef, Chris, Kevin, (2014/07/04 12:25), Satoru Takeuchi wrote: Hi Chris and Kevin, On 07/03/2014 09:21 PM, Satoru Takeuchi wrote: Hi Kevin, (2014/07/04 11:13), Kevin Brandstatter wrote: basing of the latest for-linus branch i found i can write way more than the quota btrfs quota enable btrfs subvolume create test btrfs qgruop limit 1G test dd if=/dev/zero of=test/file bs=1024 count=150 output: 150+0 records in 150+0 records out 153600 bytes (1.5 GB) copied, 5.91909 s, 259 MB/s thats a full half gig over the quota limit. I noticed some changes to the quota accounting in the logs, what changed that could cause this? Do you remember what kernel version quota worked correctly? (2014/07/04 11:32), Satoru Takeuchi wrote: (2014/07/04 11:25), Kevin Brandstatter wrote: 3.15.3 via arch/ and from linux-git OK, I'll bisect it. I made the following reproducer based on your operation. It succeeded with 3.15 and failed with 3.16-rc3. So, the problematic patch is not in mason/for-linux branch, but in somewhere between 3.15 and 3.16-rc3. Please wait for a while to finish my bisect... I bisected and found the bad commit is the following patch. === commit fcebe4562dec83b3f8d3088d77584727b09130b2 Author: Josef Bacik jba...@fb.com Date: Tue May 13 17:30:47 2014 -0700 Btrfs: rework qgroup accounting === Josef, please take a look at this patch. Thanks, Satoru === #!/bin/bash -x TEST_DEV=/dev/vdb TEST_MNT=/home/sat/mnt umount $TEST_MNT mkfs.btrfs -f $TEST_DEV mount $TEST_DEV $TEST_MNT btrfs quota enable $TEST_MNT SUBVOLPATH=$TEST_MNT/quota_test LIMIT=$((1024*100)) btrfs subvolume create $SUBVOLPATH btrfs qgroup limit $LIMIT $SUBVOLPATH TESTFILE=$SUBVOLPATH/test dd if=/dev/zero of=$TESTFILE bs=1024 count=$(($LIMIT*3/2/1024)) SIZE=$(($(ls -s $TESTFILE | awk '{print $1}')*1024)) RET=0 if [ $SIZE -le $LIMIT ] ; then echo [PASS] quota works correctly 2 else echo [FAIL] quota doesn't work 2 RET=1 fi exit $RET === Thanks, Satoru Thanks, Satoru -Kevin Brandstatter -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 3.15.1: kernel BUG at fs/btrfs/locking.c:269
On 07/04/2014 12:11 PM, Marc MERLIN wrote: On Fri, Jul 04, 2014 at 11:07:22AM +0800, Liu Bo wrote: [160562.925463] parent transid verify failed on 2776298520576 wanted 41015 found 18120 What should I be doing about this? Does it mean that I do have some kind of corruption/damage on my filesystem? If there is another copy for the block(RAID1, DUP, RAID5/6), it'd try to read the copy and repair the crc with the good one, it's all we can do about it. Right. It's not quite my question though. I mean I don't know what device it's on, never mind what file is affected. If I know which file is corrupted, I can simply delete it and restore from backup, no biggie. Right now I don't even know which one of my 3 btrfs filesystems (over 10TB) has this problem. That makes the message kind of problematic: you have a problem, but not I'm not giving you any fighting chance of finding out where :) Also, is it possible to have all these messages state which devid they occurred on? I don't even know which device I should be worrying about right now, and although I'm running scrub now, my understanding is that scrub doesn't actually look at FS structures and is likely to miss this anyway. Yes we can but it'd need a bit more effort, for now, all device msg we've seen in panic info comes from sb-s_id which points to @fs_info-latest_device. Food for though, as is the message is unfortunately close to useless, except to an FS developer with a system that has only one btrfs filesystem. On Fri, Jul 04, 2014 at 11:50:25AM +0800, Wang Shilong wrote: I am afraid, scrub maybe could not fix such kind of errors, all scrub doing is to verify whether checksums match and if possible use good mirrors to rewrite bad one. I wouldn't be bothered if scrub can't fix it, but it would be good if it could tell me. Such errors seem imply contention itself is corrupted, we may have passed checksum check after ending io, but we fail generation check afterwards. So should I really replace scrub with find / -type f -print0 | xargs grep . /dev/null ? Basically we need something that will scan the filesystem and ensure that all files are reachable correctly without causing filesystem problems, and if one is bad, output the name of the bad file(s). Scrub only does a half job of that it seems. To get physical device name, we still need mirror num to know which device we are locating. Ok, so it's missing for now and therefore the code can't easily report it, I understand. Well, I explained the problem, ext4 and others of course tell me which devid an error is on, hopefully btrfs will able to do so in the near future. So it is ok for you to print one of btrfs filesystem device(for example device name) ? maybe it is not really physical address the metadata locates in, this is easier. Back to the original problem, would you agree that find / -type f -print0 | xargs grep . /dev/nul? may do a better job scanning the entire FS for problems than scrub would? Thanks, Marc -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 3.15.1: kernel BUG at fs/btrfs/locking.c:269
On 07/04/2014 11:07 AM, Liu Bo wrote: On Thu, Jul 03, 2014 at 06:44:21AM -0700, Marc MERLIN wrote: Thanks for the patch. Hopefully this will make it to the next 3.15.x kernel. I also went back to 3.14 anyway since the 'blocked for 120 seconds' look like another instance of deadlocks we've been discussing here. But just curious: [160562.925463] parent transid verify failed on 2776298520576 wanted 41015 found 18120 What should I be doing about this? Does it mean that I do have some kind of corruption/damage on my filesystem? If there is another copy for the block(RAID1, DUP, RAID5/6), it'd try to read the copy and repair the crc with the good one, it's all we can do about it. Also, is it possible to have all these messages state which devid they occurred on? I don't even know which device I should be worrying about right now, and although I'm running scrub now, my understanding is that scrub doesn't actually look at FS structures and is likely to miss this anyway. Yes we can but it'd need a bit more effort, for now, all device msg we've seen in panic info comes from sb-s_id which points to @fs_info-latest_device. You means something like this: + printk_ratelimited(BTRFS (device: %s) parent transid verify failed on %llu wanted %llu found %llu\n, + eb-fs_info-sb-s_id, eb-start, + parent_transid, btrfs_header_generation(eb)); thanks, -liubo Thanks, Marc -- A mouse is a device used to point at the xterm you want to type in - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 1024R/763BE901 . -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html