date:20140703

On Wed, Jul 02, 2014 at 01:41:52PM -0700, Marc MERLIN wrote:
 This got triggered by an rsync I think. I'm not sure which of my btrfs FS
 has the issue yet since BUG_ON isn't very helpful as discussed earlier.
 
 [160562.925463] parent transid verify failed on 2776298520576 wanted 41015 
 found 18120
 [160562.950297] [ cut here ]
 [160562.965904] kernel BUG at fs/btrfs/locking.c:269!
 
 But shouldn't messages like 'parent transid verify failed' print which device 
 this happened on to give the operator a hint on where the problem is?
 
 Could someone do a pass at those and make sure they all print the device
 ID/name?
 
 Bug below:
 
 Full log before the crash:
 INFO: task btrfs-transacti:3358 blocked for more than 120 seconds.
   Not tainted 3.15.1-amd64-i915-preempt-20140216jbp #1
 echo 0  /proc/sys/kernel/hung_task_timeout_secs disables this message.
 btrfs-transacti D  0  3358  2 0x
  8800c50ebc50 0046 8800c50ebc20 8800c50ebfd8
  8800c6914390 000141c0 7fff 8801433b8f10
  0002 8161c9b0 7fff 8800c50ebc60
 Call Trace:
  [8161c9b0] ? sock_rps_reset_flow+0x32/0x32
  [8161d3c6] schedule+0x73/0x75
  [8161c9e9] schedule_timeout+0x39/0x129
  [8107653d] ? get_parent_ip+0xd/0x3c
  [8162338f] ? preempt_count_add+0x7a/0x8d
  [8161dbac] __wait_for_common+0x11a/0x159
  [8107810f] ? wake_up_state+0x12/0x12
  [8161dc0f] wait_for_completion+0x24/0x26
  [81237ce6] btrfs_wait_and_free_delalloc_work+0x16/0x28
  [8123fd3a] btrfs_run_ordered_operations+0x1e7/0x21e
  [81229aa4] btrfs_flush_all_pending_stuffs+0x4e/0x55
  [8122b25a] btrfs_commit_transaction+0x20d/0x8b0
  [81227b41] transaction_kthread+0xf8/0x1ab
  [81227a49] ? btrfs_cleanup_transaction+0x44c/0x44c
  [8106b4b4] kthread+0xae/0xb6
  [8106b406] ? __kthread_parkme+0x61/0x61
  [8162667c] ret_from_fork+0x7c/0xb0
  [8106b406] ? __kthread_parkme+0x61/0x61
 INFO: task kworker/u8:13:13157 blocked for more than 120 seconds.
   Not tainted 3.15.1-amd64-i915-preempt-20140216jbp #1
 echo 0  /proc/sys/kernel/hung_task_timeout_secs disables this message.
 kworker/u8:13   D  0 13157  2 0x0080
 Workqueue: btrfs-flush_delalloc normal_work_helper
  8800041cfc00 0046 8800041cfbd0 8800041cffd8
  8800034f40d0 000141c0 88021f2941c0 8800034f40d0
  8800041cfca0 810fdc2f 0002 8800041cfc10
 Call Trace:
  [810fdc2f] ? wait_on_page_read+0x3c/0x3c
  [8161d3c6] schedule+0x73/0x75
  [8161d56b] io_schedule+0x60/0x7a
  [810fdc3d] sleep_on_page+0xe/0x12
  [8161d7fd] __wait_on_bit+0x48/0x7a
  [810fdbdd] wait_on_page_bit+0x7a/0x7c
  [81084821] ? autoremove_wake_function+0x34/0x34
  [810fef03] filemap_fdatawait_range+0x7e/0x126
  [8122f1cf] ? btrfs_submit_direct+0x3f4/0x3f4
  [8122d7aa] ? btrfs_writepages+0x28/0x2a
  [811084c6] ? do_writepages+0x1e/0x2c
  [810ff38e] ? __filemap_fdatawrite_range+0x55/0x57
  [8124006f] btrfs_wait_ordered_range+0x6a/0x11a
  [8122fe01] btrfs_run_delalloc_work+0x27/0x69
  [812508db] normal_work_helper+0xfe/0x240
  [81065d7e] process_one_work+0x195/0x2d2
  [81066020] worker_thread+0x136/0x205
  [81065eea] ? process_scheduled_works+0x2f/0x2f
  [8106b4b4] kthread+0xae/0xb6
  [8106b406] ? __kthread_parkme+0x61/0x61
  [8162667c] ret_from_fork+0x7c/0xb0
  [8106b406] ? __kthread_parkme+0x61/0x61
 INFO: task btrfs-transacti:3358 blocked for more than 120 seconds.
   Not tainted 3.15.1-amd64-i915-preempt-20140216jbp #1
 echo 0  /proc/sys/kernel/hung_task_timeout_secs disables this message.
 btrfs-transacti D  0  3358  2 0x
  8800c50ebc50 0046 8800c50ebc20 8800c50ebfd8
  8800c6914390 000141c0 7fff 8801433b8f10
  0002 8161c9b0 7fff 8800c50ebc60
 Call Trace:
  [8161c9b0] ? sock_rps_reset_flow+0x32/0x32
  [8161d3c6] schedule+0x73/0x75
  [8161c9e9] schedule_timeout+0x39/0x129
  [8107653d] ? get_parent_ip+0xd/0x3c
  [8162338f] ? preempt_count_add+0x7a/0x8d
  [8161dbac] __wait_for_common+0x11a/0x159
  [8107810f] ? wake_up_state+0x12/0x12
  [8161dc0f] wait_for_completion+0x24/0x26
  [81237ce6] btrfs_wait_and_free_delalloc_work+0x16/0x28
  [8123fd3a] btrfs_run_ordered_operations+0x1e7/0x21e
  [81229aa4] btrfs_flush_all_pending_stuffs+0x4e/0x55
  [8122b25a] btrfs_commit_transaction+0x20d/0x8b0
  [81227b41] transaction_kthread+0xf8/0x1ab
  [81227a49] ? btrfs_cleanup_transaction+0x44c/0x44c

Re: 3.15.1: kernel BUG at fs/btrfs/locking.c:269

2014-07-03 Thread Wang Shilong


On 07/03/2014 04:13 PM, Liu Bo wrote:

On Wed, Jul 02, 2014 at 01:41:52PM -0700, Marc MERLIN wrote:

This got triggered by an rsync I think. I'm not sure which of my btrfs FS
has the issue yet since BUG_ON isn't very helpful as discussed earlier.

[160562.925463] parent transid verify failed on 2776298520576 wanted 41015 
found 18120
[160562.950297] [ cut here ]
[160562.965904] kernel BUG at fs/btrfs/locking.c:269!

But shouldn't messages like 'parent transid verify failed' print which device
this happened on to give the operator a hint on where the problem is?

Could someone do a pass at those and make sure they all print the device
ID/name?

Bug below:

Full log before the crash:
INFO: task btrfs-transacti:3358 blocked for more than 120 seconds.
   Not tainted 3.15.1-amd64-i915-preempt-20140216jbp #1
echo 0  /proc/sys/kernel/hung_task_timeout_secs disables this message.
btrfs-transacti D  0  3358  2 0x
  8800c50ebc50 0046 8800c50ebc20 8800c50ebfd8
  8800c6914390 000141c0 7fff 8801433b8f10
  0002 8161c9b0 7fff 8800c50ebc60
Call Trace:
  [8161c9b0] ? sock_rps_reset_flow+0x32/0x32
  [8161d3c6] schedule+0x73/0x75
  [8161c9e9] schedule_timeout+0x39/0x129
  [8107653d] ? get_parent_ip+0xd/0x3c
  [8162338f] ? preempt_count_add+0x7a/0x8d
  [8161dbac] __wait_for_common+0x11a/0x159
  [8107810f] ? wake_up_state+0x12/0x12
  [8161dc0f] wait_for_completion+0x24/0x26
  [81237ce6] btrfs_wait_and_free_delalloc_work+0x16/0x28
  [8123fd3a] btrfs_run_ordered_operations+0x1e7/0x21e
  [81229aa4] btrfs_flush_all_pending_stuffs+0x4e/0x55
  [8122b25a] btrfs_commit_transaction+0x20d/0x8b0
  [81227b41] transaction_kthread+0xf8/0x1ab
  [81227a49] ? btrfs_cleanup_transaction+0x44c/0x44c
  [8106b4b4] kthread+0xae/0xb6
  [8106b406] ? __kthread_parkme+0x61/0x61
  [8162667c] ret_from_fork+0x7c/0xb0
  [8106b406] ? __kthread_parkme+0x61/0x61
INFO: task kworker/u8:13:13157 blocked for more than 120 seconds.
   Not tainted 3.15.1-amd64-i915-preempt-20140216jbp #1
echo 0  /proc/sys/kernel/hung_task_timeout_secs disables this message.
kworker/u8:13   D  0 13157  2 0x0080
Workqueue: btrfs-flush_delalloc normal_work_helper
  8800041cfc00 0046 8800041cfbd0 8800041cffd8
  8800034f40d0 000141c0 88021f2941c0 8800034f40d0
  8800041cfca0 810fdc2f 0002 8800041cfc10
Call Trace:
  [810fdc2f] ? wait_on_page_read+0x3c/0x3c
  [8161d3c6] schedule+0x73/0x75
  [8161d56b] io_schedule+0x60/0x7a
  [810fdc3d] sleep_on_page+0xe/0x12
  [8161d7fd] __wait_on_bit+0x48/0x7a
  [810fdbdd] wait_on_page_bit+0x7a/0x7c
  [81084821] ? autoremove_wake_function+0x34/0x34
  [810fef03] filemap_fdatawait_range+0x7e/0x126
  [8122f1cf] ? btrfs_submit_direct+0x3f4/0x3f4
  [8122d7aa] ? btrfs_writepages+0x28/0x2a
  [811084c6] ? do_writepages+0x1e/0x2c
  [810ff38e] ? __filemap_fdatawrite_range+0x55/0x57
  [8124006f] btrfs_wait_ordered_range+0x6a/0x11a
  [8122fe01] btrfs_run_delalloc_work+0x27/0x69
  [812508db] normal_work_helper+0xfe/0x240
  [81065d7e] process_one_work+0x195/0x2d2
  [81066020] worker_thread+0x136/0x205
  [81065eea] ? process_scheduled_works+0x2f/0x2f
  [8106b4b4] kthread+0xae/0xb6
  [8106b406] ? __kthread_parkme+0x61/0x61
  [8162667c] ret_from_fork+0x7c/0xb0
  [8106b406] ? __kthread_parkme+0x61/0x61
INFO: task btrfs-transacti:3358 blocked for more than 120 seconds.
   Not tainted 3.15.1-amd64-i915-preempt-20140216jbp #1
echo 0  /proc/sys/kernel/hung_task_timeout_secs disables this message.
btrfs-transacti D  0  3358  2 0x
  8800c50ebc50 0046 8800c50ebc20 8800c50ebfd8
  8800c6914390 000141c0 7fff 8801433b8f10
  0002 8161c9b0 7fff 8800c50ebc60
Call Trace:
  [8161c9b0] ? sock_rps_reset_flow+0x32/0x32
  [8161d3c6] schedule+0x73/0x75
  [8161c9e9] schedule_timeout+0x39/0x129
  [8107653d] ? get_parent_ip+0xd/0x3c
  [8162338f] ? preempt_count_add+0x7a/0x8d
  [8161dbac] __wait_for_common+0x11a/0x159
  [8107810f] ? wake_up_state+0x12/0x12
  [8161dc0f] wait_for_completion+0x24/0x26
  [81237ce6] btrfs_wait_and_free_delalloc_work+0x16/0x28
  [8123fd3a] btrfs_run_ordered_operations+0x1e7/0x21e
  [81229aa4] btrfs_flush_all_pending_stuffs+0x4e/0x55
  [8122b25a] btrfs_commit_transaction+0x20d/0x8b0
  [81227b41] transaction_kthread+0xf8/0x1ab
  [81227a49] ? btrfs_cleanup_transaction+0x44c/0x44c

Re: [PATCH v2] Btrfs: fix crash when starting transaction

On Tue, 24 Jun 2014 17:46:58 +0100, Filipe David Borba Manana wrote:
 Often when starting a transaction we commit the currently running transaction,
 which can end up writing block group caches when the current process has its
 journal_info set to NULL (and not to a transaction). This makes our assertion
 at btrfs_check_data_free_space() (current_journal != NULL) fail, resulting
 in a crash/hang. Therefore fix it by setting journal_info.
 
 Two different traces of this issue follow below.
 
 1)
 
 [51502.241936] BTRFS: assertion failed: current-journal_info, file: 
 fs/btrfs/extent-tree.c, line: 3670
 [51502.242213] [ cut here ]
 [51502.242493] kernel BUG at fs/btrfs/ctree.h:3964!
 [51502.242669] invalid opcode:  [#1] SMP DEBUG_PAGEALLOC
 (...)
 [51502.244010] Call Trace:
 [51502.244010]  [a02bc025] 
 btrfs_check_data_free_space+0x395/0x3a0 [btrfs]
 [51502.244010]  [a02c3bdc] 
 btrfs_write_dirty_block_groups+0x4ac/0x640 [btrfs]
 [51502.244010]  [a0357a6a] commit_cowonly_roots+0x164/0x226 
 [btrfs]
 [51502.244010]  [a02d53cd] btrfs_commit_transaction+0x4ed/0xab0 
 [btrfs]
 [51502.244010]  [8168ec7b] ? _raw_spin_unlock+0x2b/0x40
 [51502.244010]  [a02d6259] start_transaction+0x459/0x620 [btrfs]
 [51502.244010]  [a02d67ab] btrfs_start_transaction+0x1b/0x20 
 [btrfs]
 [51502.244010]  [a02d73e1] __unlink_start_trans+0x31/0xe0 
 [btrfs]
 [51502.244010]  [a02dea67] btrfs_unlink+0x37/0xc0 [btrfs]
 [51502.244010]  [811bb054] ? do_unlinkat+0x114/0x2a0
 [51502.244010]  [811baebc] vfs_unlink+0xcc/0x150
 [51502.244010]  [811bb1a0] do_unlinkat+0x260/0x2a0
 [51502.244010]  [811a9ef4] ? filp_close+0x64/0x90
 [51502.244010]  [810aaea6] ? trace_hardirqs_on_caller+0x16/0x1e0
 [51502.244010]  [81349cab] ? trace_hardirqs_on_thunk+0x3a/0x3f
 [51502.244010]  [811be9eb] SyS_unlinkat+0x1b/0x40
 [51502.244010]  [81698452] system_call_fastpath+0x16/0x1b
 [51502.244010] Code: 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55 89 f1 48 
 c7 c2 71 13 36 a0 48 89 fe 31 c0 48 c7 c7 b8 43 36 a0 48 89 e5 e8 5d b0 32 e1 
 0f 0b 0f 1f 44 00 00 55 b9 11 00 00 00 48 89 e5 41 55 49 89 f5
 [51502.244010] RIP  [a03575da] assfail.constprop.88+0x1e/0x20 
 [btrfs]
 
 2)
 
 [25405.097230] BTRFS: assertion failed: current-journal_info, file: 
 fs/btrfs/extent-tree.c, line: 3670
 [25405.097488] [ cut here ]
 [25405.097767] kernel BUG at fs/btrfs/ctree.h:3964!
 [25405.097940] invalid opcode:  [#1] SMP DEBUG_PAGEALLOC
 (...)
 [25405.18] Call Trace:
 [25405.18]  [a02bc025] 
 btrfs_check_data_free_space+0x395/0x3a0 [btrfs]
 [25405.18]  [a02c3bdc] 
 btrfs_write_dirty_block_groups+0x4ac/0x640 [btrfs]
 [25405.18]  [a035755a] commit_cowonly_roots+0x164/0x226 
 [btrfs]
 [25405.18]  [a02d53cd] btrfs_commit_transaction+0x4ed/0xab0 
 [btrfs]
 [25405.18]  [8109c170] ? bit_waitqueue+0xc0/0xc0
 [25405.18]  [a02d6259] start_transaction+0x459/0x620 [btrfs]
 [25405.18]  [a02d67ab] btrfs_start_transaction+0x1b/0x20 
 [btrfs]
 [25405.18]  [a02e3407] btrfs_create+0x47/0x210 [btrfs]
 [25405.18]  [a02d74cc] ? btrfs_permission+0x3c/0x80 [btrfs]
 [25405.18]  [811bc63b] vfs_create+0x9b/0x130
 [25405.18]  [811bcf19] do_last+0x849/0xe20
 [25405.18]  [811b9409] ? link_path_walk+0x79/0x820
 [25405.18]  [811bd5b5] path_openat+0xc5/0x690
 [25405.18]  [810ab07d] ? trace_hardirqs_on+0xd/0x10
 [25405.18]  [811cdcd2] ? __alloc_fd+0x32/0x1d0
 [25405.18]  [811be2a3] do_filp_open+0x43/0xa0
 [25405.18]  [811cddf1] ? __alloc_fd+0x151/0x1d0
 [25405.18]  [811abcfc] do_sys_open+0x13c/0x230
 [25405.18]  [810aaea6] ? trace_hardirqs_on_caller+0x16/0x1e0
 [25405.18]  [811abe12] SyS_open+0x22/0x30
 [25405.18]  [81698452] system_call_fastpath+0x16/0x1b
 [25405.18] Code: 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55 89 f1 48 
 c7 c2 51 13 36 a0 48 89 fe 31 c0 48 c7 c7 d0 43 36 a0 48 89 e5 e8 6d b5 32 e1 
 0f 0b 0f 1f 44 00 00 55 b9 11 00 00 00 48 89 e5 41 55 49 89 f5
 [25405.18] RIP  [a03570ca] assfail.constprop.88+0x1e/0x20 
 [btrfs]
 
 Signed-off-by: Filipe David Borba Manana fdman...@gmail.com
 ---
 
 V2: Removed test for current-journal_info == NULL. At this point it's
 always expected to be NULL.

Reviewed-by: Miao Xie mi...@cn.fujitsu.com

 
  fs/btrfs/transaction.c | 1 +
  1 file changed, 1 insertion(+)
 
 diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
 index ac984a3..614eac3 100644
 --- a/fs/btrfs/transaction.c

Re: [RFC PATCH] Revert btrfs: allow mounting btrfs subvolumes with different ro/rw options

 Original Message 
Subject: Re: [RFC PATCH] Revert btrfs: allow mounting btrfs subvolumes 
with different ro/rw options

From: Tobias Geerinckx-Rice tobias.geerinckx.r...@gmail.com
To: Qu Wenruo quwen...@cn.fujitsu.com
Date: 2014年07月03日 16:06

[List CCd. I hate Gmail.]

Noob alert.

On 3 July 2014 02:28, Qu Wenruo quwen...@cn.fujitsu.com wrote:

Subject: Re: [RFC PATCH] Revert btrfs: allow mounting btrfs subvolumes w=

ith

different ro/rw options
From: Goffredo Baroncelli kreij...@libero.it
To: Qu Wenruo quwen...@cn.fujitsu.com, linux-btrfs@vger.kernel.org
Date: 2014=E5=B9=B407=E6=9C=8803=E6=97=A5 01:48

On 07/01/2014 11:30 AM, Qu Wenruo wrote:

This commit has the following problem:
1) Break the ro mount rule.
When users mount the whole btrfs ro, it is still possible to mount
subvol rw and change the contents. Which make the whole fs ro mount
non-sense.

Where is the problem ? I see an use case when I want a conservative
default: mount all ro except some subvolumes.

In any case it is not a security problem because if the user has the
capability to mount a subvolume, also he has the capability to remount,r=

w

the whole filesystem.

Not security problem but behavior not consistent.
If user mount the whole disk ro, he or she want the fs read only and noth=

ing

will change in it.
If you mount a subvol rw, then the whole disk ro expectation is broken.
Things will change even the whole
disk is readonly.

This assumption seems wrong and untenable if considered from a
different angle: one doesn't mount the whole disk ro, merely the
default subvolume.

# mount -o ro /dev/sda1 /mnt

is merely convenient short-hand for

# mount -o ro,subvol=3D@ [or whatever] /dev/sda1 /mnt

and anyone who expects this to magically protect the whole disk is,
frankly, confused.

Substituting partitions for subvolumes: mounting /dev/sda2 read-only
should have no effect on /dev/sda3.
Even if you went a bit batty and decided to make /dev/sda2 the
default partition:

# ln -sf /dev/sda2 /dev/sda
# mount -o ro /dev/sda /mnt/this/is/silly

syntactic sugar doesn't change anything.

Subvolumes are logically discrete entities, the fact that they share
trees on-disk is merely a (very nice) implementation detail. It is
impossible to mount a whole disk under btrfs.

Oh, sorry for my confusing words.
To make it clear, when mentioning 'the whole disk(or partition 
whatever)' I mean the FS_TREE.

(Of course not the default subvolume)

The problem is that, even you mount a subvolume ro, you can still change 
contents in the subvolume

through its rw parent subvolume.
And if a subvolume can still be modified, the ro mount lose it meaning.

So we need special rules to prevent such things.

Thanks,
Qu

Tobias

The problem also happens when a parent subvol is mounted rw but child sub=

vol

is mounted ro.
User can still modify the child subvol through parent subvol, still broke
the readonly rule.

This makes sense, though.

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 3.15.1: kernel BUG at fs/btrfs/locking.c:269

On Thu, Jul 03, 2014 at 04:20:47PM +0800, Wang Shilong wrote:
 On 07/03/2014 04:13 PM, Liu Bo wrote:
 On Wed, Jul 02, 2014 at 01:41:52PM -0700, Marc MERLIN wrote:
 This got triggered by an rsync I think. I'm not sure which of my btrfs FS
 has the issue yet since BUG_ON isn't very helpful as discussed earlier.
 
 [160562.925463] parent transid verify failed on 2776298520576 wanted 41015 
 found 18120
 [160562.950297] [ cut here ]
 [160562.965904] kernel BUG at fs/btrfs/locking.c:269!
 
 But shouldn't messages like 'parent transid verify failed' print which 
 device
 this happened on to give the operator a hint on where the problem is?
 
 Could someone do a pass at those and make sure they all print the device
 ID/name?
 
 Bug below:
 
 Full log before the crash:
 INFO: task btrfs-transacti:3358 blocked for more than 120 seconds.
Not tainted 3.15.1-amd64-i915-preempt-20140216jbp #1
 echo 0  /proc/sys/kernel/hung_task_timeout_secs disables this message.
 btrfs-transacti D  0  3358  2 0x
   8800c50ebc50 0046 8800c50ebc20 8800c50ebfd8
   8800c6914390 000141c0 7fff 8801433b8f10
   0002 8161c9b0 7fff 8800c50ebc60
 Call Trace:
   [8161c9b0] ? sock_rps_reset_flow+0x32/0x32
   [8161d3c6] schedule+0x73/0x75
   [8161c9e9] schedule_timeout+0x39/0x129
   [8107653d] ? get_parent_ip+0xd/0x3c
   [8162338f] ? preempt_count_add+0x7a/0x8d
   [8161dbac] __wait_for_common+0x11a/0x159
   [8107810f] ? wake_up_state+0x12/0x12
   [8161dc0f] wait_for_completion+0x24/0x26
   [81237ce6] btrfs_wait_and_free_delalloc_work+0x16/0x28
   [8123fd3a] btrfs_run_ordered_operations+0x1e7/0x21e
   [81229aa4] btrfs_flush_all_pending_stuffs+0x4e/0x55
   [8122b25a] btrfs_commit_transaction+0x20d/0x8b0
   [81227b41] transaction_kthread+0xf8/0x1ab
   [81227a49] ? btrfs_cleanup_transaction+0x44c/0x44c
   [8106b4b4] kthread+0xae/0xb6
   [8106b406] ? __kthread_parkme+0x61/0x61
   [8162667c] ret_from_fork+0x7c/0xb0
   [8106b406] ? __kthread_parkme+0x61/0x61
 INFO: task kworker/u8:13:13157 blocked for more than 120 seconds.
Not tainted 3.15.1-amd64-i915-preempt-20140216jbp #1
 echo 0  /proc/sys/kernel/hung_task_timeout_secs disables this message.
 kworker/u8:13   D  0 13157  2 0x0080
 Workqueue: btrfs-flush_delalloc normal_work_helper
   8800041cfc00 0046 8800041cfbd0 8800041cffd8
   8800034f40d0 000141c0 88021f2941c0 8800034f40d0
   8800041cfca0 810fdc2f 0002 8800041cfc10
 Call Trace:
   [810fdc2f] ? wait_on_page_read+0x3c/0x3c
   [8161d3c6] schedule+0x73/0x75
   [8161d56b] io_schedule+0x60/0x7a
   [810fdc3d] sleep_on_page+0xe/0x12
   [8161d7fd] __wait_on_bit+0x48/0x7a
   [810fdbdd] wait_on_page_bit+0x7a/0x7c
   [81084821] ? autoremove_wake_function+0x34/0x34
   [810fef03] filemap_fdatawait_range+0x7e/0x126
   [8122f1cf] ? btrfs_submit_direct+0x3f4/0x3f4
   [8122d7aa] ? btrfs_writepages+0x28/0x2a
   [811084c6] ? do_writepages+0x1e/0x2c
   [810ff38e] ? __filemap_fdatawrite_range+0x55/0x57
   [8124006f] btrfs_wait_ordered_range+0x6a/0x11a
   [8122fe01] btrfs_run_delalloc_work+0x27/0x69
   [812508db] normal_work_helper+0xfe/0x240
   [81065d7e] process_one_work+0x195/0x2d2
   [81066020] worker_thread+0x136/0x205
   [81065eea] ? process_scheduled_works+0x2f/0x2f
   [8106b4b4] kthread+0xae/0xb6
   [8106b406] ? __kthread_parkme+0x61/0x61
   [8162667c] ret_from_fork+0x7c/0xb0
   [8106b406] ? __kthread_parkme+0x61/0x61
 INFO: task btrfs-transacti:3358 blocked for more than 120 seconds.
Not tainted 3.15.1-amd64-i915-preempt-20140216jbp #1
 echo 0  /proc/sys/kernel/hung_task_timeout_secs disables this message.
 btrfs-transacti D  0  3358  2 0x
   8800c50ebc50 0046 8800c50ebc20 8800c50ebfd8
   8800c6914390 000141c0 7fff 8801433b8f10
   0002 8161c9b0 7fff 8800c50ebc60
 Call Trace:
   [8161c9b0] ? sock_rps_reset_flow+0x32/0x32
   [8161d3c6] schedule+0x73/0x75
   [8161c9e9] schedule_timeout+0x39/0x129
   [8107653d] ? get_parent_ip+0xd/0x3c
   [8162338f] ? preempt_count_add+0x7a/0x8d
   [8161dbac] __wait_for_common+0x11a/0x159
   [8107810f] ? wake_up_state+0x12/0x12
   [8161dc0f] wait_for_completion+0x24/0x26
   [81237ce6] btrfs_wait_and_free_delalloc_work+0x16/0x28
   [8123fd3a] btrfs_run_ordered_operations+0x1e7/0x21e
   [81229aa4] btrfs_flush_all_pending_stuffs+0x4e/0x55

Re: [PATCH] btrfs: only unlock block in verify_parent_transid if we locked it

On Wed, Jun 25, 2014 at 01:45:41PM -0700, Josef Bacik wrote:
 This is a regression from my patch a26e8c9f75b0bfd89e8f110737b136eb5994, 
 we
 need to only unlock the block if we were the one who locked it.  Otherwise 
 this
 will trip BUG_ON()'s in locking.c  Thanks,
 

Reviewed-by: Liu Bo bo.li@oracle.com

-liubo

 cc: sta...@vger.kernel.org
 Signed-off-by: Josef Bacik jba...@fb.com
 ---
  fs/btrfs/disk-io.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)
 
 diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
 index 8bb4aa1..f00165d 100644
 --- a/fs/btrfs/disk-io.c
 +++ b/fs/btrfs/disk-io.c
 @@ -369,7 +369,8 @@ static int verify_parent_transid(struct extent_io_tree 
 *io_tree,
  out:
   unlock_extent_cached(io_tree, eb-start, eb-start + eb-len - 1,
cached_state, GFP_NOFS);
 - btrfs_tree_read_unlock_blocking(eb);
 + if (need_lock)
 + btrfs_tree_read_unlock_blocking(eb);
   return ret;
  }
  
 -- 
 2.0.0
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 0/4] Add superblock checksum check for btrfs-progs

Before this patchset, btrfs-progs will overall ignore the superblock
checksum and continue the routine.
Sometimes this may cause disasters like checking a btrfs with corrupted
superblock will lead to crash in btrfs-progs.

This patch introduces superblock checksum check into btrfs_read_dev_super(),
making btrfs-progs much more restricted and robust.
To allow super-recover to open devices, add options to scan all 3
superblocks when using super-recover.
Also updated the related error string and fix a bug in chunk-recover that
will not be triggered until superblock csum is calculated.

Qu Wenruo (4):
  btrfs-progs: Check superblock's checsum when read dev super
  btrfs-progs: Allow btrfs_read_dev_super() to read all 3 super for
super_recover.
  btrfs-progs: Add more meaningful return value for
btrfs_read_dev_super() and corresponding error string.
  btrfs-progs: Fix size for malloc for superblock checksum.

 btrfs-find-root.c |  9 --
 chunk-recover.c   | 18 +++
 cmds-filesystem.c |  9 --
 disk-io.c | 91 +--
 disk-io.h |  5 +--
 super-recover.c   |  2 +-
 utils.c   | 16 ++
 volumes.c |  8 ++---
 volumes.h |  2 +-
 9 files changed, 104 insertions(+), 56 deletions(-)

-- 
2.0.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/4] btrfs-progs: Allow btrfs_read_dev_super() to read all 3 super for super_recover.

Btrfs-progs superblock checksum check is somewhat too restricted for
super-recover, since current btrfs-progs will only read the 1st
superblock and if you need super-recover the 1st superblock is
possibly already damaged.

The fix is introducing super_recover parameter for
btrfs_read_dev_super() and callers to allow scan backup superblocks if
needed.

Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com
---
 btrfs-find-root.c |  4 ++--
 chunk-recover.c   |  6 +++---
 cmds-filesystem.c |  2 +-
 disk-io.c | 17 ++---
 disk-io.h |  5 +++--
 super-recover.c   |  2 +-
 utils.c   | 11 ++-
 volumes.c |  4 ++--
 volumes.h |  2 +-
 9 files changed, 29 insertions(+), 24 deletions(-)

diff --git a/btrfs-find-root.c b/btrfs-find-root.c
index 25d79f1..e31a9b5 100644
--- a/btrfs-find-root.c
+++ b/btrfs-find-root.c
@@ -82,7 +82,7 @@ static struct btrfs_root *open_ctree_broken(int fd, const 
char *device)
return NULL;
}
 
-   ret = btrfs_scan_fs_devices(fd, device, fs_devices, 0, 1);
+   ret = btrfs_scan_fs_devices(fd, device, fs_devices, 0, 1, 1);
if (ret)
goto out;
 
@@ -94,7 +94,7 @@ static struct btrfs_root *open_ctree_broken(int fd, const 
char *device)
 
disk_super = fs_info-super_copy;
ret = btrfs_read_dev_super(fs_devices-latest_bdev,
-  disk_super, fs_info-super_bytenr);
+  disk_super, fs_info-super_bytenr, 1);
if (ret) {
printk(No valid btrfs found\n);
goto out_devices;
diff --git a/chunk-recover.c b/chunk-recover.c
index 613d715..9baedd7 100644
--- a/chunk-recover.c
+++ b/chunk-recover.c
@@ -1283,7 +1283,7 @@ open_ctree_with_broken_chunk(struct recover_control *rc)
 
disk_super = fs_info-super_copy;
ret = btrfs_read_dev_super(fs_info-fs_devices-latest_bdev,
-  disk_super, fs_info-super_bytenr);
+  disk_super, fs_info-super_bytenr, 1);
if (ret) {
fprintf(stderr, No valid btrfs found\n);
goto out_devices;
@@ -1349,7 +1349,7 @@ static int recover_prepare(struct recover_control *rc, 
char *path)
goto fail_close_fd;
}
 
-   ret = btrfs_read_dev_super(fd, sb, BTRFS_SUPER_INFO_OFFSET);
+   ret = btrfs_read_dev_super(fd, sb, BTRFS_SUPER_INFO_OFFSET, 1);
if (ret) {
fprintf(stderr, read super block error\n);
goto fail_free_sb;
@@ -1368,7 +1368,7 @@ static int recover_prepare(struct recover_control *rc, 
char *path)
goto fail_free_sb;
}
 
-   ret = btrfs_scan_fs_devices(fd, path, fs_devices, 0, 1);
+   ret = btrfs_scan_fs_devices(fd, path, fs_devices, 0, 1, 1);
if (ret)
goto fail_free_sb;
 
diff --git a/cmds-filesystem.c b/cmds-filesystem.c
index 306f715..d2e46dc 100644
--- a/cmds-filesystem.c
+++ b/cmds-filesystem.c
@@ -513,7 +513,7 @@ static int dev_to_fsid(char *dev, __u8 *fsid)
 
disk_super = (struct btrfs_super_block *)buf;
ret = btrfs_read_dev_super(fd, disk_super,
-   BTRFS_SUPER_INFO_OFFSET);
+  BTRFS_SUPER_INFO_OFFSET, 0);
if (ret)
goto out;
 
diff --git a/disk-io.c b/disk-io.c
index e447af8..1bd9fae 100644
--- a/disk-io.c
+++ b/disk-io.c
@@ -980,7 +980,7 @@ void btrfs_cleanup_all_caches(struct btrfs_fs_info *fs_info)
 
 int btrfs_scan_fs_devices(int fd, const char *path,
  struct btrfs_fs_devices **fs_devices,
- u64 sb_bytenr, int run_ioctl)
+ u64 sb_bytenr, int run_ioctl, int super_recover)
 {
u64 total_devs;
int ret;
@@ -988,7 +988,7 @@ int btrfs_scan_fs_devices(int fd, const char *path,
sb_bytenr = BTRFS_SUPER_INFO_OFFSET;
 
ret = btrfs_scan_one_device(fd, path, fs_devices,
-   total_devs, sb_bytenr);
+   total_devs, sb_bytenr, super_recover);
if (ret) {
fprintf(stderr, No valid Btrfs found on %s\n, path);
return ret;
@@ -1076,7 +1076,8 @@ static struct btrfs_fs_info *__open_ctree_fd(int fp, 
const char *path,
fs_info-on_restoring = 1;
 
ret = btrfs_scan_fs_devices(fp, path, fs_devices, sb_bytenr,
-   !(flags  OPEN_CTREE_RECOVER_SUPER));
+   !(flags  OPEN_CTREE_RECOVER_SUPER),
+   (flags  OPEN_CTREE_RECOVER_SUPER));
if (ret)
goto out;
 
@@ -1096,9 +1097,9 @@ static struct btrfs_fs_info *__open_ctree_fd(int fp, 
const char *path,
disk_super = fs_info-super_copy;
if (!(flags  OPEN_CTREE_RECOVER_SUPER))
ret = btrfs_read_dev_super(fs_devices-latest_bdev,
-

[PATCH 3/4] btrfs-progs: Add more meaningful return value for btrfs_read_dev_super() and corresponding error string.

Since btrfs_read_dev_super() now can distinguish non-btrfs fs and
corrupted superblock thanks for the newly introduced super csum check,
the return value and corresponding error string should also be updated
to print more meaningful errors for end users.

Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com
---
 btrfs-find-root.c |  5 -
 chunk-recover.c   | 10 --
 cmds-filesystem.c |  7 ++-
 disk-io.c | 36 ++--
 utils.c   |  5 -
 volumes.c |  4 +---
 6 files changed, 49 insertions(+), 18 deletions(-)

diff --git a/btrfs-find-root.c b/btrfs-find-root.c
index e31a9b5..f3bf452 100644
--- a/btrfs-find-root.c
+++ b/btrfs-find-root.c
@@ -96,7 +96,10 @@ static struct btrfs_root *open_ctree_broken(int fd, const 
char *device)
ret = btrfs_read_dev_super(fs_devices-latest_bdev,
   disk_super, fs_info-super_bytenr, 1);
if (ret) {
-   printk(No valid btrfs found\n);
+   if (ret == -ENOENT)
+   printk(No valid btrfs found\n);
+   if (ret == -EIO)
+   printk(Superblock is corrupted\n);
goto out_devices;
}
 
diff --git a/chunk-recover.c b/chunk-recover.c
index 9baedd7..c8badf9 100644
--- a/chunk-recover.c
+++ b/chunk-recover.c
@@ -1285,7 +1285,10 @@ open_ctree_with_broken_chunk(struct recover_control *rc)
ret = btrfs_read_dev_super(fs_info-fs_devices-latest_bdev,
   disk_super, fs_info-super_bytenr, 1);
if (ret) {
-   fprintf(stderr, No valid btrfs found\n);
+   if (ret == -ENOENT)
+   printk(No valid btrfs found\n);
+   if (ret == -EIO)
+   printk(Superblock is corrupted\n);
goto out_devices;
}
 
@@ -1351,7 +1354,10 @@ static int recover_prepare(struct recover_control *rc, 
char *path)
 
ret = btrfs_read_dev_super(fd, sb, BTRFS_SUPER_INFO_OFFSET, 1);
if (ret) {
-   fprintf(stderr, read super block error\n);
+   if (ret == -ENOENT)
+   printk(No valid btrfs found\n);
+   if (ret == -EIO)
+   printk(Superblock is corrupted\n);
goto fail_free_sb;
}
 
diff --git a/cmds-filesystem.c b/cmds-filesystem.c
index d2e46dc..d58397d 100644
--- a/cmds-filesystem.c
+++ b/cmds-filesystem.c
@@ -604,9 +604,14 @@ static int cmd_show(int argc, char **argv)
} else {
ret = dev_to_fsid(search, fsid);
if (ret) {
-   fprintf(stderr,
+   if (ret == -ENOENT)
+   fprintf(stderr,
ERROR: No btrfs on 
%s\n,
search);
+   if (ret == -EIO)
+   fprintf(stderr,
+   Superblock is 
corrupted on %s\n,
+   search);
return 1;
}
uuid_unparse(fsid, uuid_buf);
diff --git a/disk-io.c b/disk-io.c
index 1bd9fae..4cc831b 100644
--- a/disk-io.c
+++ b/disk-io.c
@@ -990,7 +990,11 @@ int btrfs_scan_fs_devices(int fd, const char *path,
ret = btrfs_scan_one_device(fd, path, fs_devices,
total_devs, sb_bytenr, super_recover);
if (ret) {
-   fprintf(stderr, No valid Btrfs found on %s\n, path);
+   if (ret == -ENOENT)
+   fprintf(stderr, No valid Btrfs found on %s\n, path);
+   if (ret == -EIO)
+   fprintf(stderr, Superblock is corrupted on %s\n,
+   path);
return ret;
}
 
@@ -1101,7 +1105,10 @@ static struct btrfs_fs_info *__open_ctree_fd(int fp, 
const char *path,
else
ret = btrfs_read_dev_super(fp, disk_super, sb_bytenr, 0);
if (ret) {
-   printk(No valid btrfs found\n);
+   if (ret == -ENOENT)
+   printk(No valid btrfs found\n);
+   if (ret == -EIO)
+   printk(Superblock is corrupted\n);
goto out_devices;
}
 
@@ -1201,11 +1208,11 @@ int btrfs_read_dev_super(int fd, struct 
btrfs_super_block *sb, u64 sb_bytenr,
if (sb_bytenr != BTRFS_SUPER_INFO_OFFSET) {
ret = pread64(fd, data, sizeof(data), sb_bytenr);
if (ret  sizeof(data))
-   return -1;
+

[PATCH 0/4] Add superblock checksum check for btrfs-progs

Before this patchset, btrfs-progs will overall ignore the superblock
checksum and continue the routine.
Sometimes this may cause disasters like checking a btrfs with corrupted
superblock will lead to crash in btrfs-progs.

This patch introduces superblock checksum check into btrfs_read_dev_super(),
making btrfs-progs much more restricted and robust.
To allow super-recover to open devices, add options to scan all 3
superblocks when using super-recover.
Also updated the related error string and fix a bug in chunk-recover that
will not be triggered until superblock csum is calculated.

Qu Wenruo (4):
  btrfs-progs: Check superblock's checsum when read dev super
  btrfs-progs: Allow btrfs_read_dev_super() to read all 3 super for
super_recover.
  btrfs-progs: Add more meaningful return value for
btrfs_read_dev_super() and corresponding error string.
  btrfs-progs: Fix size for malloc for superblock checksum.

 btrfs-find-root.c |  9 --
 chunk-recover.c   | 18 +++
 cmds-filesystem.c |  9 --
 disk-io.c | 91 +--
 disk-io.h |  5 +--
 super-recover.c   |  2 +-
 utils.c   | 16 ++
 volumes.c |  8 ++---
 volumes.h |  2 +-
 9 files changed, 104 insertions(+), 56 deletions(-)

-- 
2.0.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 4/4] btrfs-progs: Fix malloc size for superblock.

recover_prepare() in chunk-recover.c alloc memory which only contains
sizeof(struct btrfs_super_block). This will cause glibc malloc error
after superblock csum is calculated.

Use BTRFS_SUPER_INFO_SIZE to fix the bug.

Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com
---
 chunk-recover.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/chunk-recover.c b/chunk-recover.c
index c8badf9..7dfaf82 100644
--- a/chunk-recover.c
+++ b/chunk-recover.c
@@ -1345,7 +1345,7 @@ static int recover_prepare(struct recover_control *rc, 
char *path)
return -1;
}
 
-   sb = malloc(sizeof(struct btrfs_super_block));
+   sb = malloc(BTRFS_SUPER_INFO_SIZE);
if (!sb) {
fprintf(stderr, allocating memory for sb failed.\n);
ret = -ENOMEM;
-- 
2.0.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 1/4] btrfs-progs: Check superblock's checsum when read dev super

Btrfs-progs will read the superblock without checking the checksum.
When all superblocks are corrupted, continuing will cause disaster.

So this patch will add checksum check for btrfs-progs when reading
superblocks.

Also fix a bug that btrfs_read_dev_super() only reads sizeof(struct
btrfs_super_block), corrent size should be BTRFS_SUPER_INFO_SIZE.

Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com
---
v2:
  Use corrent memcmp src.
  Read the whole supblock size(sectorsize) other than
  sizeof(btrfs_super_block).
---
 disk-io.c | 46 +-
 1 file changed, 29 insertions(+), 17 deletions(-)

diff --git a/disk-io.c b/disk-io.c
index 8db0335..e447af8 100644
--- a/disk-io.c
+++ b/disk-io.c
@@ -1186,22 +1186,25 @@ int btrfs_read_dev_super(int fd, struct 
btrfs_super_block *sb, u64 sb_bytenr)
 {
u8 fsid[BTRFS_FSID_SIZE];
int fsid_is_initialized = 0;
-   struct btrfs_super_block buf;
+   u8 data[BTRFS_SUPER_INFO_SIZE];
+   struct btrfs_super_block *buf = (struct btrfs_super_block *) data;
int i;
int ret;
u64 transid = 0;
u64 bytenr;
+   u32 crc;
+   char crc_result[BTRFS_CSUM_SIZE];
 
if (sb_bytenr != BTRFS_SUPER_INFO_OFFSET) {
-   ret = pread64(fd, buf, sizeof(buf), sb_bytenr);
-   if (ret  sizeof(buf))
+   ret = pread64(fd, data, sizeof(data), sb_bytenr);
+   if (ret  sizeof(data))
return -1;
 
-   if (btrfs_super_bytenr(buf) != sb_bytenr ||
-   btrfs_super_magic(buf) != BTRFS_MAGIC)
+   if (btrfs_super_bytenr(buf) != sb_bytenr ||
+   btrfs_super_magic(buf) != BTRFS_MAGIC)
return -1;
 
-   memcpy(sb, buf, sizeof(*sb));
+   memcpy(sb, data, sizeof(data));
return 0;
}
 
@@ -1214,22 +1217,31 @@ int btrfs_read_dev_super(int fd, struct 
btrfs_super_block *sb, u64 sb_bytenr)
 
for (i = 0; i  1; i++) {
bytenr = btrfs_sb_offset(i);
-   ret = pread64(fd, buf, sizeof(buf), bytenr);
-   if (ret  sizeof(buf))
+   ret = pread64(fd, data, sizeof(data), bytenr);
+   if (ret  sizeof(data))
break;
 
-   if (btrfs_super_bytenr(buf) != bytenr )
+   if (btrfs_super_bytenr(buf) != bytenr)
continue;
-   /* if magic is NULL, the device was removed */
-   if (btrfs_super_magic(buf) == 0  i == 0)
+   /* if first super block is not btrfs, the device was removed */
+   if (btrfs_super_magic(buf) != BTRFS_MAGIC  i == 0)
return -1;
-   if (btrfs_super_magic(buf) != BTRFS_MAGIC)
+   if (btrfs_super_magic(buf) != BTRFS_MAGIC)
+   continue;
+
+   /* check if the superblock is damaged */
+   crc = ~(u32)0;
+   crc = btrfs_csum_data(NULL, (char *)buf + BTRFS_CSUM_SIZE,
+ crc, BTRFS_SUPER_INFO_SIZE -
+ BTRFS_CSUM_SIZE);
+   btrfs_csum_final(crc, crc_result);
+   if (memcmp(crc_result, buf, btrfs_super_csum_size(buf)))
continue;
 
if (!fsid_is_initialized) {
-   memcpy(fsid, buf.fsid, sizeof(fsid));
+   memcpy(fsid, buf-fsid, sizeof(fsid));
fsid_is_initialized = 1;
-   } else if (memcmp(fsid, buf.fsid, sizeof(fsid))) {
+   } else if (memcmp(fsid, buf-fsid, sizeof(fsid))) {
/*
 * the superblocks (the original one and
 * its backups) contain data of different
@@ -1238,9 +1250,9 @@ int btrfs_read_dev_super(int fd, struct btrfs_super_block 
*sb, u64 sb_bytenr)
continue;
}
 
-   if (btrfs_super_generation(buf)  transid) {
-   memcpy(sb, buf, sizeof(*sb));
-   transid = btrfs_super_generation(buf);
+   if (btrfs_super_generation(buf)  transid) {
+   memcpy(sb, data, sizeof(data));
+   transid = btrfs_super_generation(buf);
}
}
 
-- 
2.0.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/4] btrfs-progs: Allow btrfs_read_dev_super() to read all 3 super for super_recover.

Btrfs-progs superblock checksum check is somewhat too restricted for
super-recover, since current btrfs-progs will only read the 1st
superblock and if you need super-recover the 1st superblock is
possibly already damaged.

The fix is introducing super_recover parameter for
btrfs_read_dev_super() and callers to allow scan backup superblocks if
needed.

Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com
---
 btrfs-find-root.c |  4 ++--
 chunk-recover.c   |  6 +++---
 cmds-filesystem.c |  2 +-
 disk-io.c | 17 ++---
 disk-io.h |  5 +++--
 super-recover.c   |  2 +-
 utils.c   | 11 ++-
 volumes.c |  4 ++--
 volumes.h |  2 +-
 9 files changed, 29 insertions(+), 24 deletions(-)

diff --git a/btrfs-find-root.c b/btrfs-find-root.c
index 25d79f1..e31a9b5 100644
--- a/btrfs-find-root.c
+++ b/btrfs-find-root.c
@@ -82,7 +82,7 @@ static struct btrfs_root *open_ctree_broken(int fd, const 
char *device)
return NULL;
}
 
-   ret = btrfs_scan_fs_devices(fd, device, fs_devices, 0, 1);
+   ret = btrfs_scan_fs_devices(fd, device, fs_devices, 0, 1, 1);
if (ret)
goto out;
 
@@ -94,7 +94,7 @@ static struct btrfs_root *open_ctree_broken(int fd, const 
char *device)
 
disk_super = fs_info-super_copy;
ret = btrfs_read_dev_super(fs_devices-latest_bdev,
-  disk_super, fs_info-super_bytenr);
+  disk_super, fs_info-super_bytenr, 1);
if (ret) {
printk(No valid btrfs found\n);
goto out_devices;
diff --git a/chunk-recover.c b/chunk-recover.c
index 613d715..9baedd7 100644
--- a/chunk-recover.c
+++ b/chunk-recover.c
@@ -1283,7 +1283,7 @@ open_ctree_with_broken_chunk(struct recover_control *rc)
 
disk_super = fs_info-super_copy;
ret = btrfs_read_dev_super(fs_info-fs_devices-latest_bdev,
-  disk_super, fs_info-super_bytenr);
+  disk_super, fs_info-super_bytenr, 1);
if (ret) {
fprintf(stderr, No valid btrfs found\n);
goto out_devices;
@@ -1349,7 +1349,7 @@ static int recover_prepare(struct recover_control *rc, 
char *path)
goto fail_close_fd;
}
 
-   ret = btrfs_read_dev_super(fd, sb, BTRFS_SUPER_INFO_OFFSET);
+   ret = btrfs_read_dev_super(fd, sb, BTRFS_SUPER_INFO_OFFSET, 1);
if (ret) {
fprintf(stderr, read super block error\n);
goto fail_free_sb;
@@ -1368,7 +1368,7 @@ static int recover_prepare(struct recover_control *rc, 
char *path)
goto fail_free_sb;
}
 
-   ret = btrfs_scan_fs_devices(fd, path, fs_devices, 0, 1);
+   ret = btrfs_scan_fs_devices(fd, path, fs_devices, 0, 1, 1);
if (ret)
goto fail_free_sb;
 
diff --git a/cmds-filesystem.c b/cmds-filesystem.c
index 306f715..d2e46dc 100644
--- a/cmds-filesystem.c
+++ b/cmds-filesystem.c
@@ -513,7 +513,7 @@ static int dev_to_fsid(char *dev, __u8 *fsid)
 
disk_super = (struct btrfs_super_block *)buf;
ret = btrfs_read_dev_super(fd, disk_super,
-   BTRFS_SUPER_INFO_OFFSET);
+  BTRFS_SUPER_INFO_OFFSET, 0);
if (ret)
goto out;
 
diff --git a/disk-io.c b/disk-io.c
index e447af8..1bd9fae 100644
--- a/disk-io.c
+++ b/disk-io.c
@@ -980,7 +980,7 @@ void btrfs_cleanup_all_caches(struct btrfs_fs_info *fs_info)
 
 int btrfs_scan_fs_devices(int fd, const char *path,
  struct btrfs_fs_devices **fs_devices,
- u64 sb_bytenr, int run_ioctl)
+ u64 sb_bytenr, int run_ioctl, int super_recover)
 {
u64 total_devs;
int ret;
@@ -988,7 +988,7 @@ int btrfs_scan_fs_devices(int fd, const char *path,
sb_bytenr = BTRFS_SUPER_INFO_OFFSET;
 
ret = btrfs_scan_one_device(fd, path, fs_devices,
-   total_devs, sb_bytenr);
+   total_devs, sb_bytenr, super_recover);
if (ret) {
fprintf(stderr, No valid Btrfs found on %s\n, path);
return ret;
@@ -1076,7 +1076,8 @@ static struct btrfs_fs_info *__open_ctree_fd(int fp, 
const char *path,
fs_info-on_restoring = 1;
 
ret = btrfs_scan_fs_devices(fp, path, fs_devices, sb_bytenr,
-   !(flags  OPEN_CTREE_RECOVER_SUPER));
+   !(flags  OPEN_CTREE_RECOVER_SUPER),
+   (flags  OPEN_CTREE_RECOVER_SUPER));
if (ret)
goto out;
 
@@ -1096,9 +1097,9 @@ static struct btrfs_fs_info *__open_ctree_fd(int fp, 
const char *path,
disk_super = fs_info-super_copy;
if (!(flags  OPEN_CTREE_RECOVER_SUPER))
ret = btrfs_read_dev_super(fs_devices-latest_bdev,
-

[PATCH v2 1/4] btrfs-progs: Check superblock's checsum when read dev super

Btrfs-progs will read the superblock without checking the checksum.
When all superblocks are corrupted, continuing will cause disaster.

So this patch will add checksum check for btrfs-progs when reading
superblocks.

Also fix a bug that btrfs_read_dev_super() only reads sizeof(struct
btrfs_super_block), corrent size should be BTRFS_SUPER_INFO_SIZE.

Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com
---
v2:
  Use corrent memcmp src.
  Read the whole supblock size(sectorsize) other than
  sizeof(btrfs_super_block).
---
 disk-io.c | 46 +-
 1 file changed, 29 insertions(+), 17 deletions(-)

diff --git a/disk-io.c b/disk-io.c
index 8db0335..e447af8 100644
--- a/disk-io.c
+++ b/disk-io.c
@@ -1186,22 +1186,25 @@ int btrfs_read_dev_super(int fd, struct 
btrfs_super_block *sb, u64 sb_bytenr)
 {
u8 fsid[BTRFS_FSID_SIZE];
int fsid_is_initialized = 0;
-   struct btrfs_super_block buf;
+   u8 data[BTRFS_SUPER_INFO_SIZE];
+   struct btrfs_super_block *buf = (struct btrfs_super_block *) data;
int i;
int ret;
u64 transid = 0;
u64 bytenr;
+   u32 crc;
+   char crc_result[BTRFS_CSUM_SIZE];
 
if (sb_bytenr != BTRFS_SUPER_INFO_OFFSET) {
-   ret = pread64(fd, buf, sizeof(buf), sb_bytenr);
-   if (ret  sizeof(buf))
+   ret = pread64(fd, data, sizeof(data), sb_bytenr);
+   if (ret  sizeof(data))
return -1;
 
-   if (btrfs_super_bytenr(buf) != sb_bytenr ||
-   btrfs_super_magic(buf) != BTRFS_MAGIC)
+   if (btrfs_super_bytenr(buf) != sb_bytenr ||
+   btrfs_super_magic(buf) != BTRFS_MAGIC)
return -1;
 
-   memcpy(sb, buf, sizeof(*sb));
+   memcpy(sb, data, sizeof(data));
return 0;
}
 
@@ -1214,22 +1217,31 @@ int btrfs_read_dev_super(int fd, struct 
btrfs_super_block *sb, u64 sb_bytenr)
 
for (i = 0; i  1; i++) {
bytenr = btrfs_sb_offset(i);
-   ret = pread64(fd, buf, sizeof(buf), bytenr);
-   if (ret  sizeof(buf))
+   ret = pread64(fd, data, sizeof(data), bytenr);
+   if (ret  sizeof(data))
break;
 
-   if (btrfs_super_bytenr(buf) != bytenr )
+   if (btrfs_super_bytenr(buf) != bytenr)
continue;
-   /* if magic is NULL, the device was removed */
-   if (btrfs_super_magic(buf) == 0  i == 0)
+   /* if first super block is not btrfs, the device was removed */
+   if (btrfs_super_magic(buf) != BTRFS_MAGIC  i == 0)
return -1;
-   if (btrfs_super_magic(buf) != BTRFS_MAGIC)
+   if (btrfs_super_magic(buf) != BTRFS_MAGIC)
+   continue;
+
+   /* check if the superblock is damaged */
+   crc = ~(u32)0;
+   crc = btrfs_csum_data(NULL, (char *)buf + BTRFS_CSUM_SIZE,
+ crc, BTRFS_SUPER_INFO_SIZE -
+ BTRFS_CSUM_SIZE);
+   btrfs_csum_final(crc, crc_result);
+   if (memcmp(crc_result, buf, btrfs_super_csum_size(buf)))
continue;
 
if (!fsid_is_initialized) {
-   memcpy(fsid, buf.fsid, sizeof(fsid));
+   memcpy(fsid, buf-fsid, sizeof(fsid));
fsid_is_initialized = 1;
-   } else if (memcmp(fsid, buf.fsid, sizeof(fsid))) {
+   } else if (memcmp(fsid, buf-fsid, sizeof(fsid))) {
/*
 * the superblocks (the original one and
 * its backups) contain data of different
@@ -1238,9 +1250,9 @@ int btrfs_read_dev_super(int fd, struct btrfs_super_block 
*sb, u64 sb_bytenr)
continue;
}
 
-   if (btrfs_super_generation(buf)  transid) {
-   memcpy(sb, buf, sizeof(*sb));
-   transid = btrfs_super_generation(buf);
+   if (btrfs_super_generation(buf)  transid) {
+   memcpy(sb, data, sizeof(data));
+   transid = btrfs_super_generation(buf);
}
}
 
-- 
2.0.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH V4 1/9] Btrfs: device_list_add() should not update list when mounted

From: Anand Jain anand.j...@oracle.com

device_list_add() is called when user runs btrfs dev scan, which would add
any btrfs device into the btrfs_fs_devices list.

Now think of a mounted btrfs. And a new device which contains the a SB
from the mounted btrfs devices.

In this situation when user runs btrfs dev scan, the current code would
just replace existing device with the new device.

Which is to note that old device is neither closed nor gracefully
removed from the btrfs.

The FS is still operational with the old bdev however the device name
is the btrfs_device is new which is provided by the btrfs dev scan.

reproducer:

devmgt[1] detach /dev/sdc

replace the missing disk /dev/sdc

btrfs rep start -f 1 /dev/sde /btrfs
Label: none  uuid: 5dc0aaf4-4683-4050-b2d6-5ebe5f5cd120
Total devices 2 FS bytes used 32.00KiB
devid1 size 958.94MiB used 115.88MiB path /dev/sde
devid2 size 958.94MiB used 103.88MiB path /dev/sdd

make /dev/sdc to reappear

devmgt attach host2

btrfs dev scan

btrfs fi show -m
Label: none  uuid: 5dc0aaf4-4683-4050-b2d6-5ebe5f5cd120^M
Total devices 2 FS bytes used 32.00KiB^M
devid1 size 958.94MiB used 115.88MiB path /dev/sdc - Wrong.
devid2 size 958.94MiB used 103.88MiB path /dev/sdd

since /dev/sdc has been replaced with /dev/sde, the /dev/sdc shouldn't be
part of the btrfs-fsid when it reappears. If user want it to be part of it
then sys admin should be using btrfs device add instead.

[1] github.com/anajain/devmgt.git

Signed-off-by: Anand Jain anand.j...@oracle.com
Signed-off-by: Wang Shilong wangsl.f...@cn.fujitsu.com
Signed-off-by: Miao Xie mi...@cn.fujitsu.com
---
Changelog v3-v4:
- Fix the over-80-charactor problem
---
 fs/btrfs/volumes.c | 27 +++
 1 file changed, 27 insertions(+)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index a9c11a0..16e71a1 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -508,6 +508,33 @@ static noinline int device_list_add(const char *path,
ret = 1;
device-fs_devices = fs_devices;
} else if (!device-name || strcmp(device-name-str, path)) {
+   /*
+* When FS is already mounted.
+* 1. If you are here and if the device-name is NULL that
+*means this device was missing at time of FS mount.
+* 2. If you are here and if the device-name is different
+*from 'path' that means either
+*  a. The same device disappeared and reappeared with
+* different name. or
+*  b. The missing-disk-which-was-replaced, has
+* reappeared now.
+*
+* We must allow 1 and 2a above. But 2b would be a spurious
+* and unintentional.
+*
+* Further in case of 1 and 2a above, the disk at 'path'
+* would have missed some transaction when it was away and
+* in case of 2a the stale bdev has to be updated as well.
+* 2b must not be allowed at all time.
+*/
+
+   /*
+* As of now don't allow update to btrfs_fs_device through
+* the btrfs dev scan cli, after FS has been mounted.
+*/
+   if (fs_devices-opened)
+   return -EBUSY;
+
name = rcu_string_strdup(path, GFP_NOFS);
if (!name)
return -ENOMEM;
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH RESEND 3/9] Btrfs: make defragment work with nodatacow option

From: Wang Shilong wangsl.f...@cn.fujitsu.com

Btrfs defragment will utilize COW feature, which means this
did not work for nodatacow option, this problem was detected
by xfstests generic/018 with nodatacow mount option.

Fix this problem by forcing cow for a extent with state
@EXTETN_DEFRAG setting.

Signed-off-by: Wang Shilong wangsl.f...@cn.fujitsu.com
Signed-off-by: Miao Xie mi...@cn.fujitsu.com
---
 fs/btrfs/btrfs_inode.h |  6 ++
 fs/btrfs/inode.c   | 39 ---
 2 files changed, 42 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
index a0cf3e5..01cfcba 100644
--- a/fs/btrfs/btrfs_inode.h
+++ b/fs/btrfs/btrfs_inode.h
@@ -127,6 +127,12 @@ struct btrfs_inode {
u64 delalloc_bytes;
 
/*
+* total number of bytes pending defrag, used by stat to check whether
+* it needs COW.
+*/
+   u64 defrag_bytes;
+
+   /*
 * the size of the file stored in the metadata on disk.  data=ordered
 * means the in-memory i_size might be larger than the size on disk
 * because not all the blocks are written yet.
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 6b65fab..a616fa4 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1425,6 +1425,26 @@ error:
return ret;
 }
 
+static inline int need_force_cow(struct inode *inode, u64 start, u64 end)
+{
+
+   if (!(BTRFS_I(inode)-flags  BTRFS_INODE_NODATACOW) 
+   !(BTRFS_I(inode)-flags  BTRFS_INODE_PREALLOC))
+   return 0;
+
+   /*
+* @defrag_bytes is a hint value, no spinlock held here,
+* if is not zero, it means the file is defragging.
+* Force cow if given extent needs to be defragged.
+*/
+   if (BTRFS_I(inode)-defrag_bytes 
+   test_range_bit(BTRFS_I(inode)-io_tree, start, end,
+  EXTENT_DEFRAG, 0, NULL))
+   return 1;
+
+   return 0;
+}
+
 /*
  * extent_io.c call back to do delayed allocation processing
  */
@@ -1434,11 +1454,12 @@ static int run_delalloc_range(struct inode *inode, 
struct page *locked_page,
 {
int ret;
struct btrfs_root *root = BTRFS_I(inode)-root;
+   int force_cow = need_force_cow(inode, start, end);
 
-   if (BTRFS_I(inode)-flags  BTRFS_INODE_NODATACOW) {
+   if (BTRFS_I(inode)-flags  BTRFS_INODE_NODATACOW  !force_cow) {
ret = run_delalloc_nocow(inode, locked_page, start, end,
 page_started, 1, nr_written);
-   } else if (BTRFS_I(inode)-flags  BTRFS_INODE_PREALLOC) {
+   } else if (BTRFS_I(inode)-flags  BTRFS_INODE_PREALLOC  !force_cow) {
ret = run_delalloc_nocow(inode, locked_page, start, end,
 page_started, 0, nr_written);
} else if (!btrfs_test_opt(root, COMPRESS) 
@@ -1535,6 +1556,8 @@ static void btrfs_set_bit_hook(struct inode *inode,
   struct extent_state *state, unsigned long *bits)
 {
 
+   if ((*bits  EXTENT_DEFRAG)  !(*bits  EXTENT_DELALLOC))
+   WARN_ON(1);
/*
 * set_bit and clear bit hooks normally require _irqsave/restore
 * but in this case, we are only testing for the DELALLOC
@@ -1557,6 +1580,8 @@ static void btrfs_set_bit_hook(struct inode *inode,
 root-fs_info-delalloc_batch);
spin_lock(BTRFS_I(inode)-lock);
BTRFS_I(inode)-delalloc_bytes += len;
+   if (*bits  EXTENT_DEFRAG)
+   BTRFS_I(inode)-defrag_bytes += len;
if (do_list  !test_bit(BTRFS_INODE_IN_DELALLOC_LIST,
 BTRFS_I(inode)-runtime_flags))
btrfs_add_delalloc_inodes(root, inode);
@@ -1571,6 +1596,13 @@ static void btrfs_clear_bit_hook(struct inode *inode,
 struct extent_state *state,
 unsigned long *bits)
 {
+   u64 len = state-end + 1 - state-start;
+
+   spin_lock(BTRFS_I(inode)-lock);
+   if ((state-state  EXTENT_DEFRAG)  (*bits  EXTENT_DEFRAG))
+   BTRFS_I(inode)-defrag_bytes -= len;
+   spin_unlock(BTRFS_I(inode)-lock);
+
/*
 * set_bit and clear bit hooks normally require _irqsave/restore
 * but in this case, we are only testing for the DELALLOC
@@ -1578,7 +1610,6 @@ static void btrfs_clear_bit_hook(struct inode *inode,
 */
if ((state-state  EXTENT_DELALLOC)  (*bits  EXTENT_DELALLOC)) {
struct btrfs_root *root = BTRFS_I(inode)-root;
-   u64 len = state-end + 1 - state-start;
bool do_list = !btrfs_is_free_space_inode(inode);
 
if (*bits  EXTENT_FIRST_DELALLOC) {
@@ -8089,6 +8120,7 @@ struct inode *btrfs_alloc_inode(struct super_block *sb)
ei-last_sub_trans = 0;
ei-logged_trans = 0;

[PATCH RESEND 6/9] Btrfs: cleanup the read failure record after write or when the inode is freeing

After the data is written successfully, we should cleanup the read failure 
record
in that range because
- If we set data COW for the file, the range that the failure record pointed to 
is
  mapped to a new place, so it is invalid.
- If we set no data COW for the file, and if there is no error during writting,
  the corrupted data is corrected, so the failure record can be removed. And if
  some errors happen on the mirrors, we also needn't worry about it because the
  failure record will be recreated if we read the same place again.

Sometimes, we may fail to correct the data, so the failure records will be left
in the tree, we need free them when we free the inode or the memory leak 
happens.

Signed-off-by: Miao Xie mi...@cn.fujitsu.com
---
 fs/btrfs/extent_io.c | 34 ++
 fs/btrfs/extent_io.h |  1 +
 fs/btrfs/inode.c |  6 ++
 3 files changed, 41 insertions(+)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 3a64354..d67bb4f 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2003,6 +2003,40 @@ static int free_io_failure(struct inode *inode, struct 
io_failure_record *rec,
 }
 
 /*
+ * Can be called when
+ * - hold extent lock
+ * - under ordered extent
+ * - the inode is freeing
+ */
+void btrfs_free_io_failure_record(struct inode *inode, u64 start, u64 end)
+{
+   struct extent_io_tree *failure_tree = BTRFS_I(inode)-io_failure_tree;
+   struct io_failure_record *failrec;
+   struct extent_state *state, *next;
+
+   if (RB_EMPTY_ROOT(failure_tree-state))
+   return;
+
+   spin_lock(failure_tree-lock);
+   state = find_first_extent_bit_state(failure_tree, start, EXTENT_DIRTY);
+   while (state) {
+   if (state-start  end)
+   break;
+
+   ASSERT(state-end = end);
+
+   next = next_state(state);
+
+   failrec = (struct io_failure_record *)state-private;
+   free_extent_state(state);
+   kfree(failrec);
+
+   state = next;
+   }
+   spin_unlock(failure_tree-lock);
+}
+
+/*
  * this bypasses the standard btrfs submit functions deliberately, as
  * the standard behavior is to write all copies in a raid setup. here we only
  * want to write the one bad copy. so we do the mapping for ourselves and issue
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index ccc264e..d06780b 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -347,6 +347,7 @@ int repair_io_failure(struct btrfs_fs_info *fs_info, u64 
start,
 int end_extent_writepage(struct page *page, int err, u64 start, u64 end);
 int repair_eb_io_failure(struct btrfs_root *root, struct extent_buffer *eb,
 int mirror_num);
+void btrfs_free_io_failure_record(struct inode *inode, u64 start, u64 end);
 #ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS
 noinline u64 find_lock_delalloc_range(struct inode *inode,
  struct extent_io_tree *tree,
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 15902eb..b431c58 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -2670,6 +2670,10 @@ static int btrfs_finish_ordered_io(struct 
btrfs_ordered_extent *ordered_extent)
goto out;
}
 
+   btrfs_free_io_failure_record(inode, ordered_extent-file_offset,
+ordered_extent-file_offset +
+ordered_extent-len - 1);
+
if (test_bit(BTRFS_ORDERED_TRUNCATED, ordered_extent-flags)) {
truncated = true;
logical_len = ordered_extent-truncated_len;
@@ -4745,6 +4749,8 @@ void btrfs_evict_inode(struct inode *inode)
/* do we really want it for -i_nlink  0 and zero btrfs_root_refs? */
btrfs_wait_ordered_range(inode, 0, (u64)-1);
 
+   btrfs_free_io_failure_record(inode, 0, (u64)-1);
+
if (root-fs_info-log_root_recovering) {
BUG_ON(test_bit(BTRFS_INODE_HAS_ORPHAN_ITEM,
 BTRFS_I(inode)-runtime_flags));
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH RESEND 5/9] Btrfs: fix missing error handler if submiting re-read bio fails

We forgot to free failure record and bio after submitting re-read bio failed,
fix it.

Signed-off-by: Miao Xie mi...@cn.fujitsu.com
---
 fs/btrfs/extent_io.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 23398ad..3a64354 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2345,6 +2345,11 @@ static int bio_readpage_error(struct bio *failed_bio, 
u64 phy_offset,
ret = tree-ops-submit_bio_hook(inode, read_mode, bio,
 failrec-this_mirror,
 failrec-bio_flags, 0);
+   if (ret) {
+   free_io_failure(inode, failrec, 0);
+   bio_put(bio);
+   }
+
return ret;
 }
 
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 9/9] Btrfs: fix writing data into the seed filesystem

If we mounted a seed filesystem with degraded option, and then added a new
device into the seed filesystem, then we found adding device failed because
of the IO failure.

Steps to reproduce:
 # mkfs.btrfs -d raid1 -m raid1 dev0 dev1
 # btrfstune -S 1 dev0
 # mount dev0 -o degraded mnt
 # btrfs device add -f dev2 mnt

It is because the original didn't set the chunk on the seed device to be
read-only if the degraded flag was set. It was introduced by patch f48b90756,
which fixed the problem the raid1 filesystem became read-only after one device
of it was missing. But this fix method was not right, we should set the 
read-only
flag according to the number of the missing devices, not the degraded mount
option, if the number of the missing devices is less than the max error number
that the profile of the chunk tolerates, we don't set it to be read-only.

Cc: Josef Bacik jba...@fb.com
Signed-off-by: Miao Xie mi...@cn.fujitsu.com
---
 fs/btrfs/volumes.c | 52 
 1 file changed, 36 insertions(+), 16 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 73a82e5..daecfa5 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -4584,12 +4584,31 @@ out:
return ret;
 }
 
+static inline int btrfs_chunk_max_errors(struct map_lookup *map)
+{
+   int max_errors;
+
+   if (map-type  (BTRFS_BLOCK_GROUP_RAID1 |
+BTRFS_BLOCK_GROUP_RAID10 |
+BTRFS_BLOCK_GROUP_RAID5 |
+BTRFS_BLOCK_GROUP_DUP)) {
+   max_errors = 1;
+   } else if (map-type  BTRFS_BLOCK_GROUP_RAID6) {
+   max_errors = 2;
+   } else {
+   max_errors = 0;
+   }
+
+   return max_errors;
+}
+
 int btrfs_chunk_readonly(struct btrfs_root *root, u64 chunk_offset)
 {
struct extent_map *em;
struct map_lookup *map;
struct btrfs_mapping_tree *map_tree = root-fs_info-mapping_tree;
int readonly = 0;
+   int miss_ndevs = 0;
int i;
 
read_lock(map_tree-map_tree.lock);
@@ -4598,18 +4617,27 @@ int btrfs_chunk_readonly(struct btrfs_root *root, u64 
chunk_offset)
if (!em)
return 1;
 
-   if (btrfs_test_opt(root, DEGRADED)) {
-   free_extent_map(em);
-   return 0;
-   }
-
map = (struct map_lookup *)em-bdev;
for (i = 0; i  map-num_stripes; i++) {
+   if (map-stripes[i].dev-missing) {
+   miss_ndevs++;
+   continue;
+   }
+
if (!map-stripes[i].dev-writeable) {
readonly = 1;
-   break;
+   goto end;
}
}
+
+   /*
+* If the number of missing devices is larger than max errors,
+* we can not write the data into that chunk successfully, so
+* set it readonly.
+*/
+   if (miss_ndevs  btrfs_chunk_max_errors(map))
+   readonly = 1;
+end:
free_extent_map(em);
return readonly;
 }
@@ -5220,16 +5248,8 @@ static int __btrfs_map_block(struct btrfs_fs_info 
*fs_info, int rw,
}
}
 
-   if (rw  (REQ_WRITE | REQ_GET_READ_MIRRORS)) {
-   if (map-type  (BTRFS_BLOCK_GROUP_RAID1 |
-BTRFS_BLOCK_GROUP_RAID10 |
-BTRFS_BLOCK_GROUP_RAID5 |
-BTRFS_BLOCK_GROUP_DUP)) {
-   max_errors = 1;
-   } else if (map-type  BTRFS_BLOCK_GROUP_RAID6) {
-   max_errors = 2;
-   }
-   }
+   if (rw  (REQ_WRITE | REQ_GET_READ_MIRRORS))
+   max_errors = btrfs_chunk_max_errors(map);
 
if (dev_replace_is_ongoing  (rw  (REQ_WRITE | REQ_DISCARD)) 
dev_replace-tgtdev != NULL) {
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH RESEND 4/9] Btrfs: fix put dio bio twice when we submit dio bio fail

The caller of btrfs_submit_direct_hook() will put the original dio bio
when btrfs_submit_direct_hook() return a error number, so we needn't
put the original bio in btrfs_submit_direct_hook().

Signed-off-by: Miao Xie mi...@cn.fujitsu.com
---
 fs/btrfs/inode.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index a616fa4..15902eb 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7325,10 +7325,8 @@ static int btrfs_submit_direct_hook(int rw, struct 
btrfs_dio_private *dip,
map_length = orig_bio-bi_iter.bi_size;
ret = btrfs_map_block(root-fs_info, rw, start_sector  9,
  map_length, NULL, 0);
-   if (ret) {
-   bio_put(orig_bio);
+   if (ret)
return -EIO;
-   }
 
if (map_length = orig_bio-bi_iter.bi_size) {
bio = orig_bio;
@@ -7345,6 +7343,7 @@ static int btrfs_submit_direct_hook(int rw, struct 
btrfs_dio_private *dip,
bio = btrfs_dio_bio_alloc(orig_bio-bi_bdev, start_sector, GFP_NOFS);
if (!bio)
return -ENOMEM;
+
bio-bi_private = dip;
bio-bi_end_io = btrfs_end_dio_bio;
atomic_inc(dip-pending_bios);
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH V2 7/9] btrfs: fix null pointer dereference in clone_fs_devices when name is null

From: Anand Jain anand.j...@oracle.com

when one of the device path is missing btrfs_device name is null. So this
patch will check for that.

stack:
BUG: unable to handle kernel NULL pointer dereference at 0010
IP: [812e18c0] strlen+0x0/0x30
[a01cd92a] ? clone_fs_devices+0xaa/0x160 [btrfs]
[a01cdcf7] btrfs_init_new_device+0x317/0xca0 [btrfs]
[81155bca] ? __kmalloc_track_caller+0x15a/0x1a0
[a01d6473] btrfs_ioctl+0xaa3/0x2860 [btrfs]
[81132a6c] ? handle_mm_fault+0x48c/0x9c0
[81192a61] ? __blkdev_put+0x171/0x180
[817a784c] ? __do_page_fault+0x4ac/0x590
[81193426] ? blkdev_put+0x106/0x110
[81179175] ? mntput+0x35/0x40
[8116d4b0] do_vfs_ioctl+0x460/0x4a0
[8115c72e] ? fput+0xe/0x10
[81068033] ? task_work_run+0xb3/0xd0
[8116d547] SyS_ioctl+0x57/0x90
[817a793e] ? do_page_fault+0xe/0x10
[817abe52] system_call_fastpath+0x16/0x1b

reproducer:
mkfs.btrfs -draid1 -mraid1 /dev/sdg1 /dev/sdg2
btrfstune -S 1 /dev/sdg1
modprobe -r btrfs  modprobe btrfs
mount -o degraded /dev/sdg1 /btrfs
btrfs dev add /dev/sdg3 /btrfs

Signed-off-by: Anand Jain anand.j...@oracle.com
Signed-off-by: Miao Xie mi...@cn.fujitsu.com
---
Changelog v1-v2:
- Fix the problem that we forgot to set the missing flag for the cloned device
---
 fs/btrfs/volumes.c | 25 -
 1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 1891541..4731bd6 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -598,16 +598,23 @@ static struct btrfs_fs_devices *clone_fs_devices(struct 
btrfs_fs_devices *orig)
if (IS_ERR(device))
goto error;
 
-   /*
-* This is ok to do without rcu read locked because we hold the
-* uuid mutex so nothing we touch in here is going to disappear.
-*/
-   name = rcu_string_strdup(orig_dev-name-str, GFP_NOFS);
-   if (!name) {
-   kfree(device);
-   goto error;
+   if (orig_dev-missing) {
+   device-missing = 1;
+   fs_devices-missing_devices++;
+   } else {
+   ASSERT(orig_dev-name);
+   /*
+* This is ok to do without rcu read locked because
+* we hold the uuid mutex so nothing we touch in here
+* is going to disappear.
+*/
+   name = rcu_string_strdup(orig_dev-name-str, GFP_NOFS);
+   if (!name) {
+   kfree(device);
+   goto error;
+   }
+   rcu_assign_pointer(device-name, name);
}
-   rcu_assign_pointer(device-name, name);
 
list_add(device-dev_list, fs_devices-devices);
device-fs_devices = fs_devices;
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 8/9] Btrfs: fix unzeroed members in fs_devices when creating a fs from seed fs

We forgot to zero some members in fs_devices when we create new fs_devices
from the one of the seed fs. It would cause the problem that we got wrong
chunk profile when allocating chunks. Fix it.

Signed-off-by: Miao Xie mi...@cn.fujitsu.com
---
 fs/btrfs/volumes.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 4731bd6..73a82e5 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1993,6 +1993,9 @@ static int btrfs_prepare_sprout(struct btrfs_root *root)
fs_devices-seeding = 0;
fs_devices-num_devices = 0;
fs_devices-open_devices = 0;
+   fs_devices-missing_devices = 0;
+   fs_devices-num_can_discard = 0;
+   fs_devices-rotating = 0;
fs_devices-seed = seed_devices;
 
generate_random_uuid(fs_devices-fsid);
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH V2 2/9] btrfs: check generation as replace duplicates devid+uuid

From: Anand Jain anand.j...@oracle.com

When FS in unmounted we need to check generation number as well
since devid+uuid combination could match with the missing replaced
disk when it reappears, and without this patch it might pair with
the replaced disk again.

 device_list_add() function is called in the following threads,
mount device option
mount argument
ioctl BTRFS_IOC_SCAN_DEV (btrfs dev scan)
ioctl BTRFS_IOC_DEVICES_READY (btrfs dev ready dev)
 they have been unit tested to work fine with this patch.

 If the user knows what he is doing and really want to pair with
 replaced disk (which is not a standard operation), then he should
 first clear the kernel btrfs device list in the memory by doing
 the module unload/load and followed with the mount -o device option.

Signed-off-by: Anand Jain anand.j...@oracle.com
Signed-off-by: Wang Shilong wangsl.f...@cn.fujitsu.com
Signed-off-by: Miao Xie mi...@cn.fujitsu.com
---
Changelog v1-v2:
- Fix the over-80-charactor problem and unreasonable error number
---
 fs/btrfs/volumes.c | 22 +-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 16e71a1..1891541 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -532,8 +532,19 @@ static noinline int device_list_add(const char *path,
 * As of now don't allow update to btrfs_fs_device through
 * the btrfs dev scan cli, after FS has been mounted.
 */
-   if (fs_devices-opened)
+   if (fs_devices-opened) {
return -EBUSY;
+   } else {
+   /*
+* That is if the FS is _not_ mounted and if you
+* are here, that means there is more than one
+* disk with same uuid and devid.We keep the one
+* with larger generation number or the last-in if
+* generation are equal.
+*/
+   if (found_transid  device-generation)
+   return -EEXIST;
+   }
 
name = rcu_string_strdup(path, GFP_NOFS);
if (!name)
@@ -546,6 +557,15 @@ static noinline int device_list_add(const char *path,
}
}
 
+   /*
+* Unmount does not free the btrfs_device struct but would zero
+* generation along with most of the other members. So just update
+* it back. We need it to pick the disk with largest generation
+* (as above).
+*/
+   if (!fs_devices-opened)
+   device-generation = found_transid;
+
if (found_transid  fs_devices-latest_trans) {
fs_devices-latest_devid = devid;
fs_devices-latest_trans = found_transid;
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2] Btrfs: fix crash when starting transaction


(2014/07/03 17:30), Miao Xie wrote:

On Tue, 24 Jun 2014 17:46:58 +0100, Filipe David Borba Manana wrote:

Often when starting a transaction we commit the currently running transaction,
which can end up writing block group caches when the current process has its
journal_info set to NULL (and not to a transaction). This makes our assertion
at btrfs_check_data_free_space() (current_journal != NULL) fail, resulting
in a crash/hang. Therefore fix it by setting journal_info.

Two different traces of this issue follow below.

1)

 [51502.241936] BTRFS: assertion failed: current-journal_info, file: 
fs/btrfs/extent-tree.c, line: 3670
 [51502.242213] [ cut here ]
 [51502.242493] kernel BUG at fs/btrfs/ctree.h:3964!
 [51502.242669] invalid opcode:  [#1] SMP DEBUG_PAGEALLOC
 (...)
 [51502.244010] Call Trace:
 [51502.244010]  [a02bc025] 
btrfs_check_data_free_space+0x395/0x3a0 [btrfs]
 [51502.244010]  [a02c3bdc] 
btrfs_write_dirty_block_groups+0x4ac/0x640 [btrfs]
 [51502.244010]  [a0357a6a] commit_cowonly_roots+0x164/0x226 
[btrfs]
 [51502.244010]  [a02d53cd] btrfs_commit_transaction+0x4ed/0xab0 
[btrfs]
 [51502.244010]  [8168ec7b] ? _raw_spin_unlock+0x2b/0x40
 [51502.244010]  [a02d6259] start_transaction+0x459/0x620 [btrfs]
 [51502.244010]  [a02d67ab] btrfs_start_transaction+0x1b/0x20 
[btrfs]
 [51502.244010]  [a02d73e1] __unlink_start_trans+0x31/0xe0 [btrfs]
 [51502.244010]  [a02dea67] btrfs_unlink+0x37/0xc0 [btrfs]
 [51502.244010]  [811bb054] ? do_unlinkat+0x114/0x2a0
 [51502.244010]  [811baebc] vfs_unlink+0xcc/0x150
 [51502.244010]  [811bb1a0] do_unlinkat+0x260/0x2a0
 [51502.244010]  [811a9ef4] ? filp_close+0x64/0x90
 [51502.244010]  [810aaea6] ? trace_hardirqs_on_caller+0x16/0x1e0
 [51502.244010]  [81349cab] ? trace_hardirqs_on_thunk+0x3a/0x3f
 [51502.244010]  [811be9eb] SyS_unlinkat+0x1b/0x40
 [51502.244010]  [81698452] system_call_fastpath+0x16/0x1b
 [51502.244010] Code: 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55 89 f1 48 c7 c2 71 
13 36 a0 48 89 fe 31 c0 48 c7 c7 b8 43 36 a0 48 89 e5 e8 5d b0 32 e1 0f 0b 0f 
1f 44 00 00 55 b9 11 00 00 00 48 89 e5 41 55 49 89 f5
 [51502.244010] RIP  [a03575da] assfail.constprop.88+0x1e/0x20 
[btrfs]

2)

 [25405.097230] BTRFS: assertion failed: current-journal_info, file: 
fs/btrfs/extent-tree.c, line: 3670
 [25405.097488] [ cut here ]
 [25405.097767] kernel BUG at fs/btrfs/ctree.h:3964!
 [25405.097940] invalid opcode:  [#1] SMP DEBUG_PAGEALLOC
 (...)
 [25405.18] Call Trace:
 [25405.18]  [a02bc025] 
btrfs_check_data_free_space+0x395/0x3a0 [btrfs]
 [25405.18]  [a02c3bdc] 
btrfs_write_dirty_block_groups+0x4ac/0x640 [btrfs]
 [25405.18]  [a035755a] commit_cowonly_roots+0x164/0x226 
[btrfs]
 [25405.18]  [a02d53cd] btrfs_commit_transaction+0x4ed/0xab0 
[btrfs]
 [25405.18]  [8109c170] ? bit_waitqueue+0xc0/0xc0
 [25405.18]  [a02d6259] start_transaction+0x459/0x620 [btrfs]
 [25405.18]  [a02d67ab] btrfs_start_transaction+0x1b/0x20 
[btrfs]
 [25405.18]  [a02e3407] btrfs_create+0x47/0x210 [btrfs]
 [25405.18]  [a02d74cc] ? btrfs_permission+0x3c/0x80 [btrfs]
 [25405.18]  [811bc63b] vfs_create+0x9b/0x130
 [25405.18]  [811bcf19] do_last+0x849/0xe20
 [25405.18]  [811b9409] ? link_path_walk+0x79/0x820
 [25405.18]  [811bd5b5] path_openat+0xc5/0x690
 [25405.18]  [810ab07d] ? trace_hardirqs_on+0xd/0x10
 [25405.18]  [811cdcd2] ? __alloc_fd+0x32/0x1d0
 [25405.18]  [811be2a3] do_filp_open+0x43/0xa0
 [25405.18]  [811cddf1] ? __alloc_fd+0x151/0x1d0
 [25405.18]  [811abcfc] do_sys_open+0x13c/0x230
 [25405.18]  [810aaea6] ? trace_hardirqs_on_caller+0x16/0x1e0
 [25405.18]  [811abe12] SyS_open+0x22/0x30
 [25405.18]  [81698452] system_call_fastpath+0x16/0x1b
 [25405.18] Code: 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55 89 f1 48 c7 c2 51 
13 36 a0 48 89 fe 31 c0 48 c7 c7 d0 43 36 a0 48 89 e5 e8 6d b5 32 e1 0f 0b 0f 
1f 44 00 00 55 b9 11 00 00 00 48 89 e5 41 55 49 89 f5
 [25405.18] RIP  [a03570ca] assfail.constprop.88+0x1e/0x20 
[btrfs]

Signed-off-by: Filipe David Borba Manana fdman...@gmail.com
---

V2: Removed test for current-journal_info == NULL. At this point it's
 always expected to be NULL.


Reviewed-by: Miao Xie mi...@cn.fujitsu.com


Let me clarify my understanding since I'm not good at the transaction code.

* What is the route cause?

  When start_transaction() is called with current-journal_transaction == NULL,
  we

Re: [PATCH v2] Btrfs: fix crash when starting transaction

On Thu, 3 Jul 2014 19:32:18 +0900, Satoru Takeuchi wrote:
 (2014/07/03 17:30), Miao Xie wrote:
 On Tue, 24 Jun 2014 17:46:58 +0100, Filipe David Borba Manana wrote:
 Often when starting a transaction we commit the currently running 
 transaction,
 which can end up writing block group caches when the current process has its
 journal_info set to NULL (and not to a transaction). This makes our 
 assertion
 at btrfs_check_data_free_space() (current_journal != NULL) fail, resulting
 in a crash/hang. Therefore fix it by setting journal_info.

 Two different traces of this issue follow below.

 1)

  [51502.241936] BTRFS: assertion failed: current-journal_info, file: 
 fs/btrfs/extent-tree.c, line: 3670
  [51502.242213] [ cut here ]
  [51502.242493] kernel BUG at fs/btrfs/ctree.h:3964!
  [51502.242669] invalid opcode:  [#1] SMP DEBUG_PAGEALLOC
  (...)
  [51502.244010] Call Trace:
  [51502.244010]  [a02bc025] 
 btrfs_check_data_free_space+0x395/0x3a0 [btrfs]
  [51502.244010]  [a02c3bdc] 
 btrfs_write_dirty_block_groups+0x4ac/0x640 [btrfs]
  [51502.244010]  [a0357a6a] commit_cowonly_roots+0x164/0x226 
 [btrfs]
  [51502.244010]  [a02d53cd] 
 btrfs_commit_transaction+0x4ed/0xab0 [btrfs]
  [51502.244010]  [8168ec7b] ? _raw_spin_unlock+0x2b/0x40
  [51502.244010]  [a02d6259] start_transaction+0x459/0x620 
 [btrfs]
  [51502.244010]  [a02d67ab] btrfs_start_transaction+0x1b/0x20 
 [btrfs]
  [51502.244010]  [a02d73e1] __unlink_start_trans+0x31/0xe0 
 [btrfs]
  [51502.244010]  [a02dea67] btrfs_unlink+0x37/0xc0 [btrfs]
  [51502.244010]  [811bb054] ? do_unlinkat+0x114/0x2a0
  [51502.244010]  [811baebc] vfs_unlink+0xcc/0x150
  [51502.244010]  [811bb1a0] do_unlinkat+0x260/0x2a0
  [51502.244010]  [811a9ef4] ? filp_close+0x64/0x90
  [51502.244010]  [810aaea6] ? 
 trace_hardirqs_on_caller+0x16/0x1e0
  [51502.244010]  [81349cab] ? 
 trace_hardirqs_on_thunk+0x3a/0x3f
  [51502.244010]  [811be9eb] SyS_unlinkat+0x1b/0x40
  [51502.244010]  [81698452] system_call_fastpath+0x16/0x1b
  [51502.244010] Code: 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55 89 f1 
 48 c7 c2 71 13 36 a0 48 89 fe 31 c0 48 c7 c7 b8 43 36 a0 48 89 e5 e8 5d b0 
 32 e1 0f 0b 0f 1f 44 00 00 55 b9 11 00 00 00 48 89 e5 41 55 49 89 f5
  [51502.244010] RIP  [a03575da] 
 assfail.constprop.88+0x1e/0x20 [btrfs]

 2)

  [25405.097230] BTRFS: assertion failed: current-journal_info, file: 
 fs/btrfs/extent-tree.c, line: 3670
  [25405.097488] [ cut here ]
  [25405.097767] kernel BUG at fs/btrfs/ctree.h:3964!
  [25405.097940] invalid opcode:  [#1] SMP DEBUG_PAGEALLOC
  (...)
  [25405.18] Call Trace:
  [25405.18]  [a02bc025] 
 btrfs_check_data_free_space+0x395/0x3a0 [btrfs]
  [25405.18]  [a02c3bdc] 
 btrfs_write_dirty_block_groups+0x4ac/0x640 [btrfs]
  [25405.18]  [a035755a] commit_cowonly_roots+0x164/0x226 
 [btrfs]
  [25405.18]  [a02d53cd] 
 btrfs_commit_transaction+0x4ed/0xab0 [btrfs]
  [25405.18]  [8109c170] ? bit_waitqueue+0xc0/0xc0
  [25405.18]  [a02d6259] start_transaction+0x459/0x620 
 [btrfs]
  [25405.18]  [a02d67ab] btrfs_start_transaction+0x1b/0x20 
 [btrfs]
  [25405.18]  [a02e3407] btrfs_create+0x47/0x210 [btrfs]
  [25405.18]  [a02d74cc] ? btrfs_permission+0x3c/0x80 
 [btrfs]
  [25405.18]  [811bc63b] vfs_create+0x9b/0x130
  [25405.18]  [811bcf19] do_last+0x849/0xe20
  [25405.18]  [811b9409] ? link_path_walk+0x79/0x820
  [25405.18]  [811bd5b5] path_openat+0xc5/0x690
  [25405.18]  [810ab07d] ? trace_hardirqs_on+0xd/0x10
  [25405.18]  [811cdcd2] ? __alloc_fd+0x32/0x1d0
  [25405.18]  [811be2a3] do_filp_open+0x43/0xa0
  [25405.18]  [811cddf1] ? __alloc_fd+0x151/0x1d0
  [25405.18]  [811abcfc] do_sys_open+0x13c/0x230
  [25405.18]  [810aaea6] ? 
 trace_hardirqs_on_caller+0x16/0x1e0
  [25405.18]  [811abe12] SyS_open+0x22/0x30
  [25405.18]  [81698452] system_call_fastpath+0x16/0x1b
  [25405.18] Code: 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55 89 f1 
 48 c7 c2 51 13 36 a0 48 89 fe 31 c0 48 c7 c7 d0 43 36 a0 48 89 e5 e8 6d b5 
 32 e1 0f 0b 0f 1f 44 00 00 55 b9 11 00 00 00 48 89 e5 41 55 49 89 f5
  [25405.18] RIP  [a03570ca] 
 assfail.constprop.88+0x1e/0x20 [btrfs]

 Signed-off-by: Filipe David Borba Manana fdman...@gmail.com
 ---

 V2: Removed test for current-journal_info == NULL. At this point it's
  always expected to be NULL.

 Reviewed-by: Miao Xie mi...@cn.fujitsu.com
 
 Let me clarify my understanding

Re: [PATCH] Btrfs: fix wrong uevent target

CC Anand Jain

Sorry, please ignore this patch.
Anand wrote the same patch several days ago, so this bug fix belongs to Anand
though he NACKed his patch at that time.

Thanks
Miao

On Wed, 2 Jul 2014 17:03:54 +0800, Miao Xie wrote:
 block_device's bd_disk points to the disk, not the object which block_device
 is actually corresponding to(the whole disk or a partition), so we would send
 uevent to the wrong target. Fix it.
 
 Signed-off-by: Miao Xie mi...@cn.fujitsu.com
 ---
  fs/btrfs/volumes.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
 
 diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
 index 95828b0..e8b9214 100644
 --- a/fs/btrfs/volumes.c
 +++ b/fs/btrfs/volumes.c
 @@ -123,7 +123,7 @@ static void btrfs_kobject_uevent(struct block_device 
 *bdev,
  {
   int ret;
  
 - ret = kobject_uevent(disk_to_dev(bdev-bd_disk)-kobj, action);
 + ret = kobject_uevent(part_to_dev(bdev-bd_part)-kobj, action);
   if (ret)
   pr_warn(BTRFS: Sending event '%d' to kobject: '%s' (%p): 
 failed\n,
   action,
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH] Revert btrfs: allow mounting btrfs subvolumes with different ro/rw options

2014-07-03 Thread Tobias Geerinckx-Rice

On 3 July 2014 10:33, Qu Wenruo quwen...@cn.fujitsu.com wrote:
 Oh, sorry for my confusing words.

And I probably should have waited for my frustration with my mail
client/device/public transport to subside before panicking^Creplying.

I use a combination of ro  rw (not insanely nested) subvolumes on a
few pseudo-embedded home/office servers and would like to keep that
arrangement working if possible. I'm also aware that it doesn't
protect against all possible bugs.

 To make it clear, when mentioning 'the whole disk(or partition whatever)' I 
 mean the FS_TREE.
 (Of course not the default subvolume)

 The problem is that, even you mount a subvolume ro, you can still change 
 contents in the subvolume
 through its rw parent subvolume.
 And if a subvolume can still be modified, the ro mount lose it meaning.

That makes so much more sense than my original reading, which was
weird and wrong and implied strange subvol-5-only magic. Sorry!

 So we need special rules to prevent such things.

Not that it matters, but: agreed.

   Tobias
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Btrfs: implement support for fallocate collapse range

2014-07-03 Thread Filipe David Manana

On Thu, Jul 3, 2014 at 5:46 AM, Chandan Rajendra
chan...@linux.vnet.ibm.com wrote:
On Monday 23 Jun 2014 11:25:47 Filipe David Borba Manana wrote:
diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index aeab453..8f1a371 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -2825,12 +2825,12 @@ cow_done:
* It is safe to drop the lock on our parent before we
* go through the expensive btree search on b.
*
- * If we're inserting or deleting (ins_len != 0), then we might
- * be changing slot zero, which may require changing the
parent.
- * So, we can't drop the lock until after we know which slot
- * we're operating on.
+ * If we're inserting, deleting or updating a key (cow != 0),
+ * then we might be changing slot zero, which may require
+ * changing the parent. So, we can't drop the lock until after
+ * we know which slot we're operating on.
*/
- if (!ins_len !p-keep_locks) {
+ if (!cow !p-keep_locks) {
int u = level + 1;

For the key update case (i.e. (ins_len == 0) and (cow == 1)), maybe
we could optimize by having a variable to hold the return value of
should_cow_block(), i.e. it keeps track of whether the current
metadata block was COWed. If the variable indicates that the block was
*not* COWed, then we could release the lock on the parent block since
the update operation (even on slot 0) isn't going to change the
corresponding entry in the parent block.

Hi Chandan,

Just because a node wasn't cowed it doesn't mean it isn't going to be
updated. Further updating the key works bottom-up - we don't know
during btrfs_search_slot() if our caller intends to update a key in
the leaf or just update the item, so we need to return a path here
with all nodes with slot == 0 having a write lock on them. I'm
actually for now basically just undoing this
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=eb653de15987612444b6cde3b0e67b1edd94625f

Anyway right now I have a functional issue to tackle first.

Thanks.

Thanks,
chandan

--
Filipe David Manana,

Reasonable men adapt themselves to the world.
Unreasonable men adapt the world to themselves.
That's why all progress depends on unreasonable men.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: 3.15.1: kernel BUG at fs/btrfs/locking.c:269

2014-07-03 Thread Marc MERLIN

Thanks for the patch. Hopefully this will make it to the next 3.15.x
kernel.

I also went back to 3.14 anyway since the 'blocked for 120 seconds' look
like another instance of deadlocks we've been discussing here.

But just curious:

 [160562.925463] parent transid verify failed on 2776298520576 wanted 41015 
 found 18120

What should I be doing about this?
Does it mean that I do have some kind of corruption/damage on my
filesystem?

Also, is it possible to have all these messages state which devid they
occurred on? I don't even know which device I should be worrying about
right now, and although I'm running scrub now, my understanding is that
scrub doesn't actually look at FS structures and is likely to miss this
anyway.

Thanks,
Marc
-- 
A mouse is a device used to point at the xterm you want to type in - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Btrfs: fix wrong uevent target

2014-07-03 Thread Chris Mason

On 07/03/2014 07:09 AM, Miao Xie wrote:
 CC Anand Jain
 
 Sorry, please ignore this patch.
 Anand wrote the same patch several days ago, so this bug fix belongs to Anand
 though he NACKed his patch at that time.

It certainly looks right, but Anand had mentioned that he had a few
questions on testing.  I've pulled it out for now, but I'll take Anand's
version when you're both happy.

-chris
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] btrfs-progs: add ask_user confirmation for btrfstune clear seeding flag

On Thu, Jul 03, 2014 at 10:06:33AM +0800, Gui Hecheng wrote:
 Clear the seeding flag may cause the original filesystem to be writable,
 which is dangerous.

Can you please describe the dangerous scenario a bit more? This would
also go to the documentation so it's not only to satisfy my curiosity.

Dropping the seeding flag could be dangerous if the filesystem starts in
seeding mode, a new device is added, some writes are done, then
filesystem is unmounted.

Now it's a 2 device filesystem, where the orignal holds some data and
without the seeding flag it would accept new writes. Still ok for me,
though this is probably the time where some user assumptions may break.

 In this case, add user confirmation check when clearing seeding flag.
 Also warn the user that the fs is in a dangerous condition when
 the seeding flag is cleared if it it forced to.

The -y option is tied only to the seeding option, but it should IMO be
more general and called --force.

 Signed-off-by: Gui Hecheng guihc.f...@cn.fujitsu.com
 ---
  btrfstune.c | 24 +++-
  1 file changed, 23 insertions(+), 1 deletion(-)
 
 diff --git a/btrfstune.c b/btrfstune.c
 index 3f2f0cd..0e18088 100644
 --- a/btrfstune.c
 +++ b/btrfstune.c
 @@ -103,6 +104,7 @@ static void print_usage(void)
   fprintf(stderr, \t-S value\tpositive value will enable seeding, zero 
 to disable, negative is not allowed\n);
   fprintf(stderr, \t-r \t\tenable extended inode refs\n);
   fprintf(stderr, \t-x \t\tenable skinny metadata extent refs\n);
 + fprintf(stderr, \t-y \t\tsay yes to clear the seeding flag, make sure 
 that you are aware of the danger\n);

The help text could say someting like

--force\tallow dangerous changes\n

btrfstune only allows setting the bit for extref and skinny-metadata,
unsetting would be dangerous as well.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] btrfs-progs: add mount status check for btrfs-image

On Thu, Jul 03, 2014 at 10:06:34AM +0800, Gui Hecheng wrote:
 The btrfs-image tool should not be run on a mounted filesystem.

Should not, but for some values of sometimes it makes sense, eg.
capturing image of an otherwise quiescent filesystem, a read-only mount
or after a crash.

This utility is used for debugging so I'd prefer to let the user do as
he likes, though printing the warning about the mount status is a good
improvement.

 The undergoing fs operations may change what you have imaged a while ago,
 this makes the image meanmingless.

I'm not familiar with the image format, but maybe we can set a bit in
the header when the filesystem was not captured cleanly.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Btrfs: fix wrong uevent target

2014-07-03 Thread Anand Jain


Chris,

  This fix is theoretically correct but my guess that this would
  solve problem as reported by Qu Wenruo was wrong [1].
  Patch is good to integrate.

Thanks, Anand

[1] Re: [PATCH RFC] btrfs: Add ctime/mtime update for btrfs device 
add/remove.



On 03/07/2014 22:00, Chris Mason wrote:

On 07/03/2014 07:09 AM, Miao Xie wrote:

CC Anand Jain

Sorry, please ignore this patch.
Anand wrote the same patch several days ago, so this bug fix belongs to Anand
though he NACKed his patch at that time.


It certainly looks right, but Anand had mentioned that he had a few
questions on testing.  I've pulled it out for now, but I'll take Anand's
version when you're both happy.

-chris

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH] Revert btrfs: allow mounting btrfs subvolumes with different ro/rw options

2014-07-03 Thread Goffredo Baroncelli

On 07/03/2014 02:28 AM, Qu Wenruo wrote:

  Original Message 
 Subject: Re: [RFC PATCH] Revert btrfs: allow mounting btrfs subvolumes with 
 different ro/rw options
 From: Goffredo Baroncelli kreij...@libero.it
 To: Qu Wenruo quwen...@cn.fujitsu.com, linux-btrfs@vger.kernel.org
 Date: 2014年07月03日 01:48
 On 07/01/2014 11:30 AM, Qu Wenruo wrote:
 This commit has the following problem:
 1) Break the ro mount rule.
 When users mount the whole btrfs ro, it is still possible to mount
 subvol rw and change the contents. Which make the whole fs ro mount
 non-sense.
 Where is the problem ? I see an use case when I want a conservative default: 
 mount all ro except some subvolumes.

 In any case it is not a security problem because if the user has the 
 capability to mount a subvolume, also he has the capability to remount,rw 
 the whole filesystem.

 Not security problem but behavior not consistent.
 If user mount the whole disk ro, he or she want the fs read only and nothing 
 will change in it.
 If you mount a subvol rw, then the whole disk ro expectation is broken. 
 Things will change even the whole
 disk is readonly.

Sorry for bother you again, but there is a thing not clear to me:

If

# mount -o subvolid=5,ro /dev/sda1 /mnt/root
# mount -o subvol=subvolname,rw /dev/sda1 /mnt/subvolname

I suppose that 

# touch /mnt/root/touch-test# 1

fails, and

# touch /mnt/subvolname/touch-test  # 2

succeeded. I understood correctly ? If so this behaviour seems to me correctly.
Different is after mounting the subvolume subvolumename, also the whole 
filesystem results rw (eg: #1 succeeded).

G.Baroncelli

 The problem also happens when a parent subvol is mounted rw but child subvol 
 is mounted ro.
 User can still modify the child subvol through parent subvol, still broke the 
 readonly rule.

 Thanks,
 Qu

-- 
gpg @keyserver.linux.it: Goffredo Baroncelli (kreijackATinwind.it
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/4] btrfs-progs: Allow btrfs_read_dev_super() to read all 3 super for super_recover.

On Thu, Jul 03, 2014 at 05:36:36PM +0800, Qu Wenruo wrote:
 @@ -1182,7 +1183,8 @@ struct btrfs_root *open_ctree_fd(int fp, const char 
 *path, u64 sb_bytenr,
   return info-fs_root;
  }
  
 -int btrfs_read_dev_super(int fd, struct btrfs_super_block *sb, u64 sb_bytenr)
 +int btrfs_read_dev_super(int fd, struct btrfs_super_block *sb, u64 sb_bytenr,
 +  int recover_super)

 + int max_super = recover_super ? BTRFS_SUPER_MIRROR_MAX : 1;

Minor tweak, I've renamed it to super_recover as this is used everywhere
else. No need to resend the patch.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 3/4] btrfs-progs: Add more meaningful return value for btrfs_read_dev_super() and corresponding error string.

On Thu, Jul 03, 2014 at 05:36:37PM +0800, Qu Wenruo wrote:
 --- a/btrfs-find-root.c
 +++ b/btrfs-find-root.c
 @@ -96,7 +96,10 @@ static struct btrfs_root *open_ctree_broken(int fd, const 
 char *device)
   ret = btrfs_read_dev_super(fs_devices-latest_bdev,
  disk_super, fs_info-super_bytenr, 1);
   if (ret) {
 - printk(No valid btrfs found\n);
 + if (ret == -ENOENT)
 + printk(No valid btrfs found\n);

Please use fprintf(stderr, ...) for error messages, until we have a
better message logging helpers.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/4] Add superblock checksum check for btrfs-progs

On Thu, Jul 03, 2014 at 05:36:34PM +0800, Qu Wenruo wrote:
 Before this patchset, btrfs-progs will overall ignore the superblock
 checksum and continue the routine.
 Sometimes this may cause disasters like checking a btrfs with corrupted
 superblock will lead to crash in btrfs-progs.
 
 This patch introduces superblock checksum check into btrfs_read_dev_super(),
 making btrfs-progs much more restricted and robust.
 To allow super-recover to open devices, add options to scan all 3
 superblocks when using super-recover.
 Also updated the related error string and fix a bug in chunk-recover that
 will not be triggered until superblock csum is calculated.
 
 Qu Wenruo (4):
   btrfs-progs: Check superblock's checsum when read dev super
   btrfs-progs: Allow btrfs_read_dev_super() to read all 3 super for
 super_recover.
   btrfs-progs: Add more meaningful return value for
 btrfs_read_dev_super() and corresponding error string.
   btrfs-progs: Fix size for malloc for superblock checksum.

Nice work. I've added 1, 2 and 4 it to integration. Please update the
patch 3 (printf/fprintf).
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] btrfs-progs: prevent select invalid dev super after dev replace

On Thu, Jul 03, 2014 at 10:06:35AM +0800, Gui Hecheng wrote:
 After dev replace, we should not select the superblock of the
 replaced dev. Otherwise, all the superblokcs will be overwritten
 by this invalid superblock.
 
 To prevent this case, let btrfs-select-super check the first
 superblock on the selected dev. If the magic doesn't match,
 then the dev is a replaced dev and error message will show up.

Is this patch needed if Qu's superblock checksum patches are applied?
Thanks.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2] Btrfs: fix crash when starting transaction


(2014/07/03 20:07), Miao Xie wrote:

On Thu, 3 Jul 2014 19:32:18 +0900, Satoru Takeuchi wrote:

(2014/07/03 17:30), Miao Xie wrote:

On Tue, 24 Jun 2014 17:46:58 +0100, Filipe David Borba Manana wrote:

Often when starting a transaction we commit the currently running transaction,
which can end up writing block group caches when the current process has its
journal_info set to NULL (and not to a transaction). This makes our assertion
at btrfs_check_data_free_space() (current_journal != NULL) fail, resulting
in a crash/hang. Therefore fix it by setting journal_info.

Two different traces of this issue follow below.

1)

  [51502.241936] BTRFS: assertion failed: current-journal_info, file: 
fs/btrfs/extent-tree.c, line: 3670
  [51502.242213] [ cut here ]
  [51502.242493] kernel BUG at fs/btrfs/ctree.h:3964!
  [51502.242669] invalid opcode:  [#1] SMP DEBUG_PAGEALLOC
  (...)
  [51502.244010] Call Trace:
  [51502.244010]  [a02bc025] 
btrfs_check_data_free_space+0x395/0x3a0 [btrfs]
  [51502.244010]  [a02c3bdc] 
btrfs_write_dirty_block_groups+0x4ac/0x640 [btrfs]
  [51502.244010]  [a0357a6a] commit_cowonly_roots+0x164/0x226 
[btrfs]
  [51502.244010]  [a02d53cd] btrfs_commit_transaction+0x4ed/0xab0 
[btrfs]
  [51502.244010]  [8168ec7b] ? _raw_spin_unlock+0x2b/0x40
  [51502.244010]  [a02d6259] start_transaction+0x459/0x620 [btrfs]
  [51502.244010]  [a02d67ab] btrfs_start_transaction+0x1b/0x20 
[btrfs]
  [51502.244010]  [a02d73e1] __unlink_start_trans+0x31/0xe0 
[btrfs]
  [51502.244010]  [a02dea67] btrfs_unlink+0x37/0xc0 [btrfs]
  [51502.244010]  [811bb054] ? do_unlinkat+0x114/0x2a0
  [51502.244010]  [811baebc] vfs_unlink+0xcc/0x150
  [51502.244010]  [811bb1a0] do_unlinkat+0x260/0x2a0
  [51502.244010]  [811a9ef4] ? filp_close+0x64/0x90
  [51502.244010]  [810aaea6] ? trace_hardirqs_on_caller+0x16/0x1e0
  [51502.244010]  [81349cab] ? trace_hardirqs_on_thunk+0x3a/0x3f
  [51502.244010]  [811be9eb] SyS_unlinkat+0x1b/0x40
  [51502.244010]  [81698452] system_call_fastpath+0x16/0x1b
  [51502.244010] Code: 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55 89 f1 48 c7 c2 
71 13 36 a0 48 89 fe 31 c0 48 c7 c7 b8 43 36 a0 48 89 e5 e8 5d b0 32 e1 0f 0b 
0f 1f 44 00 00 55 b9 11 00 00 00 48 89 e5 41 55 49 89 f5
  [51502.244010] RIP  [a03575da] assfail.constprop.88+0x1e/0x20 
[btrfs]

2)

  [25405.097230] BTRFS: assertion failed: current-journal_info, file: 
fs/btrfs/extent-tree.c, line: 3670
  [25405.097488] [ cut here ]
  [25405.097767] kernel BUG at fs/btrfs/ctree.h:3964!
  [25405.097940] invalid opcode:  [#1] SMP DEBUG_PAGEALLOC
  (...)
  [25405.18] Call Trace:
  [25405.18]  [a02bc025] 
btrfs_check_data_free_space+0x395/0x3a0 [btrfs]
  [25405.18]  [a02c3bdc] 
btrfs_write_dirty_block_groups+0x4ac/0x640 [btrfs]
  [25405.18]  [a035755a] commit_cowonly_roots+0x164/0x226 
[btrfs]
  [25405.18]  [a02d53cd] btrfs_commit_transaction+0x4ed/0xab0 
[btrfs]
  [25405.18]  [8109c170] ? bit_waitqueue+0xc0/0xc0
  [25405.18]  [a02d6259] start_transaction+0x459/0x620 [btrfs]
  [25405.18]  [a02d67ab] btrfs_start_transaction+0x1b/0x20 
[btrfs]
  [25405.18]  [a02e3407] btrfs_create+0x47/0x210 [btrfs]
  [25405.18]  [a02d74cc] ? btrfs_permission+0x3c/0x80 [btrfs]
  [25405.18]  [811bc63b] vfs_create+0x9b/0x130
  [25405.18]  [811bcf19] do_last+0x849/0xe20
  [25405.18]  [811b9409] ? link_path_walk+0x79/0x820
  [25405.18]  [811bd5b5] path_openat+0xc5/0x690
  [25405.18]  [810ab07d] ? trace_hardirqs_on+0xd/0x10
  [25405.18]  [811cdcd2] ? __alloc_fd+0x32/0x1d0
  [25405.18]  [811be2a3] do_filp_open+0x43/0xa0
  [25405.18]  [811cddf1] ? __alloc_fd+0x151/0x1d0
  [25405.18]  [811abcfc] do_sys_open+0x13c/0x230
  [25405.18]  [810aaea6] ? trace_hardirqs_on_caller+0x16/0x1e0
  [25405.18]  [811abe12] SyS_open+0x22/0x30
  [25405.18]  [81698452] system_call_fastpath+0x16/0x1b
  [25405.18] Code: 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55 89 f1 48 c7 c2 
51 13 36 a0 48 89 fe 31 c0 48 c7 c7 d0 43 36 a0 48 89 e5 e8 6d b5 32 e1 0f 0b 
0f 1f 44 00 00 55 b9 11 00 00 00 48 89 e5 41 55 49 89 f5
  [25405.18] RIP  [a03570ca] assfail.constprop.88+0x1e/0x20 
[btrfs]

Signed-off-by: Filipe David Borba Manana fdman...@gmail.com
---

V2: Removed test for current-journal_info == NULL. At this point it's
  always expected to be NULL.


Reviewed-by: Miao Xie mi...@cn.fujitsu.com


Let me clarify my understanding since

Re: [PATCH] Btrfs: assert send doesn't attempt to start transactions

(2014/06/25 1:48), Filipe David Borba Manana wrote:
 When starting a transaction just assert that current-journal_info
 doesn't contain a send transaction stub, since send isn't supposed
 to start transactions and when it finishes (either successfully or
 not) it's supposed to set current-journal_info to NULL.
 
 This is motivated by the change titled:
 
  Btrfs: fix crash when starting transaction
 
 Signed-off-by: Filipe David Borba Manana fdman...@gmail.com

Reviewed-by: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com

Thanks,
Satoru

 ---
   fs/btrfs/transaction.c | 6 --
   1 file changed, 4 insertions(+), 2 deletions(-)
 
 diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
 index 614eac3..47870ca 100644
 --- a/fs/btrfs/transaction.c
 +++ b/fs/btrfs/transaction.c
 @@ -386,11 +386,13 @@ start_transaction(struct btrfs_root *root, u64 
 num_items, unsigned int type,
   bool reloc_reserved = false;
   int ret;
   
 + /* Send isn't supposed to start transactions. */
 + ASSERT(current-journal_info != (void *)BTRFS_SEND_TRANS_STUB);
 +
   if (test_bit(BTRFS_FS_STATE_ERROR, root-fs_info-fs_state))
   return ERR_PTR(-EROFS);
   
 - if (current-journal_info 
 - current-journal_info != (void *)BTRFS_SEND_TRANS_STUB) {
 + if (current-journal_info) {
   WARN_ON(type  TRANS_EXTWRITERS);
   h = current-journal_info;
   h-use_count++;
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/4] Add superblock checksum check for btrfs-progs

 Original Message 
Subject: Re: [PATCH 0/4] Add superblock checksum check for btrfs-progs
From: David Sterba dste...@suse.cz
To: Qu Wenruo quwen...@cn.fujitsu.com
Date: 2014年07月04日 01:57

On Thu, Jul 03, 2014 at 05:36:34PM +0800, Qu Wenruo wrote:

Before this patchset, btrfs-progs will overall ignore the superblock
checksum and continue the routine.
Sometimes this may cause disasters like checking a btrfs with corrupted
superblock will lead to crash in btrfs-progs.

This patch introduces superblock checksum check into btrfs_read_dev_super(),
making btrfs-progs much more restricted and robust.
To allow super-recover to open devices, add options to scan all 3
superblocks when using super-recover.
Also updated the related error string and fix a bug in chunk-recover that
will not be triggered until superblock csum is calculated.

Qu Wenruo (4):
   btrfs-progs: Check superblock's checsum when read dev super
   btrfs-progs: Allow btrfs_read_dev_super() to read all 3 super for
 super_recover.
   btrfs-progs: Add more meaningful return value for
 btrfs_read_dev_super() and corresponding error string.
   btrfs-progs: Fix size for malloc for superblock checksum.

Nice work. I've added 1, 2 and 4 it to integration. Please update the
patch 3 (printf/fprintf).

Thanks for the review and minor tweak for patch 2.

I'll send v2 version of patch 3 soon.

Thank,
Qu
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] btrfs-progs: prevent select invalid dev super after dev replace

2014-07-03 Thread Gui Hecheng

On Thu, 2014-07-03 at 20:10 +0200, David Sterba wrote:
 On Thu, Jul 03, 2014 at 10:06:35AM +0800, Gui Hecheng wrote:
  After dev replace, we should not select the superblock of the
  replaced dev. Otherwise, all the superblokcs will be overwritten
  by this invalid superblock.
  
  To prevent this case, let btrfs-select-super check the first
  superblock on the selected dev. If the magic doesn't match,
  then the dev is a replaced dev and error message will show up.
 
 Is this patch needed if Qu's superblock checksum patches are applied?
 Thanks.
Hmm...I think it is not neccessary then, please *ignore* this one.
Thanks, David.

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Is btrfs related to OOM death problems on my 8GB server with both 3.15.1 and 3.14?

2014-07-03 Thread Marc MERLIN

I upgraded my server from 3.14 to 3.15.1 last week, and since then it's been
running out of memory and deadlocking (panic= doesn't even work).
I downgraded back to 3.14, but I already had the problem once since then.

OOM comes in, even though I have 0 swap used and AFAIK all my RAM isn't
gone, it then fails to kill enough stuff and eventually it dies like this:
[80943.542209] Swap cache stats: add 814596, delete 814595, find 2567491/2808869
[80943.565106] Free swap  = 15612448kB
[80943.577607] Total swap = 15616764kB
[80943.589766] 2021665 pages RAM
[80943.600281] 0 pages HighMem/MovableOnly
[80943.613284] 28468 pages reserved
[80943.624330] 0 pages hwpoisoned
[80943.634824] [ pid ]   uid  tgid total_vm  rss nr_ptes swapents 
oom_score_adj name
[80943.659669] [  918] 0   918  8550   5  236 
-1000 udevd
[80943.684789] [ 8022] 0  8022 30740   5   89 
-1000 auditd
[80943.710154] [ 8253] 0  8253 18130   6  123 
-1000 sshd
[80943.735024] [12001] 0 12001  8540   5  241 
-1000 udevd
[80943.760152] [18969] 0 18969  8540   5  223 
-1000 udevd
[80943.785293] Kernel panic - not syncing: Out of memory and no killable 
processes...


Here is my more recent capture on 3.14 when I was able to catch it
before the panic and dump a bunch of sysrq data.
http://marc.merlins.org/tmp/btrfs-oom.txt

Things to note in that log:
[90621.895715] 2962 total pagecache pages
[90621.895716] 5 pages in swap cache
[90621.895717] Swap cache stats: add 145004, delete 144999, find 3314901/3316382
[90621.895718] Free swap  = 15230536kB
[90621.895718] Total swap = 15616764kB
[90621.895718] Total swap = 15616764kB
[90621.895719] 2021665 pages RAM
[90621.895720] 0 pages HighMem/MovableOnly
[90621.895720] 28468 pages reserved
I'm not a VM person so I don't know how to read this, but am I out of RAM
but not out of swap (since clearly none was used), or am I out of a specific
memory region that is causing me problems?

I'm not 100% certain btrfs is to blame, but somehow it's suspect when
ugprading to 3.15 and getting btrfs problems then caused my 3 months running
fine 3.14.0 kernel also to die with the same OOM problems.
Then again, I understand it could be red herring. Suggestions either way are
appreciated :)

I tried raising this:
gargamel:~# echo 100  /proc/sys/vm/swappiness

But so far I have too much unused RAM for any swap to be touched.
gargamel:~# free
 total   used   free sharedbuffers cached
Mem:   789479249385962956196  0   23962909204
-/+ buffers/cache:20269965867796
Swap: 15616764  0   15616764


The log is too big to paste here, but you can grep it for:
[90817.715833] SysRq : Show Memory
[90817.715833] SysRq : Show Memory
[90817.715833] SysRq : Show Memory
[90893.571151] SysRq : Show backtrace of all active CPUs
[90921.781599] SysRq : Show Blocked State
[91075.976611] SysRq : Show State
[91406.972046] SysRq : Terminate All Tasks
[91410.771584] SysRq : Emergency Remount R/O
[91413.222483] SysRq : Emergency Sync
[91430.316955] SysRq : Power Off
^^^
note the kernel was wedged enough that Power Off didn't work, apparently 
because it failed
to swap:
[91447.490142] CPU: 3 PID: 48 Comm: kswapd0 Not tainted 
3.14.0-amd64-i915-preempt-20140216 #2
[91447.490143] Hardware name: System manufacturer System Product Name/P8H67-M 
PRO, BIOS 3806 08/20/2012
[91447.490145] task: 8802126b6490 ti: 8802126e4000 task.ti: 
8802126e4000
[91447.490146] RIP: 0010:[810898f5]  [810898f5] 
do_raw_spin_lock+0x23/0x27

Right after OOM started kicking in, console showed apparent deadlocks in btrfs.
But is it possible that btrfs is then also eating all my memory somehow?

You can find the long details here:
http://marc.merlins.org/tmp/btrfs-oom.txt

[90801.680821] INFO: task btrfs-transacti:3433 blocked for more than 120 
seconds.
[90801.712345]   Not tainted 3.14.0-amd64-i915-preempt-20140216 #2
[90801.734394] echo 0  /proc/sys/kernel/hung_task_timeout_secs disables this 
message.
[90801.882691] btrfs-transacti D 88021387e800 0  3433  2 0x
[90801.904863]  88020b20de10 0046 88020b20dfd8 
88021387e2d0
[90801.928448]  000141c0 88021387e2d0 880211e94800 
880029c00dc0
[90801.952015]  8802009d1f28 8802009d1ed0  
88020b20de20
[90801.975701] Call Trace:
[90801.984443]  [8160d2a1] schedule+0x73/0x75
[90802.000438]  [8122b575] btrfs_commit_transaction+0x330/0x849
[90802.021140]  [81085116] ? finish_wait+0x65/0x65
[90802.038438]  [81227c48] transaction_kthread+0xf8/0x1ab
[90802.057571]  [81227b50] ? btrfs_cleanup_transaction+0x43f/0x43f
[90802.079092]  [8106bc62] kthread+0xae/0xb6
[90802.094838]  [8106bbb4] ? __kthread_parkme+0x61/0x61

Re: [RFC PATCH] Revert btrfs: allow mounting btrfs subvolumes with different ro/rw options

 Original Message 
Subject: Re: [RFC PATCH] Revert btrfs: allow mounting btrfs subvolumes 
with different ro/rw options

From: Goffredo Baroncelli kreij...@inwind.it
To: Qu Wenruo quwen...@cn.fujitsu.com, linux-btrfs@vger.kernel.org
Date: 2014年07月04日 01:37

On 07/03/2014 02:28 AM, Qu Wenruo wrote:

 Original Message 
Subject: Re: [RFC PATCH] Revert btrfs: allow mounting btrfs subvolumes with 
different ro/rw options
From: Goffredo Baroncelli kreij...@libero.it
To: Qu Wenruo quwen...@cn.fujitsu.com, linux-btrfs@vger.kernel.org
Date: 2014年07月03日 01:48

On 07/01/2014 11:30 AM, Qu Wenruo wrote:

This commit has the following problem:
1) Break the ro mount rule.
When users mount the whole btrfs ro, it is still possible to mount
subvol rw and change the contents. Which make the whole fs ro mount
non-sense.

Where is the problem ? I see an use case when I want a conservative default: 
mount all ro except some subvolumes.

In any case it is not a security problem because if the user has the capability 
to mount a subvolume, also he has the capability to remount,rw the whole 
filesystem.

Not security problem but behavior not consistent.
If user mount the whole disk ro, he or she want the fs read only and nothing 
will change in it.
If you mount a subvol rw, then the whole disk ro expectation is broken. Things 
will change even the whole
disk is readonly.

Sorry for bother you again, but there is a thing not clear to me:

If

 # mount -o subvolid=5,ro /dev/sda1 /mnt/root
 # mount -o subvol=subvolname,rw /dev/sda1 /mnt/subvolname

I suppose that

 # touch /mnt/root/touch-test   # 1

fails, and

 # touch /mnt/subvolname/touch-test # 2

succeeded. I understood correctly ?

Your understanding is right and that is current behavior.

But that should not be the correct behavior.

If you mount fs_tree ro, btrfs should ensure the whole fs_tree(including 
all the subvolumes) ro.
Or the whole fs_tree is not restricted readonly since you can modify 
contents inside the rw subvolume,

and it's part of the fs_tree.(partly ro and partly rw status)

IMO the perfect logical should be like the following:
1) ro mounted subvolume will force all the children subvolumes only ro 
mountable

subvol 5 (mounted ro /)
├── subvol 257 (mounted rw /mnt/btrfrs)
So above mounted should not be allowed.

But the following mount should be OK:
subvol 5 (mounted rw /)
├── subvol 257 (mounted ro /mnt/btrfrs)

2) ro mounted subvolume will not be modified even through the rw mounted 
parent subvolume.

Only this will ensure restricted ro mount option.

If anyone has any other ideas about it, I'm happy to listen.

Thanks,
Qu

  If so this behaviour seems to me correctly.
Different is after mounting the subvolume subvolumename, also the whole 
filesystem results rw (eg: #1 succeeded).

G.Baroncelli

The problem also happens when a parent subvol is mounted rw but child subvol is 
mounted ro.
User can still modify the child subvol through parent subvol, still broke the 
readonly rule.

Thanks,
Qu

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 3/4] btrfs-progs: Add more meaningful return value for btrfs_read_dev_super() and corresponding error string.

Since btrfs_read_dev_super() now can distinguish non-btrfs fs and
corrupted superblock thanks for the newly introduced super csum check,
the return value and corresponding error string should also be updated
to print more meaningful errors for end users.

Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com
---
v2:
  Use fprintf(stderr,) to replace the old printk.
---
 btrfs-find-root.c |  5 -
 chunk-recover.c   | 10 --
 cmds-filesystem.c |  7 ++-
 disk-io.c | 36 ++--
 utils.c   |  5 -
 volumes.c |  4 +---
 6 files changed, 49 insertions(+), 18 deletions(-)

diff --git a/btrfs-find-root.c b/btrfs-find-root.c
index e31a9b5..7932308 100644
--- a/btrfs-find-root.c
+++ b/btrfs-find-root.c
@@ -96,7 +96,10 @@ static struct btrfs_root *open_ctree_broken(int fd, const 
char *device)
ret = btrfs_read_dev_super(fs_devices-latest_bdev,
   disk_super, fs_info-super_bytenr, 1);
if (ret) {
-   printk(No valid btrfs found\n);
+   if (ret == -ENOENT)
+   fprintf(stderr, No valid btrfs found\n);
+   if (ret == -EIO)
+   fprintf(stderr, Superblock is corrupted\n);
goto out_devices;
}
 
diff --git a/chunk-recover.c b/chunk-recover.c
index 9baedd7..4a16110 100644
--- a/chunk-recover.c
+++ b/chunk-recover.c
@@ -1285,7 +1285,10 @@ open_ctree_with_broken_chunk(struct recover_control *rc)
ret = btrfs_read_dev_super(fs_info-fs_devices-latest_bdev,
   disk_super, fs_info-super_bytenr, 1);
if (ret) {
-   fprintf(stderr, No valid btrfs found\n);
+   if (ret == -ENOENT)
+   fprintf(stderr, No valid btrfs found\n);
+   if (ret == -EIO)
+   fprintf(stderr, Superblock is corrupted\n);
goto out_devices;
}
 
@@ -1351,7 +1354,10 @@ static int recover_prepare(struct recover_control *rc, 
char *path)
 
ret = btrfs_read_dev_super(fd, sb, BTRFS_SUPER_INFO_OFFSET, 1);
if (ret) {
-   fprintf(stderr, read super block error\n);
+   if (ret == -ENOENT)
+   fprintf(stderr, No valid btrfs found\n);
+   if (ret == -EIO)
+   fprintf(stderr, Superblock is corrupted\n);
goto fail_free_sb;
}
 
diff --git a/cmds-filesystem.c b/cmds-filesystem.c
index d2e46dc..d58397d 100644
--- a/cmds-filesystem.c
+++ b/cmds-filesystem.c
@@ -604,9 +604,14 @@ static int cmd_show(int argc, char **argv)
} else {
ret = dev_to_fsid(search, fsid);
if (ret) {
-   fprintf(stderr,
+   if (ret == -ENOENT)
+   fprintf(stderr,
ERROR: No btrfs on 
%s\n,
search);
+   if (ret == -EIO)
+   fprintf(stderr,
+   Superblock is 
corrupted on %s\n,
+   search);
return 1;
}
uuid_unparse(fsid, uuid_buf);
diff --git a/disk-io.c b/disk-io.c
index 1bd9fae..5ee7edd 100644
--- a/disk-io.c
+++ b/disk-io.c
@@ -990,7 +990,11 @@ int btrfs_scan_fs_devices(int fd, const char *path,
ret = btrfs_scan_one_device(fd, path, fs_devices,
total_devs, sb_bytenr, super_recover);
if (ret) {
-   fprintf(stderr, No valid Btrfs found on %s\n, path);
+   if (ret == -ENOENT)
+   fprintf(stderr, No valid Btrfs found on %s\n, path);
+   if (ret == -EIO)
+   fprintf(stderr, Superblock is corrupted on %s\n,
+   path);
return ret;
}
 
@@ -1101,7 +1105,10 @@ static struct btrfs_fs_info *__open_ctree_fd(int fp, 
const char *path,
else
ret = btrfs_read_dev_super(fp, disk_super, sb_bytenr, 0);
if (ret) {
-   printk(No valid btrfs found\n);
+   if (ret == -ENOENT)
+   fprintf(stderr, No valid btrfs found\n);
+   if (ret == -EIO)
+   fprintf(stderr, Superblock is corrupted\n);
goto out_devices;
}
 
@@ -1201,11 +1208,11 @@ int btrfs_read_dev_super(int fd, struct 
btrfs_super_block *sb, u64 sb_bytenr,
if (sb_bytenr != BTRFS_SUPER_INFO_OFFSET) {
ret =

Re: [PATCH] Btrfs: fix wrong uevent target

 Original Message 
Subject: Re: [PATCH] Btrfs: fix wrong uevent target
From: Anand Jain anand.j...@oracle.com
To: Chris Mason c...@fb.com
Date: 2014年07月04日 01:32

Chris,

  This fix is theoretically correct but my guess that this would
  solve problem as reported by Qu Wenruo was wrong [1].
  Patch is good to integrate.

Thanks, Anand

[1] Re: [PATCH RFC] btrfs: Add ctime/mtime update for btrfs device 
add/remove.
Yes, whatever uevent improvement will not solve the problem of 'btrfs 
dev scan; btrfs dev del; btrfs dev scan'.
Since uevent event is send from kernel and received by ueventd, then 
ueventd goes to update the device file ctime/mtine.
The uevent procedure is always asynchronized, so it will not fix the 
'btrfs dev scan' libblkid cache problem.

Althogh my RFC patch is ugly, it will provide a synchronzied method to 
update ctime/mtime from kernel.

Thank,
Qu

On 03/07/2014 22:00, Chris Mason wrote:

On 07/03/2014 07:09 AM, Miao Xie wrote:

CC Anand Jain

Sorry, please ignore this patch.
Anand wrote the same patch several days ago, so this bug fix belongs 
to Anand

though he NACKed his patch at that time.

It certainly looks right, but Anand had mentioned that he had a few
questions on testing.  I've pulled it out for now, but I'll take Anand's
version when you're both happy.

-chris

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RESEND 4/9] Btrfs: fix put dio bio twice when we submit dio bio fail

Hi Miao,

(2014/07/03 19:22), Miao Xie wrote:
 The caller of btrfs_submit_direct_hook() will put the original dio bio
 when btrfs_submit_direct_hook() return a error number, so we needn't
 put the original bio in btrfs_submit_direct_hook().
 
 Signed-off-by: Miao Xie mi...@cn.fujitsu.com

Reviewed-by: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com

Here is the review result, CMIIW.

call trace:
  btrfs_submit_direct
- btrfs_submit_direct_hook

fs/btrfs/inode.c:
===
static int btrfs_submit_direct_hook(int rw, struct btrfs_dio_private *dip,
int skip_sum)
{
...
struct bio *orig_bio = dip-orig_bio;
...
map_length = orig_bio-bi_iter.bi_size;
ret = btrfs_map_block(root-fs_info, rw, start_sector  9,
  map_length, NULL, 0);
if (ret) {
bio_put(orig_bio); # (1)
return -EIO;
}
...
}
...
static void btrfs_submit_direct(int rw, struct bio *dio_bio,
struct inode *inode, loff_t file_offset)
{
dip = kmalloc(sizeof(*dip) + sum_len, GFP_NOFS);
if (!dip) {
ret = -ENOMEM;
goto free_io_bio; # (2)
}
...
ret = btrfs_submit_direct_hook(rw, dip, skip_sum);
if (!ret)
return;

free_io_bio:
bio_put(io_bio); # (3)
...
}
===

If btrfs_map_block() fails in btrfs_submit_direct_hook(), it put
orig_bio at (1) and return -EIO. Then caller, btrfs_submit_direct()
free the same bio at (3). Since (3) is also used for other error
handling (2), I consider your way, removing (1), is better.

Thanks,
Satoru

 ---
   fs/btrfs/inode.c | 5 ++---
   1 file changed, 2 insertions(+), 3 deletions(-)
 
 diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
 index a616fa4..15902eb 100644
 --- a/fs/btrfs/inode.c
 +++ b/fs/btrfs/inode.c
 @@ -7325,10 +7325,8 @@ static int btrfs_submit_direct_hook(int rw, struct 
 btrfs_dio_private *dip,
   map_length = orig_bio-bi_iter.bi_size;
   ret = btrfs_map_block(root-fs_info, rw, start_sector  9,
 map_length, NULL, 0);
 - if (ret) {
 - bio_put(orig_bio);
 + if (ret)
   return -EIO;
 - }
   
   if (map_length = orig_bio-bi_iter.bi_size) {
   bio = orig_bio;
 @@ -7345,6 +7343,7 @@ static int btrfs_submit_direct_hook(int rw, struct 
 btrfs_dio_private *dip,
   bio = btrfs_dio_bio_alloc(orig_bio-bi_bdev, start_sector, GFP_NOFS);
   if (!bio)
   return -ENOMEM;
 +
   bio-bi_private = dip;
   bio-bi_end_io = btrfs_end_dio_bio;
   atomic_inc(dip-pending_bios);
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] btrfs-progs: add mount status check for btrfs-image

2014-07-03 Thread Gui Hecheng

On Thu, 2014-07-03 at 18:58 +0200, David Sterba wrote:
 On Thu, Jul 03, 2014 at 10:06:34AM +0800, Gui Hecheng wrote:
  The btrfs-image tool should not be run on a mounted filesystem.
 
 Should not, but for some values of sometimes it makes sense, eg.
 capturing image of an otherwise quiescent filesystem, a read-only mount
 or after a crash.

Ah, read-only mount is really a case.

 This utility is used for debugging so I'd prefer to let the user do as
 he likes, though printing the warning about the mount status is a good
 improvement.

I agree, then I'll keep the check_mounted and just give a prompt and let
it continue.

  The undergoing fs operations may change what you have imaged a while ago,
  this makes the image meanmingless.
 
 I'm not familiar with the image format, but maybe we can set a bit in
 the header when the filesystem was not captured cleanly.

Hmm...This is a point, I think I'll give it a try in another patch.


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] btrfs-progs: add ask_user confirmation for btrfstune clear seeding flag

2014-07-03 Thread Gui Hecheng

On Thu, 2014-07-03 at 18:51 +0200, David Sterba wrote:
 On Thu, Jul 03, 2014 at 10:06:33AM +0800, Gui Hecheng wrote:
  Clear the seeding flag may cause the original filesystem to be writable,
  which is dangerous.
 
 Can you please describe the dangerous scenario a bit more? This would
 also go to the documentation so it's not only to satisfy my curiosity.

Yes, I'll include a certain scenario in the changelog of a v2 patch.

 Dropping the seeding flag could be dangerous if the filesystem starts in
 seeding mode, a new device is added, some writes are done, then
 filesystem is unmounted.
 
 Now it's a 2 device filesystem, where the orignal holds some data and
 without the seeding flag it would accept new writes. Still ok for me,
 though this is probably the time where some user assumptions may break.
 
  In this case, add user confirmation check when clearing seeding flag.
  Also warn the user that the fs is in a dangerous condition when
  the seeding flag is cleared if it it forced to.
 
 The -y option is tied only to the seeding option, but it should IMO be
 more general and called --force.

I agree.

  Signed-off-by: Gui Hecheng guihc.f...@cn.fujitsu.com
  ---
   btrfstune.c | 24 +++-
   1 file changed, 23 insertions(+), 1 deletion(-)
  
  diff --git a/btrfstune.c b/btrfstune.c
  index 3f2f0cd..0e18088 100644
  --- a/btrfstune.c
  +++ b/btrfstune.c
  @@ -103,6 +104,7 @@ static void print_usage(void)
  fprintf(stderr, \t-S value\tpositive value will enable seeding, zero 
  to disable, negative is not allowed\n);
  fprintf(stderr, \t-r \t\tenable extended inode refs\n);
  fprintf(stderr, \t-x \t\tenable skinny metadata extent refs\n);
  +   fprintf(stderr, \t-y \t\tsay yes to clear the seeding flag, make sure 
  that you are aware of the danger\n);
 
 The help text could say someting like
 
   --force\tallow dangerous changes\n
 
 btrfstune only allows setting the bit for extref and skinny-metadata,
 unsetting would be dangerous as well.
On my part, I don't find any scenarioes for these two, could you please
remind me more?

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Quota Ignored On write

2014-07-03 Thread Kevin Brandstatter

basing of the latest for-linus branch i found i can write way more than
the quota

btrfs quota enable
btrfs subvolume create test
btrfs qgruop limit 1G test
dd if=/dev/zero of=test/file bs=1024 count=150
output:
150+0 records in
150+0 records out
153600 bytes (1.5 GB) copied, 5.91909 s, 259 MB/s

thats a full half gig over the quota limit. I noticed some changes to
the quota
accounting in the logs, what changed that could cause this?

-Kevin Brandstatter
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Quota Ignored On write


Hi Kevin,

(2014/07/04 11:13), Kevin Brandstatter wrote:

basing of the latest for-linus branch i found i can write way more than
the quota

btrfs quota enable
btrfs subvolume create test
btrfs qgruop limit 1G test
dd if=/dev/zero of=test/file bs=1024 count=150
output:
150+0 records in
150+0 records out
153600 bytes (1.5 GB) copied, 5.91909 s, 259 MB/s

thats a full half gig over the quota limit. I noticed some changes to
the quota
accounting in the logs, what changed that could cause this?


Do you remember what kernel version quota worked correctly?

Thanks,
Satoru



-Kevin Brandstatter
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Quota Ignored On write

2014-07-03 Thread Kevin Brandstatter

3.15.3 via arch/ and from linux-git

-Kevin

On 07/03/2014 09:21 PM, Satoru Takeuchi wrote:
 Hi Kevin,

 (2014/07/04 11:13), Kevin Brandstatter wrote:
 basing of the latest for-linus branch i found i can write way more than
 the quota

 btrfs quota enable
 btrfs subvolume create test
 btrfs qgruop limit 1G test
 dd if=/dev/zero of=test/file bs=1024 count=150
 output:
 150+0 records in
 150+0 records out
 153600 bytes (1.5 GB) copied, 5.91909 s, 259 MB/s

 thats a full half gig over the quota limit. I noticed some changes to
 the quota
 accounting in the logs, what changed that could cause this?

 Do you remember what kernel version quota worked correctly?

 Thanks,
 Satoru


 -Kevin Brandstatter
 -- 
 To unsubscribe from this list: send the line unsubscribe
 linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Quota Ignored On write


(2014/07/04 11:25), Kevin Brandstatter wrote:

3.15.3 via arch/ and from linux-git


OK, I'll bisect it.

Satoru



-Kevin

On 07/03/2014 09:21 PM, Satoru Takeuchi wrote:

Hi Kevin,

(2014/07/04 11:13), Kevin Brandstatter wrote:

basing of the latest for-linus branch i found i can write way more than
the quota

btrfs quota enable
btrfs subvolume create test
btrfs qgruop limit 1G test
dd if=/dev/zero of=test/file bs=1024 count=150
output:
150+0 records in
150+0 records out
153600 bytes (1.5 GB) copied, 5.91909 s, 259 MB/s

thats a full half gig over the quota limit. I noticed some changes to
the quota
accounting in the logs, what changed that could cause this?


Do you remember what kernel version quota worked correctly?

Thanks,
Satoru



-Kevin Brandstatter
--
To unsubscribe from this list: send the line unsubscribe
linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html





--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Quota Ignored On write

2014-07-03 Thread Kevin Brandstatter

hmm, is it possible that btrfs is doing some deduplication or compression?
The expected behavior works fine with small quotas like 10/20MB
but at 1GB i can overwrite quite a bit from /dev/zero
I also tried to dd from /dev/urandom (to get some variety other than zeros)
dd if=/dev/urandom of=meow bs=1024 count=150
output:
dd: error writing ‘meow’: Disk quota exceeded
1163330+0 records in
1163329+0 records out
1191248896 bytes (1.2 GB) copied, 110.25 s, 10.8 MB/s

So it looks like its stopping the write, but with a 1GB quota, thats a
20% over quota

-Kevin

On 07/03/2014 09:21 PM, Satoru Takeuchi wrote:
 Hi Kevin,

 (2014/07/04 11:13), Kevin Brandstatter wrote:
 basing of the latest for-linus branch i found i can write way more than
 the quota

 btrfs quota enable
 btrfs subvolume create test
 btrfs qgruop limit 1G test
 dd if=/dev/zero of=test/file bs=1024 count=150
 output:
 150+0 records in
 150+0 records out
 153600 bytes (1.5 GB) copied, 5.91909 s, 259 MB/s

 thats a full half gig over the quota limit. I noticed some changes to
 the quota
 accounting in the logs, what changed that could cause this?

 Do you remember what kernel version quota worked correctly?

 Thanks,
 Satoru


 -Kevin Brandstatter
 -- 
 To unsubscribe from this list: send the line unsubscribe
 linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 3.15.1: kernel BUG at fs/btrfs/locking.c:269

On Thu, Jul 03, 2014 at 06:44:21AM -0700, Marc MERLIN wrote:
 Thanks for the patch. Hopefully this will make it to the next 3.15.x
 kernel.
 
 I also went back to 3.14 anyway since the 'blocked for 120 seconds' look
 like another instance of deadlocks we've been discussing here.
 
 But just curious:
 
  [160562.925463] parent transid verify failed on 2776298520576 wanted 
  41015 found 18120
 
 What should I be doing about this?
 Does it mean that I do have some kind of corruption/damage on my
 filesystem?
 

If there is another copy for the block(RAID1, DUP, RAID5/6), it'd try to read
the copy and repair the crc with the good one, it's all we can do about it.

 Also, is it possible to have all these messages state which devid they
 occurred on? I don't even know which device I should be worrying about
 right now, and although I'm running scrub now, my understanding is that
 scrub doesn't actually look at FS structures and is likely to miss this
 anyway.

Yes we can but it'd need a bit more effort, for now, all device msg we've seen
in panic info comes from sb-s_id which points to @fs_info-latest_device.

thanks,
-liubo

 
 Thanks,
 Marc
 -- 
 A mouse is a device used to point at the xterm you want to type in - A.S.R.
 Microsoft is to operating systems 
    what McDonalds is to gourmet 
 cooking
 Home page: http://marc.merlins.org/ | PGP 
 1024R/763BE901
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 9/9] Btrfs: fix writing data into the seed filesystem

On Thu, Jul 03, 2014 at 06:22:13PM +0800, Miao Xie wrote:
 If we mounted a seed filesystem with degraded option, and then added a new
 device into the seed filesystem, then we found adding device failed because
 of the IO failure.
 
 Steps to reproduce:
  # mkfs.btrfs -d raid1 -m raid1 dev0 dev1
  # btrfstune -S 1 dev0
  # mount dev0 -o degraded mnt
  # btrfs device add -f dev2 mnt
 
 It is because the original didn't set the chunk on the seed device to be
 read-only if the degraded flag was set. It was introduced by patch f48b90756,
 which fixed the problem the raid1 filesystem became read-only after one device
 of it was missing. But this fix method was not right, we should set the 
 read-only
 flag according to the number of the missing devices, not the degraded mount
 option, if the number of the missing devices is less than the max error number
 that the profile of the chunk tolerates, we don't set it to be read-only.

Reviewed-by: Liu Bo bo.li@oracle.com

-liubo

 
 Cc: Josef Bacik jba...@fb.com
 Signed-off-by: Miao Xie mi...@cn.fujitsu.com
 ---
  fs/btrfs/volumes.c | 52 
  1 file changed, 36 insertions(+), 16 deletions(-)
 
 diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
 index 73a82e5..daecfa5 100644
 --- a/fs/btrfs/volumes.c
 +++ b/fs/btrfs/volumes.c
 @@ -4584,12 +4584,31 @@ out:
   return ret;
  }
  
 +static inline int btrfs_chunk_max_errors(struct map_lookup *map)
 +{
 + int max_errors;
 +
 + if (map-type  (BTRFS_BLOCK_GROUP_RAID1 |
 +  BTRFS_BLOCK_GROUP_RAID10 |
 +  BTRFS_BLOCK_GROUP_RAID5 |
 +  BTRFS_BLOCK_GROUP_DUP)) {
 + max_errors = 1;
 + } else if (map-type  BTRFS_BLOCK_GROUP_RAID6) {
 + max_errors = 2;
 + } else {
 + max_errors = 0;
 + }
 +
 + return max_errors;
 +}
 +
  int btrfs_chunk_readonly(struct btrfs_root *root, u64 chunk_offset)
  {
   struct extent_map *em;
   struct map_lookup *map;
   struct btrfs_mapping_tree *map_tree = root-fs_info-mapping_tree;
   int readonly = 0;
 + int miss_ndevs = 0;
   int i;
  
   read_lock(map_tree-map_tree.lock);
 @@ -4598,18 +4617,27 @@ int btrfs_chunk_readonly(struct btrfs_root *root, u64 
 chunk_offset)
   if (!em)
   return 1;
  
 - if (btrfs_test_opt(root, DEGRADED)) {
 - free_extent_map(em);
 - return 0;
 - }
 -
   map = (struct map_lookup *)em-bdev;
   for (i = 0; i  map-num_stripes; i++) {
 + if (map-stripes[i].dev-missing) {
 + miss_ndevs++;
 + continue;
 + }
 +
   if (!map-stripes[i].dev-writeable) {
   readonly = 1;
 - break;
 + goto end;
   }
   }
 +
 + /*
 +  * If the number of missing devices is larger than max errors,
 +  * we can not write the data into that chunk successfully, so
 +  * set it readonly.
 +  */
 + if (miss_ndevs  btrfs_chunk_max_errors(map))
 + readonly = 1;
 +end:
   free_extent_map(em);
   return readonly;
  }
 @@ -5220,16 +5248,8 @@ static int __btrfs_map_block(struct btrfs_fs_info 
 *fs_info, int rw,
   }
   }
  
 - if (rw  (REQ_WRITE | REQ_GET_READ_MIRRORS)) {
 - if (map-type  (BTRFS_BLOCK_GROUP_RAID1 |
 -  BTRFS_BLOCK_GROUP_RAID10 |
 -  BTRFS_BLOCK_GROUP_RAID5 |
 -  BTRFS_BLOCK_GROUP_DUP)) {
 - max_errors = 1;
 - } else if (map-type  BTRFS_BLOCK_GROUP_RAID6) {
 - max_errors = 2;
 - }
 - }
 + if (rw  (REQ_WRITE | REQ_GET_READ_MIRRORS))
 + max_errors = btrfs_chunk_max_errors(map);
  
   if (dev_replace_is_ongoing  (rw  (REQ_WRITE | REQ_DISCARD)) 
   dev_replace-tgtdev != NULL) {
 -- 
 1.9.3
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Quota Ignored On write


Hi Chris and Kevin,


On 07/03/2014 09:21 PM, Satoru Takeuchi wrote:

Hi Kevin,

(2014/07/04 11:13), Kevin Brandstatter wrote:

basing of the latest for-linus branch i found i can write way more than
the quota

btrfs quota enable
btrfs subvolume create test
btrfs qgruop limit 1G test
dd if=/dev/zero of=test/file bs=1024 count=150
output:
150+0 records in
150+0 records out
153600 bytes (1.5 GB) copied, 5.91909 s, 259 MB/s

thats a full half gig over the quota limit. I noticed some changes to
the quota
accounting in the logs, what changed that could cause this?


Do you remember what kernel version quota worked correctly?

(2014/07/04 11:32), Satoru Takeuchi wrote:

(2014/07/04 11:25), Kevin Brandstatter wrote:

3.15.3 via arch/ and from linux-git


OK, I'll bisect it.


I made the following reproducer based on your operation.
It succeeded with 3.15 and failed with 3.16-rc3. So, the problematic
patch is not in mason/for-linux branch, but in somewhere between
3.15 and 3.16-rc3. Please wait for a while to finish my bisect...

===
#!/bin/bash -x

TEST_DEV=/dev/vdb
TEST_MNT=/home/sat/mnt

umount $TEST_MNT
mkfs.btrfs -f $TEST_DEV
mount $TEST_DEV $TEST_MNT
btrfs quota enable $TEST_MNT

SUBVOLPATH=$TEST_MNT/quota_test
LIMIT=$((1024*100))
btrfs subvolume create $SUBVOLPATH
btrfs qgroup limit $LIMIT $SUBVOLPATH
TESTFILE=$SUBVOLPATH/test
dd if=/dev/zero of=$TESTFILE bs=1024 count=$(($LIMIT*3/2/1024))
SIZE=$(($(ls -s $TESTFILE | awk '{print $1}')*1024))

RET=0
if [ $SIZE -le $LIMIT ] ; then
  echo [PASS] quota works correctly 2
else
  echo [FAIL] quota doesn't work 2
  RET=1
fi

exit $RET
===

Thanks,
Satoru



Thanks,
Satoru



-Kevin Brandstatter
--
To unsubscribe from this list: send the line unsubscribe
linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html





--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 3.15.1: kernel BUG at fs/btrfs/locking.c:269

2014-07-03 Thread Marc MERLIN

On Fri, Jul 04, 2014 at 11:07:22AM +0800, Liu Bo wrote:
   [160562.925463] parent transid verify failed on 2776298520576 wanted 
   41015 found 18120
  
  What should I be doing about this?
  Does it mean that I do have some kind of corruption/damage on my
  filesystem?
  
 If there is another copy for the block(RAID1, DUP, RAID5/6), it'd try to read
 the copy and repair the crc with the good one, it's all we can do about it.

Right. It's not quite my question though.
I mean I don't know what device it's on, never mind what file is affected.
If I know which file is corrupted, I can simply delete it and restore from
backup, no biggie.
Right now I don't even know which one of my 3 btrfs filesystems (over 10TB)
has this problem. That makes the message kind of problematic: you have a
problem, but not I'm not giving you any fighting chance of finding out
where :)
 
  Also, is it possible to have all these messages state which devid they
  occurred on? I don't even know which device I should be worrying about
  right now, and although I'm running scrub now, my understanding is that
  scrub doesn't actually look at FS structures and is likely to miss this
  anyway.
 
 Yes we can but it'd need a bit more effort, for now, all device msg we've seen
 in panic info comes from sb-s_id which points to @fs_info-latest_device.

Food for though, as is the message is unfortunately close to useless, except
to an FS developer with a system that has only one btrfs filesystem.

On Fri, Jul 04, 2014 at 11:50:25AM +0800, Wang Shilong wrote:
 I am afraid, scrub maybe could not fix such kind of errors, all scrub
 doing is to verify whether checksums match and if possible use good
 mirrors to rewrite bad one.

I wouldn't be bothered if scrub can't fix it, but it would be good if it
could tell me.
 
 Such errors seem imply contention itself is corrupted, we may have passed
 checksum check after ending io, but we fail generation check afterwards.
 
So should I really replace scrub with
find / -type f -print0 | xargs grep . /dev/null ?

Basically we need something that will scan the filesystem and ensure that
all files are reachable correctly without causing filesystem problems, and
if one is bad, output the name of the bad file(s).
Scrub only does a half job of that it seems.

 To get physical device name, we still need mirror num to know which device
 we are locating.

Ok, so it's missing for now and therefore the code can't easily report it,
I understand.

Well, I explained the problem, ext4 and others of course tell me which devid
an error is on, hopefully btrfs will able to do so in the near future.

Back to the original problem, would you agree that 
find / -type f -print0 | xargs grep . /dev/nul?
may do a better job scanning the entire FS for problems than scrub would?

Thanks,
Marc
-- 
A mouse is a device used to point at the xterm you want to type in - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Is btrfs related to OOM death problems on my 8GB server with both 3.15.1 and 3.14?

2014-07-03 Thread Russell Coker

On Thu, 3 Jul 2014 18:19:38 Marc MERLIN wrote:
 I upgraded my server from 3.14 to 3.15.1 last week, and since then it's been
 running out of memory and deadlocking (panic= doesn't even work).
 I downgraded back to 3.14, but I already had the problem once since then.

Is there any correlation between such problems and BTRFS operations such as 
creating snapshots or running a scrub/balance?

Back in ~3.10 days I had serious problems with BTRFS memory use when removing 
multiple snapshots or balancing.  But at about 3.13 they all seemed to get 
fixed.

I usually didn't have a kernel panic when I had such problems (although I 
sometimes had a system lock up solid such that I couldn't even determine what 
it's problem was).  Usually the Oom handler started killing big processes such 
as chromium when it shouldn't have needed to.

Note that I haven't verified that the BTRFS memory use is reasonable in all 
such situations.  Merely that it doesn't use enough to kill my systems.

-- 
My Main Blog http://etbe.coker.com.au/
My Documents Bloghttp://doc.coker.com.au/

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Quota Ignored On write