Re: [PATCH] btrfs-progs: show-super: Add option to print superblock at given bytenr

2015-11-02 Thread David Sterba
On Mon, Nov 02, 2015 at 04:34:19PM +0800, Qu Wenruo wrote:
> Add '-s ' option to show superblock at given bytenr.
> 
> This is very useful to debug non-standard btrfs, like debuging the
> 1st stage btrfs of btrfs-convert.
> 
> Signed-off-by: Qu Wenruo 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2] Btrfs: find_free_extent: Do not erroneously skip LOOP_CACHING_WAIT state

2015-11-02 Thread Josef Bacik

On 11/02/2015 03:29 AM, Chandan Rajendra wrote:

When executing generic/001 in a loop on a ppc64 machine (with both sectorsize
and nodesize set to 64k), the following call trace is observed,

WARNING: at /root/repos/linux/fs/btrfs/locking.c:253
Modules linked in:
CPU: 2 PID: 8353 Comm: umount Not tainted 4.3.0-rc5-13676-ga5e681d #54
task: c000f2b1f560 ti: c000f6008000 task.ti: c000f6008000
NIP: c0520c88 LR: c04a3b34 CTR: 
REGS: c000f600a820 TRAP: 0700   Not tainted  (4.3.0-rc5-13676-ga5e681d)
MSR: 800102029032   CR: 2884  XER: 
CFAR: c04a3b30 SOFTE: 1
GPR00: c04a3b34 c000f600aaa0 c108ac00 c000f5a808c0
GPR04:  c000f600ae60  0005
GPR08: 20a1 0001 c000f2b1f560 0030
GPR12: 84842882 cfdc0900 c000f600ae60 c000f070b800
GPR16:  c000f3c8a000  0049
GPR20: 0001 0001 c000f5aa01f8 
GPR24: 0f83e0f83e0f83e1 c000f5a808c0 c000f3c8d000 c000
GPR28: c000f600ae74 0001 c000f3c8d000 c000f5a808c0
NIP [c0520c88] .btrfs_tree_lock+0x48/0x2a0
LR [c04a3b34] .btrfs_lock_root_node+0x44/0x80
Call Trace:
[c000f600aaa0] [c000f600ab80] 0xc000f600ab80 (unreliable)
[c000f600ab80] [c04a3b34] .btrfs_lock_root_node+0x44/0x80
[c000f600ac00] [c04a99dc] .btrfs_search_slot+0xa8c/0xc00
[c000f600ad40] [c04ab878] .btrfs_insert_empty_items+0x98/0x120
[c000f600adf0] [c050da44] .btrfs_finish_chunk_alloc+0x1d4/0x620
[c000f600af20] [c04be854] 
.btrfs_create_pending_block_groups+0x1d4/0x2c0
[c000f600b020] [c04bf188] .do_chunk_alloc+0x3c8/0x420
[c000f600b100] [c04c27cc] .find_free_extent+0xbfc/0x1030
[c000f600b260] [c04c2ce8] .btrfs_reserve_extent+0xe8/0x250
[c000f600b330] [c04c2f90] .btrfs_alloc_tree_block+0x140/0x590
[c000f600b440] [c04a47b4] .__btrfs_cow_block+0x124/0x780
[c000f600b530] [c04a4fc0] .btrfs_cow_block+0xf0/0x250
[c000f600b5e0] [c04a917c] .btrfs_search_slot+0x22c/0xc00
[c000f600b720] [c050aa40] .btrfs_remove_chunk+0x1b0/0x9f0
[c000f600b850] [c04c4e04] .btrfs_delete_unused_bgs+0x434/0x570
[c000f600b950] [c04d3cb8] .close_ctree+0x2e8/0x3b0
[c000f600ba20] [c049d178] .btrfs_put_super+0x18/0x30
[c000f600ba90] [c0243cd4] .generic_shutdown_super+0xa4/0x1a0
[c000f600bb10] [c02441d8] .kill_anon_super+0x18/0x30
[c000f600bb90] [c049c898] .btrfs_kill_super+0x18/0xc0
[c000f600bc10] [c02444f8] .deactivate_locked_super+0x98/0xe0
[c000f600bc90] [c0269f94] .cleanup_mnt+0x54/0xa0
[c000f600bd10] [c00bd744] .task_work_run+0xc4/0x100
[c000f600bdb0] [c0016334] .do_notify_resume+0x74/0x80
[c000f600be30] [c00098b8] .ret_from_except_lite+0x64/0x68
Instruction dump:
fba1ffe8 fbc1fff0 fbe1fff8 7c791b78 f8010010 f821ff21 e94d0290 81030040
812a04e8 7d094a78 7d290034 5529d97e <0b09> 3b40 3be30050 3bc3004c

The above call trace is seen even on x86_64; albeit very rarely and that too
with nodesize set to 64k and with nospace_cache mount option being used.

The reason for the above call trace is,
btrfs_remove_chunk
   check_system_chunk
 Allocate chunk if required
   For each physical stripe on underlying device,
 btrfs_free_dev_extent
   ...
   Take lock on Device tree's root node
   btrfs_cow_block("dev tree's root node");
 btrfs_reserve_extent
   find_free_extent
index = BTRFS_RAID_DUP;
have_caching_bg = false;

 When in LOOP_CACHING_NOWAIT state, Assume we find a block group
which is being cached; Hence have_caching_bg is set to true

 When repeating the search for the next RAID index, we set
have_caching_bg to false.

Hence right after completing the LOOP_CACHING_NOWAIT state, we incorrectly
skip LOOP_CACHING_WAIT state and move to LOOP_ALLOC_CHUNK state where we
allocate a chunk and try to add entries corresponding to the chunk's physical
stripe into the device tree. When doing so the task deadlocks itself waiting
for the blocking lock on the root node of the device tree.

This commit fixes the issue by introducing a new local variable to help
indicate as to whether a block group of any RAID type is being cached.

Signed-off-by: Chandan Rajendra 
---
Changelog:
v1->v2: Honor 80 column restriction.

  fs/btrfs/extent-tree.c | 7 ++-
  1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index f50c7c2..99a8e57 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -7029,6 +7029,7 @@ static 

Re: random i/o error without error in dmesg

2015-11-02 Thread Szalma László

2015-10-28 09:44 keltezéssel, Szalma László írta:

Ok, I had a chance to try some things.

1.: the error

md5sum xyz
md5sum: xyz: Input/output error

(no any errors in dmesg)

2.: mount -o remount,ro /mnt/x

(could not do, it is used)
mysql stop && mount -o remount,ro /mnt/x
problem persists: io error.
mount -o remount,rw /mnt/x
still io error
umount /mnt/x
mount /mnt/x
NO io error, md5sum works!

The umount/mount ALWAYS solved the problem for me, mount -o remount,ro 
was tried for the first time, but it was not enought. Reboot was not 
needed.

(kernel 4.2.4)

László Szalma



Unfortunately the problem with kernel 4.3.0 still exists.

László Szalma

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs-progs: print-tree: Output stripe dev uuid

2015-11-02 Thread Qu Wenruo
Add output for dev uuid for print_chunk().

Quite useful to debug temporary btrfs in btrfs-convert.

Signed-off-by: Qu Wenruo 
---
 print-tree.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/print-tree.c b/print-tree.c
index 7ddf400..4d4c3a2 100644
--- a/print-tree.c
+++ b/print-tree.c
@@ -231,9 +231,17 @@ void print_chunk(struct extent_buffer *eb, struct 
btrfs_chunk *chunk)
printf("\t\ttype %s num_stripes %d\n",
   chunk_flags_str, num_stripes);
for (i = 0 ; i < num_stripes ; i++) {
+   unsigned char dev_uuid[BTRFS_UUID_SIZE];
+   char str_dev_uuid[BTRFS_UUID_UNPARSED_SIZE];
+
+   read_extent_buffer(eb, dev_uuid,
+   (unsigned long)btrfs_stripe_dev_uuid_nr(chunk, i),
+   BTRFS_UUID_SIZE);
+   uuid_unparse(dev_uuid, str_dev_uuid);
printf("\t\t\tstripe %d devid %llu offset %llu\n", i,
  (unsigned long long)btrfs_stripe_devid_nr(eb, chunk, i),
  (unsigned long long)btrfs_stripe_offset_nr(eb, chunk, i));
+   printf("\t\t\tdev uuid: %s\n", str_dev_uuid);
}
 }
 
-- 
2.6.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC] make btrfs subvol mounts appear in /proc/mounts

2015-11-02 Thread J. Bruce Fields
On Wed, Oct 28, 2015 at 07:25:10AM +0900, Neil Brown wrote:
> 
> If you create a subvolume in btrfs and access it (by name) without
> mounting it, then the subvolume looks like a separate mount to some
> extent, returning a different st_dev to stat(), but it doesn't look like
> a separate mount in that it isn't listed in /proc/mounts. This
> inconsistency can confuse tools.
> 
> This patch causes these subvolumes to become separate mounts by using
> the VFS' automount functionality, much like NFS uses automount when it
> discovered mountpoints on the server.
> 
> The VFS currently makes it impossible to auto-mount a directory on to itself
> (i.e. a bind mount).  For NFS this isn't a problem as a new superblock
> is created for the child filesystem so there are two separate dentries
> (and inodes) for the one directory: one in the parent filesystem, one in
> the child (note that the two superblocks share a common connection to
> the server so there is still a lot of commonality).
> 
> BTRFS has chosen instead to use a single superblock for all subvolumes.

Naive question: was there a reason for that choice?

--b.

> This results in a single dentry for the subvol-root.  A dentry which
> must be auto-mounted on itself.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] btrfs-progs: Rename variables in btrfs_add_to_fsid

2015-11-02 Thread Zhao Lei
There are two total_bytes in btrfs_add_to_fsid(), local variable
of total_bytes means fs_total_bytes, and device->total_bytes means
device's total_bytes.
And device's total_bytes in argument is named block_count in current
code.

This patch rename:
 total_bytes -> fs_total_bytes
 block_count -> device_total_bytes

To make code more readable.

Signed-off-by: Zhao Lei 
---
 utils.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/utils.c b/utils.c
index b7752df..999af43 100644
--- a/utils.c
+++ b/utils.c
@@ -724,7 +724,7 @@ static int zero_dev_clamped(int fd, off_t start, ssize_t 
len, u64 dev_size)
 
 int btrfs_add_to_fsid(struct btrfs_trans_handle *trans,
  struct btrfs_root *root, int fd, char *path,
- u64 block_count, u32 io_width, u32 io_align,
+ u64 device_total_bytes, u32 io_width, u32 io_align,
  u32 sectorsize)
 {
struct btrfs_super_block *disk_super;
@@ -732,7 +732,7 @@ int btrfs_add_to_fsid(struct btrfs_trans_handle *trans,
struct btrfs_device *device;
struct btrfs_dev_item *dev_item;
char *buf = NULL;
-   u64 total_bytes;
+   u64 fs_total_bytes;
u64 num_devs;
int ret;
 
@@ -757,7 +757,7 @@ int btrfs_add_to_fsid(struct btrfs_trans_handle *trans,
device->sector_size = sectorsize;
device->fd = fd;
device->writeable = 1;
-   device->total_bytes = block_count;
+   device->total_bytes = device_total_bytes;
device->bytes_used = 0;
device->total_ios = 0;
device->dev_root = root->fs_info->dev_root;
@@ -768,8 +768,8 @@ int btrfs_add_to_fsid(struct btrfs_trans_handle *trans,
ret = btrfs_add_device(trans, root, device);
BUG_ON(ret);
 
-   total_bytes = btrfs_super_total_bytes(super) + block_count;
-   btrfs_set_super_total_bytes(super, total_bytes);
+   fs_total_bytes = btrfs_super_total_bytes(super) + device_total_bytes;
+   btrfs_set_super_total_bytes(super, fs_total_bytes);
 
num_devs = btrfs_super_num_devices(super) + 1;
btrfs_set_super_num_devices(super, num_devs);
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] btrfs-progs: mkfs: Round device size down to sectorsize

2015-11-02 Thread Zhao Lei
When do following command in a vm, whose disks are created by
qemu-img create -f raw 11 2.6G:
 # mkfs.btrfs -f /dev/vdd /dev/vde /dev/vdf
 # btrfs-show-super /dev/vdd /dev/vde /dev/vdf | grep dev_item.total_bytes
 dev_item.total_bytes2791727104
 dev_item.total_bytes2791729152
 dev_item.total_bytes2791729152
We can see that the first device's size is little smaller.

And it fails xfstests btrfs/011.

Reason:
 First device's size is rounded down to sectorsize in make_btrfs(),
 but other devices are not.

Fix:
 Round down remain devices' size in btrfs_add_to_fsid().

Reported-by: Qu Wenruo 
Signed-off-by: Zhao Lei 
---
 utils.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/utils.c b/utils.c
index d17291a..b7752df 100644
--- a/utils.c
+++ b/utils.c
@@ -736,6 +736,8 @@ int btrfs_add_to_fsid(struct btrfs_trans_handle *trans,
u64 num_devs;
int ret;
 
+   device_total_bytes = (device_total_bytes / sectorsize) * sectorsize;
+
device = kzalloc(sizeof(*device), GFP_NOFS);
if (!device)
goto err_nomem;
-- 
1.8.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs-progs: show-super: Add option to print superblock at given bytenr

2015-11-02 Thread Qu Wenruo
Add '-s ' option to show superblock at given bytenr.

This is very useful to debug non-standard btrfs, like debuging the
1st stage btrfs of btrfs-convert.

Signed-off-by: Qu Wenruo 
---
 Documentation/btrfs-show-super.asciidoc | 5 +
 btrfs-show-super.c  | 7 ++-
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/Documentation/btrfs-show-super.asciidoc 
b/Documentation/btrfs-show-super.asciidoc
index 1646be3..3480a3d 100644
--- a/Documentation/btrfs-show-super.asciidoc
+++ b/Documentation/btrfs-show-super.asciidoc
@@ -40,6 +40,11 @@ If several '-i ' are given, only the last one 
is valid.
 Attempt to print the superblock even if no superblock magic is found.  May end
 badly.
 
+-s ::
+Specifiy the superblock bytenr.
++
+Used for debug purpose. Disable '-f' option.
+
 EXIT STATUS
 ---
 *btrfs-show-super* will return 0 if no error happened.
diff --git a/btrfs-show-super.c b/btrfs-show-super.c
index 27414c8..7b499e4 100644
--- a/btrfs-show-super.c
+++ b/btrfs-show-super.c
@@ -48,6 +48,7 @@ static void print_usage(void)
fprintf(stderr, "\t-a : print information of all superblocks\n");
fprintf(stderr, "\t-i  : specify which mirror to print 
out\n");
fprintf(stderr, "\t-F : attempt to dump superblocks with bad magic\n");
+   fprintf(stderr, "\t-s  : specify the superblock bytenr\n");
fprintf(stderr, "%s\n", PACKAGE_STRING);
 }
 
@@ -63,7 +64,7 @@ int main(int argc, char **argv)
u64 arg;
u64 sb_bytenr = btrfs_sb_offset(0);
 
-   while ((opt = getopt(argc, argv, "fFai:")) != -1) {
+   while ((opt = getopt(argc, argv, "fFai:s:")) != -1) {
switch (opt) {
case 'i':
arg = arg_strtou64(optarg);
@@ -86,6 +87,10 @@ int main(int argc, char **argv)
case 'F':
force = 1;
break;
+   case 's':
+   sb_bytenr = arg_strtou64(optarg);
+   all = 0;
+   break;
default:
print_usage();
exit(1);
-- 
2.6.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V2] Btrfs: find_free_extent: Do not erroneously skip LOOP_CACHING_WAIT state

2015-11-02 Thread Chandan Rajendra
When executing generic/001 in a loop on a ppc64 machine (with both sectorsize
and nodesize set to 64k), the following call trace is observed,

WARNING: at /root/repos/linux/fs/btrfs/locking.c:253
Modules linked in:
CPU: 2 PID: 8353 Comm: umount Not tainted 4.3.0-rc5-13676-ga5e681d #54
task: c000f2b1f560 ti: c000f6008000 task.ti: c000f6008000
NIP: c0520c88 LR: c04a3b34 CTR: 
REGS: c000f600a820 TRAP: 0700   Not tainted  (4.3.0-rc5-13676-ga5e681d)
MSR: 800102029032   CR: 2884  XER: 
CFAR: c04a3b30 SOFTE: 1
GPR00: c04a3b34 c000f600aaa0 c108ac00 c000f5a808c0
GPR04:  c000f600ae60  0005
GPR08: 20a1 0001 c000f2b1f560 0030
GPR12: 84842882 cfdc0900 c000f600ae60 c000f070b800
GPR16:  c000f3c8a000  0049
GPR20: 0001 0001 c000f5aa01f8 
GPR24: 0f83e0f83e0f83e1 c000f5a808c0 c000f3c8d000 c000
GPR28: c000f600ae74 0001 c000f3c8d000 c000f5a808c0
NIP [c0520c88] .btrfs_tree_lock+0x48/0x2a0
LR [c04a3b34] .btrfs_lock_root_node+0x44/0x80
Call Trace:
[c000f600aaa0] [c000f600ab80] 0xc000f600ab80 (unreliable)
[c000f600ab80] [c04a3b34] .btrfs_lock_root_node+0x44/0x80
[c000f600ac00] [c04a99dc] .btrfs_search_slot+0xa8c/0xc00
[c000f600ad40] [c04ab878] .btrfs_insert_empty_items+0x98/0x120
[c000f600adf0] [c050da44] .btrfs_finish_chunk_alloc+0x1d4/0x620
[c000f600af20] [c04be854] 
.btrfs_create_pending_block_groups+0x1d4/0x2c0
[c000f600b020] [c04bf188] .do_chunk_alloc+0x3c8/0x420
[c000f600b100] [c04c27cc] .find_free_extent+0xbfc/0x1030
[c000f600b260] [c04c2ce8] .btrfs_reserve_extent+0xe8/0x250
[c000f600b330] [c04c2f90] .btrfs_alloc_tree_block+0x140/0x590
[c000f600b440] [c04a47b4] .__btrfs_cow_block+0x124/0x780
[c000f600b530] [c04a4fc0] .btrfs_cow_block+0xf0/0x250
[c000f600b5e0] [c04a917c] .btrfs_search_slot+0x22c/0xc00
[c000f600b720] [c050aa40] .btrfs_remove_chunk+0x1b0/0x9f0
[c000f600b850] [c04c4e04] .btrfs_delete_unused_bgs+0x434/0x570
[c000f600b950] [c04d3cb8] .close_ctree+0x2e8/0x3b0
[c000f600ba20] [c049d178] .btrfs_put_super+0x18/0x30
[c000f600ba90] [c0243cd4] .generic_shutdown_super+0xa4/0x1a0
[c000f600bb10] [c02441d8] .kill_anon_super+0x18/0x30
[c000f600bb90] [c049c898] .btrfs_kill_super+0x18/0xc0
[c000f600bc10] [c02444f8] .deactivate_locked_super+0x98/0xe0
[c000f600bc90] [c0269f94] .cleanup_mnt+0x54/0xa0
[c000f600bd10] [c00bd744] .task_work_run+0xc4/0x100
[c000f600bdb0] [c0016334] .do_notify_resume+0x74/0x80
[c000f600be30] [c00098b8] .ret_from_except_lite+0x64/0x68
Instruction dump:
fba1ffe8 fbc1fff0 fbe1fff8 7c791b78 f8010010 f821ff21 e94d0290 81030040
812a04e8 7d094a78 7d290034 5529d97e <0b09> 3b40 3be30050 3bc3004c

The above call trace is seen even on x86_64; albeit very rarely and that too
with nodesize set to 64k and with nospace_cache mount option being used.

The reason for the above call trace is,
btrfs_remove_chunk
  check_system_chunk
Allocate chunk if required
  For each physical stripe on underlying device,
btrfs_free_dev_extent
  ...
  Take lock on Device tree's root node
  btrfs_cow_block("dev tree's root node");
btrfs_reserve_extent
  find_free_extent
index = BTRFS_RAID_DUP;
have_caching_bg = false;

When in LOOP_CACHING_NOWAIT state, Assume we find a block group
which is being cached; Hence have_caching_bg is set to true

When repeating the search for the next RAID index, we set
have_caching_bg to false.

Hence right after completing the LOOP_CACHING_NOWAIT state, we incorrectly
skip LOOP_CACHING_WAIT state and move to LOOP_ALLOC_CHUNK state where we
allocate a chunk and try to add entries corresponding to the chunk's physical
stripe into the device tree. When doing so the task deadlocks itself waiting
for the blocking lock on the root node of the device tree.

This commit fixes the issue by introducing a new local variable to help
indicate as to whether a block group of any RAID type is being cached.

Signed-off-by: Chandan Rajendra 
---
Changelog:
v1->v2: Honor 80 column restriction.

 fs/btrfs/extent-tree.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index f50c7c2..99a8e57 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -7029,6 +7029,7 @@ static noinline int find_free_extent(struct btrfs_root 
*orig_root,
  

Re: [PATCH V2] Btrfs: find_free_extent: Do not erroneously skip LOOP_CACHING_WAIT state

2015-11-02 Thread Chris Mason
On Mon, Nov 02, 2015 at 01:59:46PM +0530, Chandan Rajendra wrote:
> When executing generic/001 in a loop on a ppc64 machine (with both sectorsize
> and nodesize set to 64k), the following call trace is observed,

Thanks Chandan, I hit this same trace on x86-64 with 16K nodes.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC] make btrfs subvol mounts appear in /proc/mounts

2015-11-02 Thread Chris Mason
On Mon, Nov 02, 2015 at 03:50:12PM -0500, J. Bruce Fields wrote:
> On Wed, Oct 28, 2015 at 07:25:10AM +0900, Neil Brown wrote:
> > 
> > If you create a subvolume in btrfs and access it (by name) without
> > mounting it, then the subvolume looks like a separate mount to some
> > extent, returning a different st_dev to stat(), but it doesn't look like
> > a separate mount in that it isn't listed in /proc/mounts. This
> > inconsistency can confuse tools.
> > 
> > This patch causes these subvolumes to become separate mounts by using
> > the VFS' automount functionality, much like NFS uses automount when it
> > discovered mountpoints on the server.
> > 
> > The VFS currently makes it impossible to auto-mount a directory on to itself
> > (i.e. a bind mount).  For NFS this isn't a problem as a new superblock
> > is created for the child filesystem so there are two separate dentries
> > (and inodes) for the one directory: one in the parent filesystem, one in
> > the child (note that the two superblocks share a common connection to
> > the server so there is still a lot of commonality).
> > 
> > BTRFS has chosen instead to use a single superblock for all subvolumes.
> 
> Naive question: was there a reason for that choice?

They are really all part of the same FS, the single super better fits.
Or said another way, it felt like there would be dramatically more duct
tape around supers-per-subvolume than there was abusing st_dev.

Neil's patch came up after I told him a few of us had tried to do the
same thing and failed to find clean vfs changes to make it possible...he
took it as a challenge.  Now I have to remember what it was about our
past attempts that I didn't like.

I'll test this and queue for 4.5 if it all works out, thanks Neil!

-chris

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V2] btrfs: Print Warning only if ENOSPC_DEBUG is enabled

2015-11-02 Thread Ashish Samant
Dont call WARN_ON for ENOSPC error unless ENOSPC_DEBUG is enabled.

Signed-off-by : Ashish Samant 
---
 fs/btrfs/delayed-inode.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
index a2ae427..b86cfd9 100644
--- a/fs/btrfs/delayed-inode.c
+++ b/fs/btrfs/delayed-inode.c
@@ -652,9 +652,13 @@ static int btrfs_delayed_inode_reserve_metadata(
goto out;
 
ret = btrfs_block_rsv_migrate(src_rsv, dst_rsv, num_bytes);
-   if (!WARN_ON(ret))
+   if (!ret)
goto out;
 
+   if (btrfs_test_opt(root, ENOSPC_DEBUG))
+   WARN(1, KERN_DEBUG
+"btrfs: block rsv migrate returned %d\n", ret);
+
/*
 * Ok this is a problem, let's just steal from the global rsv
 * since this really shouldn't happen that often.
-- 
1.8.3.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] fstests: generic test for fsync after hole punching

2015-11-02 Thread Dave Chinner
On Mon, Nov 02, 2015 at 12:32:57PM +, fdman...@kernel.org wrote:
> From: Filipe Manana 
> 
> Test that a file fsync works after punching a hole for the same file
> range multiple times, and that after log/journal replay the file's
> content and layout are correct.
> 
> This test is motivated by a bug found in btrfs, which is fixed by
> the following linux kernel patch:
> 
>   "Btrfs: fix hole punching when using the no-holes feature"

> +# This test was motivated by an issue found in btrfs when the btrfs no-holes
> +# feature is enabled (introduced in kernel 3.14). So enable the feature if 
> the
> +# fs being tested is btrfs.
> +if [ $FSTYP == "btrfs" ]; then
> + _require_btrfs_fs_feature "no_holes"
> + _require_btrfs_mkfs_feature "no-holes"
> + MKFS_OPTIONS="$MKFS_OPTIONS -O no-holes"
> +fi

This sort of transparent filesystem option should be tested by
executing the entire test suite with it enabled:

# MKFS_OPTIONS="-O no-holes" ./check -g auto

rather than only enabling for just this test.

> +# Silently drop all writes and unmount to simulate a crash/power failure.
> +_load_flakey_table $FLAKEY_DROP_WRITES
> +_unmount_flakey
> +
> +# Allow writes again, mount to trigger log replay and validate file contents.
> +_load_flakey_table $FLAKEY_ALLOW_WRITES
> +_mount_flakey

This is repeated often enough across many tests that a helper like:

# Silently drop all writes and unmount/remount to simulate a
# crash/power failure.
_flakey_drop_and_remount()
{
_load_flakey_table $FLAKEY_DROP_WRITES
_unmount_flakey

_load_flakey_table $FLAKEY_ALLOW_WRITES
_mount_flakey
}

is appropriate. Doesn't need to be in this patch, though.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs progs pre-release 4.3-rc1

2015-11-02 Thread Duncan
David Sterba posted on Mon, 02 Nov 2015 16:14:53 +0100 as excerpted:

> the kernel 4.3 was released yesterday, the btrfs-progs will follow at
> the end of this week. I've tagged an rc1 from current devel branch.
> There are a lots of small invisible changes and one change in the
> defaults:
> 
> * mkfs: mixed mode is not forced anymore for devices smaller than 1 GiB

It says one change in the /defaults/, but then it says mixed mode isn't 
/forced/ anymore under a GiB.

Which is it, a change in the /defaults/, under a gig now defaults to 
separate data/metadata, or same /defaults/, but now there's a way to 
overrule them and do separate data/metadata under a gig, so while mixed 
remains the default, it's no longer /forced/?

If the /defaults/ changed, is mixed mode still /recommended/ for small 
filesystems?

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC] make btrfs subvol mounts appear in /proc/mounts

2015-11-02 Thread Neil Brown
On Tue, Nov 03 2015, Chris Mason wrote:

> On Mon, Nov 02, 2015 at 03:50:12PM -0500, J. Bruce Fields wrote:
>> On Wed, Oct 28, 2015 at 07:25:10AM +0900, Neil Brown wrote:
>> > 
>> > If you create a subvolume in btrfs and access it (by name) without
>> > mounting it, then the subvolume looks like a separate mount to some
>> > extent, returning a different st_dev to stat(), but it doesn't look like
>> > a separate mount in that it isn't listed in /proc/mounts. This
>> > inconsistency can confuse tools.
>> > 
>> > This patch causes these subvolumes to become separate mounts by using
>> > the VFS' automount functionality, much like NFS uses automount when it
>> > discovered mountpoints on the server.
>> > 
>> > The VFS currently makes it impossible to auto-mount a directory on to 
>> > itself
>> > (i.e. a bind mount).  For NFS this isn't a problem as a new superblock
>> > is created for the child filesystem so there are two separate dentries
>> > (and inodes) for the one directory: one in the parent filesystem, one in
>> > the child (note that the two superblocks share a common connection to
>> > the server so there is still a lot of commonality).
>> > 
>> > BTRFS has chosen instead to use a single superblock for all subvolumes.
>> 
>> Naive question: was there a reason for that choice?
>
> They are really all part of the same FS, the single super better fits.
> Or said another way, it felt like there would be dramatically more duct
> tape around supers-per-subvolume than there was abusing st_dev.
>
> Neil's patch came up after I told him a few of us had tried to do the
> same thing and failed to find clean vfs changes to make it possible...he
> took it as a challenge.  Now I have to remember what it was about our
> past attempts that I didn't like.
>
> I'll test this and queue for 4.5 if it all works out, thanks Neil!

I'd rather resend with proper documentation updates and s-o-b before it
gets queued if that is OK.  So once you are happy, please let me know
and I'll do it "properly".

Thanks,
NeilBrown


signature.asc
Description: PGP signature


4.2.5 forced read-only -ENOSPC w/ free space

2015-11-02 Thread E V
During an rsync, 20TB unallocated space. Currently, no snapshots.
Should I try 4.1.12, or 4.3?
dmesg:
[122014.436612] BTRFS: error (device sde) in
btrfs_run_delayed_refs:2781: errno=-28 No space left
[122014.436615] BTRFS info (device sde): forced readonly
[122014.436624] BTRFS: error (device sde) in
btrfs_run_delayed_refs:2781: errno=-28 No space left
[122014.436725] WARNING: CPU: 13 PID: 8025 at
fs/btrfs/extent-tree.c:2781 btrfs_run_delayed_refs+0x97/0x195
[btrfs]()
[122014.436741] BTRFS: error (device sde) in
__btrfs_prealloc_file_range:9636: errno=-28 No space left
[122014.436772] BTRFS: error (device sde) in
btrfs_start_dirty_block_groups:3461: errno=-28 No space left
[122014.436777] BTRFS warning (device sde): Skipping commit of aborted
transaction.
[122014.436780] BTRFS: error (device sde) in cleanup_transaction:1710:
errno=-5 IO failure
[122014.436959] BTRFS: Transaction aborted (error -28)
[122014.436961] Modules linked in: ipmi_si mpt2sas raid_class
scsi_transport_sas dell_rbu nfsv3 nfsv4 nfsd auth_rpcgss oid_registry
nfs_acl nfs lockd grace fscache sunrpc ext4 crc16 jbd2 ext2 coretemp
joydev crct10dif_pclmul sha256_generic psmouse serio_raw hmac drbg
aesni_intel iTCO_wdt ipmi_devintf iTCO_vendor_support dcdbas evdev
aes_x86_64 glue_helper lrw gf128mul ablk_helper pcspkr cryptd lpc_ich
mfd_core i7core_edac edac_core ipmi_msghandler acpi_power_meter button
processor thermal_sys loop ext3 mbcache jbd btrfs xor raid6_pq
hid_generic usbhid hid sg sd_mod crc32c_intel uhci_hcd ehci_pci
ehci_hcd megaraid_sas ixgbe mdio ptp usbcore pps_core usb_common
scsi_mod bnx2 [last unloaded: ipmi_si]
[122014.437405] CPU: 13 PID: 8025 Comm: kworker/u66:13 Tainted: G
I 4.2.5 #1
[122014.437519] Workqueue: btrfs-extent-refs btrfs_extent_refs_helper [btrfs]
[122014.437552]   0009 813af77a
880006ab7d08
[122014.437606]  810421bb 3dac a01ee526
880342782f30
[122014.437660]  ffe4 880100ea33b0 8803218ae800
880342782e08
[122014.437714] Call Trace:
[122014.437743]  [] ? dump_stack+0x40/0x50
[122014.437773]  [] ? warn_slowpath_common+0x98/0xb0
[122014.437817]  [] ?
btrfs_run_delayed_refs+0x97/0x195 [btrfs]
[122014.437863]  [] ? warn_slowpath_fmt+0x45/0x4a
[122014.437906]  [] ?
btrfs_run_delayed_refs+0x97/0x195 [btrfs]
[122014.437965]  [] ?
delayed_ref_async_start+0x33/0x71 [btrfs]
[122014.438029]  [] ? normal_work_helper+0xc3/0x1fa [btrfs]
[122014.438063]  [] ? process_one_work+0x159/0x286
[122014.438093]  [] ? worker_thread+0x1d9/0x280
[122014.438123]  [] ? rescuer_thread+0x27a/0x27a
[122014.438152]  [] ? kthread+0xab/0xb3
[122014.438180]  [] ? kthread_parkme+0x16/0x16
[122014.438211]  [] ? ret_from_fork+0x3f/0x70
[122014.438240]  [] ? kthread_parkme+0x16/0x16
[122014.438268] ---[ end trace 1c8deab18b734f90 ]---
[122014.438296] BTRFS: error (device sde) in
btrfs_run_delayed_refs:2781: errno=-28 No space left

btrfs file usage /mirror
Overall:
Device size: 140.07TiB
Device allocated:119.96TiB
Device unallocated:   20.11TiB
Device missing:  0.00B
Used:117.54TiB
Free (estimated): 22.53TiB  (min: 12.47TiB)
Data ratio:   1.00
Metadata ratio:   2.00
Global reserve:  512.00MiB  (used: 0.00B)

Data,single: Size:119.66TiB, Used:117.24TiB
   /dev/sdb   24.91TiB
   /dev/sdc   24.91TiB
   /dev/sdd   34.92TiB
   /dev/sde   34.92TiB

Metadata,RAID10: Size:151.00GiB, Used:149.88GiB
   /dev/sdb   37.75GiB
   /dev/sdc   37.75GiB
   /dev/sdd   37.75GiB
   /dev/sde   37.75GiB

System,RAID10: Size:64.00MiB, Used:15.75MiB
   /dev/sdb   16.00MiB
   /dev/sdc   16.00MiB
   /dev/sdd   16.00MiB
   /dev/sde   16.00MiB

Unallocated:
   /dev/sdb5.06TiB
   /dev/sdc5.06TiB
   /dev/sdd5.06TiB
   /dev/sde5.06TiB
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Btrfs progs pre-release 4.3-rc1

2015-11-02 Thread David Sterba
Hi,

the kernel 4.3 was released yesterday, the btrfs-progs will follow at the end
of this week. I've tagged an rc1 from current devel branch. There are a lots of
small invisible changes and one change in the defaults:

* mkfs: mixed mode is not forced anymore for devices smaller than 1 GiB

I've updated manual pages for mkfs, balance, btrfstune, convert and inspect,
I'd be glad if somebody could proofread them.

Otherwise bugfixes and small-sized patches are still welcome.


Tarballs: https://www.kernel.org/pub/linux/kernel/people/kdave/btrfs-progs/
Git: git://git.kernel.org/pub/scm/linux/kernel/git/kdave/btrfs-progs.git
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] fstests: generic test for fsync after hole punching

2015-11-02 Thread fdmanana
From: Filipe Manana 

Test that a file fsync works after punching a hole for the same file
range multiple times, and that after log/journal replay the file's
content and layout are correct.

This test is motivated by a bug found in btrfs, which is fixed by
the following linux kernel patch:

  "Btrfs: fix hole punching when using the no-holes feature"

Signed-off-by: Filipe Manana 
---
 tests/generic/110 | 123 ++
 tests/generic/110.out |  13 ++
 tests/generic/group   |   1 +
 3 files changed, 137 insertions(+)
 create mode 100755 tests/generic/110
 create mode 100644 tests/generic/110.out

diff --git a/tests/generic/110 b/tests/generic/110
new file mode 100755
index 000..1e3daac
--- /dev/null
+++ b/tests/generic/110
@@ -0,0 +1,123 @@
+#! /bin/bash
+# FSQA Test No. 110
+#
+# Test that a file fsync works after punching a hole for the same file range
+# multiple times and that after log/journal replay the file's content is
+# correct.
+#
+# This test is motivated by a bug found in btrfs.
+#
+#---
+#
+# Copyright (C) 2015 SUSE Linux Products GmbH. All Rights Reserved.
+# Author: Filipe Manana 
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#---
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo "QA output created by $seq"
+tmp=/tmp/$$
+status=1   # failure is the default!
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+_cleanup()
+{
+   _cleanup_flakey
+   cd /
+   rm -f $tmp.*
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+. ./common/punch
+. ./common/dmflakey
+
+# real QA test starts here
+_need_to_be_root
+_supported_fs generic
+_supported_os Linux
+_require_scratch
+_require_xfs_io_command "fpunch"
+_require_xfs_io_command "fiemap"
+_require_dm_target flakey
+_require_metadata_journaling $SCRATCH_DEV
+
+# This test was motivated by an issue found in btrfs when the btrfs no-holes
+# feature is enabled (introduced in kernel 3.14). So enable the feature if the
+# fs being tested is btrfs.
+if [ $FSTYP == "btrfs" ]; then
+   _require_btrfs_fs_feature "no_holes"
+   _require_btrfs_mkfs_feature "no-holes"
+   MKFS_OPTIONS="$MKFS_OPTIONS -O no-holes"
+fi
+
+rm -f $seqres.full
+
+_scratch_mkfs >>$seqres.full 2>&1
+_init_flakey
+_mount_flakey
+
+# Create out test file with some data and then fsync it.
+# We do the fsync only to make sure the last fsync we do in this test triggers
+# the fast code path of btrfs' fsync implementation, a condition necessary to
+# trigger the bug btrfs had.
+$XFS_IO_PROG -f -c "pwrite -S 0xaa 0K 128K" \
+   -c "fsync"  \
+   $SCRATCH_MNT/foobar | _filter_xfs_io
+
+# Now punch a hole against the range [96K, 128K[.
+$XFS_IO_PROG -c "fpunch 96K 32K" $SCRATCH_MNT/foobar
+
+# Punch another hole against a range that overlaps the previous range and ends
+# beyond eof.
+$XFS_IO_PROG -c "fpunch 64K 128K" $SCRATCH_MNT/foobar
+
+# Punch another hole against a range that overlaps the first range ([96K, 
128K[)
+# and ends at eof.
+$XFS_IO_PROG -c "fpunch 32K 96K" $SCRATCH_MNT/foobar
+
+# Fsync our file. We want to verify that, after a power failure and mounting 
the
+# filesystem again, the file content reflects all the hole punch operations.
+$XFS_IO_PROG -c "fsync" $SCRATCH_MNT/foobar
+
+echo "File digest before power failure:"
+md5sum $SCRATCH_MNT/foobar | _filter_scratch
+
+echo "Fiemap before power failure:"
+$XFS_IO_PROG -c "fiemap -v" $SCRATCH_MNT/foobar | _filter_fiemap
+
+# Silently drop all writes and unmount to simulate a crash/power failure.
+_load_flakey_table $FLAKEY_DROP_WRITES
+_unmount_flakey
+
+# Allow writes again, mount to trigger log replay and validate file contents.
+_load_flakey_table $FLAKEY_ALLOW_WRITES
+_mount_flakey
+
+echo "File digest after log replay:"
+# Must match the same digest we got before the power failure.
+md5sum $SCRATCH_MNT/foobar | _filter_scratch
+
+echo "Fiemap after log replay:"
+# Must match the same extent listing we got before the power failure.
+$XFS_IO_PROG -c "fiemap -v" $SCRATCH_MNT/foobar | _filter_fiemap
+
+_unmount_flakey
+
+status=0
+exit
diff --git a/tests/generic/110.out b/tests/generic/110.out
new 

[PATCH] Btrfs: fix hole punching when using the no-holes feature

2015-11-02 Thread fdmanana
From: Filipe Manana 

When we are using the no-holes feature, if we punch a hole into a file
range that already contains a hole which overlaps the range we are passing
to fallocate(), we end up removing the extent map that represents the
existing hole without adding a new one. This happens because with the
no-holes feature we do not have explicit extent items to represent holes
and therefore the call to __btrfs_drop_extents(), made from
btrfs_punch_hole(), returns an end offset to the variable drop_end that
is smaller than the end of the range passed to fallocate(), while it
drops all existing extent maps in that range.
Normally having a missing extent map is not a problem, for example for
a readpages() operation we just end up building the extent map by
looking at the fs/subvol tree for a matching extent item (or a lack of
one for implicit holes). However for an fsync that uses the fast path,
which needs to look at the list of modified extent maps, this means
the fsync will not record information about the complete hole we had
before the fallocate() call into the log tree, resulting in a file with
content/layout that does not match what we had neither before nor after
the hole punch operation.

The following test case for fstests reproduces the issue. It fails without
this change because we get a file with a different digest after the fsync
log replay and also with a different extent/hole layout.

  seq=`basename $0`
  seqres=$RESULT_DIR/$seq
  echo "QA output created by $seq"
  tmp=/tmp/$$
  status=1  # failure is the default!
  trap "_cleanup; exit \$status" 0 1 2 3 15

  _cleanup()
  {
 _cleanup_flakey
 rm -f $tmp.*
  }

  # get standard environment, filters and checks
  . ./common/rc
  . ./common/filter
  . ./common/punch
  . ./common/dmflakey

  # real QA test starts here
  _need_to_be_root
  _supported_fs generic
  _supported_os Linux
  _require_scratch
  _require_xfs_io_command "fpunch"
  _require_xfs_io_command "fiemap"
  _require_dm_target flakey
  _require_metadata_journaling $SCRATCH_DEV

  # This test was motivated by an issue found in btrfs when the btrfs
  # no-holes feature is enabled (introduced in kernel 3.14). So enable
  # the feature if the fs being tested is btrfs.
  if [ $FSTYP == "btrfs" ]; then
  _require_btrfs_fs_feature "no_holes"
  _require_btrfs_mkfs_feature "no-holes"
  MKFS_OPTIONS="$MKFS_OPTIONS -O no-holes"
  fi

  rm -f $seqres.full

  _scratch_mkfs >>$seqres.full 2>&1
  _init_flakey
  _mount_flakey

  # Create out test file with some data and then fsync it.
  # We do the fsync only to make sure the last fsync we do in this test
  # triggers the fast code path of btrfs' fsync implementation, a
  # condition necessary to trigger the bug btrfs had.
  $XFS_IO_PROG -f -c "pwrite -S 0xaa 0K 128K" \
  -c "fsync"  \
  $SCRATCH_MNT/foobar | _filter_xfs_io

  # Now punch a hole against the range [96K, 128K[.
  $XFS_IO_PROG -c "fpunch 96K 32K" $SCRATCH_MNT/foobar

  # Punch another hole against a range that overlaps the previous range
  # and ends beyond eof.
  $XFS_IO_PROG -c "fpunch 64K 128K" $SCRATCH_MNT/foobar

  # Punch another hole against a range that overlaps the first range
  # ([96K, 128K[) and ends at eof.
  $XFS_IO_PROG -c "fpunch 32K 96K" $SCRATCH_MNT/foobar

  # Fsync our file. We want to verify that, after a power failure and
  # mounting the filesystem again, the file content reflects all the hole
  # punch operations.
  $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/foobar

  echo "File digest before power failure:"
  md5sum $SCRATCH_MNT/foobar | _filter_scratch

  echo "Fiemap before power failure:"
  $XFS_IO_PROG -c "fiemap -v" $SCRATCH_MNT/foobar | _filter_fiemap

  # Silently drop all writes and umount to simulate a crash/power failure.
  _load_flakey_table $FLAKEY_DROP_WRITES
  _unmount_flakey

  # Allow writes again, mount to trigger log replay and validate file
  # contents.
  _load_flakey_table $FLAKEY_ALLOW_WRITES
  _mount_flakey

  echo "File digest after log replay:"
  # Must match the same digest we got before the power failure.
  md5sum $SCRATCH_MNT/foobar | _filter_scratch

  echo "Fiemap after log replay:"
  # Must match the same extent listing we got before the power failure.
  $XFS_IO_PROG -c "fiemap -v" $SCRATCH_MNT/foobar | _filter_fiemap

  _unmount_flakey

  status=0
  exit

Signed-off-by: Filipe Manana 
---
 fs/btrfs/file.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 381be79..0c48d94 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -2489,6 +2489,19 @@ static int btrfs_punch_hole(struct inode *inode, loff_t 
offset, loff_t len)
 
trans->block_rsv = >fs_info->trans_block_rsv;
/*
+* If we are using the NO_HOLES feature we might have had already an
+* hole that overlaps a part of the region [lockstart, lockend] and
+* ends at (or beyond) 

Process is blocked for more than 120 seconds

2015-11-02 Thread Dmitry Katsubo
Hi everyone,

I have noticed the following in the log. The system continues to run,
but I am not sure for how long it will be stable.

# uname -a
Linux Debian 4.2.3-2~bpo8+1 (2015-10-20) i686 GNU/Linux

# mount | grep /var
/dev/sdd2 on /var type btrfs
(rw,noatime,compress=lzo,space_cache,subvolid=258,subvol=/var)

> [Mon Nov  2 06:35:57 2015] INFO: task nscd:859 blocked for more than 120 
> seconds.
> [Mon Nov  2 06:35:57 2015]   Not tainted 4.2.0-0.bpo.1-686-pae #1
> [Mon Nov  2 06:35:57 2015] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
> disables this message.
> [Mon Nov  2 06:35:57 2015] nscdD f1c7dd20 0   859  1 
> 0x
> [Mon Nov  2 06:35:57 2015]  f1c7dd40 00200082 f79de900 f1c7dd20 c10bc119 
> ffe0 f3aec740 00200246
> [Mon Nov  2 06:35:57 2015]  f74ea800 f79e3f40 f77fb800 f1c7e000 f6b381dc 
> f6b38000 f1c7dd4c c14f1fdb
> [Mon Nov  2 06:35:57 2015]  d5553960 f1c7dd70 f867672f  f77fb800 
> c1099250 d0a4be08 d9755e68
> [Mon Nov  2 06:35:57 2015] Call Trace:
> [Mon Nov  2 06:35:57 2015]  [] ? del_timer_sync+0x49/0x50
> [Mon Nov  2 06:35:57 2015]  [] ? schedule+0x2b/0x80
> [Mon Nov  2 06:35:57 2015]  [] ? 
> wait_current_trans.isra.21+0x8f/0xf0 [btrfs]
> [Mon Nov  2 06:35:57 2015]  [] ? wait_woken+0x80/0x80
> [Mon Nov  2 06:35:57 2015]  [] ? start_transaction+0x3d0/0x5d0 
> [btrfs]
> [Mon Nov  2 06:35:57 2015]  [] ? 
> btrfs_delalloc_reserve_metadata+0x32d/0x580 [btrfs]
> [Mon Nov  2 06:35:57 2015]  [] ? btrfs_dirty_inode+0xb0/0xb0 [btrfs]
> [Mon Nov  2 06:35:57 2015]  [] ? btrfs_join_transaction+0x23/0x30 
> [btrfs]
> [Mon Nov  2 06:35:57 2015]  [] ? btrfs_dirty_inode+0x39/0xb0 [btrfs]
> [Mon Nov  2 06:35:57 2015]  [] ? btrfs_dirty_inode+0xb0/0xb0 [btrfs]
> [Mon Nov  2 06:35:57 2015]  [] ? file_update_time+0x7e/0xc0
> [Mon Nov  2 06:35:57 2015]  [] ? btrfs_page_mkwrite+0x80/0x3c0 
> [btrfs]
> [Mon Nov  2 06:35:57 2015]  [] ? hrtimer_cancel+0x19/0x20
> [Mon Nov  2 06:35:57 2015]  [] ? futex_wait+0x1e1/0x270
> [Mon Nov  2 06:35:57 2015]  [] ? do_page_mkwrite+0x38/0x90
> [Mon Nov  2 06:35:57 2015]  [] ? do_wp_page+0x2e2/0x6d0
> [Mon Nov  2 06:35:57 2015]  [] ? futex_wake+0x71/0x140
> [Mon Nov  2 06:35:57 2015]  [] ? kmap_atomic_prot+0xe7/0x110
> [Mon Nov  2 06:35:57 2015]  [] ? handle_mm_fault+0xd59/0x14d0
> [Mon Nov  2 06:35:57 2015]  [] ? __do_page_fault+0x18c/0x480
> [Mon Nov  2 06:35:57 2015]  [] ? __do_page_fault+0x480/0x480
> [Mon Nov  2 06:35:57 2015]  [] ? error_code+0x67/0x6c
> [Mon Nov  2 06:35:57 2015] INFO: task nscd:864 blocked for more than 120 
> seconds.
> [Mon Nov  2 06:35:57 2015]   Not tainted 4.2.0-0.bpo.1-686-pae #1
> [Mon Nov  2 06:35:57 2015] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
> disables this message.
> [Mon Nov  2 06:35:57 2015] nscdD f1c87f5c 0   864  1 
> 0x
> [Mon Nov  2 06:35:57 2015]  f1c87ef4 00200082 f1c87f80 f1c87f5c 03e7 
> f1c87ee4 f3aec740 ac76c560
> [Mon Nov  2 06:35:57 2015]  f74ea800 f79e3f40 f3c7b040 f1c88000 f3c7b040 
> 0001 f1c87f00 c14f1fdb
> [Mon Nov  2 06:35:57 2015]  f3aec77c f1c87f38 c14f4265 f1c87f1c f3aec780 
> f3aec788  0125
> [Mon Nov  2 06:35:57 2015] Call Trace:
> [Mon Nov  2 06:35:57 2015]  [] ? schedule+0x2b/0x80
> [Mon Nov  2 06:35:57 2015]  [] ? rwsem_down_write_failed+0x185/0x280
> [Mon Nov  2 06:35:57 2015]  [] ? 
> call_rwsem_down_write_failed+0x6/0x8
> [Mon Nov  2 06:35:57 2015]  [] ? down_write+0x25/0x40
> [Mon Nov  2 06:35:57 2015]  [] ? vm_mmap_pgoff+0x4a/0xa0
> [Mon Nov  2 06:35:57 2015]  [] ? SyS_fstat64+0x28/0x30
> [Mon Nov  2 06:35:57 2015]  [] ? SyS_mmap_pgoff+0x110/0x210
> [Mon Nov  2 06:35:57 2015]  [] ? sysenter_do_call+0x12/0x12
> [Mon Nov  2 06:35:57 2015] INFO: task nmbd:1330 blocked for more than 120 
> seconds.
> [Mon Nov  2 06:35:57 2015]   Not tainted 4.2.0-0.bpo.1-686-pae #1
> [Mon Nov  2 06:35:57 2015] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
> disables this message.
> [Mon Nov  2 06:35:57 2015] nmbdD  0  1330  1 
> 0x
> [Mon Nov  2 06:35:57 2015]  ef44bd74 00200086    
>  f3984900 
> [Mon Nov  2 06:35:57 2015]  f69e1800 f79e3f40 f3a7a800 ef44c000 d17255a0 
> d17255a0 ef44bd80 c14f1fdb
> [Mon Nov  2 06:35:57 2015]  d1725600 ef44bdc8 f86961b5 000d3fff  
> 1000  000d3000
> [Mon Nov  2 06:35:57 2015] Call Trace:
> [Mon Nov  2 06:35:57 2015]  [] ? schedule+0x2b/0x80
> [Mon Nov  2 06:35:57 2015]  [] ? 
> btrfs_start_ordered_extent+0xd5/0x100 [btrfs]
> [Mon Nov  2 06:35:57 2015]  [] ? wait_woken+0x80/0x80
> [Mon Nov  2 06:35:57 2015]  [] ? 
> lock_and_cleanup_extent_if_need+0x134/0x260 [btrfs]
> [Mon Nov  2 06:35:57 2015]  [] ? prepare_pages+0xc6/0x150 [btrfs]
> [Mon Nov  2 06:35:57 2015]  [] ? __btrfs_buffered_write+0x17a/0x5e0 
> [btrfs]
> [Mon Nov  2 06:35:57 2015]  [] ? __alloc_pages_nodemask+0x133/0x880
> [Mon Nov  2 06:35:57 2015]  [] ? btrfs_file_write_iter+0x1e5/0x550 
> [btrfs]
> [Mon Nov  2 

Re: trying to balance, filesystem keeps going read-only.

2015-11-02 Thread Austin S Hemmelgarn
On 2015-11-01 09:33, Ken Long wrote:
> I get a similar read-only status when I try to remove the drive from the 
> array..
> 
> Too bad the utility's function can not be slowed down.. to avoid
> triggering this error... ?
> 
Actually, there are a couple of ways you could do this.  The most reliable way 
to do it
(and arguably the only correct way) is to use the blkio cgroup to put bandwidth 
 or
IOPS limits on the process.  For authoritative info about how to do this, check
Documentation/cgroups/blkio-controller.txt in the Linux source tree.

If the issue really is the device not responding soon enough, you may also try 
increasing
the device timeout the kernel uses.  A udev rule like the following will 
increase the
timeout for all ATA/SCSI/USB (it says SCSI devices, but all ATA and USB devices 
get
routed through the SCSI subsystem anyway unless you're using really old and 
deprecated
drivers) devices to 150 seconds (2.5 minutes, which is reasonable for most 
non-enterprise
devices):

DRIVER=="sd", SUBSYSTEM=="scsi", ENV{DEVTYPE}=="scsi_device", 
ATTR{timeout}="150"



smime.p7s
Description: S/MIME Cryptographic Signature


Re: "free_raid_bio" crash on RAID6

2015-11-02 Thread Tobias Holst
Hi

No, I never figured this out... After a while of waiting for answers I
just started over and took the data from my backup.

> Did you try removing the bad drive and did the system keep crashing anyway?

As you can see in my first mail the drive was already removed when
this error started to happen ("some devices missing"). ;)

Regards,
Tobias


2015-10-18 16:14 GMT+02:00 Philip Seeger :
> Hi Tobias
>
> On 07/20/2015 06:20 PM, Tobias Holst wrote:
>>
>> My btrfs-RAID6 seems to be broken again :(
>>
>> When reading from it I get several of these:
>> [  176.349943] BTRFS info (device dm-4): csum failed ino 1287707
>> extent 21274957705216 csum 2830458701 wanted 426660650 mirror 2
>>
>> then followed by a "free_raid_bio"-crash:
>>
>> [  176.349961] [ cut here ]
>> [  176.349981] WARNING: CPU: 6 PID: 110 at
>> /home/kernel/COD/linux/fs/btrfs/raid56.c:831
>> __free_raid_bio+0xfc/0x130 [btrfs]()
>> ...
>
>
> It's been 3 months now, have you ever figured this out? Do you know if the
> bug has been identified and fixed or have you filed a bugzilla report?
>
>> One drive is broken, so at the moment it is mounted with "-O
>> defaults,ro,degraded,recovery,compress=lzo,space_cache,subvol=raid".
>
>
> Did you try removing the bad drive and did the system keep crashing anyway?
>
>
>
> Philip
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html