3.15 btrfs free space cache oops

2014-08-11 Thread Daniel J Blueman
When running MonetDB against a BTRFS RAID-0 set over 4 SSDs [1] on
3.15.5, we see io_ctl have a bad address of 0x20, causing a fatal
pagefault in memcpy():

(gdb) list *(__btrfs_write_out_cache+0x3e4)
0x81365984 is in __btrfs_write_out_cache
(fs/btrfs/free-space-cache.c:521).
516if (io_ctl-index = io_ctl-num_pages)
517return -ENOSPC;
518io_ctl_map_page(io_ctl, 0);
519}
520
521memcpy(io_ctl-cur, bitmap, PAGE_CACHE_SIZE);
522io_ctl_set_crc(io_ctl, io_ctl-index - 1);
523if (io_ctl-index  io_ctl-num_pages)
524io_ctl_map_page(io_ctl, 0);
525return 0;

I can try to reproduce it if more data is useful?

Thanks,
  Daniel

-- [1]

mkfs.btrfs -f -m raid0 -d raid0 -n 16k -l 16k -O skinny-metadata
/dev/sda2 /dev/sdc2 /dev/sdb2 /dev/sdd2
mount /dev/sda2 /scratch -o noatime,discard,nodatasum,nobarrier,ssd_spread

-- [2]

BUG: unable to handle kernel paging request at 0020
IP: [8135a374] __btrfs_write_out_cache+0x3e4/0x8e0
PGD 3bca02c067 PUD 3bcf5fb067 PMD 0
Oops:  [#1] SMP
Modules linked in:
CPU: 34 PID: 46645 Comm: mserver5 Not tainted 3.15.5-server #7
Hardware name: Dell Inc. PowerEdge R815/0W13NR, BIOS 3.1.1 [1.1.54] 10/16/2013
task: 880a8c7234f0 ti: 8809aefcc000 task.ti: 8809aefcc000
RIP: 0010:[8135a374] [8135a374]
__btrfs_write_out_cache+0x3e4/0x8e0
RSP: 0018:8809aefcfc40 EFLAGS: 00010246
RAX: 004fb9321000 RBX: 8809aefcfca8 RCX: 0200
RDX: 1000 RSI: 0020 RDI: 884fb9321000
RBP: 8809aefcfd48 R08: 0200 R09: 
R10:  R11: 884fb9320ffc R12: 8831e3303740
R13: 880100579970 R14: 880bb38061c0 R15: 0020
FS: 7fb9447ed700() GS:884bbfc8() knlGS:
CS: 0010 DS:  ES:  CR0: 80050033
CR2: 0020 CR3: 00329b71c000 CR4: 000407e0
Stack:
 8809aefcfc90 0011 000e 884fbbc2c870
 880bb38061c0 8809aefcfc90 880bb3806058 880b02ec
 883bcd523800 8833d338f2c0 88476b1eb4e0 00b890cde000
Call Trace:
 [81a75b4b] ? _raw_spin_lock+0xb/0x20
 [8135c0e1] btrfs_write_out_cache+0xb1/0xf0
 [8130be0b] btrfs_write_dirty_block_groups+0x58b/0x670
 [813199c5] commit_cowonly_roots+0x195/0x250
 [8131b92f] btrfs_commit_transaction+0x41f/0x9b0
 [81358e85] ? btrfs_log_dentry_safe+0x55/0x70
 [8132b6b2] btrfs_sync_file+0x182/0x2a0
 [8114a450] do_fsync+0x50/0x80
 [8114a6de] SyS_fdatasync+0xe/0x20
 [81a766e6] system_call_fastpath+0x1a/0x1f
Code: ff 4d 89 fc 49 89 c7 e9 ab 00 00 00 0f 1f 00 40 f6 c7 02 0f 85
fe 00 00 00 40 f6 c7 04 0f 85 14 01 00 00 89 d1 c1 e9 03 f6 c2 04 f3
48 a5 74 09 8b 0e 89 0f b9 04 00 00 00 f6 c2 02 74 0e 44 0f
RIP [8135a374] __btrfs_write_out_cache+0x3e4/0x8e0
 RSP 8809aefcfc40
CR2: 0020
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] btrfs send/receive, page allocation failure

2014-08-11 Thread Filipe David Manana
On Mon, Aug 11, 2014 at 4:07 AM, Chris Murphy li...@colorremedies.com wrote:


 Can you try the following patch and confirm if it helps?
 https://patchwork.kernel.org/patch/4705171/

 This one applies without problems, I didn't build it because I saw v4. The v4 
 patch I get:

 + patch -p1 -F1 -s
 4 out of 4 hunks FAILED -- saving rejects to file fs/btrfs/send.c.rej

 Should v4 alone be applied over 3.16.0? Or each version in succession?

Alone. How did you try to apply it to 3.16? Try cd source_dir  git
am patchfile if you didn't (e.g. you used patch command directly).

thanks



 Chris Murphy



-- 
Filipe David Manana,

Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] btrfs-progs: random fixes of btrfs-filesystem documentation

2014-08-11 Thread Satoru Takeuchi
From: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com

 - Simplify and unify the description of both man and usage.
 - Fix to show -m and -d is not exclusive
   with path|uuid|device|label.
 - Add the description about short options for --mounted and
   --all-devices, -m and -d respectively.
 - Move the descriptions of options to Options section.

Signed-off-by: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com

---
 Documentation/btrfs-filesystem.txt | 22 ++
 cmds-filesystem.c  | 15 ++-
 2 files changed, 24 insertions(+), 13 deletions(-)

diff --git a/Documentation/btrfs-filesystem.txt 
b/Documentation/btrfs-filesystem.txt
index c9c0b00..fe68496 100644
--- a/Documentation/btrfs-filesystem.txt
+++ b/Documentation/btrfs-filesystem.txt
@@ -20,15 +20,21 @@ SUBCOMMAND
 *df* path [path...]::
 Show space usage information for a mount point.
 
-*show* [--mounted|--all-devices|path|uuid|device|label]::
-Show the btrfs filesystem with some additional info.
+*show* [-d|-m] [path|uuid|device|label]::
+Show the structure of btrfs filesystem(s).
 +
-If no option nor path|uuid|device|label is passed, btrfs shows
-information of all the btrfs filesystem both mounted and unmounted.
-If '--mounted' is passed, it would probe btrfs kernel to list mounted btrfs
-filesystem(s);
-If '--all-devices' is passed, all the devices under /dev are scanned;
-otherwise the devices list is extracted from the /proc/partitions file.
+If none of 'path|uuid|device|label' is passed, btrfs shows
+information of all the btrfs filesystems both mounted and unmounted.
++
+The show command finds btrfs filesystems by scanning all the devices
+in /proc/partitions by default.
++
+`Options`
++
+-d|--alldevices
+scan all the devices under /dev
+-m|--mounted 
+scan only mounted filesystems
 
 *sync* path::
 Force a sync for the filesystem identified by path.
diff --git a/cmds-filesystem.c b/cmds-filesystem.c
index 38011e5..5a80a98 100644
--- a/cmds-filesystem.c
+++ b/cmds-filesystem.c
@@ -578,11 +578,16 @@ out:
 }
 
 static const char * const cmd_show_usage[] = {
-   btrfs filesystem show [options] [path|uuid|device|label],
-   Show the structure of a filesystem,
-   -d|--all-devices   show only disks under /dev containing btrfs 
filesystem,
-   -m|--mounted   show only mounted btrfs,
-   If no argument is given, structure of all present filesystems is 
shown.,
+   btrfs filesystem show [-d|-m] [path|uuid|device|label],
+   Show the structure of btrfs filesystem(s).,
+   If none of 'path|uuid|device|label' is passed, btrfs shows,
+   information of all the btrfs filesystems both mounted and unmounted.,
+   ,
+   The show command finds btrfs filesystems by scanning all the devices,
+   in /proc/partitions by default.,
+   ,
+   -d|--all-devices   scan all the devices under /dev,
+   -m|--mounted   scan only mounted filesystems,
NULL
 };
 
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] btrfs-progs: avoid to use numeric literal for the size of uuid buffer

2014-08-11 Thread Satoru Takeuchi
From: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com

Replace a numeric literal to more descriptive macro for
the size of uuid buffer.

Signed-of-by: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com

---
 cmds-filesystem.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/cmds-filesystem.c b/cmds-filesystem.c
index 5a80a98..7633f1f 100644
--- a/cmds-filesystem.c
+++ b/cmds-filesystem.c
@@ -603,7 +603,7 @@ static int cmd_show(int argc, char **argv)
char mp[BTRFS_PATH_NAME_MAX + 1];
char path[PATH_MAX];
__u8 fsid[BTRFS_FSID_SIZE];
-   char uuid_buf[37];
+   char uuid_buf[BTRFS_UUID_UNPARSED_SIZE];
int found = 0;
 
while (1) {
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] btrfs-progs: Show error message if btrfs filesystem show failed to find any btrfs filesystem

2014-08-11 Thread Satoru Takeuchi
From: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com

Current btrfs doesn't display any error message if this command
failed to find any btrfs filesystem corresponding to
path|uuid|device|label which user specified.

Signed-off-by: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com

---
 cmds-filesystem.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/cmds-filesystem.c b/cmds-filesystem.c
index 7633f1f..2f78e24 100644
--- a/cmds-filesystem.c
+++ b/cmds-filesystem.c
@@ -695,6 +695,7 @@ static int cmd_show(int argc, char **argv)
ret = btrfs_scan_kernel(search);
if (search  !ret) {
/* since search is found we are done */
+   found = 1;
goto out;
}
 
@@ -729,6 +730,15 @@ devs_only:
btrfs_close_devices(fs_devices);
}
 out:
+   if (search  !found) {
+   fprintf(stderr,
+   ERROR: Couldn't find any btrfs filesystem 
+   matches with '%s'.\n, search);
+   fprintf(stderr,
+   Please check if both '%s' and the range of scanning 
+   are correct.\n, search);
+   }
+
printf(%s\n, BTRFS_BUILD_VERSION);
free_seen_fsid();
return ret;
-- 
1.9.3

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/4] btrfs: update sprout seed pointer when seed fs is relinquished

2014-08-11 Thread Anand Jain
We are not updating sprout fs seed pointer when all seed device
is replaced. This patch will check if all seed device has been
replaced and then update the sprout pointer accordingly.

Same reproducer as in the previous patch would apply here.
And notice that btrfs_close_device will check if seed fs is
present and spits out the error with out this patch.

int btrfs_close_devices(struct btrfs_fs_devices *fs_devices)
{
::
seed_devices = fs_devices-seed;
::
while (seed_devices) {
fs_devices = seed_devices;
seed_devices = fs_devices-seed;
__btrfs_close_devices(fs_devices);
free_fs_devices(fs_devices);
}

Signed-off-by: Anand Jain anand.j...@oracle.com
---
 fs/btrfs/volumes.c | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index f098ae7..bfdc11f 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1992,6 +1992,25 @@ void btrfs_rm_dev_replace_srcdev(struct btrfs_fs_info 
*fs_info,
btrfs_scratch_superblock(srcdev);
}
 
+   /* unless fs_devices is seed fs, num_devices shouldn't go
+* zero
+*/
+   BUG_ON(!fs_devices-num_devices  !fs_devices-seeding);
+
+   /* if this is no devs we rather delete the fs_devices */
+   if (!fs_devices-num_devices) {
+   struct btrfs_fs_devices *tmp_fs_devices;
+
+   tmp_fs_devices = fs_info-fs_devices;
+   while (tmp_fs_devices) {
+   if (tmp_fs_devices-seed == fs_devices) {
+   tmp_fs_devices-seed = fs_devices-seed;
+   break;
+   }
+   tmp_fs_devices = tmp_fs_devices-seed;
+   }
+   fs_devices-seed = NULL;
+   }
call_rcu(srcdev-rcu, free_device);
 }
 
-- 
2.0.0.153.g79d

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/4] btrfs: fix rw_devices miss match after seed replace

2014-08-11 Thread Anand Jain
reproducer:
reproducer:
mount /dev/sdb /btrfs
btrfs dev add /dev/sdc /btrfs
btrfs rep start -B /dev/sdb /dev/sdd /btrfs
umount /btrfs

WARNING: CPU: 0 PID: 3882 at fs/btrfs/volumes.c:892 
__btrfs_close_devices+0x1c8/0x200 [btrfs]()

which is

WARN_ON(fs_devices-rw_devices);

   The problem here is that we did not add one to the rw_devices when
   we replace the seed device with a writable device.

Signed-off-by: Anand Jain anand.j...@oracle.com
---
 fs/btrfs/dev-replace.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index eea26e1..fb0a7fa 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -562,6 +562,8 @@ static int btrfs_dev_replace_finishing(struct btrfs_fs_info 
*fs_info,
if (fs_info-fs_devices-latest_bdev == src_device-bdev)
fs_info-fs_devices-latest_bdev = tgt_device-bdev;
list_add(tgt_device-dev_alloc_list, fs_info-fs_devices-alloc_list);
+   if (src_device-fs_devices-seeding)
+   fs_info-fs_devices-rw_devices++;
 
/* replace the sysfs entry */
btrfs_kobj_rm_device(fs_info, src_device);
-- 
2.0.0.153.g79d

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/4] btrfs: preparatory to make btrfs_rm_dev_replace_srcdev() seed aware

2014-08-11 Thread Anand Jain
There is no logical change in this patch, just a preparatory patch,
so that changes can be easily reasoned.

Signed-off-by: Anand Jain anand.j...@oracle.com
---
 fs/btrfs/volumes.c | 14 +-
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 5f634b6..5fd0132 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1960,19 +1960,23 @@ error_undo:
 void btrfs_rm_dev_replace_srcdev(struct btrfs_fs_info *fs_info,
 struct btrfs_device *srcdev)
 {
+   struct btrfs_fs_devices *fs_devices;
+
WARN_ON(!mutex_is_locked(fs_info-fs_devices-device_list_mutex));
 
+   fs_devices = fs_info-fs_devices;
+
list_del_rcu(srcdev-dev_list);
list_del_rcu(srcdev-dev_alloc_list);
-   fs_info-fs_devices-num_devices--;
+   fs_devices-num_devices--;
if (srcdev-missing) {
-   fs_info-fs_devices-missing_devices--;
-   fs_info-fs_devices-rw_devices++;
+   fs_devices-missing_devices--;
+   fs_devices-rw_devices++;
}
if (srcdev-can_discard)
-   fs_info-fs_devices-num_can_discard--;
+   fs_devices-num_can_discard--;
if (srcdev-bdev) {
-   fs_info-fs_devices-open_devices--;
+   fs_devices-open_devices--;
 
/*
 * zero out the old super if it is not writable
-- 
2.0.0.153.g79d

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/4] btrfs: replace seed device followed by unmount causes kernel WARNING

2014-08-11 Thread Anand Jain
reproducer:
mount /dev/sdb /btrfs
btrfs dev add /dev/sdc /btrfs
btrfs rep start -B /dev/sdb /dev/sdd /btrfs
umount /btrfs

WARNING: CPU: 0 PID: 12661 at fs/btrfs/volumes.c:891 
__btrfs_close_devices+0x1b0/0x200 [btrfs]()
::

__btrfs_close_devices()
::
WARN_ON(fs_devices-open_devices);

After the seed device has been replaced the new target device
is no more a seed device. So we need to update the device
numbers in the fs_devices as pointed by the fs_info.

Signed-off-by: Anand Jain anand.j...@oracle.com
---
 fs/btrfs/volumes.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 5fd0132..f098ae7 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1964,7 +1964,13 @@ void btrfs_rm_dev_replace_srcdev(struct btrfs_fs_info 
*fs_info,
 
WARN_ON(!mutex_is_locked(fs_info-fs_devices-device_list_mutex));
 
-   fs_devices = fs_info-fs_devices;
+   /*
+* in case of fs with no seed, srcdev-fs_devices will point
+* to fs_devices of fs_info. However when the dev being replaced is
+* a seed dev it will point to the seed's local fs_devices. In short
+* srcdev will have its correct fs_devices in both the cases.
+*/
+   fs_devices = srcdev-fs_devices;
 
list_del_rcu(srcdev-dev_list);
list_del_rcu(srcdev-dev_alloc_list);
-- 
2.0.0.153.g79d

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: replace seed device followed by unmount causes kernel WARNING

2014-08-11 Thread Anand Jain



I have sent out the patch-set

 [PATCH 1/4] btrfs: preparatory to make btrfs_rm_dev_replace_srcdev() 
seed aware


in replacement for this patch. Kindly use/review the above patch set.

Thanks. Anand



On 31/07/2014 16:45, Anand Jain wrote:



On 30/07/2014 15:42, Miao Xie wrote:

On Fri, 25 Jul 2014 20:33:34 +0800, Anand Jain wrote:

After the seed device has been replaced the new target device
is no more a seed device. So we need to bring that state in
the fs_devices.

reproducer:
mount /dev/sdb /btrfs
btrfs dev add /dev/sdc /btrfs
btrfs rep start -B /dev/sdb /dev/sdd /btrfs
umount /btrfs

WARNING: CPU: 0 PID: 12661 at fs/btrfs/volumes.c:891
__btrfs_close_devices+0x1b0/0x200 [btrfs]()
::

__btrfs_close_devices()
::
 WARN_ON(fs_devices-open_devices);
 WARN_ON(fs_devices-rw_devices);

per the btrfs-devlist tool (to dump fs_devices and
btrfs_device from the kernel) the num_device, open_devices,
rw_devices are still at 1 but the total_device is at 2,
even after the seed device has been replaced in the above example.

Signed-off-by: Anand Jain anand.j...@oracle.com
---
  fs/btrfs/dev-replace.c | 13 +
  1 file changed, 13 insertions(+)

diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index eea26e1..a144bb1 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -569,6 +569,19 @@ static int btrfs_dev_replace_finishing(struct
btrfs_fs_info *fs_info,

  btrfs_rm_dev_replace_blocked(fs_info);

+/*
+ * if we are replacing a seed device with a writable device
+ * then FS won't be a seeding FS any more.
+ */
+if (src_device-fs_devices-seeding  !src_device-writeable) {


First, why not move this code into btrfs_rm_dev_replace_srcdev()?

Then if the first condition is true, the second
one(!src_device-writeable) must be true
because all the devices in the seed fs_device must be read-only. so
only the first
check is enough.


+fs_info-fs_devices-rw_devices++;


If src is missing dev, we would increase it twice.


+fs_info-fs_devices-num_devices++;
+fs_info-fs_devices-open_devices++;
+
+fs_info-fs_devices-seeding = 0;
+fs_info-fs_devices-seed = NULL;


In fact, we may have several seed fs_devices in one fs, and the seed
fs_device
which includes src might not the first one, so assign seed to be NULL
would break
the seed fs_device list.


  Yep I had question when writing this patch but later decided
  to reset seed and seeding. if I am not wrong don't reset
  seeding and seed will do as well.

Thanks for reviewing.
Anand


Thanks
Miao


+}
+
  btrfs_rm_dev_replace_srcdev(fs_info, src_device);

  btrfs_rm_dev_replace_unblocked(fs_info);



--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: don't monopolize a core when evicting inode

2014-08-11 Thread Satoru Takeuchi
(2014/08/08 10:47), Filipe Manana wrote:
 If an inode has a very large number of extent maps, we can spend
 a lot of time freeing them, which triggers a soft lockup warning.
 Therefore reschedule if we need to when freeing the extent maps
 while evicting the inode.
 
 I could trigger this all the time by running xfstests/generic/299 on
 a file system with the no-holes feature enabled. That test creates
 an inode with 11386677 extent maps.
 
  $ mkfs.btrfs -f -O no-holes $TEST_DEV
  $ MKFS_OPTIONS=-O no-holes ./check generic/299
  generic/299 382s ...
  Message from syslogd@debian-vm3 at Aug  7 10:44:29 ...
   kernel:[85304.208017] BUG: soft lockup - CPU#0 stuck for 22s! 
 [umount:25330]
   384s
  Ran: generic/299
  Passed all 1 tests
 
  $ dmesg
  (...)
  [86304.300017] BUG: soft lockup - CPU#0 stuck for 23s! [umount:25330]
  (...)
  [86304.300036] Call Trace:
  [86304.300036]  [81698ba9] __slab_free+0x54/0x295
  [86304.300036]  [a02ee9cc] ? free_extent_map+0x5c/0xb0 [btrfs]
  [86304.300036]  [811a6cd2] kmem_cache_free+0x282/0x2a0
  [86304.300036]  [a02ee9cc] free_extent_map+0x5c/0xb0 [btrfs]
  [86304.300036]  [a02e3775] btrfs_evict_inode+0xd5/0x660 [btrfs]
  [86304.300036]  [811e7c8d] ? 
 __inode_wait_for_writeback+0x6d/0xc0
  [86304.300036]  [816a389b] ? _raw_spin_unlock+0x2b/0x40
  [86304.300036]  [811d8cbb] evict+0xab/0x180
  [86304.300036]  [811d8dce] dispose_list+0x3e/0x60
  [86304.300036]  [811d9b04] evict_inodes+0xf4/0x110
  [86304.300036]  [811bd953] generic_shutdown_super+0x53/0x110
  [86304.300036]  [811bdaa6] kill_anon_super+0x16/0x30
  [86304.300036]  [a02a78ba] btrfs_kill_super+0x1a/0xa0 [btrfs]
  [86304.300036]  [811bd3a9] deactivate_locked_super+0x59/0x80
  [86304.300036]  [811be44e] deactivate_super+0x4e/0x70
  [86304.300036]  [811dec14] mntput_no_expire+0x174/0x1f0
  [86304.300036]  [811deab7] ? mntput_no_expire+0x17/0x1f0
  [86304.300036]  [811e0517] SyS_umount+0x97/0x100
  (...)
 
 Signed-off-by: Filipe Manana fdman...@suse.com

Reviewed-by: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com
Tested-by: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com

 ---
   fs/btrfs/inode.c | 6 ++
   1 file changed, 6 insertions(+)
 
 diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
 index 8ad3ea9..00b4bd3 100644
 --- a/fs/btrfs/inode.c
 +++ b/fs/btrfs/inode.c
 @@ -4718,6 +4718,11 @@ static void evict_inode_truncate_pages(struct inode 
 *inode)
   clear_bit(EXTENT_FLAG_LOGGING, em-flags);
   remove_extent_mapping(map_tree, em);
   free_extent_map(em);
 + if (need_resched()) {
 + write_unlock(map_tree-lock);
 + cond_resched();
 + write_lock(map_tree-lock);
 + }
   }
   write_unlock(map_tree-lock);
   
 @@ -4740,6 +4745,7 @@ static void evict_inode_truncate_pages(struct inode 
 *inode)
cached_state, GFP_NOFS);
   free_extent_state(state);
   
 + cond_resched();
   spin_lock(io_tree-lock);
   }
   spin_unlock(io_tree-lock);
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: clone, don't create invalid hole extent map

2014-08-11 Thread Satoru Takeuchi
(2014/08/08 10:47), Filipe Manana wrote:
 When cloning a file that consists of an inline extent, we were creating
 an extent map that represents a non-existing trailing hole starting at a
 file offset that isn't a multiple of the sector size. This happened because
 when processing an inline extent we weren't aligning the extent's length to
 the sector size, and therefore incorrectly treating the range
 [inline_extent_length; sector_size[ as a hole.
 
 Signed-off-by: Filipe Manana fdman...@suse.com

Reviewed-by: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com

 ---
   fs/btrfs/ioctl.c | 3 ++-
   1 file changed, 2 insertions(+), 1 deletion(-)
 
 diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
 index d490abd..6e3a0d1 100644
 --- a/fs/btrfs/ioctl.c
 +++ b/fs/btrfs/ioctl.c
 @@ -3494,7 +3494,8 @@ process_slot:
   btrfs_mark_buffer_dirty(leaf);
   btrfs_release_path(path);
   
 - last_dest_end = new_key.offset + datal;
 + last_dest_end = ALIGN(new_key.offset + datal,
 +   root-sectorsize);
   ret = clone_finish_inode_update(trans, inode,
   last_dest_end,
   destoff, olen);
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


File system stuck in scrub

2014-08-11 Thread Nikolaus Rath
Hello,

I started a scrub of one of my btrfs filesystem and then had to restart
the system. `systemctl restart` seemed to terminate all processes, but
then got stuck at the end. The disk activity led was still flashing
rapidly at that point, so I assume that the active scrub was preventing
the reboot (is that a bug or a feature?).

In any case, I could not wait for that so I power cycled. But now my
file system seems to be stuck in a scrub that can neither be completed
nor cancelled:

$ sudo btrfs scrub status /home/nikratio/
scrub status for 8742472d-a9b0-4ab6-b67a-5d21f14f7a38
scrub started at Sun Aug 10 18:36:43 2014, running for 1562 seconds
total bytes scrubbed: 209.97GiB with 0 errors

$ date
Sun Aug 10 22:00:44 PDT 2014

$ sudo btrfs scrub cancel /home/nikratio/
ERROR: scrub cancel failed on /home/nikratio/: not running

$ sudo btrfs scrub start /home/nikratio/
ERROR: scrub is already running.
To cancel use 'btrfs scrub cancel /home/nikratio/'.
To see the status use 'btrfs scrub status [-d] /home/nikratio/'.

Note that the scrub was started more than 3 hours ago, but claims to
have been running for only 1562 seconds.

I then figured that maybe I need to run btrfsck. This gave the following
output:

checking extents
checking free space cache
checking fs roots
root 5 inode 3149791 errors 400, nbytes wrong
root 5 inode 3150233 errors 400, nbytes wrong
root 5 inode 3150238 errors 400, nbytes wrong
[102 similar lines]
Checking filesystem on /dev/mapper/vg0-nikratio_crypt
UUID: 8742472d-a9b0-4ab6-b67a-5d21f14f7a38
free space inode generation (0) did not match free space cache generation 
(161262)
free space inode generation (0) did not match free space cache generation 
(75485)
free space inode generation (0) did not match free space cache generation 
(79599)
free space inode generation (0) did not match free space cache generation 
(72280)
free space inode generation (0) did not match free space cache generation 
(79599)
free space inode generation (0) did not match free space cache generation 
(25866)
free space inode generation (0) did not match free space cache generation 
(12255)
free space inode generation (0) did not match free space cache generation 
(72521)
free space inode generation (0) did not match free space cache generation 
(161286)
free space inode generation (0) did not match free space cache generation 
(28716)
free space inode generation (0) did not match free space cache generation 
(161481)
found 216444746042 bytes used err is 1
total csum bytes: 383160676
total tree bytes: 875753472
total fs tree bytes: 284246016
total extent tree bytes: 69320704
btree space waste bytes: 205021777
file data blocks allocated: 3701556121600
 referenced 388107321344
Btrfs v3.14.1

So nothing about the scrub, but apparently some other errors.

Can someone tell me:

 * Should I be able to restart while a scrub is in progress, or is that
   deliberately prevented by btrfs?

 * How can I resume or cancel the scrub?

 * Is it more risky to leave the above errors uncorrected, or to run
   btrfsck with --repair?


I'm using kernel 3.14.

Thanks!
-Nikolaus

-- 
GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F
Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

 »Time flies like an arrow, fruit flies like a Banana.«
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: File system stuck in scrub

2014-08-11 Thread Hugo Mills
On Mon, Aug 11, 2014 at 08:12:46AM -0700, Nikolaus Rath wrote:
 I started a scrub of one of my btrfs filesystem and then had to restart
 the system. `systemctl restart` seemed to terminate all processes, but
 then got stuck at the end. The disk activity led was still flashing
 rapidly at that point, so I assume that the active scrub was preventing
 the reboot (is that a bug or a feature?).

   Shouldn't have stopped it.

 In any case, I could not wait for that so I power cycled. But now my
 file system seems to be stuck in a scrub that can neither be completed
 nor cancelled:
 
 $ sudo btrfs scrub status /home/nikratio/
 scrub status for 8742472d-a9b0-4ab6-b67a-5d21f14f7a38
 scrub started at Sun Aug 10 18:36:43 2014, running for 1562 seconds
 total bytes scrubbed: 209.97GiB with 0 errors
 
 $ date
 Sun Aug 10 22:00:44 PDT 2014
 
 $ sudo btrfs scrub cancel /home/nikratio/
 ERROR: scrub cancel failed on /home/nikratio/: not running
 
 $ sudo btrfs scrub start /home/nikratio/
 ERROR: scrub is already running.
 To cancel use 'btrfs scrub cancel /home/nikratio/'.
 To see the status use 'btrfs scrub status [-d] /home/nikratio/'.
 
 Note that the scrub was started more than 3 hours ago, but claims to
 have been running for only 1562 seconds.

   This is a regrettably common problem -- fortunately with a simple
solution. The userspace scrub monitor died in the reboot, leaving the
status file present. If you delete the status file, which is in
/var/lib/btrfs/, that should allow you to start a new scrub.

 I then figured that maybe I need to run btrfsck. This gave the following
 output:
 
 checking extents
 checking free space cache
 checking fs roots
 root 5 inode 3149791 errors 400, nbytes wrong
 root 5 inode 3150233 errors 400, nbytes wrong
 root 5 inode 3150238 errors 400, nbytes wrong
 [102 similar lines]
 Checking filesystem on /dev/mapper/vg0-nikratio_crypt
 UUID: 8742472d-a9b0-4ab6-b67a-5d21f14f7a38
 free space inode generation (0) did not match free space cache generation 
 (161262)
[snip]
 found 216444746042 bytes used err is 1
 total csum bytes: 383160676
 total tree bytes: 875753472
 total fs tree bytes: 284246016
 total extent tree bytes: 69320704
 btree space waste bytes: 205021777
 file data blocks allocated: 3701556121600
  referenced 388107321344
 Btrfs v3.14.1
 
 So nothing about the scrub, but apparently some other errors.

   The free space inode generation errors are harmless. The wrong
nbytes is probably not horrifically damaging, but I don't know so much
about that one.

 Can someone tell me:
 
  * Should I be able to restart while a scrub is in progress, or is that
deliberately prevented by btrfs?

   Restart the machine? Yes.

  * How can I resume or cancel the scrub?

   It's probably simply not running -- see above.

  * Is it more risky to leave the above errors uncorrected, or to run
btrfsck with --repair?

   I would, I think, leave them.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- We are all lying in the gutter,  but some of us are looking ---   
  at the stars.  


signature.asc
Description: Digital signature


Re: File system stuck in scrub

2014-08-11 Thread Calvin Walton
Hi,

On Mon, 2014-08-11 at 08:12 -0700, Nikolaus Rath wrote:
 Hello,
 
 I started a scrub of one of my btrfs filesystem and then had to 
 restart
 the system. `systemctl restart` seemed to terminate all processes, 
 but
 then got stuck at the end. The disk activity led was still flashing
 rapidly at that point, so I assume that the active scrub was 
 preventing
 the reboot (is that a bug or a feature?).
This sounds like a bug - I know that e.g. the rebalance operation is 
designed so that you can shutdown/reboot during the operation, and it 
will complete following a reboot. But I'm not familiar with the code 
in question.

 In any case, I could not wait for that so I power cycled. But now my
 file system seems to be stuck in a scrub that can neither be 
 completed
 nor cancelled:
 
 $ sudo btrfs scrub status /home/nikratio/
 scrub status for 8742472d-a9b0-4ab6-b67a-5d21f14f7a38
 scrub started at Sun Aug 10 18:36:43 2014, running for 1562 
 seconds
 total bytes scrubbed: 209.97GiB with 0 errors
 
 $ date
 Sun Aug 10 22:00:44 PDT 2014
 
 $ sudo btrfs scrub cancel /home/nikratio/
 ERROR: scrub cancel failed on /home/nikratio/: not running
 
 $ sudo btrfs scrub start /home/nikratio/
 ERROR: scrub is already running.
 To cancel use 'btrfs scrub cancel /home/nikratio/'.
 To see the status use 'btrfs scrub status [-d] /home/nikratio/'.
My guess is that this is a mismatch between some state stored by the 
userspace tools and the state in the kernel. One of the things you can 
try is to delete the files /var/lib/btrfs/scrub.status.* - that will 
force the btrfs tools to get the current status from the kernel (you 
will lose some statistics and scrub history.)

Running 'btrfs scrub status /home/nikratio/' after this should simply 
say 'no stats available', and you can start a new scrub later if you 
like.

 I then figured that maybe I need to run btrfsck. This gave the 
 following
 output:
As long as you didn't use --repair, this shouldn't break anything... 
Note that btrfs has to be run on an *unmounted* filesystem to give 
useful results.

  * Is it more risky to leave the above errors uncorrected, or to run
btrfsck with --repair?
There probably aren't any issues on the filesystem that the runtime 
btrfs code can't handle. Don't run with --repair, at least not yet.

 
 
 I'm using kernel 3.14.
 
 Thanks!
 -Nikolaus
 
 

-- 
Calvin Walton calvin.wal...@kepstin.ca
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: File system stuck in scrub

2014-08-11 Thread Marc MERLIN
On Mon, Aug 11, 2014 at 11:45:45AM -0400, Calvin Walton wrote:
  $ sudo btrfs scrub start /home/nikratio/
  ERROR: scrub is already running.
  To cancel use 'btrfs scrub cancel /home/nikratio/'.
  To see the status use 'btrfs scrub status [-d] /home/nikratio/'.
 My guess is that this is a mismatch between some state stored by the 
 userspace tools and the state in the kernel. One of the things you can 
 try is to delete the files /var/lib/btrfs/scrub.status.* - that will 
 force the btrfs tools to get the current status from the kernel (you 
 will lose some statistics and scrub history.)

No need to really delete it, just changing one character will do :)

http://marc.merlins.org/perso/btrfs/post_2014-04-26_Btrfs-Tips_-Cancel-A-Btrfs-Scrub-That-Is-Already-Stopped.html

Cheers,
Marc
-- 
A mouse is a device used to point at the xterm you want to type in - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] btrfs send/receive, page allocation failure

2014-08-11 Thread Chris Murphy

On Aug 11, 2014, at 2:46 AM, Filipe David Manana fdman...@gmail.com wrote:

 On Mon, Aug 11, 2014 at 4:07 AM, Chris Murphy li...@colorremedies.com wrote:
 
 
 Can you try the following patch and confirm if it helps?
 https://patchwork.kernel.org/patch/4705171/
 
 This one applies without problems, I didn't build it because I saw v4. The 
 v4 patch I get:
 
 + patch -p1 -F1 -s
 4 out of 4 hunks FAILED -- saving rejects to file fs/btrfs/send.c.rej
 
 Should v4 alone be applied over 3.16.0? Or each version in succession?
 
 Alone. How did you try to apply it to 3.16?

rpmbuild
https://fedoraproject.org/wiki/Building_a_custom_kernel

Gist is, save the patch, create patch filename entry into kernel.spec, then run 
rpmbuild. rpmbuild uses 'patch -p1 -F1 -s' to apply the patch. The v1 patch 
applies, as does Liu Bo's patch from July 29 Btrfs: fix regression of btrfs 
device replace which I was also going to test. But the v4 patch isn't applying.

 Try cd source_dir  git
 am patchfile if you didn't (e.g. you used patch command directly).

I'd kinda prefer to build an rpm since I need to test it on baremetal for this 
bug, and a VM for the device replace bug.


Chris Murphy--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] btrfs send/receive, page allocation failure

2014-08-11 Thread Filipe David Manana
On Mon, Aug 11, 2014 at 4:55 PM, Chris Murphy li...@colorremedies.com wrote:

 On Aug 11, 2014, at 2:46 AM, Filipe David Manana fdman...@gmail.com wrote:

 On Mon, Aug 11, 2014 at 4:07 AM, Chris Murphy li...@colorremedies.com 
 wrote:


 Can you try the following patch and confirm if it helps?
 https://patchwork.kernel.org/patch/4705171/

 This one applies without problems, I didn't build it because I saw v4. The 
 v4 patch I get:

 + patch -p1 -F1 -s
 4 out of 4 hunks FAILED -- saving rejects to file fs/btrfs/send.c.rej

 Should v4 alone be applied over 3.16.0? Or each version in succession?

 Alone. How did you try to apply it to 3.16?

 rpmbuild
 https://fedoraproject.org/wiki/Building_a_custom_kernel

 Gist is, save the patch, create patch filename entry into kernel.spec, then 
 run rpmbuild. rpmbuild uses 'patch -p1 -F1 -s' to apply the patch. The v1 
 patch applies, as does Liu Bo's patch from July 29 Btrfs: fix regression of 
 btrfs device replace which I was also going to test. But the v4 patch isn't 
 applying.

 Try cd source_dir  git
 am patchfile if you didn't (e.g. you used patch command directly).

 I'd kinda prefer to build an rpm since I need to test it on baremetal for 
 this bug, and a VM for the device replace bug.

Sorry, I don't know anything about fedora's way of kernel patching.
Either way, it seems the problem is simple to solve:

git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
git checkout v3.16
git am /path/to/my_patch_file
git diff HEAD^..HEAD  /tmp/diff

The resulting patch file [1] /tmp/diff then applies cleanly with
patch -p1 -F1 -s

https://friendpaste.com/Bgwdjk31P3pZHtArr341G



 Chris Murphy



-- 
Filipe David Manana,

Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] btrfs send/receive, page allocation failure

2014-08-11 Thread Chris Murphy

On Aug 11, 2014, at 10:04 AM, Filipe David Manana fdman...@gmail.com wrote:

 On Mon, Aug 11, 2014 at 4:55 PM, Chris Murphy li...@colorremedies.com wrote:
 
 On Aug 11, 2014, at 2:46 AM, Filipe David Manana fdman...@gmail.com wrote:
 
 On Mon, Aug 11, 2014 at 4:07 AM, Chris Murphy li...@colorremedies.com 
 wrote:
 
 
 Can you try the following patch and confirm if it helps?
 https://patchwork.kernel.org/patch/4705171/
 
 This one applies without problems, I didn't build it because I saw v4. The 
 v4 patch I get:
 
 + patch -p1 -F1 -s
 4 out of 4 hunks FAILED -- saving rejects to file fs/btrfs/send.c.rej
 
 Should v4 alone be applied over 3.16.0? Or each version in succession?
 
 Alone. How did you try to apply it to 3.16?
 
 rpmbuild
 https://fedoraproject.org/wiki/Building_a_custom_kernel
 
 Gist is, save the patch, create patch filename entry into kernel.spec, then 
 run rpmbuild. rpmbuild uses 'patch -p1 -F1 -s' to apply the patch. The v1 
 patch applies, as does Liu Bo's patch from July 29 Btrfs: fix regression of 
 btrfs device replace which I was also going to test. But the v4 patch isn't 
 applying.
 
 Try cd source_dir  git
 am patchfile if you didn't (e.g. you used patch command directly).
 
 I'd kinda prefer to build an rpm since I need to test it on baremetal for 
 this bug, and a VM for the device replace bug.
 
 Sorry, I don't know anything about fedora's way of kernel patching.
 Either way, it seems the problem is simple to solve:
 
 git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
 git checkout v3.16
 git am /path/to/my_patch_file
 git diff HEAD^..HEAD  /tmp/diff
 
 The resulting patch file [1] /tmp/diff then applies cleanly with
 patch -p1 -F1 -s
 
 https://friendpaste.com/Bgwdjk31P3pZHtArr341G

OK that friendpaste is completely different than the [PATCH v4] email.

# from above URL
diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index 6528aa6..95891c0 100644

# from [PATCH v4] email
diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index 3c63b29..b29fc5c 100644

The lines numbers are all completely different also. I'll try the patch from 
the above URL.


Chris Murphy--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] btrfs send/receive, page allocation failure

2014-08-11 Thread lists

On 2014-08-11 10:25, Chris Murphy wrote:
On Aug 11, 2014, at 10:04 AM, Filipe David Manana fdman...@gmail.com 
wrote:


On Mon, Aug 11, 2014 at 4:55 PM, Chris Murphy 
li...@colorremedies.com wrote:


On Aug 11, 2014, at 2:46 AM, Filipe David Manana fdman...@gmail.com 
wrote:


On Mon, Aug 11, 2014 at 4:07 AM, Chris Murphy 
li...@colorremedies.com wrote:



Can you try the following patch and confirm if it helps?
https://patchwork.kernel.org/patch/4705171/


This one applies without problems, I didn't build it because I saw 
v4. The v4 patch I get:


+ patch -p1 -F1 -s
4 out of 4 hunks FAILED -- saving rejects to file 
fs/btrfs/send.c.rej


Should v4 alone be applied over 3.16.0? Or each version in 
succession?


Alone. How did you try to apply it to 3.16?


rpmbuild
https://fedoraproject.org/wiki/Building_a_custom_kernel

Gist is, save the patch, create patch filename entry into 
kernel.spec, then run rpmbuild. rpmbuild uses 'patch -p1 -F1 -s' to 
apply the patch. The v1 patch applies, as does Liu Bo's patch from 
July 29 Btrfs: fix regression of btrfs device replace which I was 
also going to test. But the v4 patch isn't applying.



Try cd source_dir  git
am patchfile if you didn't (e.g. you used patch command directly).


I'd kinda prefer to build an rpm since I need to test it on baremetal 
for this bug, and a VM for the device replace bug.


Sorry, I don't know anything about fedora's way of kernel patching.
Either way, it seems the problem is simple to solve:

git clone 
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

git checkout v3.16
git am /path/to/my_patch_file
git diff HEAD^..HEAD  /tmp/diff

The resulting patch file [1] /tmp/diff then applies cleanly with
patch -p1 -F1 -s

https://friendpaste.com/Bgwdjk31P3pZHtArr341G


OK that friendpaste is completely different than the [PATCH v4] email.

# from above URL
diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index 6528aa6..95891c0 100644

# from [PATCH v4] email
diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index 3c63b29..b29fc5c 100644

The lines numbers are all completely different also. I'll try the
patch from the above URL.


The above friendpaste URL patch has applied, and I'm now building.


Chris Murphy
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] btrfs-progs: random fixes of btrfs-filesystem documentation

2014-08-11 Thread Eric Sandeen
On 8/11/14, 2:11 AM, Satoru Takeuchi wrote:
 From: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com
 
  - Simplify and unify the description of both man and usage.
  - Fix to show -m and -d is not exclusive
with path|uuid|device|label.
  - Add the description about short options for --mounted and
--all-devices, -m and -d respectively.
  - Move the descriptions of options to Options section.
 
 Signed-off-by: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com
 
 ---
  Documentation/btrfs-filesystem.txt | 22 ++
  cmds-filesystem.c  | 15 ++-
  2 files changed, 24 insertions(+), 13 deletions(-)
 
 diff --git a/Documentation/btrfs-filesystem.txt 
 b/Documentation/btrfs-filesystem.txt
 index c9c0b00..fe68496 100644
 --- a/Documentation/btrfs-filesystem.txt
 +++ b/Documentation/btrfs-filesystem.txt
 @@ -20,15 +20,21 @@ SUBCOMMAND
  *df* path [path...]::
  Show space usage information for a mount point.
  
 -*show* [--mounted|--all-devices|path|uuid|device|label]::
 -Show the btrfs filesystem with some additional info.
 +*show* [-d|-m] [path|uuid|device|label]::
 +Show the structure of btrfs filesystem(s).
  +
 -If no option nor path|uuid|device|label is passed, btrfs shows
 -information of all the btrfs filesystem both mounted and unmounted.
 -If '--mounted' is passed, it would probe btrfs kernel to list mounted btrfs
 -filesystem(s);
 -If '--all-devices' is passed, all the devices under /dev are scanned;
 -otherwise the devices list is extracted from the /proc/partitions file.

 +If none of 'path|uuid|device|label' is passed, btrfs shows
 +information of all the btrfs filesystems both mounted and unmounted.

that doesn't seem quite correct; 

# btrfs filesystem show -m

does not specify 'path|uuid|device|label' but it only shows
mounted filesystems, not all filesystems.

As I understand it, the -d and -m options control how the command
finds devices; the 'path|uuid|device|label' argument is
used as a filter for what is found.

 ++
 +The show command finds btrfs filesystems by scanning all the devices
 +in /proc/partitions by default.

I think I would document it something like this:

show [-m|-d] [path|uuid|device|label]
Show the structure of btrfs filesystem(s).

By default, the show command scans all devices found in 
/proc/partitions.   
If [-d|--all-devices] is specified, all devices found under /dev are 
scanned.
If [-m|--mounted] is specified, only mounted (btrfs?) devices are 
scanned.

By default, the structure of all discovered filesystems is shown.
If any one of [path|uuid|device|label] is specified, only 
filesystems
matching that identifier are shown.

(What seems to be missing, though, is why would the user ever choose to use 
'-d?')

-Eric

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] btrfs-progs: random fixes of btrfs-filesystem documentation

2014-08-11 Thread Eric Sandeen
On 8/11/14, 10:05 AM, Eric Sandeen wrote:
 On 8/11/14, 2:11 AM, Satoru Takeuchi wrote:
 From: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com

  - Simplify and unify the description of both man and usage.
  - Fix to show -m and -d is not exclusive
with path|uuid|device|label.
  - Add the description about short options for --mounted and
--all-devices, -m and -d respectively.
  - Move the descriptions of options to Options section.

 Signed-off-by: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com

 ---
  Documentation/btrfs-filesystem.txt | 22 ++
  cmds-filesystem.c  | 15 ++-
  2 files changed, 24 insertions(+), 13 deletions(-)

 diff --git a/Documentation/btrfs-filesystem.txt 
 b/Documentation/btrfs-filesystem.txt
 index c9c0b00..fe68496 100644
 --- a/Documentation/btrfs-filesystem.txt
 +++ b/Documentation/btrfs-filesystem.txt
 @@ -20,15 +20,21 @@ SUBCOMMAND
  *df* path [path...]::
  Show space usage information for a mount point.
  
 -*show* [--mounted|--all-devices|path|uuid|device|label]::
 -Show the btrfs filesystem with some additional info.
 +*show* [-d|-m] [path|uuid|device|label]::
 +Show the structure of btrfs filesystem(s).
  +
 -If no option nor path|uuid|device|label is passed, btrfs shows
 -information of all the btrfs filesystem both mounted and unmounted.
 -If '--mounted' is passed, it would probe btrfs kernel to list mounted btrfs
 -filesystem(s);
 -If '--all-devices' is passed, all the devices under /dev are scanned;
 -otherwise the devices list is extracted from the /proc/partitions file.
 
 +If none of 'path|uuid|device|label' is passed, btrfs shows
 +information of all the btrfs filesystems both mounted and unmounted.
 
 that doesn't seem quite correct; 
 
 # btrfs filesystem show -m
 
 does not specify 'path|uuid|device|label' but it only shows
 mounted filesystems, not all filesystems.
 
 As I understand it, the -d and -m options control how the command
 finds devices; the 'path|uuid|device|label' argument is
 used as a filter for what is found.
 
 ++
 +The show command finds btrfs filesystems by scanning all the devices
 +in /proc/partitions by default.
 
 I think I would document it something like this:
 
 show [-m|-d] [path|uuid|device|label]
   Show the structure of btrfs filesystem(s).
 
   By default, the show command scans all devices found in 
 /proc/partitions.   
   If [-d|--all-devices] is specified, all devices found under /dev are 
 scanned.
   If [-m|--mounted] is specified, only mounted (btrfs?) devices are 
 scanned.
 
   By default, the structure of all discovered filesystems is shown.
   If any one of [path|uuid|device|label] is specified, only 
 filesystems
   matching that identifier are shown.
 
 (What seems to be missing, though, is why would the user ever choose to use 
 '-d?')

Incidentally, there is some strange behavior here when looking for multiple 
filesystems which match.

Make 2 filesystems w/ the same label:

[root@bp-05 tmp]# btrfs filesystem label /dev/sdc1 testlabel2
[root@bp-05 tmp]# btrfs filesystem label /dev/sdc5 testlabel2

Show matching filesytems:

[root@bp-05 tmp]# btrfs filesystem show testlabel2
Label: 'testlabel2'  uuid: 8c6ec835-5628-439b-9749-d92f62573ce8
Total devices 1 FS bytes used 112.00KiB
devid1 size 30.00GiB used 2.04GiB path /dev/sdc5

Label: 'testlabel2'  uuid: a43cd507-02a2-46d2-a754-322cb7bdc346
Total devices 1 FS bytes used 384.00KiB
devid1 size 30.00GiB used 2.04GiB path /dev/sdc1

Btrfs v3.14.2

That works fine, but if one is mounted:

[root@bp-05 tmp]# mount /dev/sdc1 /mnt/test

only the mounted filesystem is shown:

[root@bp-05 tmp]# btrfs filesystem show testlabel2
Label: 'testlabel2'  uuid: a43cd507-02a2-46d2-a754-322cb7bdc346
Total devices 1 FS bytes used 384.00KiB
devid1 size 30.00GiB used 2.04GiB path /dev/sdc1

Btrfs v3.14.2

That's unexpected.

Mount the other fs, and both are shown again:

[root@bp-05 tmp]# mount /dev/sdc5 /mnt/scratch
[root@bp-05 tmp]# btrfs filesystem show testlabel2
Label: 'testlabel2'  uuid: a43cd507-02a2-46d2-a754-322cb7bdc346
Total devices 1 FS bytes used 384.00KiB
devid1 size 30.00GiB used 2.04GiB path /dev/sdc1

Label: 'testlabel2'  uuid: 8c6ec835-5628-439b-9749-d92f62573ce8
Total devices 1 FS bytes used 384.00KiB
devid1 size 30.00GiB used 2.04GiB path /dev/sdc5

Btrfs v3.14.2


-Eric
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Announcement: buttersink - like rsync for btrfs snapshots

2014-08-11 Thread Ames Cornish
I've written a utility to help me with using btrfs send and receive for 
backups or other synchronization, and I'd love to get feedback on it.  

As of this release, buttersink will synchronize a set of read-only snapshots 
in a btrfs filesystem to an Amazon S3 bucket, and vice-versa.  It 
intelligently picks parent snapshots to diff from, so that a minimal amount 
of data needs to be sent over the wire and stored in the backend.  

The utility is on PyPi as buttersink, and the GitHub page is here: 
  https://github.com/AmesCornish/buttersink.  

Thanks in advance for any feedback!

- Ames


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Announcement: buttersink - like rsync for btrfs snapshots

2014-08-11 Thread Jim Salter

How has it been for reliability?

I wrote a btrsync app a while back, and the app /itself/ worked fine, 
but the btrfs send / btrfs receive itself proved problematic.  Since 
btrfs would keep a partial receive - with no easy way to tell whether a 
receive WAS partial or full - I would inevitably end up with interrupted 
sends causing a problem that couldn't be resolved without manually 
deleting snapshots on the target end haphazardly until I nailed the 
incomplete one.


On 08/11/2014 01:49 PM, Ames Cornish wrote:

I've written a utility to help me with using btrfs send and receive for
backups or other synchronization, and I'd love to get feedback on it.

As of this release, buttersink will synchronize a set of read-only snapshots
in a btrfs filesystem to an Amazon S3 bucket, and vice-versa.  It
intelligently picks parent snapshots to diff from, so that a minimal amount
of data needs to be sent over the wire and stored in the backend.

The utility is on PyPi as buttersink, and the GitHub page is here:
   https://github.com/AmesCornish/buttersink.

Thanks in advance for any feedback!

- Ames


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Announcement: buttersink - like rsync for btrfs snapshots

2014-08-11 Thread Jim Salter
To any core btrfs devs who are listening and care - the unreliability of 
btrfs send/receive is IMO the single biggest roadblock to adoption of 
btrfs as a serious next-gen FS.


I can live with occasional corner-case performance issues, I can even 
live with (very) occasional filesystem corruption... IF I can rely on 
replication to keep my data safe on another box.  Without the 
replication, there's just no reasonable case to be made to replace ZFS.


On 08/11/2014 02:05 PM, Ames Cornish wrote:

Jim,

btrfs send reliability has been an issue, though I've been able to 
successfully use it for my backups.  buttersink usually detects the 
errors and will either move the destination snapshot to mark it as 
partial/failed (for btrfs), or cancel and delete the partial upload 
(for S3).  I've also found that it helps to wait a while (e.g. 30 
seconds) after any volume deletes before trying the send/sync.  I hope 
btrfs-progs will get more reliable, too.


- Ames


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Announcement: buttersink - like rsync for btrfs snapshots

2014-08-11 Thread Ames Cornish
Jim,

btrfs send reliability has been an issue, though I've been able to
successfully use it for my backups.  buttersink usually detects the
errors and will either move the destination snapshot to mark it as
partial/failed (for btrfs), or cancel and delete the partial upload
(for S3).  I've also found that it helps to wait a while (e.g. 30
seconds) after any volume deletes before trying the send/sync.  I hope
btrfs-progs will get more reliable, too.

- Ames
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Large files, nodatacow and fragmentation

2014-08-11 Thread G. Richard Bellamy
I've been playing with btrfs as a backing store for my KVM images.

I've used 'chattr +C' on the directory where those images are stored.

You can see my recipe below [1]. I've read the gotchas found here [2]

I'm having continuing performance issues inside the Guest VM that is
created inside the btrfs subvolume, using a qcow2 format. I'm having a
hard time determining whether the issues are related to KVM or btrfs,
or if this is even a reasonable topic of discussion.

I've seen the comments on this list saying that if I want a COW
filesystem with sparse files, that I'd be better off with ZFS. I'd
like to use an in-tree COW filesystem, but if it's just not gonna
happen yet with btrfs, I guess that's just the way it is.

That being said, how would I determine what the root issue is?
Specifically, the qcow2 file in question seems to have increasing
fragmentation, even with the No_COW attr.

[1]
$ mkfs.btrfs -m raid10 -d raid10 /dev/sda /dev/sdb /dev/sdc /dev/sdd
$ mount /dev/sda /mnt
$ cd /mnt
$ btrfs create subvolume __data
$ btrfs create subvolume __data/libvirt
$ cd /
$ umount /mnt
$ mount /dev/sda /var/lib/libvirt
$ chattr +C /var/lib/libvirt/images
$ cp 
/run/media/rbellamy/433acf1d-a1a4-4596-a6a7-005e643b24e0/libvirt/images/atlas.qcow2
/var/lib/libvirt/images/
$ filefrag /var/lib/libvirt/images/atlas.qcow2
/var/lib/libvirt/images/atlas.qcow2: 0 extents found
[START UP THE VM - DO SOME THINGS]
$ filefrag /var/lib/libvirt/images/atlas.qcow2
/var/lib/libvirt/images/atlas.qcow2: 12236 extents found
[START UP THE VM - DO SOME THINGS]
$ filefrag /var/lib/libvirt/images/atlas.qcow2
/var/lib/libvirt/images/atlas.qcow2: 34988 extents found

[2]
https://btrfs.wiki.kernel.org/index.php/Gotchas
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Large files, nodatacow and fragmentation

2014-08-11 Thread Roman Mamedov
On Mon, 11 Aug 2014 11:36:46 -0700
G. Richard Bellamy rbell...@pteradigm.com wrote:

 I've been playing with btrfs as a backing store for my KVM images.
 
 I've used 'chattr +C' on the directory where those images are stored.
 
 You can see my recipe below [1]. I've read the gotchas found here [2]
 
 I'm having continuing performance issues inside the Guest VM that is
 created inside the btrfs subvolume, using a qcow2 format. I'm having a
 hard time determining whether the issues are related to KVM or btrfs,
 or if this is even a reasonable topic of discussion.
 
 I've seen the comments on this list saying that if I want a COW
 filesystem with sparse files, that I'd be better off with ZFS. I'd
 like to use an in-tree COW filesystem, but if it's just not gonna
 happen yet with btrfs, I guess that's just the way it is.
 
 That being said, how would I determine what the root issue is?
 Specifically, the qcow2 file in question seems to have increasing
 fragmentation, even with the No_COW attr.

First of all, why do you require a COW filesystem in the first place... if all
you do is just use it in a NoCOW mode?

Second, why qcow2? It can also have internal fragmentation which is unlikely to
do anything good for performance.

Try RAW format images; to reduce the space requirements, with the latest
Qemu/KVM you can pass-through TRIM command from inside the VM guest (at least
in the IDE controller mode) so that the backing filesystem will unmap areas
that are no longer in use inside the VM, in effect re-sparsifying the image.
This is VERY nifty. But yeah this can cause some fragmentation even with NoCOW.

In my personal use case NoCOW is only utilized partly, because all subvolumes
with running VMs are being snapshotted about every 30 minutes, and those
snapshots are kept for two weeks. The performance is passable; at least when
using KVM's cache=writeback mode (or less safer ones).

-- 
With respect,
Roman


signature.asc
Description: PGP signature


Re: Ideas for a feature implementation

2014-08-11 Thread Chris Murphy

On Aug 10, 2014, at 8:53 PM, Austin S Hemmelgarn ahferro...@gmail.com wrote:

 
 Another thing that isn't listed there, that I would personally love to
 see is support for secure file deletion.  To be truly secure though,
 this would need to hook into the COW logic so that files marked for
 secure deletion can't be reflinked (maybe make the automatically NOCOW
 instead, and don't allow snapshots?), and when they get written to, the
 blocks that get COW'ed have the old block overwritten.

If the file is reflinked or snapshot, then it can it be secure deleted? Because 
what does it mean to secure delete a file when there's a completely independent 
file pointing to the same physical blocks? What if someone else owns that 
independent file? Does the reflink copy get rm'd as well? Or does the file 
remain, but its blocks are zero'd/corrupted?

For SSDs, whether it's an overwrite or an FITRIM ioctl it's an open question 
when the data is actually irretrievable. It may be seconds, but could be much 
longer (hours?) so I'm not sure if it's useful. On HDD's using SMR it's not 
necessarily a given an overwrite will work there either.


Chris Murphy--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix compressed write corruption on enospc

2014-08-11 Thread Chris Mason
On 08/10/2014 10:55 AM, Liu Bo wrote:
 On Thu, Aug 07, 2014 at 10:02:15AM -0400, Chris Mason wrote:


 On 08/07/2014 04:20 AM, Miao Xie wrote:
 On Thu, 7 Aug 2014 15:50:30 +0800, Liu Bo wrote:
 [90496.156016] kworker/u8:14   D 880044e38540 0 21050  2 
 0x
 [90496.157683] Workqueue: btrfs-delalloc normal_work_helper [btrfs]
 [90496.159320]  88022880f990 0002 880407f649b0 
 88022880ffd8
 [90496.160997]  880044e38000 00013040 880044e38000 
 7fff
 [90496.162686]  880301383aa0 0002 814705d0 
 880301383a98
 [90496.164360] Call Trace:
 [90496.166028]  [814705d0] ? michael_mic.part.6+0x21/0x21
 [90496.167854]  [81470fd0] schedule+0x64/0x66
 [90496.169574]  [814705ff] schedule_timeout+0x2f/0x114
 [90496.171221]  [8106479a] ? wake_up_process+0x2f/0x32
 [90496.172867]  [81062c3b] ? get_parent_ip+0xd/0x3c
 [90496.174472]  [81062ce5] ? preempt_count_add+0x7b/0x8e
 [90496.176053]  [814717f3] __wait_for_common+0x11e/0x163
 [90496.177619]  [814717f3] ? __wait_for_common+0x11e/0x163
 [90496.179173]  [810647aa] ? wake_up_state+0xd/0xd
 [90496.180728]  [81471857] wait_for_completion+0x1f/0x21
 [90496.182285]  [c044e3b3] 
 btrfs_async_run_delayed_refs+0xbf/0xd9 [btrfs]
 [90496.183833]  [c04624e1] __btrfs_end_transaction+0x2b6/0x2ec 
 [btrfs]
 [90496.185380]  [c0462522] btrfs_end_transaction+0xb/0xd 
 [btrfs]
 [90496.186940]  [c0451742] find_free_extent+0x8a9/0x976 [btrfs]
 [90496.189464]  [c0451990] btrfs_reserve_extent+0x6f/0x119 
 [btrfs]
 [90496.191326]  [c0466b45] cow_file_range+0x1a6/0x377 [btrfs]
 [90496.193080]  [c047adc4] ? 
 extent_write_locked_range+0x10c/0x11e [btrfs]
 [90496.194659]  [c04677e4] 
 submit_compressed_extents+0x100/0x412 [btrfs]
 [90496.196225]  [8120e344] ? debug_smp_processor_id+0x17/0x19
 [90496.197776]  [c0467b78] async_cow_submit+0x82/0x87 [btrfs]
 [90496.199383]  [c048644b] normal_work_helper+0x153/0x224 
 [btrfs]
 [90496.200944]  [81052d8c] process_one_work+0x16f/0x2b8
 [90496.202483]  [81053636] worker_thread+0x27b/0x32e
 [90496.204000]  [810533bb] ? cancel_delayed_work_sync+0x10/0x10
 [90496.205514]  [81058012] kthread+0xb2/0xba
 [90496.207040]  [8147] ? ap_handle_dropped_data+0xf/0xc8
 [90496.208565]  [81057f60] ? __kthread_parkme+0x62/0x62
 [90496.210096]  [81473f6c] ret_from_fork+0x7c/0xb0
 [90496.211618]  [81057f60] ? __kthread_parkme+0x62/0x62


 Ok, this should explain the hang.  submit_compressed_extents is calling
 cow_file_range with a locked page.

 cow_file_range is trying to find a free extent and in the process is
 calling btrfs_end_transaction, which is running the async delayed refs,
 which is trying to write dirty pages, which is waiting for your locked
 page.

 I should be able to reproduce this ;)

 This part of the trace is relatively new because Liu Bo's patch made us
 redirty the pages, making it more likely that we'd try to write them
 during commit.

 But, at the end of the day we have a fundamental deadlock with
 committing a transaction while holding a locked page from an ordered file.

 For now, I'm ripping out the strict ordered file and going back to a
 best-effort filemap_flush like ext4 is using.

 I think I've figured the deadlock out, this is obviously a race case, 
 really
 hard to reproduce :-(

 So it turns out to be related to workqueues -- now a kthread can process
 work_struct queued in different workqueues, so we can explain the deadlock 
 as
 such,

 (1)
 btrfs-delalloc workqueue gets a compressed extent to process with all 
 its pages
 locked during this, and it runs into read free space cache inode, and then 
 wait
 on lock_page().

 (2)
 Reading that free space cache inode comes to submit part, and we have a
 indirect twice endio way for it, with the first endio we come to 
 end_workqueue_bio()
 and queue a work in btrfs-endio-meta workqueue, and it will run the real
 endio() for us, but ONLY when it's processed.

 So the problem is a kthread can serve several workqueues, which means
 works in btrfs-endio-meta workqueues and works in btrfs-flush_delalloc
 workqueues can be in the same processing list of a kthread.  When
 btrfs-flush_delalloc waits for the compressed page and btrfs-endio-meta
 comes after it, it hangs.

 I don't think it is right. All the btrfs workqueue has RECLAIM flag, which 
 means
 each btrfs workqueue has its own rescue worker. So the problem you said 
 should
 not happen.

 Right, I traded some emails with Tejun about this and spent a few days
 trying to prove the workqueues were doing the wrong thing.   It will end
 up spawning another worker thread for the new work, and it won't get
 queued up behind the existing thread.

 If both work items went to the same workqueue, you'd definitely be right.

Re: [BUG] btrfs send/receive, page allocation failure

2014-08-11 Thread Chris Murphy

On Aug 11, 2014, at 10:41 AM, li...@colorremedies.com wrote:
 
 https://friendpaste.com/Bgwdjk31P3pZHtArr341G
 OK that friendpaste is completely different than the [PATCH v4] email.
 # from above URL
 diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
 index 6528aa6..95891c0 100644


I can confirm that the above patch fixes the reported bug. 

I couldn't get this one to apply so it's not tested:
http://www.spinics.net/lists/linux-btrfs/msg36556.html


Chris Murphy
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] btrfs send/receive, page allocation failure

2014-08-11 Thread Filipe David Manana
On Mon, Aug 11, 2014 at 9:30 PM, Chris Murphy li...@colorremedies.com wrote:

 On Aug 11, 2014, at 10:41 AM, li...@colorremedies.com wrote:

 https://friendpaste.com/Bgwdjk31P3pZHtArr341G
 OK that friendpaste is completely different than the [PATCH v4] email.
 # from above URL
 diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
 index 6528aa6..95891c0 100644


 I can confirm that the above patch fixes the reported bug.

 I couldn't get this one to apply so it's not tested:
 http://www.spinics.net/lists/linux-btrfs/msg36556.html

It's exactly the same as the diff you tried. The git am command is
able to deal with any fuzz while the patch command can't (or not
always at least). That patch is based on the integration branch, while
the diff I pasted (and showed you to generate it) is for the v3.16 tag
from linus' repository.

Thanks for testing and reporting back.



 Chris Murphy



-- 
Filipe David Manana,

Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 40TB volume taking over 16 hours to mount, any ideas?

2014-08-11 Thread Jose Ildefonso Camargo Tolosa
As I hate when a thread is left hanging, you deserve to know what
happened in the end, you likely already guessed, but anyway: I nuked
the filesystem, and started over.

After some internal discussion in the company, we decided to move to
ZFS for now.  However, we will keep an eye on btrfs, and will likely
deploy it to some smaller system for further testing.

Thanks you all for your help!

Sincerely,

Ildefonso
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Large files, nodatacow and fragmentation

2014-08-11 Thread G. Richard Bellamy
On Mon, Aug 11, 2014 at 12:14 PM, Roman Mamedov r...@romanrm.net wrote:

 First of all, why do you require a COW filesystem in the first place... if all
 you do is just use it in a NoCOW mode?

 Second, why qcow2? It can also have internal fragmentation which is unlikely 
 to
 do anything good for performance.

Both great questions. I'm experimenting with btrfs, and the various
permutations of
btrfs with KVM.

So, why btrfs vs lvm or ext4:
1. Because nocow isn't all I'm doing with that filesystem.
2. I like the way btrfs subvolumes work, vs lvm. I can have the nocow
files in one
subvolume, and still get great snapshot performance out of others.
3. I get the performance of a raid10 without the lvm management overhead. Online
rebalancing. Easy online resizing.
4. And frankly, I just kinda want to make it work.


 Try RAW format images; to reduce the space requirements, with the latest
 Qemu/KVM you can pass-through TRIM command from inside the VM guest (at least
 in the IDE controller mode) so that the backing filesystem will unmap areas
 that are no longer in use inside the VM, in effect re-sparsifying the image.
 This is VERY nifty. But yeah this can cause some fragmentation even with 
 NoCOW.

 In my personal use case NoCOW is only utilized partly, because all subvolumes
 with running VMs are being snapshotted about every 30 minutes, and those
 snapshots are kept for two weeks. The performance is passable; at least when
 using KVM's cache=writeback mode (or less safer ones).

I've done my reading of qcow2 vs raw and that indicated that while
there is better
performance using raw, it's not significant enough to bypass the
ability to take a qemu
snapshot. I've not done the analysis myself, so I could be reading things wrong.

There's a great thread, Are nocow files snapshot-aware? [1]. My take
from that reading
is that doing a btrfs snapshot of a nocow file seems like it's
reasonable on a semi-regular
basis, but DON'T do it every 30 seconds.

Also that whole thread is predicated on the idea that your nocow files
are themselves
managed by a process/system that can read and write to them
atomically, thus I decided
against using the raw format.


 --
 With respect,
 Roman

Thanks Roman. But really we haven't addressed my original question,
which is - how would
I determine the root cause of the fragmentation in this nocow file on
top of a btrfs subvolume?

[1] http://www.spinics.net/lists/linux-btrfs/msg31341.html

Kind Regards,
Richard
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix regression of btrfs device replace

2014-08-11 Thread Chris Murphy

On Jul 29, 2014, at 5:09 AM, Liu Bo bo.li@oracle.com wrote:

 Commit 49c6f736f34f901117c20960ebd7d5e60f12fcac(
 btrfs: dev replace should replace the sysfs entry) added the missing sysfs 
 entry
 in the process of device replace, but didn't take missing devices into 
 account,
 so now we have
 
 BUG: unable to handle kernel NULL pointer dereference at 0088
 IP: [a0268551] btrfs_kobj_rm_device+0x21/0x40 [btrfs]
 ...
 
 To reproduce it,
 1. mkfs.btrfs -f disk1 disk2
 2. mkfs.ext4 disk1
 3. mount disk2 /mnt -odegraded
 4. btrfs replace start -B 1 disk3 /mnt
 --
 
 This fixes the problem.
 
 Reported-by: Chris Murphy li...@colorremedies.com
 Signed-off-by: Liu Bo bo.li@oracle.com
 ---
 fs/btrfs/sysfs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
 
 diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
 index 7869936..12e5355 100644
 --- a/fs/btrfs/sysfs.c
 +++ b/fs/btrfs/sysfs.c
 @@ -614,7 +614,7 @@ int btrfs_kobj_rm_device(struct btrfs_fs_info *fs_info,
   if (!fs_info-device_dir_kobj)
   return -EINVAL;
 
 - if (one_device) {
 + if (one_device  one_device-bdev) {
   disk = one_device-bdev-bd_part;
   disk_kobj = part_to_dev(disk)-kobj;
 


Applied to 3.16.0 and tested, problem is fixed.


Chris Murphy

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Large files, nodatacow and fragmentation

2014-08-11 Thread Chris Murphy

On Aug 11, 2014, at 1:14 PM, Roman Mamedov r...@romanrm.net wrote:


 
 Second, why qcow2? It can also have internal fragmentation which is unlikely 
 to
 do anything good for performance.

It really depends on what version of libvirt and qemu-image you've got. I did 
some testing during Fedora 20 prior to release, and the best results for my 
configuration (a laptop with an HDD at that time): btrfs host, +C qcow2, btrfs 
guest, both 16KB leaf size, and the drive pointing to the qcow2 file with cache 
policy set to unsafe. And even when obliterating the VM while writing data, I 
never lost the guest Btrfs file system. Not that I recommend it, the cache 
policy is unsafe after all. I did lose some data but it was limited to commit 
time. We're not talking huge differences, the metric I was using was installing 
Fedora 20 based on installer log start/stop time for doing the unattended 
portion of the install. It also matters somewhat to pre-allocate metadata when 
creating the qcow2 file.

I also tested XFS on XFS, ext4 on ext4, also in qcow2. And also on raw images. 
And also on LV's. I'd think the LV would have been faster since it completely 
eliminates one of the file systems (there is no host fs).

Anyway, what I determined was the only way to know is to actually test your 
workload, or a good approximation of it, with various configurations.

And another test is LVM thinp LV's once libvirt has support for using them 
(which may already have happened, I haven't revisted this since Oct 2013 
testing), because those snapshots should be as usable as Btrfs snapshots, 
unlike conventional LVM snapshots which are slow and need explicit 
preallocation.


Chris Murphy

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] btrfs-progs: random fixes of btrfs-filesystem documentation

2014-08-11 Thread Satoru Takeuchi
Hi Eric,

(2014/08/12 2:05), Eric Sandeen wrote:
 On 8/11/14, 2:11 AM, Satoru Takeuchi wrote:
 From: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com

   - Simplify and unify the description of both man and usage.
   - Fix to show -m and -d is not exclusive
 with path|uuid|device|label.
   - Add the description about short options for --mounted and
 --all-devices, -m and -d respectively.
   - Move the descriptions of options to Options section.

 Signed-off-by: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com

 ---
   Documentation/btrfs-filesystem.txt | 22 ++
   cmds-filesystem.c  | 15 ++-
   2 files changed, 24 insertions(+), 13 deletions(-)

 diff --git a/Documentation/btrfs-filesystem.txt 
 b/Documentation/btrfs-filesystem.txt
 index c9c0b00..fe68496 100644
 --- a/Documentation/btrfs-filesystem.txt
 +++ b/Documentation/btrfs-filesystem.txt
 @@ -20,15 +20,21 @@ SUBCOMMAND
   *df* path [path...]::
   Show space usage information for a mount point.
   
 -*show* [--mounted|--all-devices|path|uuid|device|label]::
 -Show the btrfs filesystem with some additional info.
 +*show* [-d|-m] [path|uuid|device|label]::
 +Show the structure of btrfs filesystem(s).
   +
 -If no option nor path|uuid|device|label is passed, btrfs shows
 -information of all the btrfs filesystem both mounted and unmounted.
 -If '--mounted' is passed, it would probe btrfs kernel to list mounted btrfs
 -filesystem(s);
 -If '--all-devices' is passed, all the devices under /dev are scanned;
 -otherwise the devices list is extracted from the /proc/partitions file.
 
 +If none of 'path|uuid|device|label' is passed, btrfs shows
 +information of all the btrfs filesystems both mounted and unmounted.
 
 that doesn't seem quite correct;
 
 # btrfs filesystem show -m
 
 does not specify 'path|uuid|device|label' but it only shows
 mounted filesystems, not all filesystems.

Oh, I forgot to add [ and ]. 

 
 As I understand it, the -d and -m options control how the command
 finds devices; the 'path|uuid|device|label' argument is
 used as a filter for what is found.

Yes, my understanding is so too.

 
 ++
 +The show command finds btrfs filesystems by scanning all the devices
 +in /proc/partitions by default.
 
 I think I would document it something like this:
 
 show [-m|-d] [path|uuid|device|label]
   Show the structure of btrfs filesystem(s).
 
   By default, the show command scans all devices found in 
 /proc/partitions.   
   If [-d|--all-devices] is specified, all devices found under /dev are 
 scanned.
   If [-m|--mounted] is specified, only mounted (btrfs?) devices are 
 scanned.
 
   By default, the structure of all discovered filesystems is shown.
   If any one of [path|uuid|device|label] is specified, only 
 filesystems
   matching that identifier are shown.

OK, I'll fix my patch based on your comment.
# Of course, I'll replace (btrfs?) with something proper words.

Can I add your Signed-off-by to my v2 patch?

 
 (What seems to be missing, though, is why would the user ever choose to use 
 '-d?')

I'm not sure. I guess, for example, in large systems, -d takes too
many times for scanning all devices under /dev or something?

Thank you for your comments!

Satoru

 
 -Eric
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] btrfs-progs: random fixes of btrfs-filesystem documentation

2014-08-11 Thread Satoru Takeuchi
Hi Eric,

(2014/08/12 2:14), Eric Sandeen wrote:
 On 8/11/14, 10:05 AM, Eric Sandeen wrote:
 On 8/11/14, 2:11 AM, Satoru Takeuchi wrote:
 From: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com

   - Simplify and unify the description of both man and usage.
   - Fix to show -m and -d is not exclusive
 with path|uuid|device|label.
   - Add the description about short options for --mounted and
 --all-devices, -m and -d respectively.
   - Move the descriptions of options to Options section.

 Signed-off-by: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com

 ---
   Documentation/btrfs-filesystem.txt | 22 ++
   cmds-filesystem.c  | 15 ++-
   2 files changed, 24 insertions(+), 13 deletions(-)

 diff --git a/Documentation/btrfs-filesystem.txt 
 b/Documentation/btrfs-filesystem.txt
 index c9c0b00..fe68496 100644
 --- a/Documentation/btrfs-filesystem.txt
 +++ b/Documentation/btrfs-filesystem.txt
 @@ -20,15 +20,21 @@ SUBCOMMAND
   *df* path [path...]::
   Show space usage information for a mount point.
   
 -*show* [--mounted|--all-devices|path|uuid|device|label]::
 -Show the btrfs filesystem with some additional info.
 +*show* [-d|-m] [path|uuid|device|label]::
 +Show the structure of btrfs filesystem(s).
   +
 -If no option nor path|uuid|device|label is passed, btrfs shows
 -information of all the btrfs filesystem both mounted and unmounted.
 -If '--mounted' is passed, it would probe btrfs kernel to list mounted btrfs
 -filesystem(s);
 -If '--all-devices' is passed, all the devices under /dev are scanned;
 -otherwise the devices list is extracted from the /proc/partitions file.

 +If none of 'path|uuid|device|label' is passed, btrfs shows
 +information of all the btrfs filesystems both mounted and unmounted.

 that doesn't seem quite correct;

 # btrfs filesystem show -m

 does not specify 'path|uuid|device|label' but it only shows
 mounted filesystems, not all filesystems.

 As I understand it, the -d and -m options control how the command
 finds devices; the 'path|uuid|device|label' argument is
 used as a filter for what is found.

 ++
 +The show command finds btrfs filesystems by scanning all the devices
 +in /proc/partitions by default.

 I think I would document it something like this:

 show [-m|-d] [path|uuid|device|label]
  Show the structure of btrfs filesystem(s).

  By default, the show command scans all devices found in 
 /proc/partitions.   
  If [-d|--all-devices] is specified, all devices found under /dev are 
 scanned.
  If [-m|--mounted] is specified, only mounted (btrfs?) devices are 
 scanned.

  By default, the structure of all discovered filesystems is shown.
  If any one of [path|uuid|device|label] is specified, only 
 filesystems
  matching that identifier are shown.

 (What seems to be missing, though, is why would the user ever choose to use 
 '-d?')
 
 Incidentally, there is some strange behavior here when looking for multiple 
 filesystems which match.
 
 Make 2 filesystems w/ the same label:
 
 [root@bp-05 tmp]# btrfs filesystem label /dev/sdc1 testlabel2
 [root@bp-05 tmp]# btrfs filesystem label /dev/sdc5 testlabel2
 
 Show matching filesytems:
 
 [root@bp-05 tmp]# btrfs filesystem show testlabel2
 Label: 'testlabel2'  uuid: 8c6ec835-5628-439b-9749-d92f62573ce8
   Total devices 1 FS bytes used 112.00KiB
   devid1 size 30.00GiB used 2.04GiB path /dev/sdc5
 
 Label: 'testlabel2'  uuid: a43cd507-02a2-46d2-a754-322cb7bdc346
   Total devices 1 FS bytes used 384.00KiB
   devid1 size 30.00GiB used 2.04GiB path /dev/sdc1
 
 Btrfs v3.14.2
 
 That works fine, but if one is mounted:
 
 [root@bp-05 tmp]# mount /dev/sdc1 /mnt/test
 
 only the mounted filesystem is shown:
 
 [root@bp-05 tmp]# btrfs filesystem show testlabel2
 Label: 'testlabel2'  uuid: a43cd507-02a2-46d2-a754-322cb7bdc346
   Total devices 1 FS bytes used 384.00KiB
   devid1 size 30.00GiB used 2.04GiB path /dev/sdc1
 
 Btrfs v3.14.2
 
 That's unexpected.
 
 Mount the other fs, and both are shown again:
 
 [root@bp-05 tmp]# mount /dev/sdc5 /mnt/scratch
 [root@bp-05 tmp]# btrfs filesystem show testlabel2
 Label: 'testlabel2'  uuid: a43cd507-02a2-46d2-a754-322cb7bdc346
   Total devices 1 FS bytes used 384.00KiB
   devid1 size 30.00GiB used 2.04GiB path /dev/sdc1
 
 Label: 'testlabel2'  uuid: 8c6ec835-5628-439b-9749-d92f62573ce8
   Total devices 1 FS bytes used 384.00KiB
   devid1 size 30.00GiB used 2.04GiB path /dev/sdc5
 
 Btrfs v3.14.2

I'll dig into it. Thank you for let me know.

Satoru

 
 
 -Eric
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] btrfs-progs: random fixes of btrfs-filesystem documentation

2014-08-11 Thread Eric Sandeen


 On Aug 11, 2014, at 4:51 PM, Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com 
 wrote:
 
 Hi Eric,
 
...
 
 
 OK, I'll fix my patch based on your comment.
 # Of course, I'll replace (btrfs?) with something proper words.

I assumed it only scans btrfs but didn't know for sure :)

 Can I add your Signed-off-by to my v2 patch?

Oh, sure, if you use my text, that makes sense.

thanks,
-Eric


 
 (What seems to be missing, though, is why would the user ever choose to use 
 '-d?')
 
 I'm not sure. I guess, for example, in large systems, -d takes too
 many times for scanning all devices under /dev or something?
 
 Thank you for your comments!
 
 Satoru
 
 
 -Eric
 
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix regression of btrfs device replace

2014-08-11 Thread Satoru Takeuchi

Hi Liu,

(2014/08/12 6:41), Chris Murphy wrote:


On Jul 29, 2014, at 5:09 AM, Liu Bo bo.li@oracle.com wrote:


Commit 49c6f736f34f901117c20960ebd7d5e60f12fcac(
btrfs: dev replace should replace the sysfs entry) added the missing sysfs entry
in the process of device replace, but didn't take missing devices into account,
so now we have

BUG: unable to handle kernel NULL pointer dereference at 0088
IP: [a0268551] btrfs_kobj_rm_device+0x21/0x40 [btrfs]
...

To reproduce it,
1. mkfs.btrfs -f disk1 disk2
2. mkfs.ext4 disk1
3. mount disk2 /mnt -odegraded
4. btrfs replace start -B 1 disk3 /mnt
--

This fixes the problem.

Reported-by: Chris Murphy li...@colorremedies.com
Signed-off-by: Liu Bo bo.li@oracle.com
---
fs/btrfs/sysfs.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
index 7869936..12e5355 100644
--- a/fs/btrfs/sysfs.c
+++ b/fs/btrfs/sysfs.c
@@ -614,7 +614,7 @@ int btrfs_kobj_rm_device(struct btrfs_fs_info *fs_info,
if (!fs_info-device_dir_kobj)
return -EINVAL;

-   if (one_device) {
+   if (one_device  one_device-bdev) {
disk = one_device-bdev-bd_part;
disk_kobj = part_to_dev(disk)-kobj;




Applied to 3.16.0 and tested, problem is fixed.


Chris Murphy


Reviewed-by: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com
Tested-by: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com

I confirmed both

 - This problem happens with 3.16, and
 - This problem doesn't happen with 3.16 + your patch.

Thanks,
Satoru



--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ideas for a feature implementation

2014-08-11 Thread Austin S Hemmelgarn
On 08/11/2014 04:27 PM, Chris Murphy wrote:
 
 On Aug 10, 2014, at 8:53 PM, Austin S Hemmelgarn
 ahferro...@gmail.com wrote:
 
 
 Another thing that isn't listed there, that I would personally
 love to see is support for secure file deletion.  To be truly
 secure though, this would need to hook into the COW logic so that
 files marked for secure deletion can't be reflinked (maybe make
 the automatically NOCOW instead, and don't allow snapshots?), and
 when they get written to, the blocks that get COW'ed have the old
 block overwritten.
 
 If the file is reflinked or snapshot, then it can it be secure
 deleted? Because what does it mean to secure delete a file when
 there's a completely independent file pointing to the same physical
 blocks? What if someone else owns that independent file? Does the
 reflink copy get rm'd as well? Or does the file remain, but its
 blocks are zero'd/corrupted?
The semantics that I would expect would be that the extents can't be
reflinked, and when snapshotted the whole file gets COW'ed, and then
inherits the secure deletion flag, possibly with another flag saying
that the user can't disable the secure deletion flag.
 
 For SSDs, whether it's an overwrite or an FITRIM ioctl it's an open
 question when the data is actually irretrievable. It may be
 seconds, but could be much longer (hours?) so I'm not sure if it's
 useful. On HDD's using SMR it's not necessarily a given an
 overwrite will work there either.
By secure deletion, I don't mean make the data absolutely
unrecoverable by any means, I mean make it functionally impractical
for someone without low-level access to and/or extensive knowledge of
the hardware to recover the data; that is, more secure than simply
unlinking the file, but obviously less than (for example) the
application of thermite to the disk platters.  I'm talking the rough
equivalent of wiping the data from RAM.

Anyone who is truly security minded should be using whole disk
encryption anyway, but even then you have the data accessible from the
running OS.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Blocked tasks on 3.15.1

2014-08-11 Thread Charles Cazabon
The blocked tasks issue that got significantly worse in 3.15 -- did anything
go into 3.16 related to this?  I didn't see a single btrfs in Linus' 3.16
announcement, so I don't know whether it should be better, the same, or worse
in this respect...

I haven't seen a definite statement about this on this list, either.

Can someone more familiar with the state of development comment on this?

Charles
-- 
---
Charles Cazabon
GPL'ed software available at:   http://pyropus.ca/software/
---
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix compressed write corruption on enospc

2014-08-11 Thread Miao Xie
On Sun, 10 Aug 2014 22:55:44 +0800, Liu Bo wrote:
 This part of the trace is relatively new because Liu Bo's patch made us
 redirty the pages, making it more likely that we'd try to write them
 during commit.

 But, at the end of the day we have a fundamental deadlock with
 committing a transaction while holding a locked page from an ordered file.

 For now, I'm ripping out the strict ordered file and going back to a
 best-effort filemap_flush like ext4 is using.

 I think I've figured the deadlock out, this is obviously a race case, 
 really
 hard to reproduce :-(

 So it turns out to be related to workqueues -- now a kthread can process
 work_struct queued in different workqueues, so we can explain the deadlock 
 as
 such,

 (1)
 btrfs-delalloc workqueue gets a compressed extent to process with all 
 its pages
 locked during this, and it runs into read free space cache inode, and then 
 wait
 on lock_page().

 (2)
 Reading that free space cache inode comes to submit part, and we have a
 indirect twice endio way for it, with the first endio we come to 
 end_workqueue_bio()
 and queue a work in btrfs-endio-meta workqueue, and it will run the real
 endio() for us, but ONLY when it's processed.

 So the problem is a kthread can serve several workqueues, which means
 works in btrfs-endio-meta workqueues and works in btrfs-flush_delalloc
 workqueues can be in the same processing list of a kthread.  When
 btrfs-flush_delalloc waits for the compressed page and btrfs-endio-meta
 comes after it, it hangs.

 I don't think it is right. All the btrfs workqueue has RECLAIM flag, which 
 means
 each btrfs workqueue has its own rescue worker. So the problem you said 
 should
 not happen.

 Right, I traded some emails with Tejun about this and spent a few days
 trying to prove the workqueues were doing the wrong thing.   It will end
 up spawning another worker thread for the new work, and it won't get
 queued up behind the existing thread.

 If both work items went to the same workqueue, you'd definitely be right.

 I've got a patch to change the flush-delalloc code so we don't do the
 file writes during commit.  It seems like the only choice right now.
 
 Not the only choice any more ;)
 
 It turns out to be related to async_cow's ordered list, say we have two
 async_cow works on the wq-ordered_list, and the first work(named A) finishes 
 its
 -ordered_func() and -ordered_free(), and the second work(B) starts B's
 -ordered_func() which gets to read free space cache inode, where it queues a
 work on @endio_meta_workers, but this work happens to be the same address with
 A's work.
 
 So now the situation is,
 
 (1) in kthread's looping worker_thread(), work A is actually running its job,
 (2) however, work A has freed its memory but kthread still want to use this
 address of memory, which means worker-current_work is still A's address.
 (3) B's readahead for free space cache inode happens to queue a work whose
 address of memory is just the previous address of A's work, which means
 another worker's -current_work is also A's address.
 (4) as in btrfs we all use function normal_work_helper(), so
 worker-current_func is fixed here.
 (5) worker_thread()
-process_one_work()
 -find_worker_executing_work()
 (find a collision, another work returns)
 
 Then we saw the hang.

Here is my understand of what you said:
The same worker dealt with work A and work B, and the 3rd work which was 
introduced by
work B and has the same virtual memory address as work A was also inserted into 
the work
list of that worker. But work B was wait for the 3rd work at that time, so 
deadlock happened.
Am I right?

If I'm right, I think what you said is impossible. Before we dealt with work B, 
we should
already invoke spin_unlock_irq(pool-lock), which implies a memory barrier 
that all changes
happens before unlock should complete before unlock, that is the address in 
current_work should
be the address of work B, when we inserted the 3rd work which was introduced by 
work B, we
should not find the address of work A in current_work of work B's worker.

I can not reproduce the problem on my machine, so I don't verify whether what I 
said is right
or not. Please correct me if I am wrong.

Thanks
Miao
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Blocked tasks on 3.15.1

2014-08-11 Thread Liu Bo
On Mon, Aug 11, 2014 at 08:55:21PM -0600, Charles Cazabon wrote:
 The blocked tasks issue that got significantly worse in 3.15 -- did anything
 go into 3.16 related to this?  I didn't see a single btrfs in Linus' 3.16
 announcement, so I don't know whether it should be better, the same, or worse
 in this respect...
 
 I haven't seen a definite statement about this on this list, either.
 
 Can someone more familiar with the state of development comment on this?

Good news is that we've figured out the bug and the patch is already under
testing :-) 

thanks,
-liubo
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix regression of btrfs device replace

2014-08-11 Thread Liu Bo
On Tue, Aug 12, 2014 at 11:25:00AM +0900, Satoru Takeuchi wrote:
 Hi Liu,
 
 (2014/08/12 6:41), Chris Murphy wrote:
 
 On Jul 29, 2014, at 5:09 AM, Liu Bo bo.li@oracle.com wrote:
 
 Commit 49c6f736f34f901117c20960ebd7d5e60f12fcac(
 btrfs: dev replace should replace the sysfs entry) added the missing sysfs 
 entry
 in the process of device replace, but didn't take missing devices into 
 account,
 so now we have
 
 BUG: unable to handle kernel NULL pointer dereference at 0088
 IP: [a0268551] btrfs_kobj_rm_device+0x21/0x40 [btrfs]
 ...
 
 To reproduce it,
 1. mkfs.btrfs -f disk1 disk2
 2. mkfs.ext4 disk1
 3. mount disk2 /mnt -odegraded
 4. btrfs replace start -B 1 disk3 /mnt
 --
 
 This fixes the problem.
 
 Reported-by: Chris Murphy li...@colorremedies.com
 Signed-off-by: Liu Bo bo.li@oracle.com
 ---
 fs/btrfs/sysfs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
 
 diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
 index 7869936..12e5355 100644
 --- a/fs/btrfs/sysfs.c
 +++ b/fs/btrfs/sysfs.c
 @@ -614,7 +614,7 @@ int btrfs_kobj_rm_device(struct btrfs_fs_info *fs_info,
 if (!fs_info-device_dir_kobj)
 return -EINVAL;
 
 -   if (one_device) {
 +   if (one_device  one_device-bdev) {
 disk = one_device-bdev-bd_part;
 disk_kobj = part_to_dev(disk)-kobj;
 
 
 
 Applied to 3.16.0 and tested, problem is fixed.
 
 
 Chris Murphy
 
 Reviewed-by: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com
 Tested-by: Satoru Takeuchi takeuchi_sat...@jp.fujitsu.com
 
 I confirmed both
 
  - This problem happens with 3.16, and
  - This problem doesn't happen with 3.16 + your patch.

Thanks for your testing!
 
-liubo
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ideas for a feature implementation

2014-08-11 Thread Chris Murphy

On Aug 11, 2014, at 8:27 PM, Austin S Hemmelgarn ahferro...@gmail.com wrote:

 On 08/11/2014 04:27 PM, Chris Murphy wrote:
 
 On Aug 10, 2014, at 8:53 PM, Austin S Hemmelgarn
 ahferro...@gmail.com wrote:
 
 
 Another thing that isn't listed there, that I would personally
 love to see is support for secure file deletion.  To be truly
 secure though, this would need to hook into the COW logic so that
 files marked for secure deletion can't be reflinked (maybe make
 the automatically NOCOW instead, and don't allow snapshots?), and
 when they get written to, the blocks that get COW'ed have the old
 block overwritten.
 
 If the file is reflinked or snapshot, then it can it be secure
 deleted? Because what does it mean to secure delete a file when
 there's a completely independent file pointing to the same physical
 blocks? What if someone else owns that independent file? Does the
 reflink copy get rm'd as well? Or does the file remain, but its
 blocks are zero'd/corrupted?
 The semantics that I would expect would be that the extents can't be
 reflinked, and when snapshotted the whole file gets COW'ed, and then
 inherits the secure deletion flag, possibly with another flag saying
 that the user can't disable the secure deletion flag.

Ahh OK I was thinking of a secure delete command (or an option to rm that 
indicates secure delete). You're suggesting one or more flags that makes for 
secure file handling, not just delete, affecting: a.) copied b.) moved, c.) 
snapshot/reflinked, d.) deleted.  So if deleted, a regular rm would see the 
xattr and do a secure delete; and the xattr would inhibit or limit the others.

While a reflink or normal copy could be inhibited, the snapshot case seems more 
difficult because it just creates a new tree. It's not scanning the tree for 
files/folders with xattr, which would have to be done to go retroactively 
remove the file set with the secure delete flag - could be really slow. And 
what if the snapshot is made read-only?

Strictly secure delete, e.g. rm -s, would be more straightforward than a flag 
affecting other filesystem operations.

 For SSDs, whether it's an overwrite or an FITRIM ioctl it's an open
 question when the data is actually irretrievable. It may be
 seconds, but could be much longer (hours?) so I'm not sure if it's
 useful. On HDD's using SMR it's not necessarily a given an
 overwrite will work there either.
 By secure deletion, I don't mean make the data absolutely
 unrecoverable by any means, I mean make it functionally impractical
 for someone without low-level access to and/or extensive knowledge of
 the hardware to recover the data; that is, more secure than simply
 unlinking the file, but obviously less than (for example) the
 application of thermite to the disk platters.  I'm talking the rough
 equivalent of wiping the data from RAM.
 
 Anyone who is truly security minded should be using whole disk
 encryption anyway, but even then you have the data accessible from the
 running OS.

Seems straightforward for any file system already supporting discard. This even 
has a useful application for thinly provisioned storage and large files where 
you'd want the underlying logical layer to free up extents sooner than later - 
even if you didn't care about the security aspect.

But for that matter, on SSDs right now you can rm the file and then fstrim the 
file system to get the same effect.

Chris Murphy--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 40TB volume taking over 16 hours to mount, any ideas?

2014-08-11 Thread Duncan
Jose Ildefonso Camargo Tolosa posted on Mon, 11 Aug 2014 16:33:36 -0500 as
excerpted:

 As I hate when a thread is left hanging, you deserve to know what
 happened in the end, you likely already guessed, but anyway: I nuked the
 filesystem, and started over.
 
 After some internal discussion in the company, we decided to move to ZFS
 for now.  However, we will keep an eye on btrfs, and will likely deploy
 it to some smaller system for further testing.
 
 Thanks you all for your help!

Thank you too. =:^)

Sounds like a sane decision for the time being.

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Blocked tasks on 3.15.1

2014-08-11 Thread Duncan
Liu Bo posted on Tue, 12 Aug 2014 10:56:42 +0800 as excerpted:

 On Mon, Aug 11, 2014 at 08:55:21PM -0600, Charles Cazabon wrote:
 The blocked tasks issue that got significantly worse in 3.15 -- did
 anything go into 3.16 related to this?  I didn't see a single btrfs
 in Linus' 3.16 announcement, so I don't know whether it should be
 better, the same, or worse in this respect...
 
 I haven't seen a definite statement about this on this list, either.
 
 Can someone more familiar with the state of development comment on
 this?
 
 Good news is that we've figured out the bug and the patch is already
 under testing :-)

IOW, it's not in 3.16.0, but will hopefully make it into 3.16.2 (it'll 
likely be a too late for 3.16.1).

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Blocked tasks on 3.15.1

2014-08-11 Thread Marc MERLIN
On Mon, Aug 11, 2014 at 08:55:21PM -0600, Charles Cazabon wrote:
 The blocked tasks issue that got significantly worse in 3.15 -- did anything
 go into 3.16 related to this?  I didn't see a single btrfs in Linus' 3.16
 announcement, so I don't know whether it should be better, the same, or worse
 in this respect...
 
 I haven't seen a definite statement about this on this list, either.

Yes, 3.15 is unusable for some workloads, mine included.
Go back to 3.14 until there is a patch in 3.16, which there isn't quite
as for right now, but very soon hopefully.

Note 3.16.0 is actually worse than 3.15 for me.

Marc
-- 
A mouse is a device used to point at the xterm you want to type in - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


3.15 btrfs free space cache oops

2014-08-11 Thread Daniel J Blueman
When running MonetDB over a BTRFS RAID-0 set over 4 SSDs [1] on
3.15.5, we see io_ctl have a bad address of 0x20, causing a fatal
pagefault in memcpy():

(gdb) list *(__btrfs_write_out_cache+0x3e4)
0x81365984 is in __btrfs_write_out_cache
(fs/btrfs/free-space-cache.c:521).
516if (io_ctl-index = io_ctl-num_pages)
517return -ENOSPC;
518io_ctl_map_page(io_ctl, 0);
519}
520
521memcpy(io_ctl-cur, bitmap, PAGE_CACHE_SIZE);
522io_ctl_set_crc(io_ctl, io_ctl-index - 1);
523if (io_ctl-index  io_ctl-num_pages)
524io_ctl_map_page(io_ctl, 0);
525return 0;

I can try to reproduce it if more data is useful?

Thanks,
  Daniel

-- [1]

mkfs.btrfs -f -m raid0 -d raid0 -n 16k -l 16k -O skinny-metadata
/dev/sda2 /dev/sdc2 /dev/sdb2 /dev/sdd2
mount /dev/sda2 /scratch -o noatime,discard,nodatasum,nobarrier,ssd_spread

-- [2]

BUG: unable to handle kernel paging request at 0020
IP: [8135a374] __btrfs_write_out_cache+0x3e4/0x8e0
PGD 3bca02c067 PUD 3bcf5fb067 PMD 0
Oops:  [#1] SMP
Modules linked in:
CPU: 34 PID: 46645 Comm: mserver5 Not tainted 3.15.5-server #7
Hardware name: Dell Inc. PowerEdge R815/0W13NR, BIOS 3.1.1 [1.1.54] 10/16/2013
task: 880a8c7234f0 ti: 8809aefcc000 task.ti: 8809aefcc000
RIP: 0010:[8135a374] [8135a374]
__btrfs_write_out_cache+0x3e4/0x8e0
RSP: 0018:8809aefcfc40 EFLAGS: 00010246
RAX: 004fb9321000 RBX: 8809aefcfca8 RCX: 0200
RDX: 1000 RSI: 0020 RDI: 884fb9321000
RBP: 8809aefcfd48 R08: 0200 R09: 
R10:  R11: 884fb9320ffc R12: 8831e3303740
R13: 880100579970 R14: 880bb38061c0 R15: 0020
FS: 7fb9447ed700() GS:884bbfc8() knlGS:
CS: 0010 DS:  ES:  CR0: 80050033
CR2: 0020 CR3: 00329b71c000 CR4: 000407e0
Stack:
 8809aefcfc90 0011 000e 884fbbc2c870
 880bb38061c0 8809aefcfc90 880bb3806058 880b02ec
 883bcd523800 8833d338f2c0 88476b1eb4e0 00b890cde000
Call Trace:
 [81a75b4b] ? _raw_spin_lock+0xb/0x20
 [8135c0e1] btrfs_write_out_cache+0xb1/0xf0
 [8130be0b] btrfs_write_dirty_block_groups+0x58b/0x670
 [813199c5] commit_cowonly_roots+0x195/0x250
 [8131b92f] btrfs_commit_transaction+0x41f/0x9b0
 [81358e85] ? btrfs_log_dentry_safe+0x55/0x70
 [8132b6b2] btrfs_sync_file+0x182/0x2a0
 [8114a450] do_fsync+0x50/0x80
 [8114a6de] SyS_fdatasync+0xe/0x20
 [81a766e6] system_call_fastpath+0x1a/0x1f
Code: ff 4d 89 fc 49 89 c7 e9 ab 00 00 00 0f 1f 00 40 f6 c7 02 0f 85
fe 00 00 00 40 f6 c7 04 0f 85 14 01 00 00 89 d1 c1 e9 03 f6 c2 04 f3
48 a5 74 09 8b 0e 89 0f b9 04 00 00 00 f6 c2 02 74 0e 44 0f
RIP [8135a374] __btrfs_write_out_cache+0x3e4/0x8e0
 RSP 8809aefcfc40
CR2: 0020
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html