date:20150709

Re: btrfs subvolume clone or fork (btrfs-progs feature request)

2015-07-09 Thread Hugo Mills

On Thu, Jul 09, 2015 at 01:43:53PM +, Duncan wrote:
 I could have sworn btrfs property -t subvolume can get/set that snapshot 
 bit.  I know I saw the discussion and I think patch for it go by, but 
 again, as I don't use them, I haven't tracked closely enough to see if it 
 ever got in.

   Are you thinking of the read-only flag? That's not the same thing
as the various UUID properties (e.g. parent) which can be used to
detemine if a subvolume was made using a snapshot.

   Hugo.

-- 
Hugo Mills | Someone's been throwing dead sheep down my Fun Well
hugo@... carfax.org.uk |
http://carfax.org.uk/  |
PGP: E2AB1DE4  |  Nick Gibbins


signature.asc
Description: Digital signature

Re: btrfs subvolume clone or fork (btrfs-progs feature request)

2015-07-09 Thread Duncan

Hugo Mills posted on Thu, 09 Jul 2015 13:54:48 + as excerpted:

 On Thu, Jul 09, 2015 at 01:43:53PM +, Duncan wrote:
 I could have sworn btrfs property -t subvolume can get/set that
 snapshot bit.  I know I saw the discussion and I think patch for it go
 by, but again, as I don't use them, I haven't tracked closely enough to
 see if it ever got in.
 
 Are you thinking of the read-only flag? That's not the same thing
 as the various UUID properties (e.g. parent) which can be used to
 detemine if a subvolume was made using a snapshot.

Perhaps, but I was sure there was a snapshot property too, because I 
remember discussion of being able to unset it in ordered to remove it 
from the snapshot (only) list.

But maybe that's all it was, discussion, it wasn't implemented, and I 
ended up conflating it with the read-only bit, which /can/ be set/unset 
that way.  Like I said I can't check as I don't have any subvolumes/
snapshots available to do a listing on and see, and the property manpage 
doesn't have a properties list to check on, it wants you to use the list 
option to get the list.

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs subvolume clone or fork (btrfs-progs feature request)

2015-07-09 Thread Roman Mamedov

On Thu, 09 Jul 2015 08:48:00 -0400
Austin S Hemmelgarn ahferro...@gmail.com wrote:

 On 2015-07-09 08:41, Sander wrote:
  Austin S Hemmelgarn wrote (ao):
  What's wrong with btrfs subvolume snapshot?
 
  Well, personally I would say the fact that once something is tagged as
  a snapshot, you can't change it to a regular subvolume without doing a
  non-incremental send/receive.
 
  A snapshot is a subvolume. There is no such thing as tagged as a
  snapshot.
 
  Sander
 
 No, there is a bit in the subvolume metadata that says whether it's 
 considered a snapshot or not.  Internally, they are handled identically, 
 but it does come into play when you consider things like btrfs subvolume 
 show -s (which only lists snapshots), which in turn means that certain 
 tasks are more difficult to script robustly.

This sounds like a vestigial leftover from back when snapshots were
conceptualized to be somehow functionally different from subvolumes... But as
you said, now there is effectively no difference, so that bit is used for
what, only to track how a subvolume was created? And to output in the
subvolume list if the user passes -s? I'd say that's a pretty oddball feature
to even have, since in any case if you want to distinguish and list only your
snapshots, you would typically just name them in a certain way, e.g.
/snaps/originalname/datetime.

-- 
With respect,
Roman


signature.asc
Description: PGP signature

Re: Anyone tried out btrbk yet?

2015-07-09 Thread Marc MERLIN

On Thu, Jul 09, 2015 at 02:26:55PM +0200, Martin Steigerwald wrote:
 Hi!
 
 I see Alex, the developer of btrbk posted here once about btrfs send and 
 receive, but well any other users of btrbk¹? What are your experiences?
 
 I consider switching to it from my home grown rsync based backup script to 
 it.
 
 Well I may try it for one of my BTRFS volumes in addition to the rsync 
 backup for now. I would like to give all options on command line, but well, 
 maybe it can completely replace my current script if I put everything in its 
 configuration.
 
 Any other handy BTRFS backup solutions?

I use my own which I wrote :)
http://marc.merlins.org/perso/btrfs/2014-03.html#Btrfs-Tips_-Doing-Fast-Incremental-Backups-With-Btrfs-Send-and-Receive
http://marc.merlins.org/linux/scripts/btrfs-subvolume-backup

Marc
-- 
A mouse is a device used to point at the xterm you want to type in - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Anyone tried out btrbk yet?

2015-07-09 Thread Henri Valta

On Thursday 09 July 2015 14:26:55 you wrote:

 Well I may try it for one of my BTRFS volumes in addition to the rsync
 backup for now. I would like to give all options on command line, but well,
 maybe it can completely replace my current script if I put everything in its
 configuration.
 
 Any other handy BTRFS backup solutions?

Hi,

I've been using btrfs-sxbackup for a couple of weeks, and it has been working 
great. Everything is configured on command line, so that's a plus.

https://pypi.python.org/pypi/btrfs-sxbackup
https://github.com/masc3d/btrfs-sxbackup

-Henri
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2] Btrfs: fix list transaction-pending_ordered corruption

2015-07-09 Thread David Sterba

On Fri, Jul 03, 2015 at 10:22:08PM +0100, fdman...@kernel.org wrote:
 From: Filipe Manana fdman...@suse.com
 Cc: sta...@vger.kernel.org
 Fixes: 50d9aa99bd35 (Btrfs: make sure logged extents complete in the current 
 transaction V3
 Signed-off-by: Filipe Manana fdman...@suse.com

... now for the right patch,

Reviewed-by: David Sterba dste...@suse.com
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Btrfs: fix list transaction-pending_ordered corruption

2015-07-09 Thread David Sterba

On Fri, Jul 03, 2015 at 08:46:40PM +0100, fdman...@kernel.org wrote:
 From: Filipe Manana fdman...@suse.com
...
 
 Cc: sta...@vger.kernel.org
 Fixes: 50d9aa99bd35 (Btrfs: make sure logged extents complete in the current 
 transaction V3
 Signed-off-by: Filipe Manana fdman...@suse.com

Good catch and thanks for looking up the offending commit.

Reviewed-by: David Sterba dste...@suse.com
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs subvolume clone or fork (btrfs-progs feature request)

2015-07-09 Thread David Sterba

On Thu, Jul 09, 2015 at 08:48:00AM -0400, Austin S Hemmelgarn wrote:
 On 2015-07-09 08:41, Sander wrote:
  Austin S Hemmelgarn wrote (ao):
  What's wrong with btrfs subvolume snapshot?
 
  Well, personally I would say the fact that once something is tagged as
  a snapshot, you can't change it to a regular subvolume without doing a
  non-incremental send/receive.
 
  A snapshot is a subvolume. There is no such thing as tagged as a
  snapshot.
 
 No, there is a bit in the subvolume metadata that says whether it's 
 considered a snapshot or not.

Technically it's not really a bit. The snapshot relation is determined
by the parent uuid value of a subvolume.

 Internally, they are handled identically, 
 but it does come into play when you consider things like btrfs subvolume 
 show -s (which only lists snapshots),

That was probably 'btrfs subvol list -s', though the 'subvol show'
command prints all snapshots of a given subvolume.

 which in turn means that certain 
 tasks are more difficult to script robustly.

I don't deny the interface/output is imperfect for scripting purposes,
maybe we can provide filters that would satisfy your usecase.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs partition converted from ext4 becomes read-only minutes after booting: WARNING: CPU: 2 PID: 2777 at ../fs/btrfs/super.c:260 __btrfs_abort_transaction+0x4b/0x120

2015-07-09 Thread Qu Wenruo




Chris Murphy wrote on 2015/07/09 18:45 -0600:

On Thu, Jul 9, 2015 at 6:34 PM, Qu Wenruo quwen...@cn.fujitsu.com wrote:

One of my patch addressed a problem that a converted btrfs can't pass
btrfsck.

Not sure if that is the cause, but if you can try btrfs-progs v3.19.1, the
one without my btrfs-progs patches and some other newer convert related
patches, and see the result?

I think this would at least provide the base for bisect the btrfs-progs if
the bug is in btrfs-progs.


I'm happy to regression test with 3.19.1 but I'm confused. After
conversion, btrfs check (4.1) finds no problems. After ext2_saved
snapshot is deleted, btrfsck finds no problems. After defrag, again
btrfsck finds no problems. After the failed balance, btrfsck finds no
problems but crashes with Aborted (core dump).
Even btrfsck reports no error, some btrfs-convert behavior change may 
lead to kernel mis-function.


But we are not sure it's btrfs-progs or kernel itself has bug.
Maybe btrfs convert did something wrong/different triggering the bug, or
just kernel regression?

So hat I'd like to check is, with 3.19.1 progs (kernel version doesn't 
change), whether the kernel still failes to do balance.


If the problem still happens, then  we can focus on kernel part, or at 
least, put at least less effort on btrfs-progs.




Should I still test 3.19.1?


Yes, please.

Thanks,
Qu
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Anyone tried out btrbk yet?

2015-07-09 Thread Donald Pearson

Marc,

I thought I'd yours a try, and I'm probably embarassing myself here
but I'm running in to this issue.  Centos 7.

[root@san01 tank]# ./btrfs-subvolume-backup store /mnt2/backups
./btrfs-subvolume-backup: line 177: shlock: command not found
/var/run/btrfs-subvolume-backup held for btrfs-subvolume-backup, quitting
[root@san01 tank]# yum whatprovides shlock
Loaded plugins: changelog, fastestmirror
Loading mirror speeds from cached hostfile
 * base: dist1.800hosting.com
 * elrepo: repos.dfw.lax-noc.com
 * epel: mirror.umd.edu
 * extras: mirrors.usc.edu
 * updates: mirror.keystealth.orgNo matches found
[root@san01 tank]# shlock
-bash: shlock: command not found
[root@san01 tank]# yum search all shlock
Loaded plugins: changelog, fastestmirror
Loading mirror speeds from cached hostfile
 * base: dist1.800hosting.com
 * elrepo: repos.dfw.lax-noc.com
 * epel: mirror.utexas.edu
 * extras: mirror.thelinuxfix.com
 * updates: dallas.tx.mirror.xygenhosting.com
Warning: No matches found for: shlock
No matches found

On Thu, Jul 9, 2015 at 12:17 PM, Marc MERLIN m...@merlins.org wrote:
 On Thu, Jul 09, 2015 at 02:26:55PM +0200, Martin Steigerwald wrote:
 Hi!

 I see Alex, the developer of btrbk posted here once about btrfs send and
 receive, but well any other users of btrbk¹? What are your experiences?

 I consider switching to it from my home grown rsync based backup script to
 it.

 Well I may try it for one of my BTRFS volumes in addition to the rsync
 backup for now. I would like to give all options on command line, but well,
 maybe it can completely replace my current script if I put everything in its
 configuration.

 Any other handy BTRFS backup solutions?

 I use my own which I wrote :)
 http://marc.merlins.org/perso/btrfs/2014-03.html#Btrfs-Tips_-Doing-Fast-Incremental-Backups-With-Btrfs-Send-and-Receive
 http://marc.merlins.org/linux/scripts/btrfs-subvolume-backup

 Marc
 --
 A mouse is a device used to point at the xterm you want to type in - A.S.R.
 Microsoft is to operating systems 
    what McDonalds is to gourmet 
 cooking
 Home page: http://marc.merlins.org/ | PGP 
 1024R/763BE901
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] btrfs: Remove noused chunk_tree and chunk_objectid from scrub_enumerate_chunks() and scrub_chunk()

2015-07-09 Thread Zhaolei

From: Zhao Lei zhao...@cn.fujitsu.com

These variables are not used from introduced version , remove them.

Signed-off-by: Zhao Lei zhao...@cn.fujitsu.com
---
 fs/btrfs/scrub.c | 9 ++---
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index eb35176..f552937 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -3321,7 +3321,6 @@ out:
 
 static noinline_for_stack int scrub_chunk(struct scrub_ctx *sctx,
  struct btrfs_device *scrub_dev,
- u64 chunk_tree, u64 chunk_objectid,
  u64 chunk_offset, u64 length,
  u64 dev_offset, int is_dev_replace)
 {
@@ -3372,8 +3371,6 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx,
struct btrfs_root *root = sctx-dev_root;
struct btrfs_fs_info *fs_info = root-fs_info;
u64 length;
-   u64 chunk_tree;
-   u64 chunk_objectid;
u64 chunk_offset;
int ret;
int slot;
@@ -3431,8 +3428,6 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx,
if (found_key.offset + length = start)
goto skip;
 
-   chunk_tree = btrfs_dev_extent_chunk_tree(l, dev_extent);
-   chunk_objectid = btrfs_dev_extent_chunk_objectid(l, dev_extent);
chunk_offset = btrfs_dev_extent_chunk_offset(l, dev_extent);
 
/*
@@ -3449,8 +3444,8 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx,
dev_replace-cursor_right = found_key.offset + length;
dev_replace-cursor_left = found_key.offset;
dev_replace-item_needs_writeback = 1;
-   ret = scrub_chunk(sctx, scrub_dev, chunk_tree, chunk_objectid,
- chunk_offset, length, found_key.offset,
+   ret = scrub_chunk(sctx, scrub_dev, chunk_offset, length,
+ found_key.offset,
  is_dev_replace);
 
/*
-- 
1.8.5.1

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Can't remove missing device

2015-07-09 Thread None None

One of my 3TB drives failed (not recognized anymore) recently so I got two new 
4TB drives, I mounted the fs with -o degraded and used btrfs dev add to add 
the new drives then I did btrfs dev del missing.
Now delete missing always returns an error
ERROR: error removing the device 'missing' - Input/output error

According to dmesg sda returns bad data but the smart values for it seem fine.
How do I get the FS working again?



Debian/SID, kernel v4.1



# btrfs fi df /srv/
Data, RAID5: total=18.96TiB, used=18.52TiB
System, RAID1: total=32.00MiB, used=2.30MiB
Metadata, RAID1: total=24.06GiB, used=22.09GiB
GlobalReserve, single: total=512.00MiB, used=0.00B



# btrfs fi sho
Label: none  uuid: ----
Total devices 11 FS bytes used 18.54TiB
devid1 size 2.73TiB used 2.56TiB path /dev/sdh
devid2 size 2.73TiB used 2.63TiB path /dev/sdg
devid3 size 2.73TiB used 2.64TiB path /dev/sdj
devid4 size 2.73TiB used 2.60TiB path /dev/sdk
devid5 size 2.73TiB used 2.63TiB path /dev/sdb
devid6 size 2.73TiB used 2.73TiB path /dev/sda
devid9 size 2.73TiB used 2.73TiB path /dev/sdd
devid   10 size 2.73TiB used 2.73TiB path /dev/sdl
devid   11 size 3.64TiB used 2.66GiB path /dev/sdc
devid   12 size 3.64TiB used 2.66GiB path /dev/sde
*** Some devices missing

btrfs-progs v4.0



# dmesg | tail -n 40
[ 9474.630480] BTRFS warning (device sda): csum failed ino 384 off 2927886336 
csum 1204172668 expected csum 3738892907
[ 9474.630487] BTRFS warning (device sda): csum failed ino 384 off 2927919104 
csum 729502971 expected csum 57406087
[ 9474.630493] BTRFS warning (device sda): csum failed ino 384 off 2927923200 
csum 1688454633 expected csum 4263548653
[ 9474.630495] BTRFS warning (device sda): csum failed ino 384 off 2927927296 
csum 3679588162 expected csum 4283532667
[ 9484.066796] BTRFS info (device sda): relocating block group 66338809643008 
flags 129
[ 9505.492349] __readpage_endio_check: 6 callbacks suppressed
[ 9505.492356] BTRFS warning (device sda): csum failed ino 385 off 2927886336 
csum 1204172668 expected csum 3738892907
[ 9505.492366] BTRFS warning (device sda): csum failed ino 385 off 2927890432 
csum 645393967 expected csum 1519548271
[ 9505.492372] BTRFS warning (device sda): csum failed ino 385 off 2927894528 
csum 3254966910 expected csum 2168664573
[ 9505.492377] BTRFS warning (device sda): csum failed ino 385 off 2927898624 
csum 3464250141 expected csum 1621289634
[ 9505.492382] BTRFS warning (device sda): csum failed ino 385 off 2927902720 
csum 2214000308 expected csum 2797028572
[ 9505.492387] BTRFS warning (device sda): csum failed ino 385 off 2927906816 
csum 3719155761 expected csum 561200354
[ 9505.492392] BTRFS warning (device sda): csum failed ino 385 off 2927910912 
csum 98768328 expected csum 1311354303
[ 9505.492397] BTRFS warning (device sda): csum failed ino 385 off 2927915008 
csum 996429330 expected csum 1552366519
[ 9505.492402] BTRFS warning (device sda): csum failed ino 385 off 2927919104 
csum 729502971 expected csum 57406087
[ 9505.492407] BTRFS warning (device sda): csum failed ino 385 off 2927923200 
csum 1688454633 expected csum 4263548653
[ 9515.428150] BTRFS info (device sda): relocating block group 66338809643008 
flags 129
[ 9534.605158] __readpage_endio_check: 7 callbacks suppressed
[ 9534.605165] BTRFS warning (device sda): csum failed ino 386 off 2927886336 
csum 1204172668 expected csum 3738892907
[ 9534.605174] BTRFS warning (device sda): csum failed ino 386 off 2927890432 
csum 645393967 expected csum 1519548271
[ 9534.605184] BTRFS warning (device sda): csum failed ino 386 off 2927894528 
csum 3254966910 expected csum 2168664573
[ 9534.605192] BTRFS warning (device sda): csum failed ino 386 off 2927898624 
csum 3464250141 expected csum 1621289634
[ 9534.605194] BTRFS warning (device sda): csum failed ino 386 off 2927902720 
csum 2214000308 expected csum 2797028572
[ 9534.605198] BTRFS warning (device sda): csum failed ino 386 off 2927906816 
csum 3719155761 expected csum 561200354
[ 9534.605204] BTRFS warning (device sda): csum failed ino 386 off 2927910912 
csum 98768328 expected csum 1311354303
[ 9534.605206] BTRFS warning (device sda): csum failed ino 386 off 2927915008 
csum 996429330 expected csum 1552366519
[ 9534.605212] BTRFS warning (device sda): csum failed ino 386 off 2927919104 
csum 729502971 expected csum 57406087
[ 9534.605215] BTRFS warning (device sda): csum failed ino 386 off 2927923200 
csum 1688454633 expected csum 4263548653
[ 9543.317995] BTRFS info (device sda): relocating block group 66338809643008 
flags 129
[ 9564.879155] __readpage_endio_check: 7 callbacks suppressed
[ 9564.879161] BTRFS warning (device sda): csum failed ino 387 off 2927886336 
csum 1204172668 expected csum 3738892907
[ 9564.879171] BTRFS warning (device sda): csum failed ino 387 off 2927890432 
csum 645393967 expected csum

[RFC PATCH 2/2] btrfs: scrub: Add support partial csum

2015-07-09 Thread Qu Wenruo

From: Zhao Lei zhao...@cn.fujitsu.com

Add scrub support for partial csum.
The only challenge is that, scrub is done in unit of bio(or page size
yet), but partial csum is done in unit of 1/8 of nodesize.

So here a new function scrub_check_node_checksum and a new tree block
csum check loop is introduced to do partial csum check while reading the
tree block.

Signed-off-by: Zhao Lei zhao...@cn.fujitsu.com
Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com
---
 fs/btrfs/scrub.c | 207 ++-
 1 file changed, 206 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index ab58115..0610474 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -307,6 +307,7 @@ static void copy_nocow_pages_worker(struct btrfs_work 
*work);
 static void __scrub_blocked_if_needed(struct btrfs_fs_info *fs_info);
 static void scrub_blocked_if_needed(struct btrfs_fs_info *fs_info);
 static void scrub_put_ctx(struct scrub_ctx *sctx);
+static int scrub_check_fsid(u8 fsid[], struct scrub_page *spage);
 
 
 static void scrub_pending_bio_inc(struct scrub_ctx *sctx)
@@ -878,6 +879,91 @@ static inline void scrub_put_recover(struct scrub_recover 
*recover)
 }
 
 /*
+ * Page_bad arg should be a page include leaf header
+ *
+ * Return 0 if this header seems correct,
+ * Return 1 on other cases
+ */
+static int scrub_check_head(struct scrub_page *spage, u8 *csum)
+{
+   void *mapped_buffer;
+   struct btrfs_header *h;
+
+   mapped_buffer = kmap_atomic(spage-page);
+   h = (struct btrfs_header *)mapped_buffer;
+
+   if (spage-logical != btrfs_stack_header_bytenr(h))
+   goto header_err;
+   if (!scrub_check_fsid(h-fsid, spage))
+   goto header_err;
+   if (memcmp(h-chunk_tree_uuid,
+  spage-dev-dev_root-fs_info-chunk_tree_uuid,
+  BTRFS_UUID_SIZE))
+   goto header_err;
+   if (spage-generation != btrfs_stack_header_generation(h))
+   goto header_err;
+
+   if (csum)
+   memcpy(csum, h-csum, sizeof(h-csum));
+
+   kunmap_atomic(mapped_buffer);
+   return 0;
+
+header_err:
+   kunmap_atomic(mapped_buffer);
+   return 1;
+}
+
+/*
+ * return 1 if checksum ok, 0 on other case
+ */
+static int scrub_check_node_checksum(struct scrub_block *sblock,
+int part,
+u8 *csum)
+{
+   int offset;
+   int len;
+   u32 crc = ~(u32)0;
+
+   if (part == 0) {
+   offset = BTRFS_CSUM_SIZE;
+   len = sblock-sctx-nodesize - BTRFS_CSUM_SIZE;
+   } else if (part == 1) {
+   offset = BTRFS_CSUM_SIZE;
+   len = sblock-sctx-nodesize * 2 / 8 - BTRFS_CSUM_SIZE;
+   } else {
+   offset = part * sblock-sctx-nodesize / 8;
+   len = sblock-sctx-nodesize / 8;
+   }
+
+   while (len  0) {
+   int page_num = offset / PAGE_SIZE;
+   int page_data_offset = offset - page_num * PAGE_SIZE;
+   int page_data_len = min(len,
+   (int)(PAGE_SIZE - page_data_offset));
+   u8 *mapped_buffer;
+
+   WARN_ON(page_num = sblock-page_count);
+
+   if (sblock-pagev[page_num]-io_error)
+   return 0;
+
+   mapped_buffer = kmap_atomic(
+ sblock-pagev[page_num]-page);
+
+   crc = btrfs_csum_data(mapped_buffer + page_data_offset, crc,
+ page_data_len);
+
+   offset += page_data_len;
+   len -= page_data_len;
+
+   kunmap_atomic(mapped_buffer);
+   }
+   btrfs_csum_final(crc, (char *)crc);
+   return (crc == ((u32 *)csum)[part]);
+}
+
+/*
  * scrub_handle_errored_block gets called when either verification of the
  * pages failed or the bio failed to read, e.g. with EIO. In the latter
  * case, this function handles all pages in the bio, even though only one
@@ -905,6 +991,9 @@ static int scrub_handle_errored_block(struct scrub_block 
*sblock_to_check)
int success;
static DEFINE_RATELIMIT_STATE(_rs, DEFAULT_RATELIMIT_INTERVAL,
  DEFAULT_RATELIMIT_BURST);
+   u8 node_csum[BTRFS_CSUM_SIZE];
+   int get_right_sum = 0;
+   int per_page_recover_start = 0;
 
BUG_ON(sblock_to_check-page_count  1);
fs_info = sctx-dev_root-fs_info;
@@ -1151,11 +1240,125 @@ nodatasum_case:
 * area are unreadable.
 */
success = 1;
+
+   /*
+* maybe some mirror's head is broken
+* we select to use right head for checksum
+*/
+   for (mirror_index = 0; mirror_index  BTRFS_MAX_MIRRORS 
+sblocks_for_recheck[mirror_index].page_count  0;
+mirror_index++) {
+   if

[RFC PATCH 1/2] btrfs: csum: Introduce partial csum for tree block.

2015-07-09 Thread Qu Wenruo

Introduce the new partial csum mechanism for tree block.

[Old tree block csum]
0 4 8121620242832
-
|csum |   unused, all 0 |
-
Csum is the crc32 of the whole tree block data.

[New tree block csum]
-
|csum0|csum1|csum2|csum3|csum4|csum5|csum6|csum7|
-
Where csum0 is the same as the old one, crc32 of the whole tree block
data.

But csum1~csum7 will restore crc32 of each eighth part.
Take example of 16K leafsize, then:
csum1: crc32 of BTRFS_CSUM_SIZE~4K
csum2: crc32 of 4K~6K
...
csum7: crc32 of 14K~16K

This provides the ability for btrfs not only to detect corruption but
also to know where corruption is.
Further improve the robustness of btrfs.

Although the best practise is to introduce new csum type and put every
eighth crc32 into corresponding place, but the benefit is not worthy to
break the backward compatibility.
So keep csum0 and modify csum1 range to keep backward compatibility.

Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com
---
 fs/btrfs/disk-io.c | 74 --
 1 file changed, 49 insertions(+), 25 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 2ef9a4b..b2d8526 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -271,47 +271,75 @@ void btrfs_csum_final(u32 crc, char *result)
 }
 
 /*
- * compute the csum for a btree block, and either verify it or write it
- * into the csum field of the block.
+ * Calcuate partial crc32 for each part.
+ *
+ * Part should be in [0, 7].
+ * Part 0 is the old crc32 of the whole leaf/node.
+ * Part 1 is the crc32 of 32~ 2/8 of leaf/node.
+ * Part 2 is the crc32 of 3/8 of leaf/node.
+ * Part 3 is the crc32 of 4/8 of lean/node and so on.
  */
-static int csum_tree_block(struct btrfs_fs_info *fs_info,
-  struct extent_buffer *buf,
-  int verify)
+static int csum_tree_block_part(struct extent_buffer *buf,
+   char *result, int part)
 {
-   u16 csum_size = btrfs_super_csum_size(fs_info-super_copy);
-   char *result = NULL;
+   int offset;
+   int err;
unsigned long len;
unsigned long cur_len;
-   unsigned long offset = BTRFS_CSUM_SIZE;
-   char *kaddr;
unsigned long map_start;
unsigned long map_len;
-   int err;
+   char *kaddr;
u32 crc = ~(u32)0;
-   unsigned long inline_result;
 
-   len = buf-len - offset;
+   BUG_ON(part = 8 || part  0);
+   BUG_ON(ALIGN(buf-len, 8) != buf-len);
+
+   if (part == 0) {
+   offset = BTRFS_CSUM_SIZE;
+   len = buf-len - offset;
+   } else if (part == 1) {
+   offset = BTRFS_CSUM_SIZE;
+   len = buf-len * 2 / 8 - offset;
+   } else {
+   offset = part * buf-len / 8;
+   len = buf-len / 8;
+   }
+
while (len  0) {
err = map_private_extent_buffer(buf, offset, 32,
kaddr, map_start, map_len);
if (err)
-   return 1;
+   return err;
cur_len = min(len, map_len - (offset - map_start));
crc = btrfs_csum_data(kaddr + offset - map_start,
  crc, cur_len);
len -= cur_len;
offset += cur_len;
}
-   if (csum_size  sizeof(inline_result)) {
-   result = kzalloc(csum_size, GFP_NOFS);
-   if (!result)
+   btrfs_csum_final(crc, result + BTRFS_CSUM_SIZE * part / 8);
+   return 0;
+}
+
+/*
+ * compute the csum for a btree block, and either verify it or write it
+ * into the csum field of the block.
+ */
+static int csum_tree_block(struct btrfs_fs_info *fs_info,
+  struct extent_buffer *buf,
+  int verify)
+{
+   u16 csum_size = btrfs_super_csum_size(fs_info-super_copy);
+   char result[BTRFS_CSUM_SIZE] = {0};
+   int err;
+   int index = 0;
+
+   /* get every part csum */
+   for (index = 0; index  8; index++) {
+   err = csum_tree_block_part(buf, result, index);
+   if (err)
return 1;
-   } else {
-   result = (char *)inline_result;
}
 
-   btrfs_csum_final(crc, result);
-
if (verify) {
if (memcmp_extent_buffer(buf, result, 0, csum_size)) {
u32 val;
@@ -324,15 +352,11 @@ static int csum_tree_block(struct btrfs_fs_info *fs_info,
level %d\n,
fs_info-sb-s_id, buf-start,
val, found, btrfs_header_level(buf));
-   if (result !=

Re: [BUG] Fails to duplicate metadata/system

2015-07-09 Thread Chris Murphy

On Thu, Jul 9, 2015 at 5:34 PM,  conc...@web.de wrote:
 Hi,

 I've noticed that a single device partition was using metadata.single and 
 system.single instead of metadata.dup and system.dup. All tests to force 
 conversion to dup failed.


Try only -mconvert=dup and without -f flag and see if it works. I'm
pretty sure system chunks are treated in parity with metadata chunks
now so it doesn't need to be separately listed. And -f isn't needed
except to reduce redundancy.

If that's not it, I'm going to speculate maybe try kernel 4.0.6 and
higher, as there was a bug in 4.0 that prevented chunk conversions but
I thought that only applied to raid profiles, not single vs dup. The
fix for that was commit 153c35b60c72de9fae06c8e2c8b2c47d79d4.



-- 
Chris Murphy
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs partition converted from ext4 becomes read-only minutes after booting: WARNING: CPU: 2 PID: 2777 at ../fs/btrfs/super.c:260 __btrfs_abort_transaction+0x4b/0x120

2015-07-09 Thread Chris Murphy

On Thu, Jul 9, 2015 at 6:34 PM, Qu Wenruo quwen...@cn.fujitsu.com wrote:
 One of my patch addressed a problem that a converted btrfs can't pass
 btrfsck.

 Not sure if that is the cause, but if you can try btrfs-progs v3.19.1, the
 one without my btrfs-progs patches and some other newer convert related
 patches, and see the result?

 I think this would at least provide the base for bisect the btrfs-progs if
 the bug is in btrfs-progs.

I'm happy to regression test with 3.19.1 but I'm confused. After
conversion, btrfs check (4.1) finds no problems. After ext2_saved
snapshot is deleted, btrfsck finds no problems. After defrag, again
btrfsck finds no problems. After the failed balance, btrfsck finds no
problems but crashes with Aborted (core dump).

Should I still test 3.19.1?


-- 
Chris Murphy
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Anyone tried out btrbk yet?

2015-07-09 Thread Donald Pearson

... and I just found your other block about stealing shlock out of inn.

Officially embarassed!

On Thu, Jul 9, 2015 at 8:35 PM, Donald Pearson
donaldwhpear...@gmail.com wrote:
 Marc,

 I thought I'd yours a try, and I'm probably embarassing myself here
 but I'm running in to this issue.  Centos 7.

 [root@san01 tank]# ./btrfs-subvolume-backup store /mnt2/backups
 ./btrfs-subvolume-backup: line 177: shlock: command not found
 /var/run/btrfs-subvolume-backup held for btrfs-subvolume-backup, quitting
 [root@san01 tank]# yum whatprovides shlock
 Loaded plugins: changelog, fastestmirror
 Loading mirror speeds from cached hostfile
  * base: dist1.800hosting.com
  * elrepo: repos.dfw.lax-noc.com
  * epel: mirror.umd.edu
  * extras: mirrors.usc.edu
  * updates: mirror.keystealth.orgNo matches found
 [root@san01 tank]# shlock
 -bash: shlock: command not found
 [root@san01 tank]# yum search all shlock
 Loaded plugins: changelog, fastestmirror
 Loading mirror speeds from cached hostfile
  * base: dist1.800hosting.com
  * elrepo: repos.dfw.lax-noc.com
  * epel: mirror.utexas.edu
  * extras: mirror.thelinuxfix.com
  * updates: dallas.tx.mirror.xygenhosting.com
 Warning: No matches found for: shlock
 No matches found

 On Thu, Jul 9, 2015 at 12:17 PM, Marc MERLIN m...@merlins.org wrote:
 On Thu, Jul 09, 2015 at 02:26:55PM +0200, Martin Steigerwald wrote:
 Hi!

 I see Alex, the developer of btrbk posted here once about btrfs send and
 receive, but well any other users of btrbk¹? What are your experiences?

 I consider switching to it from my home grown rsync based backup script to
 it.

 Well I may try it for one of my BTRFS volumes in addition to the rsync
 backup for now. I would like to give all options on command line, but well,
 maybe it can completely replace my current script if I put everything in its
 configuration.

 Any other handy BTRFS backup solutions?

 I use my own which I wrote :)
 http://marc.merlins.org/perso/btrfs/2014-03.html#Btrfs-Tips_-Doing-Fast-Incremental-Backups-With-Btrfs-Send-and-Receive
 http://marc.merlins.org/linux/scripts/btrfs-subvolume-backup

 Marc
 --
 A mouse is a device used to point at the xterm you want to type in - A.S.R.
 Microsoft is to operating systems 
    what McDonalds is to gourmet 
 cooking
 Home page: http://marc.merlins.org/ | PGP 
 1024R/763BE901
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Anyone tried out btrbk yet?

2015-07-09 Thread Paul Harvey

In my research, I've found btrbk and btrfs-sxbackup certainly to be
the leading contenders in terms of feature completeness. sanoid [1]
will be another interesting possibility once btrfs compatibility is
added (currently zfs only).

I just wish I'd discovered all these before I went to all the effort
of creating snazzer [1] :)

I've been meaning to split some stuff out of snazzer that might be
generically useful to other folks, such as filesystem cloning of all
subvols/snapshots via send/receive, and it seems as if it should be
possible to automatically prune any idiosyncratic snapshot naming
convention - I just haven't found the time to write unit tests.

[1] https://github.com/jimsalterjrs/sanoid
[2] https://github.com/csirac2/snazzer

On 10 July 2015 at 11:38, Donald Pearson donaldwhpear...@gmail.com wrote:
 ... and I just found your other block about stealing shlock out of inn.

 Officially embarassed!

 On Thu, Jul 9, 2015 at 8:35 PM, Donald Pearson
 donaldwhpear...@gmail.com wrote:
 Marc,

 I thought I'd yours a try, and I'm probably embarassing myself here
 but I'm running in to this issue.  Centos 7.

 [root@san01 tank]# ./btrfs-subvolume-backup store /mnt2/backups
 ./btrfs-subvolume-backup: line 177: shlock: command not found
 /var/run/btrfs-subvolume-backup held for btrfs-subvolume-backup, quitting
 [root@san01 tank]# yum whatprovides shlock
 Loaded plugins: changelog, fastestmirror
 Loading mirror speeds from cached hostfile
  * base: dist1.800hosting.com
  * elrepo: repos.dfw.lax-noc.com
  * epel: mirror.umd.edu
  * extras: mirrors.usc.edu
  * updates: mirror.keystealth.orgNo matches found
 [root@san01 tank]# shlock
 -bash: shlock: command not found
 [root@san01 tank]# yum search all shlock
 Loaded plugins: changelog, fastestmirror
 Loading mirror speeds from cached hostfile
  * base: dist1.800hosting.com
  * elrepo: repos.dfw.lax-noc.com
  * epel: mirror.utexas.edu
  * extras: mirror.thelinuxfix.com
  * updates: dallas.tx.mirror.xygenhosting.com
 Warning: No matches found for: shlock
 No matches found

 On Thu, Jul 9, 2015 at 12:17 PM, Marc MERLIN m...@merlins.org wrote:
 On Thu, Jul 09, 2015 at 02:26:55PM +0200, Martin Steigerwald wrote:
 Hi!

 I see Alex, the developer of btrbk posted here once about btrfs send and
 receive, but well any other users of btrbk¹? What are your experiences?

 I consider switching to it from my home grown rsync based backup script to
 it.

 Well I may try it for one of my BTRFS volumes in addition to the rsync
 backup for now. I would like to give all options on command line, but well,
 maybe it can completely replace my current script if I put everything in 
 its
 configuration.

 Any other handy BTRFS backup solutions?

 I use my own which I wrote :)
 http://marc.merlins.org/perso/btrfs/2014-03.html#Btrfs-Tips_-Doing-Fast-Incremental-Backups-With-Btrfs-Send-and-Receive
 http://marc.merlins.org/linux/scripts/btrfs-subvolume-backup

 Marc
 --
 A mouse is a device used to point at the xterm you want to type in - 
 A.S.R.
 Microsoft is to operating systems 
    what McDonalds is to gourmet 
 cooking
 Home page: http://marc.merlins.org/ | PGP 
 1024R/763BE901
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC PATCH 0/2] Btrfs partial csum support

2015-07-09 Thread Qu Wenruo

This patchset will add partial csum support for btrfs.

Partial csum will take full advantage of the 32 bytes csum space inside
the tree block, while still maintain backward compatibility on old
kernels.

The overall idea is like the following on 16K leaf:
[Old tree block csum]
0 4 8121620242832
-
|csum |   unused, all 0 |
-
Csum is the crc32 of the whole tree block data.

[New tree block csum]
-
|csum0|csum1|csum2|csum3|csum4|csum5|csum6|csum7|
-
Where csum0 is the same as the old one, crc32 of the whole tree block
data.

And csum1~csum7 will restore crc32 of each eighth part.
Take example of 16K leafsize, then:
csum1: crc32 of BTRFS_CSUM_SIZE~4K
csum2: crc32 of 4K~6K
...
csum7: crc32 of 14K~16K


When nodesize is small, like 4K, partial csum is completely useless.
But when nodesize grows up, like 32K, each partial csum will just covers
a page, making scrub able to judge which page is OK even without reading
out the whole tree block.

And add the possibility to fix case like corruption happens at all
mirror but in different part.
Such case should be more possible if nodesize goes up beyond 16K.

Qu Wenruo (1):
  btrfs: csum: Introduce partial csum for tree block.

Zhao Lei (1):
  btrfs: scrub: Add support partial csum

 fs/btrfs/disk-io.c |  74 ---
 fs/btrfs/scrub.c   | 207 -
 2 files changed, 255 insertions(+), 26 deletions(-)

-- 
2.4.5

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] Btrfs: remove unused mutex from struct 'btrfs_fs_info'

2015-07-09 Thread Byongho Lee

The code using 'ordered_extent_flush_mutex' mutex has removed by below
commit.
 - 8d875f95da43c6a8f18f77869f2ef26e9594fecc
   btrfs: disable strict file flushes for renames and truncates
But the mutex still lives in struct 'btrfs_fs_info'.

So, this patch removes the mutex from struct 'btrfs_fs_info' and its
initialization code.

Signed-off-by: Byongho Lee bhlee.ker...@gmail.com
---
 fs/btrfs/ctree.h   | 6 --
 fs/btrfs/disk-io.c | 1 -
 2 files changed, 7 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index aac314e14188..cdde6d541b3a 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1518,12 +1518,6 @@ struct btrfs_fs_info {
 */
struct mutex ordered_operations_mutex;
 
-   /*
-* Same as ordered_operations_mutex except this is for ordered extents
-* and not the operations.
-*/
-   struct mutex ordered_extent_flush_mutex;
-
struct rw_semaphore commit_root_sem;
 
struct rw_semaphore cleanup_work_sem;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index e5aad7f535aa..6ba584714c51 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2608,7 +2608,6 @@ int open_ctree(struct super_block *sb,
 
 
mutex_init(fs_info-ordered_operations_mutex);
-   mutex_init(fs_info-ordered_extent_flush_mutex);
mutex_init(fs_info-tree_log_mutex);
mutex_init(fs_info-chunk_mutex);
mutex_init(fs_info-transaction_kthread_mutex);
-- 
2.4.5

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs partition converted from ext4 becomes read-only minutes after booting: WARNING: CPU: 2 PID: 2777 at ../fs/btrfs/super.c:260 __btrfs_abort_transaction+0x4b/0x120

2015-07-09 Thread Vytautas D

Slightly off topic

does these bugs exist in systems that converted from ext4 to btrfs
using kernel 3.13 and then upgraded to kernel 4.1 ?

On Thu, Jul 9, 2015 at 4:09 AM, Chris Murphy li...@colorremedies.com wrote:
 On Thu, Jun 25, 2015 at 8:08 PM, Qu Wenruo quwen...@cn.fujitsu.com wrote:

 A quite code search leads me to inline extent.

 So, if you still have the original ext* image,
 would you please try revert to ext* and then convert it to btrfs again?

 But this time, please convert with --no-inline option, and see if this
 remove the problem.

 Using -n at convert time does not make a difference for the
 btrfs-convert bugs I've opened:
 https://bugzilla.kernel.org/show_bug.cgi?id=101191
 https://bugzilla.kernel.org/show_bug.cgi?id=101181
 https://bugzilla.kernel.org/show_bug.cgi?id=101221
 https://bugzilla.kernel.org/show_bug.cgi?id=101231

 The last one I just discovered happens much sooner, is easier to
 reproduce than the other two. It's a scrub right after a successful
 btrfs-convert that btrfs check says is OK. But the scrub ends with two
 separate oopses and multiple call traces and a spectacularly hard
 kernic panic (ssh and even the console dies).

 So I think btrfs-convert has a bug, but then the kernel code is not
 gracefully handling it at all either and crashes badly with a scrub;
 and less badly with balance. However, the file system is still OK
 despite scrub crash. With balance failure, the file system is too
 badly damaged and btrfs check and btrfs-image fail.



 --
 Chris Murphy
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs subvolume clone or fork (btrfs-progs feature request)

2015-07-09 Thread Fajar A. Nugraha

On Thu, Jul 9, 2015 at 8:20 AM, james harvey jamespharve...@gmail.com wrote:
 Request for new btrfs subvolume subcommand:

 clone or fork [-i qgroupid] source [dest]name
Create a subvolume name in dest, which is a clone or fork of source.
If dest is not given, subvolume name will be created in the
 current directory.
Options
-i qgroupid
   Add the newly created subvolume to a qgroup.  This option can be
 given multiple times.

 Would (I think):
 * btrfs subvolume create dest-subvolume
 * cp -ax --reflink=always source-subvolume/* dest-subvolume/

What's wrong with btrfs subvolume snapshot?

-- 
Fajar
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: size 2.73TiB used 240.97GiB after balance

2015-07-09 Thread Austin S Hemmelgarn


On 2015-07-08 15:06, Donald Pearson wrote:

I wouldn't use dd.

I would use recover to get the data if at all possible, then you can
experiment with try to fix the degraded condition live.  If you have
any chance of getting data from the pool, you reduce that chance every
time you make a change.

If btrfs did the balance like you said, it wouldn't be raid5.  What
you just described is raid4 where only one drive holds parity data.  I
can't say that I actually know for a fact that btrfs doesn't do this,
but I'd be shocked and some dev would need to eat their underware if
the balance job didn't distribute the parity also.

That is correct, it does distribute the parity among all the member 
drives.  That said, it would still have to modify the existing drives 
even if it did put the parity on just the new drive, because raid{4,5,6} 
are defined as _striped_ data with parity, not mirrored (ie, if you just 
removed the parity, you'd have a raid0, not a raid1).





smime.p7s
Description: S/MIME Cryptographic Signature

Re: size 2.73TiB used 240.97GiB after balance

2015-07-09 Thread Austin S Hemmelgarn


On 2015-07-08 18:16, Donald Pearson wrote:

Basically I wouldn't trust the drive that's already showing signs of
failure to survive a dd.  It isn't completely full, so the recover is
less load.  That's just the way I see it.  But I see your point of
trying to get drive images now to hedge against failures.

Unfortunately those errors are over my head so hopefully someone else
has insights.

A better option if you want a block level copy would probably be 
ddrescue (it's available in almost every distro in a package of the same 
name), it's designed for recovering as much data as possible from failed 
disks (and gives a much nicer status display than plain old dd).  If you 
do go for a block level copy however, make certain that no more than one 
of the copies is visible to the system at any given time, especially 
when the filesystem is mounted, otherwise things _WILL_ get 
exponentially worse.





smime.p7s
Description: S/MIME Cryptographic Signature

Re: btrfs subvolume clone or fork (btrfs-progs feature request)

2015-07-09 Thread Austin S Hemmelgarn


On 2015-07-09 02:22, Fajar A. Nugraha wrote:

On Thu, Jul 9, 2015 at 8:20 AM, james harvey jamespharve...@gmail.com wrote:

Request for new btrfs subvolume subcommand:

clone or fork [-i qgroupid] source [dest]name
Create a subvolume name in dest, which is a clone or fork of source.
If dest is not given, subvolume name will be created in the
current directory.
Options
-i qgroupid
   Add the newly created subvolume to a qgroup.  This option can be
given multiple times.

Would (I think):
* btrfs subvolume create dest-subvolume
* cp -ax --reflink=always source-subvolume/* dest-subvolume/


What's wrong with btrfs subvolume snapshot?

Well, personally I would say the fact that once something is tagged as a 
snapshot, you can't change it to a regular subvolume without doing a 
non-incremental send/receive.




smime.p7s
Description: S/MIME Cryptographic Signature

Re: btrfs subvolume clone or fork (btrfs-progs feature request)

2015-07-09 Thread Sander

Austin S Hemmelgarn wrote (ao):
 On 2015-07-09 08:41, Sander wrote:
 Austin S Hemmelgarn wrote (ao):
 What's wrong with btrfs subvolume snapshot?
 
 Well, personally I would say the fact that once something is tagged as
 a snapshot, you can't change it to a regular subvolume without doing a
 non-incremental send/receive.
 
 A snapshot is a subvolume. There is no such thing as tagged as a
 snapshot.
 
  Sander
 
 No, there is a bit in the subvolume metadata that says whether it's
 considered a snapshot or not.  Internally, they are handled identically, but
 it does come into play when you consider things like btrfs subvolume show -s
 (which only lists snapshots), which in turn means that certain tasks are
 more difficult to script robustly.

I stand corrected. Thanks for the info.

Sander
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs subvolume clone or fork (btrfs-progs feature request)

2015-07-09 Thread Sander

Austin S Hemmelgarn wrote (ao):
 What's wrong with btrfs subvolume snapshot?

 Well, personally I would say the fact that once something is tagged as
 a snapshot, you can't change it to a regular subvolume without doing a
 non-incremental send/receive.

A snapshot is a subvolume. There is no such thing as tagged as a
snapshot.

Sander
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs subvolume clone or fork (btrfs-progs feature request)

2015-07-09 Thread Austin S Hemmelgarn


On 2015-07-09 08:41, Sander wrote:

Austin S Hemmelgarn wrote (ao):

What's wrong with btrfs subvolume snapshot?


Well, personally I would say the fact that once something is tagged as
a snapshot, you can't change it to a regular subvolume without doing a
non-incremental send/receive.


A snapshot is a subvolume. There is no such thing as tagged as a
snapshot.

Sander

No, there is a bit in the subvolume metadata that says whether it's 
considered a snapshot or not.  Internally, they are handled identically, 
but it does come into play when you consider things like btrfs subvolume 
show -s (which only lists snapshots), which in turn means that certain 
tasks are more difficult to script robustly.




smime.p7s
Description: S/MIME Cryptographic Signature

Re: kernel crash on btrfs device delete missing

2015-07-09 Thread David Wilhelm

I was finally able to remove the missing device.  I updated the bug
report, but in case anyone else has this problem I wanted to update
here as well.

I deleted all snapshot subvolumes on the pool (between 20 and 30), and
was able to delete the missing device then without issue.  This took
two tries, because the first time I did not wait for btrfs-cleanup to
finish actually deleting the subvolumes (as it does this in the
background.)  The second time I deleted the subvolumes then waited
until all disk activity (reported by iotop) had ceased, and re-mounted
the pool to be sure.  After that the rebalance/delete worked without
issue.

I am not certain if this is because of a bug with rebalancing
snapshots or because the some bad data that was causing the segfault
just happened to be in the snapshots.

On Tue, Jul 7, 2015 at 12:45 PM, David Wilhelm thefe...@gmail.com wrote:
 Thanks.  I've submitted it as issue 101141

 https://bugzilla.kernel.org/show_bug.cgi?id=101141


That looks like the kind of thing you need a developer for. You've
 already reported it here, but sticking a copy of what you've
 discovered so far into bugzilla.kernel.org may help it not to get
 lost.

Hugo.

 --
 Hugo Mills | I don't like the look of it, I tell you.
 hugo@... carfax.org.uk | Well, stop looking at it, then.
 http://carfax.org.uk/  |
 PGP: E2AB1DE4  | The 
 Goons
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH trivial] Btrfs: Spelling s/consitent/consistent/

2015-07-09 Thread David Sterba

On Mon, Jul 06, 2015 at 03:38:11PM +0200, Geert Uytterhoeven wrote:
 Signed-off-by: Geert Uytterhoeven ge...@linux-m68k.org

Acked-by: David Sterba dste...@suse.com
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs subvolume clone or fork (btrfs-progs feature request)

2015-07-09 Thread Duncan

Austin S Hemmelgarn posted on Thu, 09 Jul 2015 08:48:00 -0400 as
excerpted:

 On 2015-07-09 08:41, Sander wrote:
 Austin S Hemmelgarn wrote (ao):
 What's wrong with btrfs subvolume snapshot?

 Well, personally I would say the fact that once something is tagged as
 a snapshot, you can't change it to a regular subvolume without doing a
 non-incremental send/receive.

 A snapshot is a subvolume. There is no such thing as tagged as a
 snapshot.

  Sander

 No, there is a bit in the subvolume metadata that says whether it's
 considered a snapshot or not.  Internally, they are handled identically,
 but it does come into play when you consider things like btrfs subvolume
 show -s (which only lists snapshots), which in turn means that certain
 tasks are more difficult to script robustly.

My use-case doesn't involve subvolumes or snapshots so I can't check for 
sure, but...

I could have sworn btrfs property -t subvolume can get/set that snapshot 
bit.  I know I saw the discussion and I think patch for it go by, but 
again, as I don't use them, I haven't tracked closely enough to see if it 
ever got in.

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Documentation: filesystems: btrfs: Fixed typos and whitespace

2015-07-09 Thread Jonathan Corbet

On Wed, 08 Jul 2015 10:44:51 -0700
Daniel Grimshaw grims...@linux.vnet.ibm.com wrote:

 I am a high school student trying to become familiar with
 Linux kernel development. The btrfs documentation in
 Documentation/filesystems had a few typos and errors in
 whitespace. This patch corrects both of these.
 
 This is a resend of an earlier patch with corrected patchfile.

Applied to the docs tree, thanks.

Just FYI, if you put lines like the last one above after the '---' line,
they won't find their way into the commit changelog, which is
preferable.  I edited it out.

jon
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs partition converted from ext4 becomes read-only minutes after booting: WARNING: CPU: 2 PID: 2777 at ../fs/btrfs/super.c:260 __btrfs_abort_transaction+0x4b/0x120

2015-07-09 Thread Chris Murphy

On Thu, Jul 9, 2015 at 4:52 AM, Vytautas D vyt...@gmail.com wrote:
 Slightly off topic

 does these bugs exist in systems that converted from ext4 to btrfs using
 kernel 3.13 and then upgraded to kernel 4.1 ?

I don't recall what btrfs-progs and kernel I last tested ext4
conversion with. I know this is a regression, I just don't know how
old it is. I think there's more than one bug here (obviously since
I've filed 4 related bugs in ~24 hours), but I really don't know the
scope of the problem. But the case where the recommended procedure not
only fails but corrupts the file system and it can't be fixed or
rolled back, is not good.

Perhaps the wiki should provide a warning that this is currently
broken, status unknown, or something?

-- 
Chris Murphy
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Concurrent write access

2015-07-09 Thread Wolfgang Mader

Hi,

I have a btrfs raid10 which is connected to a server hosting multiple virtual 
machine. Does btrfs support connecting the same subvolumes of the same raid to 
multiple virtual machines for concurrent read and write? The situation would 
be the same as, say, mounting user homes from the same nfs share on different 
machines.


Thanks,
Wolfgang

signature.asc
Description: This is a digitally signed message part.

Odd scrub behavior - Raid5/6

2015-07-09 Thread Donald Pearson

Something I've noticed scrubbing two pools that I have, one is Raid6
and the other is Raid5.

The scrubbing goes along very slowly and I think it's because there is
always one disk that's operating differently than the rest.  Which
disk changes.

Here is an iostat of the current scrub, and you can see that /dev/sdj
is the odd ball.   Below the iostat output is the smart statistics for
sdj and they indicate a healthy drive.  And to be sure I recently ran
extended tests twice without incident.

Below that is another iostat output, where literally as I'm typing
this email the behavior changed to a different drive in the pool,
/dev/sdo and the smart data follows, also showing a healthy drive (and
a load cycle count that would make a seagate blush).

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   0.340.005.88   93.200.000.59

Device: rrqm/s   wrqm/s r/s w/srMB/swMB/s
avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdp   0.00 2.000.005.00 0.00 0.08
31.47 0.059.070.009.07   8.93   4.47
sda   0.00 0.000.000.00 0.00 0.00
0.00 0.000.000.000.00   0.00   0.00
sdd   0.00 0.000.000.00 0.00 0.00
0.00 0.000.000.000.00   0.00   0.00
sde   0.00 0.000.000.00 0.00 0.00
0.00 0.000.000.000.00   0.00   0.00
sdr  31.33 0.00   41.671.00 4.56 0.02
220.12 0.399.059.260.67   3.50  14.93
sdi  26.33 0.00   65.672.00 5.73 0.10
176.47 0.75   11.04   11.360.67   3.73  25.23
sdf   0.00 0.000.000.00 0.00 0.00
0.00 0.000.000.000.00   0.00   0.00
sdn  30.33 0.00   58.331.33 5.50 0.06
190.97 0.548.999.190.50   3.90  23.27
sdo  33.00 0.00   64.671.00 6.10 0.04
191.72 0.73   11.06   11.220.67   3.88  25.47
sds  30.33 0.00   59.001.67 5.56 0.05
189.45 0.538.668.900.40   3.66  22.23
sdc   0.00 0.000.000.00 0.00 0.00
0.00 0.000.000.000.00   0.00   0.00
sdb   0.00 0.000.000.00 0.00 0.00
0.00 0.000.000.000.00   0.00   0.00
sdk  28.67 0.00   62.331.67 5.65 0.08
183.29 2.73   42.72   43.860.40   8.20  52.47
sdl   0.00 0.000.000.00 0.00 0.00
0.00 0.000.000.000.00   0.00   0.00
sdu  35.00 0.00   62.000.33 6.04 0.00
198.59 0.74   11.88   11.950.00   4.41  27.50
sdt  26.33 0.00   61.331.00 5.58 0.02
184.00 0.65   12.05   12.240.33   3.76  23.43
sdj  34.67 0.00   31.670.67 3.79 0.04
242.80   129.66 3822.54 3876.24 1271.50  30.93 100.00
sdq  33.33 0.00   43.330.67 4.79 0.03
224.42 0.56   12.68   12.880.00   6.02  26.47
sdm   0.00 0.000.000.00 0.00 0.00
0.00 0.000.000.000.00   0.00   0.00
sdh  34.33 0.00   46.001.67 5.02 0.10
220.20 0.53   11.17   11.570.40   4.45  21.20
sdg  30.00 0.00   48.671.67 4.90 0.05
201.11 0.458.999.290.00   3.52  17.70


[root@san01 ~]# smartctl -a /dev/sdj
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-4.1.0-1.el7.elrepo.x86_64]
(local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family: Western Digital Caviar Green (AF, SATA 6Gb/s)
Device Model: WDC WD15EARX-00PASB0
Serial Number:WD-WCAZAK449717
LU WWN Device Id: 5 0014ee 2b27dbe1a
Firmware Version: 51.0AB51
User Capacity:1,500,301,910,016 bytes [1.50 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Device is:In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:Thu Jul  9 16:58:52 2015 CDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status:  (   0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection:(38280) seconds.
Offline

Re: Concurrent write access

2015-07-09 Thread Hugo Mills

On Thu, Jul 09, 2015 at 11:34:40PM +0200, Wolfgang Mader wrote:
 Hi,

 I have a btrfs raid10 which is connected to a server hosting
 multiple virtual machine. Does btrfs support connecting the same
 subvolumes of the same raid to multiple virtual machines for
 concurrent read and write? The situation would be the same as, say,
 mounting user homes from the same nfs share on different machines.

   It'll depend on the protocol you use to make the subvolumes visible
within the VMs.

   btrfs subvolumes aren't block devices, so that rules out most of
the usual approaches. However, there are two methods I've used which I
can confirm will work well: NFS and 9p.

   NFS will work as a root filesystem, and will work with any
host/guest, as long as there's a network connection between the two.
9p is, at least in theory, faster (particularly with virtio), but
won't let you boot with the 9p device as your root FS. You'll need
virtualiser support if you want to run a virtio 9p -- I know qemu/kvm
supports this; I don't know if anything else supports it.

   You can probably use Samba/CIFS as well. It'll be slower than the
virtualised 9p, and not be able to host a root filesystem. I haven't
tried this one, because Samba and I get on like a house on fire(*).

   Hugo.

(*) Screaming, shouting, people running away, emergency services.

-- 
Hugo Mills | Alert status mauve ocelot: Slight chance of
hugo@... carfax.org.uk | brimstone. Be prepared to make a nice cup of tea.
http://carfax.org.uk/  |
PGP: E2AB1DE4  |


signature.asc
Description: Digital signature

Re: Concurrent write access

2015-07-09 Thread Wolfgang Mader

On Thursday 09 July 2015 22:06:09 Hugo Mills wrote:
 On Thu, Jul 09, 2015 at 11:34:40PM +0200, Wolfgang Mader wrote:
  Hi,
  
  I have a btrfs raid10 which is connected to a server hosting
  multiple virtual machine. Does btrfs support connecting the same
  subvolumes of the same raid to multiple virtual machines for
  concurrent read and write? The situation would be the same as, say,
  mounting user homes from the same nfs share on different machines.
 
It'll depend on the protocol you use to make the subvolumes visible
 within the VMs.
 
btrfs subvolumes aren't block devices, so that rules out most of
 the usual approaches. However, there are two methods I've used which I
 can confirm will work well: NFS and 9p.
 
NFS will work as a root filesystem, and will work with any
 host/guest, as long as there's a network connection between the two.
 9p is, at least in theory, faster (particularly with virtio), but
 won't let you boot with the 9p device as your root FS. You'll need
 virtualiser support if you want to run a virtio 9p -- I know qemu/kvm
 supports this; I don't know if anything else supports it.


Thanks for the overview. It it qmeu/kvm in fact, to this is an option. Right 
now, however, I connect the discs as virtual discs and not the file system, 
but only to one virtual machine.

Best,
Wolfgang

 
You can probably use Samba/CIFS as well. It'll be slower than the
 virtualised 9p, and not be able to host a root filesystem. I haven't
 tried this one, because Samba and I get on like a house on fire(*).
 
Hugo.
 
 (*) Screaming, shouting, people running away, emergency services.


signature.asc
Description: This is a digitally signed message part.

[BUG] Fails to duplicate metadata/system

2015-07-09 Thread conchur

Hi,

I've noticed that a single device partition was using metadata.single and 
system.single instead of metadata.dup and system.dup. All tests to force 
conversion to dup failed.

Here is how to reproduce this with an image and some very simple BTRFS commands 
(Debian stretch):

$ uname -a
Linux asdasd 4.0.0-1-amd64 #1 SMP Debian 4.0.2-1 (2015-05-11) x86_64 GNU/Linux
$ btrfs --version
btrfs-progs v4.0
$ fallocate -l 8G test.img
$ mkdir mnt
$ mkfs.btrfs test.img
$ mount -o loop test.img mnt
$ touch mnt/asdasd
$ btrfs fi df mnt
Data, single: total=8.00MiB, used=64.00KiB
System, DUP: total=8.00MiB, used=16.00KiB
System, single: total=4.00MiB, used=0.00B
Metadata, DUP: total=409.56MiB, used=112.00KiB
Metadata, single: total=8.00MiB, used=0.00B
GlobalReserve, single: total=16.00MiB, used=0.00B
$ btrfs balance start -v -mconvert=single -sconvert=single -dconvert=single mnt 
-f
Dumping filters: flags 0xf, state 0x0, force is on
  DATA (flags 0x100): converting, target=281474976710656, soft is off
  METADATA (flags 0x100): converting, target=281474976710656, soft is off
  SYSTEM (flags 0x100): converting, target=281474976710656, soft is off
Done, had to relocate 5 out of 5 chunks
$ btrfs fi df mnt
Data, single: total=832.00MiB, used=256.00KiB
System, single: total=32.00MiB, used=16.00KiB
Metadata, single: total=256.00MiB, used=112.00KiB
GlobalReserve, single: total=16.00MiB, used=0.00B
$ btrfs balance start -v -mconvert=dup -sconvert=dup -dconvert=single mnt -f
Dumping filters: flags 0xf, state 0x0, force is on
  DATA (flags 0x100): converting, target=281474976710656, soft is off
  METADATA (flags 0x100): converting, target=32, soft is off
  SYSTEM (flags 0x100): converting, target=32, soft is off
Done, had to relocate 3 out of 3 chunks
$ btrfs fi df mnt   
 
Data, single: total=832.00MiB, used=320.00KiB
System, single: total=32.00MiB, used=16.00KiB
Metadata, single: total=256.00MiB, used=112.00KiB
GlobalReserve, single: total=16.00MiB, used=0.00B


The expected result would be Metadata, DUP and System, DUP and not 
Metadata, single and System, single






Some more info.


$ btrfs fi show mnt
Label: none  uuid: b1a70fc4-7c18-4929-9b73-8f8bb328e7de
Total devices 1 FS bytes used 384.00KiB
devid1 size 8.00GiB used 1.09GiB path /dev/loop0

btrfs-progs v4.0
$ btrfs fi usage mnt
Overall:
Device size:   8.00GiB
Device allocated:  1.09GiB
Device unallocated:6.91GiB
Device missing:  0.00B
Used:384.00KiB
Free (estimated):  7.72GiB  (min: 7.72GiB)
Data ratio:   1.00
Metadata ratio:   1.00
Global reserve:   16.00MiB  (used: 0.00B)

Data,single: Size:832.00MiB, Used:256.00KiB
   /dev/loop0832.00MiB

Metadata,single: Size:256.00MiB, Used:112.00KiB
   /dev/loop0256.00MiB

System,single: Size:32.00MiB, Used:16.00KiB
   /dev/loop0 32.00MiB

Unallocated:
   /dev/loop0  6.91GiB
$ btrfs-debug-tree test.img 
root tree
leaf 2539634688 items 16 free space 12515 generation 47 owner 1
fs uuid b1a70fc4-7c18-4929-9b73-8f8bb328e7de
chunk uuid c2606900-bfa1-444e-ab4d-3f0b2d31626b
item 0 key (EXTENT_TREE ROOT_ITEM 0) itemoff 15844 itemsize 439
root data bytenr 2539651072 level 0 dirid 0 refs 1 gen 47
uuid ----
item 1 key (DEV_TREE ROOT_ITEM 0) itemoff 15405 itemsize 439
root data bytenr 2539569152 level 0 dirid 0 refs 1 gen 46
uuid ----
item 2 key (FS_TREE INODE_REF 6) itemoff 15388 itemsize 17
inode ref index 0 namelen 7 name: default
item 3 key (FS_TREE ROOT_ITEM 0) itemoff 14949 itemsize 439
root data bytenr 2539356160 level 0 dirid 256 refs 1 gen 42
uuid ----
ctransid 6 otransid 0 stransid 0 rtransid 0
item 4 key (ROOT_TREE_DIR INODE_ITEM 0) itemoff 14789 itemsize 160
inode generation 3 transid 0 size 0 block group 0 mode 40755 
links 1 uid 0 gid 0 rdev 0 flags 0x0
item 5 key (ROOT_TREE_DIR INODE_REF 6) itemoff 14777 itemsize 12
inode ref index 0 namelen 2 name: ..
item 6 key (ROOT_TREE_DIR DIR_ITEM 2378154706) itemoff 14740 itemsize 37
location key (FS_TREE ROOT_ITEM -1) type DIR
namelen 7 datalen 0 name: default
item 7 key (CSUM_TREE ROOT_ITEM 0) itemoff 14301 itemsize 439
root data bytenr 2539667456 level 0 dirid 0 refs 1 gen 47
uuid ----
item 8 key (UUID_TREE ROOT_ITEM 0) itemoff 13862 itemsize 439
root data bytenr 2539208704 level 0 dirid 0 refs 1 gen 41
uuid be2539ee-1c09-e84c-8ec9-bfe054347ccf
item

[PATCH] Btrfs: fix order by which delayed references are run

2015-07-09 Thread fdmanana

From: Filipe Manana fdman...@suse.com

When we have an extent that got N references removed and N new references
added in the same transaction, we must run the insertion of the references
first because otherwise the last removed reference will remove the extent
item from the extent tree, resulting in a failure for the insertions.

This is a regression introduced in the 4.2-rc1 release and this fix just
brings back the behaviour of selecting reference additions before any
reference removals.

The following test case for fstests reproduces the issue:

  seq=`basename $0`
  seqres=$RESULT_DIR/$seq
  echo QA output created by $seq
  tmp=/tmp/$$
  status=1  # failure is the default!
  trap _cleanup; exit \$status 0 1 2 3 15

  _cleanup()
  {
  _cleanup_flakey
  rm -f $tmp.*
  }

  # get standard environment, filters and checks
  . ./common/rc
  . ./common/filter
  . ./common/dmflakey

  # real QA test starts here
  _need_to_be_root
  _supported_fs btrfs
  _supported_os Linux
  _require_scratch
  _require_dm_flakey
  _require_cloner
  _require_metadata_journaling $SCRATCH_DEV

  rm -f $seqres.full

  _scratch_mkfs $seqres.full 21
  _init_flakey
  _mount_flakey

  # Create prealloc extent covering range [160K, 620K[
  $XFS_IO_PROG -f -c falloc 160K 460K $SCRATCH_MNT/foo

  # Now write to the last 80K of the prealloc extent plus 40K to the unallocated
  # space that immediately follows it. This creates a new extent of 40K that 
spans
  # the range [620K, 660K[.
  $XFS_IO_PROG -c pwrite -S 0xaa 540K 120K $SCRATCH_MNT/foo | _filter_xfs_io

  # At this point, there are now 2 back references to the prealloc extent in our
  # extent tree. Both are for our file offset 160K and one relates to a file
  # extent item with a data offset of 0 and a length of 380K, while the other
  # relates to a file extent item with a data offset of 380K and a length of 
80K.

  # Make sure everything done so far is durably persisted (all back references 
are
  # in the extent tree, etc).
  sync

  # Now clone all extents of our file that cover the offset 160K up to its eof
  # (660K at this point) into itself at offset 2M. This leaves a hole in the 
file
  # covering the range [660K, 2M[. The prealloc extent will now be referenced by
  # the file twice, once for offset 160K and once for offset 2M. The 40K extent
  # that follows the prealloc extent will also be referenced twice by our file,
  # once for offset 620K and once for offset 2M + 460K.
  $CLONER_PROG -s $((160 * 1024)) -d $((2 * 1024 * 1024)) -l 0 $SCRATCH_MNT/foo 
\
$SCRATCH_MNT/foo

  # Now create one new extent in our file with a size of 100Kb. It will span the
  # range [3M, 3M + 100K[. It also will cause creation of a hole spanning the
  # range [2M + 460K, 3M[. Our new file size is 3M + 100K.
  $XFS_IO_PROG -c pwrite -S 0xbb 3M 100K $SCRATCH_MNT/foo | _filter_xfs_io

  # At this point, there are now (in memory) 4 back references to the prealloc
  # extent.
  #
  # Two of them are for file offset 160K, related to file extent items
  # matching the file offsets 160K and 540K respectively, with data offsets of
  # 0 and 380K respectively, and with lengths of 380K and 80K respectively.
  #
  # The other two references are for file offset 2M, related to file extent 
items
  # matching the file offsets 2M and 2M + 380K respectively, with data offsets 
of
  # 0 and 380K respectively, and with lengths of 389K and 80K respectively.
  #
  # The 40K extent has 2 back references, one for file offset 620K and the other
  # for file offset 2M + 460K.
  #
  # The 100K extent has a single back reference and it relates to file offset 
3M.

  # Now clone our 100K extent into offset 600K. That offset covers the last 20K
  # of the prealloc extent, the whole 40K extent and 40K of the hole starting at
  # offset 660K.
  $CLONER_PROG -s $((3 * 1024 * 1024)) -d $((600 * 1024)) -l $((100 * 1024)) \
  $SCRATCH_MNT/foo $SCRATCH_MNT/foo

  # At this point there's only one reference to the 40K extent, at file offset
  # 2M + 460K, we have 4 references for the prealloc extent (2 for file offset
  # 160K and 2 for file offset 2M) and 2 references for the 100K extent (1 for
  # file offset 3M and a new one for file offset 600K).

  # Now fsync our file to make all its new data and metadata updates are durably
  # persisted and present if a power failure/crash happens after a successful
  # fsync and before the next transaction commit.
  $XFS_IO_PROG -c fsync $SCRATCH_MNT/foo

  echo File digest before power failure:
  md5sum $SCRATCH_MNT/foo | _filter_scratch

  # Silently drop all writes and ummount to simulate a crash/power failure.
  _load_flakey_table $FLAKEY_DROP_WRITES
  _unmount_flakey

  # Allow writes again, mount to trigger log replay and validate file contents.
  # During log replay, the btrfs delayed references implementation used to run 
the
  # deletion of back references before the addition of new back references, 
which
  # made the addition fail as it didn't

[PATCH] fstests: btrfs test to exercise shared extent reference accounting

2015-07-09 Thread fdmanana

From: Filipe Manana fdman...@suse.com

Regression test for adding and dropping an equal number of references
for file extents. Verify that if we drop N references for a file extent
and we add too N new references for that same file extent in the same
transaction, running the delayed references (which always happens at
transaction commit time) does not fail.

The regression was introduced in the 4.2-rc1 Linux kernel and fixed by
the patch titled: Btrfs: fix order by which delayed references are run.

Signed-off-by: Filipe Manana fdman...@suse.com
---
 tests/btrfs/095 | 153 
 tests/btrfs/095.out |   9 
 tests/btrfs/group   |   1 +
 3 files changed, 163 insertions(+)
 create mode 100755 tests/btrfs/095
 create mode 100644 tests/btrfs/095.out

diff --git a/tests/btrfs/095 b/tests/btrfs/095
new file mode 100755
index 000..e68f2bf
--- /dev/null
+++ b/tests/btrfs/095
@@ -0,0 +1,153 @@
+#! /bin/bash
+# FSQA Test No. 095
+#
+# Regression test for adding and dropping an equal number of references for
+# file extents. Verify that if we drop N references for a file extent and we
+# add too N new references for that same file extent in the same transaction,
+# running the delayed references (always happens at transaction commit time)
+# does not fail.
+#
+# The regression was introduced in the 4.2-rc1 Linux kernel.
+#
+#---
+#
+# Copyright (C) 2015 SUSE Linux Products GmbH. All Rights Reserved.
+# Author: Filipe Manana fdman...@suse.com
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License as
+# published by the Free Software Foundation.
+#
+# This program is distributed in the hope that it would be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write the Free Software Foundation,
+# Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#---
+#
+
+seq=`basename $0`
+seqres=$RESULT_DIR/$seq
+echo QA output created by $seq
+tmp=/tmp/$$
+status=1   # failure is the default!
+trap _cleanup; exit \$status 0 1 2 3 15
+
+_cleanup()
+{
+   _cleanup_flakey
+   rm -f $tmp.*
+}
+
+# get standard environment, filters and checks
+. ./common/rc
+. ./common/filter
+. ./common/dmflakey
+
+# real QA test starts here
+_need_to_be_root
+_supported_fs btrfs
+_supported_os Linux
+_require_scratch
+_require_dm_flakey
+_require_cloner
+_require_metadata_journaling $SCRATCH_DEV
+
+rm -f $seqres.full
+
+_scratch_mkfs $seqres.full 21
+_init_flakey
+_mount_flakey
+
+# Create prealloc extent covering range [160K, 620K[
+$XFS_IO_PROG -f -c falloc 160K 460K $SCRATCH_MNT/foo
+
+# Now write to the last 80K of the prealloc extent plus 40K to the unallocated
+# space that immediately follows it. This creates a new extent of 40K that 
spans
+# the range [620K, 660K[.
+$XFS_IO_PROG -c pwrite -S 0xaa 540K 120K $SCRATCH_MNT/foo | _filter_xfs_io
+
+# At this point, there are now 2 back references to the prealloc extent in our
+# extent tree. Both are for our file offset 160K and one relates to a file
+# extent item with a data offset of 0 and a length of 380K, while the other
+# relates to a file extent item with a data offset of 380K and a length of 80K.
+
+# Make sure everything done so far is durably persisted (all back references 
are
+# in the extent tree, etc).
+sync
+
+# Now clone all extents of our file that cover the offset 160K up to its eof
+# (660K at this point) into itself at offset 2M. This leaves a hole in the file
+# covering the range [660K, 2M[. The prealloc extent will now be referenced by
+# the file twice, once for offset 160K and once for offset 2M. The 40K extent
+# that follows the prealloc extent will also be referenced twice by our file,
+# once for offset 620K and once for offset 2M + 460K.
+$CLONER_PROG -s $((160 * 1024)) -d $((2 * 1024 * 1024)) -l 0 $SCRATCH_MNT/foo \
+   $SCRATCH_MNT/foo
+
+# Now create one new extent in our file with a size of 100Kb. It will span the
+# range [3M, 3M + 100K[. It also will cause creation of a hole spanning the
+# range [2M + 460K, 3M[. Our new file size is 3M + 100K.
+$XFS_IO_PROG -c pwrite -S 0xbb 3M 100K $SCRATCH_MNT/foo | _filter_xfs_io
+
+# At this point, there are now (in memory) 4 back references to the prealloc
+# extent.
+#
+# Two of them are for file offset 160K, related to file extent items
+# matching the file offsets 160K and 540K respectively, with data offsets of
+# 0 and 380K respectively, and with lengths of 380K and 80K respectively.
+#
+# The other two references are for file offset 2M, related to file extent items
+# matching the

Re: [PATCH] Btrfs: fix order by which delayed references are run

2015-07-09 Thread Qu Wenruo




 wrote on 2015/07/09 15:50 +0100:

From: Filipe Manana fdman...@suse.com

When we have an extent that got N references removed and N new references
added in the same transaction, we must run the insertion of the references
first because otherwise the last removed reference will remove the extent
item from the extent tree, resulting in a failure for the insertions.

This is a regression introduced in the 4.2-rc1 release and this fix just
brings back the behaviour of selecting reference additions before any
reference removals.


Thanks, Filipe, that's right, it's my fault to forgot such case.

Acked-by: Qu Wenruo quwen...@cn.fujitsu.com

Thanks,
Qu

The following test case for fstests reproduces the issue:

   seq=`basename $0`
   seqres=$RESULT_DIR/$seq
   echo QA output created by $seq
   tmp=/tmp/$$
   status=1 # failure is the default!
   trap _cleanup; exit \$status 0 1 2 3 15

   _cleanup()
   {
   _cleanup_flakey
   rm -f $tmp.*
   }

   # get standard environment, filters and checks
   . ./common/rc
   . ./common/filter
   . ./common/dmflakey

   # real QA test starts here
   _need_to_be_root
   _supported_fs btrfs
   _supported_os Linux
   _require_scratch
   _require_dm_flakey
   _require_cloner
   _require_metadata_journaling $SCRATCH_DEV

   rm -f $seqres.full

   _scratch_mkfs $seqres.full 21
   _init_flakey
   _mount_flakey

   # Create prealloc extent covering range [160K, 620K[
   $XFS_IO_PROG -f -c falloc 160K 460K $SCRATCH_MNT/foo

   # Now write to the last 80K of the prealloc extent plus 40K to the 
unallocated
   # space that immediately follows it. This creates a new extent of 40K that 
spans
   # the range [620K, 660K[.
   $XFS_IO_PROG -c pwrite -S 0xaa 540K 120K $SCRATCH_MNT/foo | _filter_xfs_io

   # At this point, there are now 2 back references to the prealloc extent in 
our
   # extent tree. Both are for our file offset 160K and one relates to a file
   # extent item with a data offset of 0 and a length of 380K, while the other
   # relates to a file extent item with a data offset of 380K and a length of 
80K.

   # Make sure everything done so far is durably persisted (all back references 
are
   # in the extent tree, etc).
   sync

   # Now clone all extents of our file that cover the offset 160K up to its eof
   # (660K at this point) into itself at offset 2M. This leaves a hole in the 
file
   # covering the range [660K, 2M[. The prealloc extent will now be referenced 
by
   # the file twice, once for offset 160K and once for offset 2M. The 40K extent
   # that follows the prealloc extent will also be referenced twice by our file,
   # once for offset 620K and once for offset 2M + 460K.
   $CLONER_PROG -s $((160 * 1024)) -d $((2 * 1024 * 1024)) -l 0 
$SCRATCH_MNT/foo \
$SCRATCH_MNT/foo

   # Now create one new extent in our file with a size of 100Kb. It will span 
the
   # range [3M, 3M + 100K[. It also will cause creation of a hole spanning the
   # range [2M + 460K, 3M[. Our new file size is 3M + 100K.
   $XFS_IO_PROG -c pwrite -S 0xbb 3M 100K $SCRATCH_MNT/foo | _filter_xfs_io

   # At this point, there are now (in memory) 4 back references to the prealloc
   # extent.
   #
   # Two of them are for file offset 160K, related to file extent items
   # matching the file offsets 160K and 540K respectively, with data offsets of
   # 0 and 380K respectively, and with lengths of 380K and 80K respectively.
   #
   # The other two references are for file offset 2M, related to file extent 
items
   # matching the file offsets 2M and 2M + 380K respectively, with data offsets 
of
   # 0 and 380K respectively, and with lengths of 389K and 80K respectively.
   #
   # The 40K extent has 2 back references, one for file offset 620K and the 
other
   # for file offset 2M + 460K.
   #
   # The 100K extent has a single back reference and it relates to file offset 
3M.

   # Now clone our 100K extent into offset 600K. That offset covers the last 20K
   # of the prealloc extent, the whole 40K extent and 40K of the hole starting 
at
   # offset 660K.
   $CLONER_PROG -s $((3 * 1024 * 1024)) -d $((600 * 1024)) -l $((100 * 1024)) \
   $SCRATCH_MNT/foo $SCRATCH_MNT/foo

   # At this point there's only one reference to the 40K extent, at file offset
   # 2M + 460K, we have 4 references for the prealloc extent (2 for file offset
   # 160K and 2 for file offset 2M) and 2 references for the 100K extent (1 for
   # file offset 3M and a new one for file offset 600K).

   # Now fsync our file to make all its new data and metadata updates are 
durably
   # persisted and present if a power failure/crash happens after a successful
   # fsync and before the next transaction commit.
   $XFS_IO_PROG -c fsync $SCRATCH_MNT/foo

   echo File digest before power failure:
   md5sum $SCRATCH_MNT/foo | _filter_scratch

   # Silently drop all writes and ummount to simulate a crash/power failure.
   _load_flakey_table $FLAKEY_DROP_WRITES
   _unmount_flakey

   # Allow writes again, mount to

Re: btrfs partition converted from ext4 becomes read-only minutes after booting: WARNING: CPU: 2 PID: 2777 at ../fs/btrfs/super.c:260 __btrfs_abort_transaction+0x4b/0x120

2015-07-09 Thread Qu Wenruo

One of my patch addressed a problem that a converted btrfs can't pass 
btrfsck.


Not sure if that is the cause, but if you can try btrfs-progs v3.19.1, 
the one without my btrfs-progs patches and some other newer convert 
related patches, and see the result?


I think this would at least provide the base for bisect the btrfs-progs 
if the bug is in btrfs-progs.


Thanks,
Qu

Chris Murphy wrote on 2015/07/09 15:38 -0600:

On Thu, Jul 9, 2015 at 4:52 AM, Vytautas D vyt...@gmail.com wrote:

Slightly off topic

does these bugs exist in systems that converted from ext4 to btrfs using
kernel 3.13 and then upgraded to kernel 4.1 ?


I don't recall what btrfs-progs and kernel I last tested ext4
conversion with. I know this is a regression, I just don't know how
old it is. I think there's more than one bug here (obviously since
I've filed 4 related bugs in ~24 hours), but I really don't know the
scope of the problem. But the case where the recommended procedure not
only fails but corrupts the file system and it can't be fixed or
rolled back, is not good.

Perhaps the wiki should provide a warning that this is currently
broken, status unknown, or something?


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

42 matches

Mail list logo