Re: btrfs subvolume clone or fork (btrfs-progs feature request)
On Thu, Jul 09, 2015 at 01:43:53PM +, Duncan wrote: I could have sworn btrfs property -t subvolume can get/set that snapshot bit. I know I saw the discussion and I think patch for it go by, but again, as I don't use them, I haven't tracked closely enough to see if it ever got in. Are you thinking of the read-only flag? That's not the same thing as the various UUID properties (e.g. parent) which can be used to detemine if a subvolume was made using a snapshot. Hugo. -- Hugo Mills | Someone's been throwing dead sheep down my Fun Well hugo@... carfax.org.uk | http://carfax.org.uk/ | PGP: E2AB1DE4 | Nick Gibbins signature.asc Description: Digital signature
Re: btrfs subvolume clone or fork (btrfs-progs feature request)
Hugo Mills posted on Thu, 09 Jul 2015 13:54:48 + as excerpted: On Thu, Jul 09, 2015 at 01:43:53PM +, Duncan wrote: I could have sworn btrfs property -t subvolume can get/set that snapshot bit. I know I saw the discussion and I think patch for it go by, but again, as I don't use them, I haven't tracked closely enough to see if it ever got in. Are you thinking of the read-only flag? That's not the same thing as the various UUID properties (e.g. parent) which can be used to detemine if a subvolume was made using a snapshot. Perhaps, but I was sure there was a snapshot property too, because I remember discussion of being able to unset it in ordered to remove it from the snapshot (only) list. But maybe that's all it was, discussion, it wasn't implemented, and I ended up conflating it with the read-only bit, which /can/ be set/unset that way. Like I said I can't check as I don't have any subvolumes/ snapshots available to do a listing on and see, and the property manpage doesn't have a properties list to check on, it wants you to use the list option to get the list. -- Duncan - List replies preferred. No HTML msgs. Every nonfree program has a lord, a master -- and if you use the program, he is your master. Richard Stallman -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs subvolume clone or fork (btrfs-progs feature request)
On Thu, 09 Jul 2015 08:48:00 -0400 Austin S Hemmelgarn ahferro...@gmail.com wrote: On 2015-07-09 08:41, Sander wrote: Austin S Hemmelgarn wrote (ao): What's wrong with btrfs subvolume snapshot? Well, personally I would say the fact that once something is tagged as a snapshot, you can't change it to a regular subvolume without doing a non-incremental send/receive. A snapshot is a subvolume. There is no such thing as tagged as a snapshot. Sander No, there is a bit in the subvolume metadata that says whether it's considered a snapshot or not. Internally, they are handled identically, but it does come into play when you consider things like btrfs subvolume show -s (which only lists snapshots), which in turn means that certain tasks are more difficult to script robustly. This sounds like a vestigial leftover from back when snapshots were conceptualized to be somehow functionally different from subvolumes... But as you said, now there is effectively no difference, so that bit is used for what, only to track how a subvolume was created? And to output in the subvolume list if the user passes -s? I'd say that's a pretty oddball feature to even have, since in any case if you want to distinguish and list only your snapshots, you would typically just name them in a certain way, e.g. /snaps/originalname/datetime. -- With respect, Roman signature.asc Description: PGP signature
Re: Anyone tried out btrbk yet?
On Thu, Jul 09, 2015 at 02:26:55PM +0200, Martin Steigerwald wrote: Hi! I see Alex, the developer of btrbk posted here once about btrfs send and receive, but well any other users of btrbk¹? What are your experiences? I consider switching to it from my home grown rsync based backup script to it. Well I may try it for one of my BTRFS volumes in addition to the rsync backup for now. I would like to give all options on command line, but well, maybe it can completely replace my current script if I put everything in its configuration. Any other handy BTRFS backup solutions? I use my own which I wrote :) http://marc.merlins.org/perso/btrfs/2014-03.html#Btrfs-Tips_-Doing-Fast-Incremental-Backups-With-Btrfs-Send-and-Receive http://marc.merlins.org/linux/scripts/btrfs-subvolume-backup Marc -- A mouse is a device used to point at the xterm you want to type in - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 1024R/763BE901 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Anyone tried out btrbk yet?
On Thursday 09 July 2015 14:26:55 you wrote: Well I may try it for one of my BTRFS volumes in addition to the rsync backup for now. I would like to give all options on command line, but well, maybe it can completely replace my current script if I put everything in its configuration. Any other handy BTRFS backup solutions? Hi, I've been using btrfs-sxbackup for a couple of weeks, and it has been working great. Everything is configured on command line, so that's a plus. https://pypi.python.org/pypi/btrfs-sxbackup https://github.com/masc3d/btrfs-sxbackup -Henri -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] Btrfs: fix list transaction-pending_ordered corruption
On Fri, Jul 03, 2015 at 10:22:08PM +0100, fdman...@kernel.org wrote: From: Filipe Manana fdman...@suse.com Cc: sta...@vger.kernel.org Fixes: 50d9aa99bd35 (Btrfs: make sure logged extents complete in the current transaction V3 Signed-off-by: Filipe Manana fdman...@suse.com ... now for the right patch, Reviewed-by: David Sterba dste...@suse.com -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: fix list transaction-pending_ordered corruption
On Fri, Jul 03, 2015 at 08:46:40PM +0100, fdman...@kernel.org wrote: From: Filipe Manana fdman...@suse.com ... Cc: sta...@vger.kernel.org Fixes: 50d9aa99bd35 (Btrfs: make sure logged extents complete in the current transaction V3 Signed-off-by: Filipe Manana fdman...@suse.com Good catch and thanks for looking up the offending commit. Reviewed-by: David Sterba dste...@suse.com -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs subvolume clone or fork (btrfs-progs feature request)
On Thu, Jul 09, 2015 at 08:48:00AM -0400, Austin S Hemmelgarn wrote: On 2015-07-09 08:41, Sander wrote: Austin S Hemmelgarn wrote (ao): What's wrong with btrfs subvolume snapshot? Well, personally I would say the fact that once something is tagged as a snapshot, you can't change it to a regular subvolume without doing a non-incremental send/receive. A snapshot is a subvolume. There is no such thing as tagged as a snapshot. No, there is a bit in the subvolume metadata that says whether it's considered a snapshot or not. Technically it's not really a bit. The snapshot relation is determined by the parent uuid value of a subvolume. Internally, they are handled identically, but it does come into play when you consider things like btrfs subvolume show -s (which only lists snapshots), That was probably 'btrfs subvol list -s', though the 'subvol show' command prints all snapshots of a given subvolume. which in turn means that certain tasks are more difficult to script robustly. I don't deny the interface/output is imperfect for scripting purposes, maybe we can provide filters that would satisfy your usecase. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs partition converted from ext4 becomes read-only minutes after booting: WARNING: CPU: 2 PID: 2777 at ../fs/btrfs/super.c:260 __btrfs_abort_transaction+0x4b/0x120
Chris Murphy wrote on 2015/07/09 18:45 -0600: On Thu, Jul 9, 2015 at 6:34 PM, Qu Wenruo quwen...@cn.fujitsu.com wrote: One of my patch addressed a problem that a converted btrfs can't pass btrfsck. Not sure if that is the cause, but if you can try btrfs-progs v3.19.1, the one without my btrfs-progs patches and some other newer convert related patches, and see the result? I think this would at least provide the base for bisect the btrfs-progs if the bug is in btrfs-progs. I'm happy to regression test with 3.19.1 but I'm confused. After conversion, btrfs check (4.1) finds no problems. After ext2_saved snapshot is deleted, btrfsck finds no problems. After defrag, again btrfsck finds no problems. After the failed balance, btrfsck finds no problems but crashes with Aborted (core dump). Even btrfsck reports no error, some btrfs-convert behavior change may lead to kernel mis-function. But we are not sure it's btrfs-progs or kernel itself has bug. Maybe btrfs convert did something wrong/different triggering the bug, or just kernel regression? So hat I'd like to check is, with 3.19.1 progs (kernel version doesn't change), whether the kernel still failes to do balance. If the problem still happens, then we can focus on kernel part, or at least, put at least less effort on btrfs-progs. Should I still test 3.19.1? Yes, please. Thanks, Qu -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Anyone tried out btrbk yet?
Marc, I thought I'd yours a try, and I'm probably embarassing myself here but I'm running in to this issue. Centos 7. [root@san01 tank]# ./btrfs-subvolume-backup store /mnt2/backups ./btrfs-subvolume-backup: line 177: shlock: command not found /var/run/btrfs-subvolume-backup held for btrfs-subvolume-backup, quitting [root@san01 tank]# yum whatprovides shlock Loaded plugins: changelog, fastestmirror Loading mirror speeds from cached hostfile * base: dist1.800hosting.com * elrepo: repos.dfw.lax-noc.com * epel: mirror.umd.edu * extras: mirrors.usc.edu * updates: mirror.keystealth.orgNo matches found [root@san01 tank]# shlock -bash: shlock: command not found [root@san01 tank]# yum search all shlock Loaded plugins: changelog, fastestmirror Loading mirror speeds from cached hostfile * base: dist1.800hosting.com * elrepo: repos.dfw.lax-noc.com * epel: mirror.utexas.edu * extras: mirror.thelinuxfix.com * updates: dallas.tx.mirror.xygenhosting.com Warning: No matches found for: shlock No matches found On Thu, Jul 9, 2015 at 12:17 PM, Marc MERLIN m...@merlins.org wrote: On Thu, Jul 09, 2015 at 02:26:55PM +0200, Martin Steigerwald wrote: Hi! I see Alex, the developer of btrbk posted here once about btrfs send and receive, but well any other users of btrbk¹? What are your experiences? I consider switching to it from my home grown rsync based backup script to it. Well I may try it for one of my BTRFS volumes in addition to the rsync backup for now. I would like to give all options on command line, but well, maybe it can completely replace my current script if I put everything in its configuration. Any other handy BTRFS backup solutions? I use my own which I wrote :) http://marc.merlins.org/perso/btrfs/2014-03.html#Btrfs-Tips_-Doing-Fast-Incremental-Backups-With-Btrfs-Send-and-Receive http://marc.merlins.org/linux/scripts/btrfs-subvolume-backup Marc -- A mouse is a device used to point at the xterm you want to type in - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 1024R/763BE901 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs: Remove noused chunk_tree and chunk_objectid from scrub_enumerate_chunks() and scrub_chunk()
From: Zhao Lei zhao...@cn.fujitsu.com These variables are not used from introduced version , remove them. Signed-off-by: Zhao Lei zhao...@cn.fujitsu.com --- fs/btrfs/scrub.c | 9 ++--- 1 file changed, 2 insertions(+), 7 deletions(-) diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index eb35176..f552937 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -3321,7 +3321,6 @@ out: static noinline_for_stack int scrub_chunk(struct scrub_ctx *sctx, struct btrfs_device *scrub_dev, - u64 chunk_tree, u64 chunk_objectid, u64 chunk_offset, u64 length, u64 dev_offset, int is_dev_replace) { @@ -3372,8 +3371,6 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx, struct btrfs_root *root = sctx-dev_root; struct btrfs_fs_info *fs_info = root-fs_info; u64 length; - u64 chunk_tree; - u64 chunk_objectid; u64 chunk_offset; int ret; int slot; @@ -3431,8 +3428,6 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx, if (found_key.offset + length = start) goto skip; - chunk_tree = btrfs_dev_extent_chunk_tree(l, dev_extent); - chunk_objectid = btrfs_dev_extent_chunk_objectid(l, dev_extent); chunk_offset = btrfs_dev_extent_chunk_offset(l, dev_extent); /* @@ -3449,8 +3444,8 @@ int scrub_enumerate_chunks(struct scrub_ctx *sctx, dev_replace-cursor_right = found_key.offset + length; dev_replace-cursor_left = found_key.offset; dev_replace-item_needs_writeback = 1; - ret = scrub_chunk(sctx, scrub_dev, chunk_tree, chunk_objectid, - chunk_offset, length, found_key.offset, + ret = scrub_chunk(sctx, scrub_dev, chunk_offset, length, + found_key.offset, is_dev_replace); /* -- 1.8.5.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Can't remove missing device
One of my 3TB drives failed (not recognized anymore) recently so I got two new 4TB drives, I mounted the fs with -o degraded and used btrfs dev add to add the new drives then I did btrfs dev del missing. Now delete missing always returns an error ERROR: error removing the device 'missing' - Input/output error According to dmesg sda returns bad data but the smart values for it seem fine. How do I get the FS working again? Debian/SID, kernel v4.1 # btrfs fi df /srv/ Data, RAID5: total=18.96TiB, used=18.52TiB System, RAID1: total=32.00MiB, used=2.30MiB Metadata, RAID1: total=24.06GiB, used=22.09GiB GlobalReserve, single: total=512.00MiB, used=0.00B # btrfs fi sho Label: none uuid: ---- Total devices 11 FS bytes used 18.54TiB devid1 size 2.73TiB used 2.56TiB path /dev/sdh devid2 size 2.73TiB used 2.63TiB path /dev/sdg devid3 size 2.73TiB used 2.64TiB path /dev/sdj devid4 size 2.73TiB used 2.60TiB path /dev/sdk devid5 size 2.73TiB used 2.63TiB path /dev/sdb devid6 size 2.73TiB used 2.73TiB path /dev/sda devid9 size 2.73TiB used 2.73TiB path /dev/sdd devid 10 size 2.73TiB used 2.73TiB path /dev/sdl devid 11 size 3.64TiB used 2.66GiB path /dev/sdc devid 12 size 3.64TiB used 2.66GiB path /dev/sde *** Some devices missing btrfs-progs v4.0 # dmesg | tail -n 40 [ 9474.630480] BTRFS warning (device sda): csum failed ino 384 off 2927886336 csum 1204172668 expected csum 3738892907 [ 9474.630487] BTRFS warning (device sda): csum failed ino 384 off 2927919104 csum 729502971 expected csum 57406087 [ 9474.630493] BTRFS warning (device sda): csum failed ino 384 off 2927923200 csum 1688454633 expected csum 4263548653 [ 9474.630495] BTRFS warning (device sda): csum failed ino 384 off 2927927296 csum 3679588162 expected csum 4283532667 [ 9484.066796] BTRFS info (device sda): relocating block group 66338809643008 flags 129 [ 9505.492349] __readpage_endio_check: 6 callbacks suppressed [ 9505.492356] BTRFS warning (device sda): csum failed ino 385 off 2927886336 csum 1204172668 expected csum 3738892907 [ 9505.492366] BTRFS warning (device sda): csum failed ino 385 off 2927890432 csum 645393967 expected csum 1519548271 [ 9505.492372] BTRFS warning (device sda): csum failed ino 385 off 2927894528 csum 3254966910 expected csum 2168664573 [ 9505.492377] BTRFS warning (device sda): csum failed ino 385 off 2927898624 csum 3464250141 expected csum 1621289634 [ 9505.492382] BTRFS warning (device sda): csum failed ino 385 off 2927902720 csum 2214000308 expected csum 2797028572 [ 9505.492387] BTRFS warning (device sda): csum failed ino 385 off 2927906816 csum 3719155761 expected csum 561200354 [ 9505.492392] BTRFS warning (device sda): csum failed ino 385 off 2927910912 csum 98768328 expected csum 1311354303 [ 9505.492397] BTRFS warning (device sda): csum failed ino 385 off 2927915008 csum 996429330 expected csum 1552366519 [ 9505.492402] BTRFS warning (device sda): csum failed ino 385 off 2927919104 csum 729502971 expected csum 57406087 [ 9505.492407] BTRFS warning (device sda): csum failed ino 385 off 2927923200 csum 1688454633 expected csum 4263548653 [ 9515.428150] BTRFS info (device sda): relocating block group 66338809643008 flags 129 [ 9534.605158] __readpage_endio_check: 7 callbacks suppressed [ 9534.605165] BTRFS warning (device sda): csum failed ino 386 off 2927886336 csum 1204172668 expected csum 3738892907 [ 9534.605174] BTRFS warning (device sda): csum failed ino 386 off 2927890432 csum 645393967 expected csum 1519548271 [ 9534.605184] BTRFS warning (device sda): csum failed ino 386 off 2927894528 csum 3254966910 expected csum 2168664573 [ 9534.605192] BTRFS warning (device sda): csum failed ino 386 off 2927898624 csum 3464250141 expected csum 1621289634 [ 9534.605194] BTRFS warning (device sda): csum failed ino 386 off 2927902720 csum 2214000308 expected csum 2797028572 [ 9534.605198] BTRFS warning (device sda): csum failed ino 386 off 2927906816 csum 3719155761 expected csum 561200354 [ 9534.605204] BTRFS warning (device sda): csum failed ino 386 off 2927910912 csum 98768328 expected csum 1311354303 [ 9534.605206] BTRFS warning (device sda): csum failed ino 386 off 2927915008 csum 996429330 expected csum 1552366519 [ 9534.605212] BTRFS warning (device sda): csum failed ino 386 off 2927919104 csum 729502971 expected csum 57406087 [ 9534.605215] BTRFS warning (device sda): csum failed ino 386 off 2927923200 csum 1688454633 expected csum 4263548653 [ 9543.317995] BTRFS info (device sda): relocating block group 66338809643008 flags 129 [ 9564.879155] __readpage_endio_check: 7 callbacks suppressed [ 9564.879161] BTRFS warning (device sda): csum failed ino 387 off 2927886336 csum 1204172668 expected csum 3738892907 [ 9564.879171] BTRFS warning (device sda): csum failed ino 387 off 2927890432 csum 645393967 expected csum
[RFC PATCH 2/2] btrfs: scrub: Add support partial csum
From: Zhao Lei zhao...@cn.fujitsu.com Add scrub support for partial csum. The only challenge is that, scrub is done in unit of bio(or page size yet), but partial csum is done in unit of 1/8 of nodesize. So here a new function scrub_check_node_checksum and a new tree block csum check loop is introduced to do partial csum check while reading the tree block. Signed-off-by: Zhao Lei zhao...@cn.fujitsu.com Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com --- fs/btrfs/scrub.c | 207 ++- 1 file changed, 206 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index ab58115..0610474 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -307,6 +307,7 @@ static void copy_nocow_pages_worker(struct btrfs_work *work); static void __scrub_blocked_if_needed(struct btrfs_fs_info *fs_info); static void scrub_blocked_if_needed(struct btrfs_fs_info *fs_info); static void scrub_put_ctx(struct scrub_ctx *sctx); +static int scrub_check_fsid(u8 fsid[], struct scrub_page *spage); static void scrub_pending_bio_inc(struct scrub_ctx *sctx) @@ -878,6 +879,91 @@ static inline void scrub_put_recover(struct scrub_recover *recover) } /* + * Page_bad arg should be a page include leaf header + * + * Return 0 if this header seems correct, + * Return 1 on other cases + */ +static int scrub_check_head(struct scrub_page *spage, u8 *csum) +{ + void *mapped_buffer; + struct btrfs_header *h; + + mapped_buffer = kmap_atomic(spage-page); + h = (struct btrfs_header *)mapped_buffer; + + if (spage-logical != btrfs_stack_header_bytenr(h)) + goto header_err; + if (!scrub_check_fsid(h-fsid, spage)) + goto header_err; + if (memcmp(h-chunk_tree_uuid, + spage-dev-dev_root-fs_info-chunk_tree_uuid, + BTRFS_UUID_SIZE)) + goto header_err; + if (spage-generation != btrfs_stack_header_generation(h)) + goto header_err; + + if (csum) + memcpy(csum, h-csum, sizeof(h-csum)); + + kunmap_atomic(mapped_buffer); + return 0; + +header_err: + kunmap_atomic(mapped_buffer); + return 1; +} + +/* + * return 1 if checksum ok, 0 on other case + */ +static int scrub_check_node_checksum(struct scrub_block *sblock, +int part, +u8 *csum) +{ + int offset; + int len; + u32 crc = ~(u32)0; + + if (part == 0) { + offset = BTRFS_CSUM_SIZE; + len = sblock-sctx-nodesize - BTRFS_CSUM_SIZE; + } else if (part == 1) { + offset = BTRFS_CSUM_SIZE; + len = sblock-sctx-nodesize * 2 / 8 - BTRFS_CSUM_SIZE; + } else { + offset = part * sblock-sctx-nodesize / 8; + len = sblock-sctx-nodesize / 8; + } + + while (len 0) { + int page_num = offset / PAGE_SIZE; + int page_data_offset = offset - page_num * PAGE_SIZE; + int page_data_len = min(len, + (int)(PAGE_SIZE - page_data_offset)); + u8 *mapped_buffer; + + WARN_ON(page_num = sblock-page_count); + + if (sblock-pagev[page_num]-io_error) + return 0; + + mapped_buffer = kmap_atomic( + sblock-pagev[page_num]-page); + + crc = btrfs_csum_data(mapped_buffer + page_data_offset, crc, + page_data_len); + + offset += page_data_len; + len -= page_data_len; + + kunmap_atomic(mapped_buffer); + } + btrfs_csum_final(crc, (char *)crc); + return (crc == ((u32 *)csum)[part]); +} + +/* * scrub_handle_errored_block gets called when either verification of the * pages failed or the bio failed to read, e.g. with EIO. In the latter * case, this function handles all pages in the bio, even though only one @@ -905,6 +991,9 @@ static int scrub_handle_errored_block(struct scrub_block *sblock_to_check) int success; static DEFINE_RATELIMIT_STATE(_rs, DEFAULT_RATELIMIT_INTERVAL, DEFAULT_RATELIMIT_BURST); + u8 node_csum[BTRFS_CSUM_SIZE]; + int get_right_sum = 0; + int per_page_recover_start = 0; BUG_ON(sblock_to_check-page_count 1); fs_info = sctx-dev_root-fs_info; @@ -1151,11 +1240,125 @@ nodatasum_case: * area are unreadable. */ success = 1; + + /* +* maybe some mirror's head is broken +* we select to use right head for checksum +*/ + for (mirror_index = 0; mirror_index BTRFS_MAX_MIRRORS +sblocks_for_recheck[mirror_index].page_count 0; +mirror_index++) { + if
[RFC PATCH 1/2] btrfs: csum: Introduce partial csum for tree block.
Introduce the new partial csum mechanism for tree block. [Old tree block csum] 0 4 8121620242832 - |csum | unused, all 0 | - Csum is the crc32 of the whole tree block data. [New tree block csum] - |csum0|csum1|csum2|csum3|csum4|csum5|csum6|csum7| - Where csum0 is the same as the old one, crc32 of the whole tree block data. But csum1~csum7 will restore crc32 of each eighth part. Take example of 16K leafsize, then: csum1: crc32 of BTRFS_CSUM_SIZE~4K csum2: crc32 of 4K~6K ... csum7: crc32 of 14K~16K This provides the ability for btrfs not only to detect corruption but also to know where corruption is. Further improve the robustness of btrfs. Although the best practise is to introduce new csum type and put every eighth crc32 into corresponding place, but the benefit is not worthy to break the backward compatibility. So keep csum0 and modify csum1 range to keep backward compatibility. Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com --- fs/btrfs/disk-io.c | 74 -- 1 file changed, 49 insertions(+), 25 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 2ef9a4b..b2d8526 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -271,47 +271,75 @@ void btrfs_csum_final(u32 crc, char *result) } /* - * compute the csum for a btree block, and either verify it or write it - * into the csum field of the block. + * Calcuate partial crc32 for each part. + * + * Part should be in [0, 7]. + * Part 0 is the old crc32 of the whole leaf/node. + * Part 1 is the crc32 of 32~ 2/8 of leaf/node. + * Part 2 is the crc32 of 3/8 of leaf/node. + * Part 3 is the crc32 of 4/8 of lean/node and so on. */ -static int csum_tree_block(struct btrfs_fs_info *fs_info, - struct extent_buffer *buf, - int verify) +static int csum_tree_block_part(struct extent_buffer *buf, + char *result, int part) { - u16 csum_size = btrfs_super_csum_size(fs_info-super_copy); - char *result = NULL; + int offset; + int err; unsigned long len; unsigned long cur_len; - unsigned long offset = BTRFS_CSUM_SIZE; - char *kaddr; unsigned long map_start; unsigned long map_len; - int err; + char *kaddr; u32 crc = ~(u32)0; - unsigned long inline_result; - len = buf-len - offset; + BUG_ON(part = 8 || part 0); + BUG_ON(ALIGN(buf-len, 8) != buf-len); + + if (part == 0) { + offset = BTRFS_CSUM_SIZE; + len = buf-len - offset; + } else if (part == 1) { + offset = BTRFS_CSUM_SIZE; + len = buf-len * 2 / 8 - offset; + } else { + offset = part * buf-len / 8; + len = buf-len / 8; + } + while (len 0) { err = map_private_extent_buffer(buf, offset, 32, kaddr, map_start, map_len); if (err) - return 1; + return err; cur_len = min(len, map_len - (offset - map_start)); crc = btrfs_csum_data(kaddr + offset - map_start, crc, cur_len); len -= cur_len; offset += cur_len; } - if (csum_size sizeof(inline_result)) { - result = kzalloc(csum_size, GFP_NOFS); - if (!result) + btrfs_csum_final(crc, result + BTRFS_CSUM_SIZE * part / 8); + return 0; +} + +/* + * compute the csum for a btree block, and either verify it or write it + * into the csum field of the block. + */ +static int csum_tree_block(struct btrfs_fs_info *fs_info, + struct extent_buffer *buf, + int verify) +{ + u16 csum_size = btrfs_super_csum_size(fs_info-super_copy); + char result[BTRFS_CSUM_SIZE] = {0}; + int err; + int index = 0; + + /* get every part csum */ + for (index = 0; index 8; index++) { + err = csum_tree_block_part(buf, result, index); + if (err) return 1; - } else { - result = (char *)inline_result; } - btrfs_csum_final(crc, result); - if (verify) { if (memcmp_extent_buffer(buf, result, 0, csum_size)) { u32 val; @@ -324,15 +352,11 @@ static int csum_tree_block(struct btrfs_fs_info *fs_info, level %d\n, fs_info-sb-s_id, buf-start, val, found, btrfs_header_level(buf)); - if (result !=
Re: [BUG] Fails to duplicate metadata/system
On Thu, Jul 9, 2015 at 5:34 PM, conc...@web.de wrote: Hi, I've noticed that a single device partition was using metadata.single and system.single instead of metadata.dup and system.dup. All tests to force conversion to dup failed. Try only -mconvert=dup and without -f flag and see if it works. I'm pretty sure system chunks are treated in parity with metadata chunks now so it doesn't need to be separately listed. And -f isn't needed except to reduce redundancy. If that's not it, I'm going to speculate maybe try kernel 4.0.6 and higher, as there was a bug in 4.0 that prevented chunk conversions but I thought that only applied to raid profiles, not single vs dup. The fix for that was commit 153c35b60c72de9fae06c8e2c8b2c47d79d4. -- Chris Murphy -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs partition converted from ext4 becomes read-only minutes after booting: WARNING: CPU: 2 PID: 2777 at ../fs/btrfs/super.c:260 __btrfs_abort_transaction+0x4b/0x120
On Thu, Jul 9, 2015 at 6:34 PM, Qu Wenruo quwen...@cn.fujitsu.com wrote: One of my patch addressed a problem that a converted btrfs can't pass btrfsck. Not sure if that is the cause, but if you can try btrfs-progs v3.19.1, the one without my btrfs-progs patches and some other newer convert related patches, and see the result? I think this would at least provide the base for bisect the btrfs-progs if the bug is in btrfs-progs. I'm happy to regression test with 3.19.1 but I'm confused. After conversion, btrfs check (4.1) finds no problems. After ext2_saved snapshot is deleted, btrfsck finds no problems. After defrag, again btrfsck finds no problems. After the failed balance, btrfsck finds no problems but crashes with Aborted (core dump). Should I still test 3.19.1? -- Chris Murphy -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Anyone tried out btrbk yet?
... and I just found your other block about stealing shlock out of inn. Officially embarassed! On Thu, Jul 9, 2015 at 8:35 PM, Donald Pearson donaldwhpear...@gmail.com wrote: Marc, I thought I'd yours a try, and I'm probably embarassing myself here but I'm running in to this issue. Centos 7. [root@san01 tank]# ./btrfs-subvolume-backup store /mnt2/backups ./btrfs-subvolume-backup: line 177: shlock: command not found /var/run/btrfs-subvolume-backup held for btrfs-subvolume-backup, quitting [root@san01 tank]# yum whatprovides shlock Loaded plugins: changelog, fastestmirror Loading mirror speeds from cached hostfile * base: dist1.800hosting.com * elrepo: repos.dfw.lax-noc.com * epel: mirror.umd.edu * extras: mirrors.usc.edu * updates: mirror.keystealth.orgNo matches found [root@san01 tank]# shlock -bash: shlock: command not found [root@san01 tank]# yum search all shlock Loaded plugins: changelog, fastestmirror Loading mirror speeds from cached hostfile * base: dist1.800hosting.com * elrepo: repos.dfw.lax-noc.com * epel: mirror.utexas.edu * extras: mirror.thelinuxfix.com * updates: dallas.tx.mirror.xygenhosting.com Warning: No matches found for: shlock No matches found On Thu, Jul 9, 2015 at 12:17 PM, Marc MERLIN m...@merlins.org wrote: On Thu, Jul 09, 2015 at 02:26:55PM +0200, Martin Steigerwald wrote: Hi! I see Alex, the developer of btrbk posted here once about btrfs send and receive, but well any other users of btrbk¹? What are your experiences? I consider switching to it from my home grown rsync based backup script to it. Well I may try it for one of my BTRFS volumes in addition to the rsync backup for now. I would like to give all options on command line, but well, maybe it can completely replace my current script if I put everything in its configuration. Any other handy BTRFS backup solutions? I use my own which I wrote :) http://marc.merlins.org/perso/btrfs/2014-03.html#Btrfs-Tips_-Doing-Fast-Incremental-Backups-With-Btrfs-Send-and-Receive http://marc.merlins.org/linux/scripts/btrfs-subvolume-backup Marc -- A mouse is a device used to point at the xterm you want to type in - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 1024R/763BE901 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Anyone tried out btrbk yet?
In my research, I've found btrbk and btrfs-sxbackup certainly to be the leading contenders in terms of feature completeness. sanoid [1] will be another interesting possibility once btrfs compatibility is added (currently zfs only). I just wish I'd discovered all these before I went to all the effort of creating snazzer [1] :) I've been meaning to split some stuff out of snazzer that might be generically useful to other folks, such as filesystem cloning of all subvols/snapshots via send/receive, and it seems as if it should be possible to automatically prune any idiosyncratic snapshot naming convention - I just haven't found the time to write unit tests. [1] https://github.com/jimsalterjrs/sanoid [2] https://github.com/csirac2/snazzer On 10 July 2015 at 11:38, Donald Pearson donaldwhpear...@gmail.com wrote: ... and I just found your other block about stealing shlock out of inn. Officially embarassed! On Thu, Jul 9, 2015 at 8:35 PM, Donald Pearson donaldwhpear...@gmail.com wrote: Marc, I thought I'd yours a try, and I'm probably embarassing myself here but I'm running in to this issue. Centos 7. [root@san01 tank]# ./btrfs-subvolume-backup store /mnt2/backups ./btrfs-subvolume-backup: line 177: shlock: command not found /var/run/btrfs-subvolume-backup held for btrfs-subvolume-backup, quitting [root@san01 tank]# yum whatprovides shlock Loaded plugins: changelog, fastestmirror Loading mirror speeds from cached hostfile * base: dist1.800hosting.com * elrepo: repos.dfw.lax-noc.com * epel: mirror.umd.edu * extras: mirrors.usc.edu * updates: mirror.keystealth.orgNo matches found [root@san01 tank]# shlock -bash: shlock: command not found [root@san01 tank]# yum search all shlock Loaded plugins: changelog, fastestmirror Loading mirror speeds from cached hostfile * base: dist1.800hosting.com * elrepo: repos.dfw.lax-noc.com * epel: mirror.utexas.edu * extras: mirror.thelinuxfix.com * updates: dallas.tx.mirror.xygenhosting.com Warning: No matches found for: shlock No matches found On Thu, Jul 9, 2015 at 12:17 PM, Marc MERLIN m...@merlins.org wrote: On Thu, Jul 09, 2015 at 02:26:55PM +0200, Martin Steigerwald wrote: Hi! I see Alex, the developer of btrbk posted here once about btrfs send and receive, but well any other users of btrbk¹? What are your experiences? I consider switching to it from my home grown rsync based backup script to it. Well I may try it for one of my BTRFS volumes in addition to the rsync backup for now. I would like to give all options on command line, but well, maybe it can completely replace my current script if I put everything in its configuration. Any other handy BTRFS backup solutions? I use my own which I wrote :) http://marc.merlins.org/perso/btrfs/2014-03.html#Btrfs-Tips_-Doing-Fast-Incremental-Backups-With-Btrfs-Send-and-Receive http://marc.merlins.org/linux/scripts/btrfs-subvolume-backup Marc -- A mouse is a device used to point at the xterm you want to type in - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 1024R/763BE901 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH 0/2] Btrfs partial csum support
This patchset will add partial csum support for btrfs. Partial csum will take full advantage of the 32 bytes csum space inside the tree block, while still maintain backward compatibility on old kernels. The overall idea is like the following on 16K leaf: [Old tree block csum] 0 4 8121620242832 - |csum | unused, all 0 | - Csum is the crc32 of the whole tree block data. [New tree block csum] - |csum0|csum1|csum2|csum3|csum4|csum5|csum6|csum7| - Where csum0 is the same as the old one, crc32 of the whole tree block data. And csum1~csum7 will restore crc32 of each eighth part. Take example of 16K leafsize, then: csum1: crc32 of BTRFS_CSUM_SIZE~4K csum2: crc32 of 4K~6K ... csum7: crc32 of 14K~16K When nodesize is small, like 4K, partial csum is completely useless. But when nodesize grows up, like 32K, each partial csum will just covers a page, making scrub able to judge which page is OK even without reading out the whole tree block. And add the possibility to fix case like corruption happens at all mirror but in different part. Such case should be more possible if nodesize goes up beyond 16K. Qu Wenruo (1): btrfs: csum: Introduce partial csum for tree block. Zhao Lei (1): btrfs: scrub: Add support partial csum fs/btrfs/disk-io.c | 74 --- fs/btrfs/scrub.c | 207 - 2 files changed, 255 insertions(+), 26 deletions(-) -- 2.4.5 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: remove unused mutex from struct 'btrfs_fs_info'
The code using 'ordered_extent_flush_mutex' mutex has removed by below commit. - 8d875f95da43c6a8f18f77869f2ef26e9594fecc btrfs: disable strict file flushes for renames and truncates But the mutex still lives in struct 'btrfs_fs_info'. So, this patch removes the mutex from struct 'btrfs_fs_info' and its initialization code. Signed-off-by: Byongho Lee bhlee.ker...@gmail.com --- fs/btrfs/ctree.h | 6 -- fs/btrfs/disk-io.c | 1 - 2 files changed, 7 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index aac314e14188..cdde6d541b3a 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1518,12 +1518,6 @@ struct btrfs_fs_info { */ struct mutex ordered_operations_mutex; - /* -* Same as ordered_operations_mutex except this is for ordered extents -* and not the operations. -*/ - struct mutex ordered_extent_flush_mutex; - struct rw_semaphore commit_root_sem; struct rw_semaphore cleanup_work_sem; diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index e5aad7f535aa..6ba584714c51 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2608,7 +2608,6 @@ int open_ctree(struct super_block *sb, mutex_init(fs_info-ordered_operations_mutex); - mutex_init(fs_info-ordered_extent_flush_mutex); mutex_init(fs_info-tree_log_mutex); mutex_init(fs_info-chunk_mutex); mutex_init(fs_info-transaction_kthread_mutex); -- 2.4.5 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs partition converted from ext4 becomes read-only minutes after booting: WARNING: CPU: 2 PID: 2777 at ../fs/btrfs/super.c:260 __btrfs_abort_transaction+0x4b/0x120
Slightly off topic does these bugs exist in systems that converted from ext4 to btrfs using kernel 3.13 and then upgraded to kernel 4.1 ? On Thu, Jul 9, 2015 at 4:09 AM, Chris Murphy li...@colorremedies.com wrote: On Thu, Jun 25, 2015 at 8:08 PM, Qu Wenruo quwen...@cn.fujitsu.com wrote: A quite code search leads me to inline extent. So, if you still have the original ext* image, would you please try revert to ext* and then convert it to btrfs again? But this time, please convert with --no-inline option, and see if this remove the problem. Using -n at convert time does not make a difference for the btrfs-convert bugs I've opened: https://bugzilla.kernel.org/show_bug.cgi?id=101191 https://bugzilla.kernel.org/show_bug.cgi?id=101181 https://bugzilla.kernel.org/show_bug.cgi?id=101221 https://bugzilla.kernel.org/show_bug.cgi?id=101231 The last one I just discovered happens much sooner, is easier to reproduce than the other two. It's a scrub right after a successful btrfs-convert that btrfs check says is OK. But the scrub ends with two separate oopses and multiple call traces and a spectacularly hard kernic panic (ssh and even the console dies). So I think btrfs-convert has a bug, but then the kernel code is not gracefully handling it at all either and crashes badly with a scrub; and less badly with balance. However, the file system is still OK despite scrub crash. With balance failure, the file system is too badly damaged and btrfs check and btrfs-image fail. -- Chris Murphy -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs subvolume clone or fork (btrfs-progs feature request)
On Thu, Jul 9, 2015 at 8:20 AM, james harvey jamespharve...@gmail.com wrote: Request for new btrfs subvolume subcommand: clone or fork [-i qgroupid] source [dest]name Create a subvolume name in dest, which is a clone or fork of source. If dest is not given, subvolume name will be created in the current directory. Options -i qgroupid Add the newly created subvolume to a qgroup. This option can be given multiple times. Would (I think): * btrfs subvolume create dest-subvolume * cp -ax --reflink=always source-subvolume/* dest-subvolume/ What's wrong with btrfs subvolume snapshot? -- Fajar -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: size 2.73TiB used 240.97GiB after balance
On 2015-07-08 15:06, Donald Pearson wrote: I wouldn't use dd. I would use recover to get the data if at all possible, then you can experiment with try to fix the degraded condition live. If you have any chance of getting data from the pool, you reduce that chance every time you make a change. If btrfs did the balance like you said, it wouldn't be raid5. What you just described is raid4 where only one drive holds parity data. I can't say that I actually know for a fact that btrfs doesn't do this, but I'd be shocked and some dev would need to eat their underware if the balance job didn't distribute the parity also. That is correct, it does distribute the parity among all the member drives. That said, it would still have to modify the existing drives even if it did put the parity on just the new drive, because raid{4,5,6} are defined as _striped_ data with parity, not mirrored (ie, if you just removed the parity, you'd have a raid0, not a raid1). smime.p7s Description: S/MIME Cryptographic Signature
Re: size 2.73TiB used 240.97GiB after balance
On 2015-07-08 18:16, Donald Pearson wrote: Basically I wouldn't trust the drive that's already showing signs of failure to survive a dd. It isn't completely full, so the recover is less load. That's just the way I see it. But I see your point of trying to get drive images now to hedge against failures. Unfortunately those errors are over my head so hopefully someone else has insights. A better option if you want a block level copy would probably be ddrescue (it's available in almost every distro in a package of the same name), it's designed for recovering as much data as possible from failed disks (and gives a much nicer status display than plain old dd). If you do go for a block level copy however, make certain that no more than one of the copies is visible to the system at any given time, especially when the filesystem is mounted, otherwise things _WILL_ get exponentially worse. smime.p7s Description: S/MIME Cryptographic Signature
Re: btrfs subvolume clone or fork (btrfs-progs feature request)
On 2015-07-09 02:22, Fajar A. Nugraha wrote: On Thu, Jul 9, 2015 at 8:20 AM, james harvey jamespharve...@gmail.com wrote: Request for new btrfs subvolume subcommand: clone or fork [-i qgroupid] source [dest]name Create a subvolume name in dest, which is a clone or fork of source. If dest is not given, subvolume name will be created in the current directory. Options -i qgroupid Add the newly created subvolume to a qgroup. This option can be given multiple times. Would (I think): * btrfs subvolume create dest-subvolume * cp -ax --reflink=always source-subvolume/* dest-subvolume/ What's wrong with btrfs subvolume snapshot? Well, personally I would say the fact that once something is tagged as a snapshot, you can't change it to a regular subvolume without doing a non-incremental send/receive. smime.p7s Description: S/MIME Cryptographic Signature
Re: btrfs subvolume clone or fork (btrfs-progs feature request)
Austin S Hemmelgarn wrote (ao): On 2015-07-09 08:41, Sander wrote: Austin S Hemmelgarn wrote (ao): What's wrong with btrfs subvolume snapshot? Well, personally I would say the fact that once something is tagged as a snapshot, you can't change it to a regular subvolume without doing a non-incremental send/receive. A snapshot is a subvolume. There is no such thing as tagged as a snapshot. Sander No, there is a bit in the subvolume metadata that says whether it's considered a snapshot or not. Internally, they are handled identically, but it does come into play when you consider things like btrfs subvolume show -s (which only lists snapshots), which in turn means that certain tasks are more difficult to script robustly. I stand corrected. Thanks for the info. Sander -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs subvolume clone or fork (btrfs-progs feature request)
Austin S Hemmelgarn wrote (ao): What's wrong with btrfs subvolume snapshot? Well, personally I would say the fact that once something is tagged as a snapshot, you can't change it to a regular subvolume without doing a non-incremental send/receive. A snapshot is a subvolume. There is no such thing as tagged as a snapshot. Sander -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs subvolume clone or fork (btrfs-progs feature request)
On 2015-07-09 08:41, Sander wrote: Austin S Hemmelgarn wrote (ao): What's wrong with btrfs subvolume snapshot? Well, personally I would say the fact that once something is tagged as a snapshot, you can't change it to a regular subvolume without doing a non-incremental send/receive. A snapshot is a subvolume. There is no such thing as tagged as a snapshot. Sander No, there is a bit in the subvolume metadata that says whether it's considered a snapshot or not. Internally, they are handled identically, but it does come into play when you consider things like btrfs subvolume show -s (which only lists snapshots), which in turn means that certain tasks are more difficult to script robustly. smime.p7s Description: S/MIME Cryptographic Signature
Re: kernel crash on btrfs device delete missing
I was finally able to remove the missing device. I updated the bug report, but in case anyone else has this problem I wanted to update here as well. I deleted all snapshot subvolumes on the pool (between 20 and 30), and was able to delete the missing device then without issue. This took two tries, because the first time I did not wait for btrfs-cleanup to finish actually deleting the subvolumes (as it does this in the background.) The second time I deleted the subvolumes then waited until all disk activity (reported by iotop) had ceased, and re-mounted the pool to be sure. After that the rebalance/delete worked without issue. I am not certain if this is because of a bug with rebalancing snapshots or because the some bad data that was causing the segfault just happened to be in the snapshots. On Tue, Jul 7, 2015 at 12:45 PM, David Wilhelm thefe...@gmail.com wrote: Thanks. I've submitted it as issue 101141 https://bugzilla.kernel.org/show_bug.cgi?id=101141 That looks like the kind of thing you need a developer for. You've already reported it here, but sticking a copy of what you've discovered so far into bugzilla.kernel.org may help it not to get lost. Hugo. -- Hugo Mills | I don't like the look of it, I tell you. hugo@... carfax.org.uk | Well, stop looking at it, then. http://carfax.org.uk/ | PGP: E2AB1DE4 | The Goons -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH trivial] Btrfs: Spelling s/consitent/consistent/
On Mon, Jul 06, 2015 at 03:38:11PM +0200, Geert Uytterhoeven wrote: Signed-off-by: Geert Uytterhoeven ge...@linux-m68k.org Acked-by: David Sterba dste...@suse.com -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs subvolume clone or fork (btrfs-progs feature request)
Austin S Hemmelgarn posted on Thu, 09 Jul 2015 08:48:00 -0400 as excerpted: On 2015-07-09 08:41, Sander wrote: Austin S Hemmelgarn wrote (ao): What's wrong with btrfs subvolume snapshot? Well, personally I would say the fact that once something is tagged as a snapshot, you can't change it to a regular subvolume without doing a non-incremental send/receive. A snapshot is a subvolume. There is no such thing as tagged as a snapshot. Sander No, there is a bit in the subvolume metadata that says whether it's considered a snapshot or not. Internally, they are handled identically, but it does come into play when you consider things like btrfs subvolume show -s (which only lists snapshots), which in turn means that certain tasks are more difficult to script robustly. My use-case doesn't involve subvolumes or snapshots so I can't check for sure, but... I could have sworn btrfs property -t subvolume can get/set that snapshot bit. I know I saw the discussion and I think patch for it go by, but again, as I don't use them, I haven't tracked closely enough to see if it ever got in. -- Duncan - List replies preferred. No HTML msgs. Every nonfree program has a lord, a master -- and if you use the program, he is your master. Richard Stallman -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Documentation: filesystems: btrfs: Fixed typos and whitespace
On Wed, 08 Jul 2015 10:44:51 -0700 Daniel Grimshaw grims...@linux.vnet.ibm.com wrote: I am a high school student trying to become familiar with Linux kernel development. The btrfs documentation in Documentation/filesystems had a few typos and errors in whitespace. This patch corrects both of these. This is a resend of an earlier patch with corrected patchfile. Applied to the docs tree, thanks. Just FYI, if you put lines like the last one above after the '---' line, they won't find their way into the commit changelog, which is preferable. I edited it out. jon -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs partition converted from ext4 becomes read-only minutes after booting: WARNING: CPU: 2 PID: 2777 at ../fs/btrfs/super.c:260 __btrfs_abort_transaction+0x4b/0x120
On Thu, Jul 9, 2015 at 4:52 AM, Vytautas D vyt...@gmail.com wrote: Slightly off topic does these bugs exist in systems that converted from ext4 to btrfs using kernel 3.13 and then upgraded to kernel 4.1 ? I don't recall what btrfs-progs and kernel I last tested ext4 conversion with. I know this is a regression, I just don't know how old it is. I think there's more than one bug here (obviously since I've filed 4 related bugs in ~24 hours), but I really don't know the scope of the problem. But the case where the recommended procedure not only fails but corrupts the file system and it can't be fixed or rolled back, is not good. Perhaps the wiki should provide a warning that this is currently broken, status unknown, or something? -- Chris Murphy -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Concurrent write access
Hi, I have a btrfs raid10 which is connected to a server hosting multiple virtual machine. Does btrfs support connecting the same subvolumes of the same raid to multiple virtual machines for concurrent read and write? The situation would be the same as, say, mounting user homes from the same nfs share on different machines. Thanks, Wolfgang signature.asc Description: This is a digitally signed message part.
Odd scrub behavior - Raid5/6
Something I've noticed scrubbing two pools that I have, one is Raid6 and the other is Raid5. The scrubbing goes along very slowly and I think it's because there is always one disk that's operating differently than the rest. Which disk changes. Here is an iostat of the current scrub, and you can see that /dev/sdj is the odd ball. Below the iostat output is the smart statistics for sdj and they indicate a healthy drive. And to be sure I recently ran extended tests twice without incident. Below that is another iostat output, where literally as I'm typing this email the behavior changed to a different drive in the pool, /dev/sdo and the smart data follows, also showing a healthy drive (and a load cycle count that would make a seagate blush). avg-cpu: %user %nice %system %iowait %steal %idle 0.340.005.88 93.200.000.59 Device: rrqm/s wrqm/s r/s w/srMB/swMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sdp 0.00 2.000.005.00 0.00 0.08 31.47 0.059.070.009.07 8.93 4.47 sda 0.00 0.000.000.00 0.00 0.00 0.00 0.000.000.000.00 0.00 0.00 sdd 0.00 0.000.000.00 0.00 0.00 0.00 0.000.000.000.00 0.00 0.00 sde 0.00 0.000.000.00 0.00 0.00 0.00 0.000.000.000.00 0.00 0.00 sdr 31.33 0.00 41.671.00 4.56 0.02 220.12 0.399.059.260.67 3.50 14.93 sdi 26.33 0.00 65.672.00 5.73 0.10 176.47 0.75 11.04 11.360.67 3.73 25.23 sdf 0.00 0.000.000.00 0.00 0.00 0.00 0.000.000.000.00 0.00 0.00 sdn 30.33 0.00 58.331.33 5.50 0.06 190.97 0.548.999.190.50 3.90 23.27 sdo 33.00 0.00 64.671.00 6.10 0.04 191.72 0.73 11.06 11.220.67 3.88 25.47 sds 30.33 0.00 59.001.67 5.56 0.05 189.45 0.538.668.900.40 3.66 22.23 sdc 0.00 0.000.000.00 0.00 0.00 0.00 0.000.000.000.00 0.00 0.00 sdb 0.00 0.000.000.00 0.00 0.00 0.00 0.000.000.000.00 0.00 0.00 sdk 28.67 0.00 62.331.67 5.65 0.08 183.29 2.73 42.72 43.860.40 8.20 52.47 sdl 0.00 0.000.000.00 0.00 0.00 0.00 0.000.000.000.00 0.00 0.00 sdu 35.00 0.00 62.000.33 6.04 0.00 198.59 0.74 11.88 11.950.00 4.41 27.50 sdt 26.33 0.00 61.331.00 5.58 0.02 184.00 0.65 12.05 12.240.33 3.76 23.43 sdj 34.67 0.00 31.670.67 3.79 0.04 242.80 129.66 3822.54 3876.24 1271.50 30.93 100.00 sdq 33.33 0.00 43.330.67 4.79 0.03 224.42 0.56 12.68 12.880.00 6.02 26.47 sdm 0.00 0.000.000.00 0.00 0.00 0.00 0.000.000.000.00 0.00 0.00 sdh 34.33 0.00 46.001.67 5.02 0.10 220.20 0.53 11.17 11.570.40 4.45 21.20 sdg 30.00 0.00 48.671.67 4.90 0.05 201.11 0.458.999.290.00 3.52 17.70 [root@san01 ~]# smartctl -a /dev/sdj smartctl 6.2 2013-07-26 r3841 [x86_64-linux-4.1.0-1.el7.elrepo.x86_64] (local build) Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: Western Digital Caviar Green (AF, SATA 6Gb/s) Device Model: WDC WD15EARX-00PASB0 Serial Number:WD-WCAZAK449717 LU WWN Device Id: 5 0014ee 2b27dbe1a Firmware Version: 51.0AB51 User Capacity:1,500,301,910,016 bytes [1.50 TB] Sector Sizes: 512 bytes logical, 4096 bytes physical Device is:In smartctl database [for details use: -P show] ATA Version is: ATA8-ACS (minor revision not indicated) SATA Version is: SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s) Local Time is:Thu Jul 9 16:58:52 2015 CDT SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection:(38280) seconds. Offline
Re: Concurrent write access
On Thu, Jul 09, 2015 at 11:34:40PM +0200, Wolfgang Mader wrote: Hi, I have a btrfs raid10 which is connected to a server hosting multiple virtual machine. Does btrfs support connecting the same subvolumes of the same raid to multiple virtual machines for concurrent read and write? The situation would be the same as, say, mounting user homes from the same nfs share on different machines. It'll depend on the protocol you use to make the subvolumes visible within the VMs. btrfs subvolumes aren't block devices, so that rules out most of the usual approaches. However, there are two methods I've used which I can confirm will work well: NFS and 9p. NFS will work as a root filesystem, and will work with any host/guest, as long as there's a network connection between the two. 9p is, at least in theory, faster (particularly with virtio), but won't let you boot with the 9p device as your root FS. You'll need virtualiser support if you want to run a virtio 9p -- I know qemu/kvm supports this; I don't know if anything else supports it. You can probably use Samba/CIFS as well. It'll be slower than the virtualised 9p, and not be able to host a root filesystem. I haven't tried this one, because Samba and I get on like a house on fire(*). Hugo. (*) Screaming, shouting, people running away, emergency services. -- Hugo Mills | Alert status mauve ocelot: Slight chance of hugo@... carfax.org.uk | brimstone. Be prepared to make a nice cup of tea. http://carfax.org.uk/ | PGP: E2AB1DE4 | signature.asc Description: Digital signature
Re: Concurrent write access
On Thursday 09 July 2015 22:06:09 Hugo Mills wrote: On Thu, Jul 09, 2015 at 11:34:40PM +0200, Wolfgang Mader wrote: Hi, I have a btrfs raid10 which is connected to a server hosting multiple virtual machine. Does btrfs support connecting the same subvolumes of the same raid to multiple virtual machines for concurrent read and write? The situation would be the same as, say, mounting user homes from the same nfs share on different machines. It'll depend on the protocol you use to make the subvolumes visible within the VMs. btrfs subvolumes aren't block devices, so that rules out most of the usual approaches. However, there are two methods I've used which I can confirm will work well: NFS and 9p. NFS will work as a root filesystem, and will work with any host/guest, as long as there's a network connection between the two. 9p is, at least in theory, faster (particularly with virtio), but won't let you boot with the 9p device as your root FS. You'll need virtualiser support if you want to run a virtio 9p -- I know qemu/kvm supports this; I don't know if anything else supports it. Thanks for the overview. It it qmeu/kvm in fact, to this is an option. Right now, however, I connect the discs as virtual discs and not the file system, but only to one virtual machine. Best, Wolfgang You can probably use Samba/CIFS as well. It'll be slower than the virtualised 9p, and not be able to host a root filesystem. I haven't tried this one, because Samba and I get on like a house on fire(*). Hugo. (*) Screaming, shouting, people running away, emergency services. signature.asc Description: This is a digitally signed message part.
[BUG] Fails to duplicate metadata/system
Hi, I've noticed that a single device partition was using metadata.single and system.single instead of metadata.dup and system.dup. All tests to force conversion to dup failed. Here is how to reproduce this with an image and some very simple BTRFS commands (Debian stretch): $ uname -a Linux asdasd 4.0.0-1-amd64 #1 SMP Debian 4.0.2-1 (2015-05-11) x86_64 GNU/Linux $ btrfs --version btrfs-progs v4.0 $ fallocate -l 8G test.img $ mkdir mnt $ mkfs.btrfs test.img $ mount -o loop test.img mnt $ touch mnt/asdasd $ btrfs fi df mnt Data, single: total=8.00MiB, used=64.00KiB System, DUP: total=8.00MiB, used=16.00KiB System, single: total=4.00MiB, used=0.00B Metadata, DUP: total=409.56MiB, used=112.00KiB Metadata, single: total=8.00MiB, used=0.00B GlobalReserve, single: total=16.00MiB, used=0.00B $ btrfs balance start -v -mconvert=single -sconvert=single -dconvert=single mnt -f Dumping filters: flags 0xf, state 0x0, force is on DATA (flags 0x100): converting, target=281474976710656, soft is off METADATA (flags 0x100): converting, target=281474976710656, soft is off SYSTEM (flags 0x100): converting, target=281474976710656, soft is off Done, had to relocate 5 out of 5 chunks $ btrfs fi df mnt Data, single: total=832.00MiB, used=256.00KiB System, single: total=32.00MiB, used=16.00KiB Metadata, single: total=256.00MiB, used=112.00KiB GlobalReserve, single: total=16.00MiB, used=0.00B $ btrfs balance start -v -mconvert=dup -sconvert=dup -dconvert=single mnt -f Dumping filters: flags 0xf, state 0x0, force is on DATA (flags 0x100): converting, target=281474976710656, soft is off METADATA (flags 0x100): converting, target=32, soft is off SYSTEM (flags 0x100): converting, target=32, soft is off Done, had to relocate 3 out of 3 chunks $ btrfs fi df mnt Data, single: total=832.00MiB, used=320.00KiB System, single: total=32.00MiB, used=16.00KiB Metadata, single: total=256.00MiB, used=112.00KiB GlobalReserve, single: total=16.00MiB, used=0.00B The expected result would be Metadata, DUP and System, DUP and not Metadata, single and System, single Some more info. $ btrfs fi show mnt Label: none uuid: b1a70fc4-7c18-4929-9b73-8f8bb328e7de Total devices 1 FS bytes used 384.00KiB devid1 size 8.00GiB used 1.09GiB path /dev/loop0 btrfs-progs v4.0 $ btrfs fi usage mnt Overall: Device size: 8.00GiB Device allocated: 1.09GiB Device unallocated:6.91GiB Device missing: 0.00B Used:384.00KiB Free (estimated): 7.72GiB (min: 7.72GiB) Data ratio: 1.00 Metadata ratio: 1.00 Global reserve: 16.00MiB (used: 0.00B) Data,single: Size:832.00MiB, Used:256.00KiB /dev/loop0832.00MiB Metadata,single: Size:256.00MiB, Used:112.00KiB /dev/loop0256.00MiB System,single: Size:32.00MiB, Used:16.00KiB /dev/loop0 32.00MiB Unallocated: /dev/loop0 6.91GiB $ btrfs-debug-tree test.img root tree leaf 2539634688 items 16 free space 12515 generation 47 owner 1 fs uuid b1a70fc4-7c18-4929-9b73-8f8bb328e7de chunk uuid c2606900-bfa1-444e-ab4d-3f0b2d31626b item 0 key (EXTENT_TREE ROOT_ITEM 0) itemoff 15844 itemsize 439 root data bytenr 2539651072 level 0 dirid 0 refs 1 gen 47 uuid ---- item 1 key (DEV_TREE ROOT_ITEM 0) itemoff 15405 itemsize 439 root data bytenr 2539569152 level 0 dirid 0 refs 1 gen 46 uuid ---- item 2 key (FS_TREE INODE_REF 6) itemoff 15388 itemsize 17 inode ref index 0 namelen 7 name: default item 3 key (FS_TREE ROOT_ITEM 0) itemoff 14949 itemsize 439 root data bytenr 2539356160 level 0 dirid 256 refs 1 gen 42 uuid ---- ctransid 6 otransid 0 stransid 0 rtransid 0 item 4 key (ROOT_TREE_DIR INODE_ITEM 0) itemoff 14789 itemsize 160 inode generation 3 transid 0 size 0 block group 0 mode 40755 links 1 uid 0 gid 0 rdev 0 flags 0x0 item 5 key (ROOT_TREE_DIR INODE_REF 6) itemoff 14777 itemsize 12 inode ref index 0 namelen 2 name: .. item 6 key (ROOT_TREE_DIR DIR_ITEM 2378154706) itemoff 14740 itemsize 37 location key (FS_TREE ROOT_ITEM -1) type DIR namelen 7 datalen 0 name: default item 7 key (CSUM_TREE ROOT_ITEM 0) itemoff 14301 itemsize 439 root data bytenr 2539667456 level 0 dirid 0 refs 1 gen 47 uuid ---- item 8 key (UUID_TREE ROOT_ITEM 0) itemoff 13862 itemsize 439 root data bytenr 2539208704 level 0 dirid 0 refs 1 gen 41 uuid be2539ee-1c09-e84c-8ec9-bfe054347ccf item
[PATCH] Btrfs: fix order by which delayed references are run
From: Filipe Manana fdman...@suse.com When we have an extent that got N references removed and N new references added in the same transaction, we must run the insertion of the references first because otherwise the last removed reference will remove the extent item from the extent tree, resulting in a failure for the insertions. This is a regression introduced in the 4.2-rc1 release and this fix just brings back the behaviour of selecting reference additions before any reference removals. The following test case for fstests reproduces the issue: seq=`basename $0` seqres=$RESULT_DIR/$seq echo QA output created by $seq tmp=/tmp/$$ status=1 # failure is the default! trap _cleanup; exit \$status 0 1 2 3 15 _cleanup() { _cleanup_flakey rm -f $tmp.* } # get standard environment, filters and checks . ./common/rc . ./common/filter . ./common/dmflakey # real QA test starts here _need_to_be_root _supported_fs btrfs _supported_os Linux _require_scratch _require_dm_flakey _require_cloner _require_metadata_journaling $SCRATCH_DEV rm -f $seqres.full _scratch_mkfs $seqres.full 21 _init_flakey _mount_flakey # Create prealloc extent covering range [160K, 620K[ $XFS_IO_PROG -f -c falloc 160K 460K $SCRATCH_MNT/foo # Now write to the last 80K of the prealloc extent plus 40K to the unallocated # space that immediately follows it. This creates a new extent of 40K that spans # the range [620K, 660K[. $XFS_IO_PROG -c pwrite -S 0xaa 540K 120K $SCRATCH_MNT/foo | _filter_xfs_io # At this point, there are now 2 back references to the prealloc extent in our # extent tree. Both are for our file offset 160K and one relates to a file # extent item with a data offset of 0 and a length of 380K, while the other # relates to a file extent item with a data offset of 380K and a length of 80K. # Make sure everything done so far is durably persisted (all back references are # in the extent tree, etc). sync # Now clone all extents of our file that cover the offset 160K up to its eof # (660K at this point) into itself at offset 2M. This leaves a hole in the file # covering the range [660K, 2M[. The prealloc extent will now be referenced by # the file twice, once for offset 160K and once for offset 2M. The 40K extent # that follows the prealloc extent will also be referenced twice by our file, # once for offset 620K and once for offset 2M + 460K. $CLONER_PROG -s $((160 * 1024)) -d $((2 * 1024 * 1024)) -l 0 $SCRATCH_MNT/foo \ $SCRATCH_MNT/foo # Now create one new extent in our file with a size of 100Kb. It will span the # range [3M, 3M + 100K[. It also will cause creation of a hole spanning the # range [2M + 460K, 3M[. Our new file size is 3M + 100K. $XFS_IO_PROG -c pwrite -S 0xbb 3M 100K $SCRATCH_MNT/foo | _filter_xfs_io # At this point, there are now (in memory) 4 back references to the prealloc # extent. # # Two of them are for file offset 160K, related to file extent items # matching the file offsets 160K and 540K respectively, with data offsets of # 0 and 380K respectively, and with lengths of 380K and 80K respectively. # # The other two references are for file offset 2M, related to file extent items # matching the file offsets 2M and 2M + 380K respectively, with data offsets of # 0 and 380K respectively, and with lengths of 389K and 80K respectively. # # The 40K extent has 2 back references, one for file offset 620K and the other # for file offset 2M + 460K. # # The 100K extent has a single back reference and it relates to file offset 3M. # Now clone our 100K extent into offset 600K. That offset covers the last 20K # of the prealloc extent, the whole 40K extent and 40K of the hole starting at # offset 660K. $CLONER_PROG -s $((3 * 1024 * 1024)) -d $((600 * 1024)) -l $((100 * 1024)) \ $SCRATCH_MNT/foo $SCRATCH_MNT/foo # At this point there's only one reference to the 40K extent, at file offset # 2M + 460K, we have 4 references for the prealloc extent (2 for file offset # 160K and 2 for file offset 2M) and 2 references for the 100K extent (1 for # file offset 3M and a new one for file offset 600K). # Now fsync our file to make all its new data and metadata updates are durably # persisted and present if a power failure/crash happens after a successful # fsync and before the next transaction commit. $XFS_IO_PROG -c fsync $SCRATCH_MNT/foo echo File digest before power failure: md5sum $SCRATCH_MNT/foo | _filter_scratch # Silently drop all writes and ummount to simulate a crash/power failure. _load_flakey_table $FLAKEY_DROP_WRITES _unmount_flakey # Allow writes again, mount to trigger log replay and validate file contents. # During log replay, the btrfs delayed references implementation used to run the # deletion of back references before the addition of new back references, which # made the addition fail as it didn't
[PATCH] fstests: btrfs test to exercise shared extent reference accounting
From: Filipe Manana fdman...@suse.com Regression test for adding and dropping an equal number of references for file extents. Verify that if we drop N references for a file extent and we add too N new references for that same file extent in the same transaction, running the delayed references (which always happens at transaction commit time) does not fail. The regression was introduced in the 4.2-rc1 Linux kernel and fixed by the patch titled: Btrfs: fix order by which delayed references are run. Signed-off-by: Filipe Manana fdman...@suse.com --- tests/btrfs/095 | 153 tests/btrfs/095.out | 9 tests/btrfs/group | 1 + 3 files changed, 163 insertions(+) create mode 100755 tests/btrfs/095 create mode 100644 tests/btrfs/095.out diff --git a/tests/btrfs/095 b/tests/btrfs/095 new file mode 100755 index 000..e68f2bf --- /dev/null +++ b/tests/btrfs/095 @@ -0,0 +1,153 @@ +#! /bin/bash +# FSQA Test No. 095 +# +# Regression test for adding and dropping an equal number of references for +# file extents. Verify that if we drop N references for a file extent and we +# add too N new references for that same file extent in the same transaction, +# running the delayed references (always happens at transaction commit time) +# does not fail. +# +# The regression was introduced in the 4.2-rc1 Linux kernel. +# +#--- +# +# Copyright (C) 2015 SUSE Linux Products GmbH. All Rights Reserved. +# Author: Filipe Manana fdman...@suse.com +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License as +# published by the Free Software Foundation. +# +# This program is distributed in the hope that it would be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write the Free Software Foundation, +# Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA +#--- +# + +seq=`basename $0` +seqres=$RESULT_DIR/$seq +echo QA output created by $seq +tmp=/tmp/$$ +status=1 # failure is the default! +trap _cleanup; exit \$status 0 1 2 3 15 + +_cleanup() +{ + _cleanup_flakey + rm -f $tmp.* +} + +# get standard environment, filters and checks +. ./common/rc +. ./common/filter +. ./common/dmflakey + +# real QA test starts here +_need_to_be_root +_supported_fs btrfs +_supported_os Linux +_require_scratch +_require_dm_flakey +_require_cloner +_require_metadata_journaling $SCRATCH_DEV + +rm -f $seqres.full + +_scratch_mkfs $seqres.full 21 +_init_flakey +_mount_flakey + +# Create prealloc extent covering range [160K, 620K[ +$XFS_IO_PROG -f -c falloc 160K 460K $SCRATCH_MNT/foo + +# Now write to the last 80K of the prealloc extent plus 40K to the unallocated +# space that immediately follows it. This creates a new extent of 40K that spans +# the range [620K, 660K[. +$XFS_IO_PROG -c pwrite -S 0xaa 540K 120K $SCRATCH_MNT/foo | _filter_xfs_io + +# At this point, there are now 2 back references to the prealloc extent in our +# extent tree. Both are for our file offset 160K and one relates to a file +# extent item with a data offset of 0 and a length of 380K, while the other +# relates to a file extent item with a data offset of 380K and a length of 80K. + +# Make sure everything done so far is durably persisted (all back references are +# in the extent tree, etc). +sync + +# Now clone all extents of our file that cover the offset 160K up to its eof +# (660K at this point) into itself at offset 2M. This leaves a hole in the file +# covering the range [660K, 2M[. The prealloc extent will now be referenced by +# the file twice, once for offset 160K and once for offset 2M. The 40K extent +# that follows the prealloc extent will also be referenced twice by our file, +# once for offset 620K and once for offset 2M + 460K. +$CLONER_PROG -s $((160 * 1024)) -d $((2 * 1024 * 1024)) -l 0 $SCRATCH_MNT/foo \ + $SCRATCH_MNT/foo + +# Now create one new extent in our file with a size of 100Kb. It will span the +# range [3M, 3M + 100K[. It also will cause creation of a hole spanning the +# range [2M + 460K, 3M[. Our new file size is 3M + 100K. +$XFS_IO_PROG -c pwrite -S 0xbb 3M 100K $SCRATCH_MNT/foo | _filter_xfs_io + +# At this point, there are now (in memory) 4 back references to the prealloc +# extent. +# +# Two of them are for file offset 160K, related to file extent items +# matching the file offsets 160K and 540K respectively, with data offsets of +# 0 and 380K respectively, and with lengths of 380K and 80K respectively. +# +# The other two references are for file offset 2M, related to file extent items +# matching the
Re: [PATCH] Btrfs: fix order by which delayed references are run
wrote on 2015/07/09 15:50 +0100: From: Filipe Manana fdman...@suse.com When we have an extent that got N references removed and N new references added in the same transaction, we must run the insertion of the references first because otherwise the last removed reference will remove the extent item from the extent tree, resulting in a failure for the insertions. This is a regression introduced in the 4.2-rc1 release and this fix just brings back the behaviour of selecting reference additions before any reference removals. Thanks, Filipe, that's right, it's my fault to forgot such case. Acked-by: Qu Wenruo quwen...@cn.fujitsu.com Thanks, Qu The following test case for fstests reproduces the issue: seq=`basename $0` seqres=$RESULT_DIR/$seq echo QA output created by $seq tmp=/tmp/$$ status=1 # failure is the default! trap _cleanup; exit \$status 0 1 2 3 15 _cleanup() { _cleanup_flakey rm -f $tmp.* } # get standard environment, filters and checks . ./common/rc . ./common/filter . ./common/dmflakey # real QA test starts here _need_to_be_root _supported_fs btrfs _supported_os Linux _require_scratch _require_dm_flakey _require_cloner _require_metadata_journaling $SCRATCH_DEV rm -f $seqres.full _scratch_mkfs $seqres.full 21 _init_flakey _mount_flakey # Create prealloc extent covering range [160K, 620K[ $XFS_IO_PROG -f -c falloc 160K 460K $SCRATCH_MNT/foo # Now write to the last 80K of the prealloc extent plus 40K to the unallocated # space that immediately follows it. This creates a new extent of 40K that spans # the range [620K, 660K[. $XFS_IO_PROG -c pwrite -S 0xaa 540K 120K $SCRATCH_MNT/foo | _filter_xfs_io # At this point, there are now 2 back references to the prealloc extent in our # extent tree. Both are for our file offset 160K and one relates to a file # extent item with a data offset of 0 and a length of 380K, while the other # relates to a file extent item with a data offset of 380K and a length of 80K. # Make sure everything done so far is durably persisted (all back references are # in the extent tree, etc). sync # Now clone all extents of our file that cover the offset 160K up to its eof # (660K at this point) into itself at offset 2M. This leaves a hole in the file # covering the range [660K, 2M[. The prealloc extent will now be referenced by # the file twice, once for offset 160K and once for offset 2M. The 40K extent # that follows the prealloc extent will also be referenced twice by our file, # once for offset 620K and once for offset 2M + 460K. $CLONER_PROG -s $((160 * 1024)) -d $((2 * 1024 * 1024)) -l 0 $SCRATCH_MNT/foo \ $SCRATCH_MNT/foo # Now create one new extent in our file with a size of 100Kb. It will span the # range [3M, 3M + 100K[. It also will cause creation of a hole spanning the # range [2M + 460K, 3M[. Our new file size is 3M + 100K. $XFS_IO_PROG -c pwrite -S 0xbb 3M 100K $SCRATCH_MNT/foo | _filter_xfs_io # At this point, there are now (in memory) 4 back references to the prealloc # extent. # # Two of them are for file offset 160K, related to file extent items # matching the file offsets 160K and 540K respectively, with data offsets of # 0 and 380K respectively, and with lengths of 380K and 80K respectively. # # The other two references are for file offset 2M, related to file extent items # matching the file offsets 2M and 2M + 380K respectively, with data offsets of # 0 and 380K respectively, and with lengths of 389K and 80K respectively. # # The 40K extent has 2 back references, one for file offset 620K and the other # for file offset 2M + 460K. # # The 100K extent has a single back reference and it relates to file offset 3M. # Now clone our 100K extent into offset 600K. That offset covers the last 20K # of the prealloc extent, the whole 40K extent and 40K of the hole starting at # offset 660K. $CLONER_PROG -s $((3 * 1024 * 1024)) -d $((600 * 1024)) -l $((100 * 1024)) \ $SCRATCH_MNT/foo $SCRATCH_MNT/foo # At this point there's only one reference to the 40K extent, at file offset # 2M + 460K, we have 4 references for the prealloc extent (2 for file offset # 160K and 2 for file offset 2M) and 2 references for the 100K extent (1 for # file offset 3M and a new one for file offset 600K). # Now fsync our file to make all its new data and metadata updates are durably # persisted and present if a power failure/crash happens after a successful # fsync and before the next transaction commit. $XFS_IO_PROG -c fsync $SCRATCH_MNT/foo echo File digest before power failure: md5sum $SCRATCH_MNT/foo | _filter_scratch # Silently drop all writes and ummount to simulate a crash/power failure. _load_flakey_table $FLAKEY_DROP_WRITES _unmount_flakey # Allow writes again, mount to
Re: btrfs partition converted from ext4 becomes read-only minutes after booting: WARNING: CPU: 2 PID: 2777 at ../fs/btrfs/super.c:260 __btrfs_abort_transaction+0x4b/0x120
One of my patch addressed a problem that a converted btrfs can't pass btrfsck. Not sure if that is the cause, but if you can try btrfs-progs v3.19.1, the one without my btrfs-progs patches and some other newer convert related patches, and see the result? I think this would at least provide the base for bisect the btrfs-progs if the bug is in btrfs-progs. Thanks, Qu Chris Murphy wrote on 2015/07/09 15:38 -0600: On Thu, Jul 9, 2015 at 4:52 AM, Vytautas D vyt...@gmail.com wrote: Slightly off topic does these bugs exist in systems that converted from ext4 to btrfs using kernel 3.13 and then upgraded to kernel 4.1 ? I don't recall what btrfs-progs and kernel I last tested ext4 conversion with. I know this is a regression, I just don't know how old it is. I think there's more than one bug here (obviously since I've filed 4 related bugs in ~24 hours), but I really don't know the scope of the problem. But the case where the recommended procedure not only fails but corrupts the file system and it can't be fixed or rolled back, is not good. Perhaps the wiki should provide a warning that this is currently broken, status unknown, or something? -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html