Re: question regarding caching
Austin S Hemmelgarn wrote (ao): The data is probably still cached in the block layer, so after unmounting, you could try 'echo 1 /proc/sys/vm/drop_caches' before mounting again, but make sure to run sync right before doing that, otherwise you might lose data. Lose data? Where you get this from? Sander -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 01/11] btrfs: Add barrier option to support -o remount,barrier
On 3 January 2014 06:10, Qu Wenruo quwen...@cn.fujitsu.com wrote: Btrfs can be remounted without barrier, but there is no barrier option so nobody can remount btrfs back with barrier on. Only umount and mount again can re-enable barrier.(Quite awkward) Also the mount options in the document is also changed slightly for the further pairing options changes. Reported-by: Daniel Blueman dan...@quora.org Signed-off-by: Qu Wenruo quwen...@cn.fujitsu.com Cc: David Sterba dste...@suse.cz --- changelog: v1: Add barrier option v2: Change the document style to fit pairing options better --- Documentation/filesystems/btrfs.txt | 13 +++-- fs/btrfs/super.c| 8 +++- 2 files changed, 14 insertions(+), 7 deletions(-) diff --git a/Documentation/filesystems/btrfs.txt b/Documentation/filesystems/btrfs.txt index 5dd282d..2d2e016 100644 --- a/Documentation/filesystems/btrfs.txt +++ b/Documentation/filesystems/btrfs.txt @@ -38,7 +38,7 @@ Mount Options = When mounting a btrfs filesystem, the following option are accepted. -Unless otherwise specified, all options default to off. +Options with (*) are default options and will not show in the mount options. alloc_start=bytes Debugging option to force all block allocations above a certain @@ -138,12 +138,13 @@ Unless otherwise specified, all options default to off. Disable support for Posix Access Control Lists (ACLs). See the acl(5) manual page for more information about ACLs. + barrier(*) nobarrier -Disables the use of block layer write barriers. Write barriers ensure - that certain IOs make it through the device cache and are on persistent - storage. If used on a device with a volatile (non-battery-backed) - write-back cache, this option will lead to filesystem corruption on a - system crash or power loss. +Disable/enable the use of block layer write barriers. Write barriers Please use Enable/Disable ... to match order on the options barrier(*) then nobarrier immediately above. + ensure that certain IOs make it through the device cache and are on + persistent storage. If used on a device with a volatile And: ... If disabled on a device with a volatile to make more sense when both enable and disable options are listed. + (non-battery-backed) write-back cache, this option will lead to + filesystem corruption on a system crash or power loss. nodatacow Disable data copy-on-write for newly created files. Implies nodatasum, diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index e9c13fb..fe9d8a6 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -323,7 +323,7 @@ enum { Opt_no_space_cache, Opt_recovery, Opt_skip_balance, Opt_check_integrity, Opt_check_integrity_including_extent_data, Opt_check_integrity_print_mask, Opt_fatal_errors, Opt_rescan_uuid_tree, - Opt_commit_interval, + Opt_commit_interval, Opt_barrier, Opt_err, }; @@ -335,6 +335,7 @@ static match_table_t tokens = { {Opt_nodatasum, nodatasum}, {Opt_nodatacow, nodatacow}, {Opt_nobarrier, nobarrier}, + {Opt_barrier, barrier}, {Opt_max_inline, max_inline=%s}, {Opt_alloc_start, alloc_start=%s}, {Opt_thread_pool, thread_pool=%d}, @@ -494,6 +495,11 @@ int btrfs_parse_options(struct btrfs_root *root, char *options) btrfs_clear_opt(info-mount_opt, SSD); btrfs_clear_opt(info-mount_opt, SSD_SPREAD); break; + case Opt_barrier: + if (btrfs_test_opt(root, NOBARRIER)) + btrfs_info(root-fs_info, turning on barriers); + btrfs_clear_opt(info-mount_opt, NOBARRIER); + break; case Opt_nobarrier: btrfs_info(root-fs_info, turning off barriers); btrfs_set_opt(info-mount_opt, NOBARRIER); -- 1.8.5.2 Thanks, Mike -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs-transaction blocked for more than 120 seconds
Kai Krakow posted on Fri, 03 Jan 2014 02:24:01 +0100 as excerpted: Duncan 1i5t5.dun...@cox.net schrieb: But because a full balance rewrites everything anyway, it'll effectively defrag too. Is that really true? I thought it just rewrites each distinct extent and shuffels chunks around... This would mean it does not merge extents together. While I'm not a coder and they're free to correct me if I'm wrong... With a full balance (there are now options allowing one to do only data, or only metadata, or for that matter only system, and do other filtering, say to rebalance only chunks less than 10% used or only those not yet converted to a new raid level, if desired, but we're talking a full balance here), all chunks are rewritten, merging data (or metadata) into fewer chunks if possible, eliminating the then unused chunks and returning the space they took to the unallocated pool. Given that everything is being rewritten anyway, a process that can take hours or even days on multi-terabyte spinning rust filesystems, /not/ doing a file defrag as part of the process would be stupid. So doing a separate defrag and balance isn't necessary. And while we're at it, doing a separate scrub and balance isn't necessary, for the same reason. (If one copy of the data is invalid and there's another, it'll be used for the rewrite and redup if necessary during the balance and the invalid copy will simply be erased. If there's no valid copy, then there will be balance errors and I believe the chunks containing the bad data are simply not rewritten at all, tho the valid data from them might be rewritten, leaving only the bad data (I'm not sure which, on that), thus allowing the admin to try other tools to clean up or recover from the damage as necessary.) That's one reason why the balance operation can take so much longer than a straight sequential read/write of the data might indicate, because it's doing all that extra work behind the scenes as well. Tho I'm not sure that it defrags across chunks, particularly if a file's fragments reach across enough chunks that they'd not have been processed by the time a written chunk is full and the balance progresses to the next one. However, given that data chunks are 1 GiB in size, that should still cut down a multi-thousand-extent file to perhaps a few dozen extents, one each per rewritten chunk. -- Duncan - List replies preferred. No HTML msgs. Every nonfree program has a lord, a master -- and if you use the program, he is your master. Richard Stallman -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: only fua the first superblock when writting supers
We only intent to fua the first superblock in every device from comments, fix it. Signed-off-by: Wang Shilong wangsl.f...@cn.fujitsu.com --- fs/btrfs/disk-io.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 9417b73..b016657 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3142,7 +3142,10 @@ static int write_dev_supers(struct btrfs_device *device, * we fua the first super. The others we allow * to go down lazy. */ - ret = btrfsic_submit_bh(WRITE_FUA, bh); + if (i == 0) + ret = btrfsic_submit_bh(WRITE_FUA, bh); + else + ret = btrfsic_submit_bh(WRITE_SYNC, bh); if (ret) errors++; } -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: question regarding caching
On 2014-01-03 03:39, Sander wrote: Austin S Hemmelgarn wrote (ao): The data is probably still cached in the block layer, so after unmounting, you could try 'echo 1 /proc/sys/vm/drop_caches' before mounting again, but make sure to run sync right before doing that, otherwise you might lose data. Lose data? Where you get this from? Sander Sorry, misread the documentation, thought it said destructive where it really said non-destructive. It's still a good idea to run sync before trying to clear the caches though, cause dirty objects aren't freeable. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
1be41b78: +18% increased btrfs write throughput
Hi Josef, FYI. We are doing 0day performance tests and happen to notice that btrfs write throughput increased considerably during v3.10-11 time frame: v3.10 v3.11 v3.12 v3.13-rc6 --- - - - 50619 ~ 1% +17.0% 59209 ~ 2% +18.8% 60159 ~ 2% +20.5% 61007 ~ 0% lkp-ws02/micro/dd-write/11HDD-JBOD-cfq-btrfs-1dd 50619 +17.0% 59209 +18.8% 60159 +20.5% 61007 TOTAL iostat.sdd.wkB/s and it's contributed by commit 1be41b78bc688fc634bf30965d2be692c99fd11d Author: Josef Bacik jba...@fusionio.com AuthorDate: Wed Jun 12 13:56:06 2013 -0400 Commit: Josef Bacik jba...@fusionio.com CommitDate: Mon Jul 1 08:52:28 2013 -0400 Btrfs: fix transaction throttling for delayed refs Dave has this fs_mark script that can make btrfs abort with sufficient amount of ram. This is because with more ram we can keep more dirty metadata in cache which in a round about way makes for many more pending delayed refs. What happens is we end up not throttling the transaction enough so when we go to commit the transaction when we've completely filled the file system we'll abort() because we use all of the space in the global reserve and we still have delayed refs to run. To fix this we need to make the delayed ref flushing and the transaction throttling dependant upon the number of delayed refs that we have instead of how much reserved space is left in the global reserve. With this patch we not only stop aborting transactions but we also get a smoother run speed with fs_mark and it makes us about 10% faster. Thanks, Reported-by: David Sterba dste...@suse.cz Signed-off-by: Josef Bacik jba...@fusionio.com fs/btrfs/ctree.h | 2 ++ fs/btrfs/extent-tree.c | 61 ++ fs/btrfs/transaction.c | 24 +--- 3 files changed, 69 insertions(+), 18 deletions(-) time.elapsed_time 740 ++--*+ *.. .*.*..**..*.*..*.| 720 ++ *.*..*.*. *..*.*..*. .*..*..*..*. .*. | | *. *..* *. *..* 700 ++ | || 680 ++ | || 660 ++ | || 640 ++ | || 620 ++ O O | OO O O O O O O OO O O O O O O O O O O O O O O| 600 ++---+ time.voluntary_context_switches 10 ++--*-**--+ 9 ++ *..*..*.*. *. *.. | *. .. *..* .*.. .*..*.*.*..*.*..*.*..* .* 8 ++* ** + .* | 7 ++ *. | | | 6 ++| 5 ++| 4 ++| | | 3 ++| 2 ++| | | 1 ++| 0 O+O--O-O--O-O--O-O--O-O-O--O-O--O-O--O-O--O-O-O--O-O--O-O--O-O+ time.file_system_inputs 80 ++**--+ |*..*..*.*..* *. *.. | 70 *+ .. *..* .*.. .*..*.*.*..*.*..*.*..* .* | * ** + .* | 60 ++
Re: 1be41b78: +18% increased btrfs write throughput
On Fri, 2014-01-03 at 23:54 +0800, fengguang...@intel.com wrote: Hi Josef, FYI. We are doing 0day performance tests and happen to notice that btrfs write throughput increased considerably during v3.10-11 time frame: v3.10 v3.11 v3.12 v3.13-rc6 --- - - - 50619 ~ 1% +17.0% 59209 ~ 2% +18.8% 60159 ~ 2% +20.5% 61007 ~ 0% lkp-ws02/micro/dd-write/11HDD-JBOD-cfq-btrfs-1dd 50619 +17.0% 59209 +18.8% 60159 +20.5% 61007 TOTAL iostat.sdd.wkB/s and it's contributed by commit 1be41b78bc688fc634bf30965d2be692c99fd11d Author: Josef Bacik jba...@fusionio.com AuthorDate: Wed Jun 12 13:56:06 2013 -0400 Commit: Josef Bacik jba...@fusionio.com CommitDate: Mon Jul 1 08:52:28 2013 -0400 Btrfs: fix transaction throttling for delayed refs Bonus points for increasing the performance on purpose. Thanks for running these Wu. -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Status of raid5/6 in 2014?
Back in Feb 2013 there was quite a bit of press about the preliminary raid5/6 implementation in Btrfs. At the time it wasn't useful for anything other then testing and it's my understanding that this is still the case. I've seen a few git commits and some chatter on this list but it would appear the developers are largely silent. Parity based raid would be a powerful addition the the Btrfs feature stack and it's the feature I most anxiously await. Are there any milestones planned for 2014? Keep up the good work... -- -=[dave]=- Entropy isn't what it used to be. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: only fua the first superblock when writting supers
On Fri, Jan 03, 2014 at 06:22:57PM +0800, Wang Shilong wrote: We only intent to fua the first superblock in every device from comments, fix it. Good catch, this could gain some speedup when there are up to 2 less flushes. There's one thing that's a different from currnet behaviour: Without this patch, all the superblocks are written with FUA, now only the first one, so my question is what if the first fails and the others succeed but do not get flushed immediatelly? This is more of a theoretical scenario, and if the 1st superblock write fails more serious problems can be expected. But let's say the write error of 1st is transient, do you or others think that it's reasonable to try to write all the remainig sb's with FUA? david -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 03/11] btrfs: Add nocheck_int mount option.
On Fri, Jan 03, 2014 at 02:10:26PM +0800, Qu Wenruo wrote: Add nocheck_int mount option to disable integrity check with remount option. + nocheck_int disables all the debug options above. I think this option is not needed, the integrity checker is a deveoplment functionality and used by people who know what they're doing. Besides this would need to clean up all the data structures that the checker uses (see eg. btrfsic_unmount that's called only if the mount option is used). I see little benefit compared to the amount of work to make sure that disabling the checker functionality in the middle works properly. david -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: only fua the first superblock when writting supers
On Fri, 2014-01-03 at 18:03 +0100, David Sterba wrote: On Fri, Jan 03, 2014 at 06:22:57PM +0800, Wang Shilong wrote: We only intent to fua the first superblock in every device from comments, fix it. Good catch, this could gain some speedup when there are up to 2 less flushes. There's one thing that's a different from currnet behaviour: Without this patch, all the superblocks are written with FUA, now only the first one, so my question is what if the first fails and the others succeed but do not get flushed immediatelly? This is more of a theoretical scenario, and if the 1st superblock write fails more serious problems can be expected. But let's say the write error of 1st is transient, do you or others think that it's reasonable to try to write all the remainig sb's with FUA? Not a bad idea, if we get a failure on the first SB, fua the others? I think it does make sense to do the others non-fua, just because they only get used in emergencies anyway. -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 00/11] btrfs: Add missing pairing mount options.
On 1/3/14, 12:10 AM, Qu Wenruo wrote: Some options should be paired to support triggering different functions when remounting. This patchset add these missing pairing mount options. I think this really would benefit from a regression test which ensures that every remount transition works properly... Thanks, -Eric changelog: v1: Initial commit with only barrier option v2: Add other missing pairing options Qu Wenruo (11): btrfs: Add barrier option to support -o remount,barrier btrfs: Add noautodefrag mount option. btrfs: Add nocheck_int mount option. btrfs: Add nodiscard mount option. btrfs: Add noenospc_debug mount option. btrfs: Add noflushoncommit mount option. btrfs: Add noinode_cache mount option. btrfs: Add acl mount option. btrfs: Add datacow mount option. btrfs: Add datasum mount option. btrfs: Add treelog mount option. Documentation/filesystems/btrfs.txt | 56 ++-- fs/btrfs/super.c| 74 - 2 files changed, 110 insertions(+), 20 deletions(-) -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 07/11] btrfs: Add noinode_cache mount option.
On Fri, Jan 03, 2014 at 02:10:30PM +0800, Qu Wenruo wrote: Add noinode_cache mount option to disable inode map cache with remount option. This looks almost safe, there's a sync_filesystem called before the filesystem's remount handler, the transaction gets committed and flushes all tha data related to inode_cache. The caching thread keeps running, which is not a serious problem as it'll finish at umount time, only consuming resources. There's a window between sync_filesystem and successful remount when the INODE_MAP_CACHE bit is set and the cache could be used to get a free ino, then the INODE_MAP_CACHE is cleared but the ino cache remains is not synced back to disk, normally called from transaction commit via btrfs_unpin_free_ino. I haven't looked if something else blocks that to happen. I'd leave this patch out for now, it probably needs more code updates than just unsetting the bit. david -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 00/11] btrfs: Add missing pairing mount options.
On Fri, Jan 03, 2014 at 02:10:23PM +0800, Qu Wenruo wrote: Some options should be paired to support triggering different functions when remounting. This patchset add these missing pairing mount options. Thanks! btrfs: Add nocheck_int mount option. btrfs: Add noinode_cache mount option. Commented separately, imho not to be merged in current state. btrfs: Add barrier option to support -o remount,barrier btrfs: Add noautodefrag mount option. btrfs: Add nodiscard mount option. btrfs: Add noenospc_debug mount option. btrfs: Add noflushoncommit mount option. btrfs: Add acl mount option. btrfs: Add datacow mount option. btrfs: Add datasum mount option. btrfs: Add treelog mount option. All ok. Reviewed-by: David Sterba dste...@suse.cz -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Question about ext4 conversion and leaf size
On Fri, Jan 03, 2014 at 12:29:51AM +, Holger Hoffstätte wrote: Conversion from ext4 works really well and is an important step for adoption. After recently converting a large-ish device I noticed dodgy performance, even after defragment rebalance; noticeably different from the quite good performance of a newly-created btrfs with 16k leaf size, as is the default since recently. So I went spelunking and found that the btrfs-convert logic indeed uses the ext4 block size as leaf size (from #2220): https://git.kernel.org/cgit/linux/kernel/git/mason/btrfs-progs.git/tree/btrfs-convert.c#n2245 This is typically 4096 bytes and explains the observed performance. So while I'm basically familiar with btrfs's design, I know nothing about the details of the conversion (I'm amazed that it works so well, including rollback!) but can/should this not be updated to the new default of 16k, or is there a strong necessary correlation between the ext4 block size and the newly created btrfs? The sectorsize has to be same for ext4 and btrfs, which is 4k (PAGE_SIZE) nowadays. The btrfs metadata block is not limited by that. I've tried to implement the dumb simple support for larger metadata block some time ago http://repo.or.cz/w/btrfs-progs-unstable/devel.git/commitdiff/337ac35f5a6ebeaee375329084b89ea4a868b4be?hp=704a08cb8ae8735f8538e637a1be822e76e69d3c but the conversion did not work properly, and I haven't debugged that further. david -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 5/6] Btrfs: use flags instead of the bool variants in delayed node
On Fri, Jan 03, 2014 at 05:27:51PM +0800, Miao Xie wrote: On Thu, 2 Jan 2014 18:49:55 +0100, David Sterba wrote: On Thu, Dec 26, 2013 at 01:07:05PM +0800, Miao Xie wrote: +#define BTRFS_DELAYED_NODE_IN_LIST0 +#define BTRFS_DELAYED_NODE_INODE_DIRTY1 + struct btrfs_delayed_node { u64 inode_id; u64 bytes_reserved; @@ -65,8 +68,7 @@ struct btrfs_delayed_node { struct btrfs_inode_item inode_item; atomic_t refs; u64 index_cnt; - bool in_list; - bool inode_dirty; + unsigned long flags; int count; }; What's the reason to do that? Replacing 2 bools with a bitfield does not seem justified, not from saving memory, nor from a performance gain side. Also some of the bit operations imply the lock instruction prefix so this affects the surrounding items as well. I don't think this is needed, unless you have further plans with the flags item. Yes, I introduced a flag in the next patch. That's still 3 bool flags that are quite independent and consume less than the unsigned long anyway. Also the bool flags are something that compiler understands and can use during optimizations unlike the obfuscated bit access. I don't mind using bitfields, but it imo starts to make sense to use them when there are more than a few, like BTRFS_INODE_* or EXTENT_BUFFER_*. The point of my objections is to establish good coding patterns to follow. david -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [block:for-3.14/core] kernel BUG at fs/bio.c:1748
Looks like Kent missed the btrfs endio in the original commit. How about this patch: - In btrfs_end_bio, call bio_endio_nodec on the restored bio so the bi_remaining is accounted for correctly. Reported-by: fengguang...@intel.com Cc: Kent Overstreet k...@daterainc.com CC: Jens Axboe ax...@kernel.dk Signed-off-by: Muthukumar Ratty mut...@gmail.com fs/btrfs/volumes.c |6 +- 1 files changed, 5 insertions(+), 1 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index f2130de..edfed52 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -5316,7 +5316,11 @@ static void btrfs_end_bio(struct bio *bio, int err) } kfree(bbio); - bio_endio(bio, err); +/* + * Call endio_nodec on the restored bio so the bi_remaining is + * accounted for correctly + */ + bio_endio_nodec(bio, err); } else if (!is_orig_bio) { bio_put(bio); } On Wed, Jan 1, 2014 at 9:31 PM, fengguang...@intel.com wrote: Greetings, We hit the below bug when doing write tests to btrfs. Other filesystems (ext4, xfs) works fine. 2 full dmesgs are attached. 196d38bccfcfa32faed8c561868336fdfa0fe8e4 is the first bad commit commit 196d38bccfcfa32faed8c561868336fdfa0fe8e4 Author: Kent Overstreet k...@daterainc.com AuthorDate: Sat Nov 23 18:34:15 2013 -0800 Commit: Kent Overstreet k...@daterainc.com CommitDate: Sat Nov 23 22:33:56 2013 -0800 block: Generic bio chaining This adds a generic mechanism for chaining bio completions. This is going to be used for a bio_split() replacement, and it turns out to be very useful in a fair amount of driver code - a fair number of drivers were implementing this in their own roundabout ways, often painfully. Note that this means it's no longer to call bio_endio() more than once on the same bio! This can cause problems for drivers that save/restore bi_end_io. Arguably they shouldn't be saving/restoring bi_end_io at all - in all but the simplest cases they'd be better off just cloning the bio, and immutable biovecs is making bio cloning cheaper. But for now, we add a bio_endio_nodec() for these cases. Signed-off-by: Kent Overstreet k...@daterainc.com Cc: Jens Axboe ax...@kernel.dk drivers/md/bcache/io.c | 2 +- drivers/md/dm-cache-target.c | 6 drivers/md/dm-snap.c | 1 + drivers/md/dm-thin.c | 8 +++-- drivers/md/dm-verity.c | 2 +- fs/bio-integrity.c | 2 +- fs/bio.c | 76 include/linux/bio.h | 2 ++ include/linux/blk_types.h| 2 ++ 9 files changed, 90 insertions(+), 11 deletions(-) [ 35.466413] random: nonblocking pool is initialized [ 196.918039] [ cut here ] [ 196.919770] kernel BUG at fs/bio.c:1748! [ 196.921505] invalid opcode: [#1] SMP [ 196.921788] Modules linked in: microcode processor [ 196.921788] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.13.0-rc6-01897-g2b48961 #1 [ 196.921788] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 196.921788] task: 8804094acad0 ti: 8804094e8000 task.ti: 8804094e8000 [ 196.921788] RIP: 0010:[811ef01e] [811ef01e] bio_endio+0x1e/0x6a [ 196.921788] RSP: 0018:88041fc83da8 EFLAGS: 00010046 [ 196.921788] RAX: RBX: fffb RCX: 0001802a0002 [ 196.921788] RDX: 0001802a0003 RSI: RDI: 8800299ff9e8 [ 196.921788] RBP: 88041fc83dc0 R08: ea00096cc980 R09: 8804097f5100 [ 196.921788] R10: ea000aeb8280 R11: 8143841e R12: 88025b326780 [ 196.921788] R13: R14: R15: 3000 [ 196.921788] FS: () GS:88041fc8() knlGS: [ 196.921788] CS: 0010 DS: ES: CR0: 8005003b [ 196.921788] CR2: 7f16e7a1948f CR3: 7f85e000 CR4: 06e0 [ 196.921788] Stack: [ 196.921788] 8800299ff9e8 8800299ff9e8 88025b326780 88041fc83de8 [ 196.921788] 81438429 fffb 8803d36e6c00 [ 196.921788] 88041fc83e10 811ef063 8802bae0a1e8 8802bae0a1e8 [ 196.921788] Call Trace: [ 196.921788] IRQ [ 196.921788] [81438429] btrfs_end_bio+0x116/0x11d [ 196.921788] [811ef063] bio_endio+0x63/0x6a [ 196.921788] [814cb712] blk_mq_complete_request+0x89/0xfe [ 196.921788] [814cb79d] __blk_mq_end_io+0x16/0x18 [ 196.921788] [814cb7bf] blk_mq_end_io+0x20/0xb1 [ 196.921788] [815a1ba9] virtblk_done+0xa4/0xf6 [ 196.921788] [8155c463] vring_interrupt+0x7c/0x8a [ 196.921788] [81107427] handle_irq_event_percpu+0x4a/0x1bc [
Re: btrfs-transaction blocked for more than 120 seconds
First, a big thank you for taking the time to post this very informative message. On Wed, Jan 01, 2014 at 12:37:42PM +, Duncan wrote: Apparently the way some distribution installation scripts work results in even a brand new installation being highly fragmented. =:^( If in addition they don't add autodefrag to the mount options used when mounting the filesystem for the original installation, the problem is made even worse, since the autodefrag mount option is designed to help catch some of this sort of issue, and schedule the affected files for auto-defrag by a separate thread. Assuming you can stomach a bit of occasional performance loss due to autodefrag, is there a reason not to always have this on btrfs filesystems in newer kernels? (let's say 3.12+)? Is there even a reason for this not to become a default mount option in newer kernels? The NOCOW file attribute. Simple command form: chattr +C /path/to/file/or/directory Thank you for that tip, I had been unaware of it 'till now. This will make my virtualbox image directory much happier :) Meanwhile, if there's a point at which the file exists in its more or less permanent form and won't be written into any longer (a torrented file is fully downloaded, or a VM image is backed up), sequentially copying it elsewhere (possibly using cp --reflink=never if on the same filesystem, to avoid a reflink copy pointing at the same fragmented extents!), then deleting the original fragmented version, should effectively defragment the file too. And since it's not being written into any more at that point, it should stay defragmented. Or just btrfs filesystem defrag the individual file.. I know I can do the cp --reflink=never, but that will generate 100GB of new files and force me to drop all my hourly/daily/weekly snapshots, so file defrag is definitely a better option. Finally, there's some more work going into autodefrag now, to hopefully increase its performance, and make it work more efficiently on a bit larger files as well. The goal is to eliminate the problems with systemd's journal, among other things, now that it's known to be a common problem, given systemd's widespread use and the fact that both systemd and btrfs aim to be the accepted general Linux default within a few years. Is there a good guideline on which kinds of btrfs filesystems autodefrag is likely not a good idea, even if the current code does not have optimal performance? I suppose fragmented files that are deleted soon after being written are a loss, but otherwise it's mostly a win. Am I missing something? Unfortunately, on a 83GB vdi (virtualbox) file, with 3.12.5, it did a lot of writing and chewed up my 4 CPUs. Then, it started to be hard to move my mouse cursor and my procmeter graph was barely updating seconds. Next, nothing updated on my X server anymore, not even seconds in time widgets. But, I could still sometimes move my mouse cursor, and I could sometimes see the HD light fliker a bit before going dead again. In other words, the system wasn't fully deadlocked, but btrfs sure got into a state where it was unable to to finish the job, and took the kernel down with it (64bit, 8GB of RAM). I waited 2H and it never came out of it, I had to power down the system in the end. Note that this was on a top of the line 500MB/s write Samsung Evo 840 SSD, not a slow HD. I think I had enough free space: Label: 'btrfs_pool1' uuid: 4850ee22-bf32-4131-a841-02abdb4a5ba6 Total devices 1 FS bytes used 732.14GB devid1 size 865.01GB used 865.01GB path /dev/dm-0 Is it possible expected behaviour of defrag to lock up on big files? Should I have had more spare free space for it to work? Other? On the plus side, the file I was trying to defragment and hung my system, was not corrupted by the process. Any idea what I should try from here? Thanks, Marc -- A mouse is a device used to point at the xterm you want to type in - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 1024R/763BE901 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: coredump in btrfsck
On Thu, Jan 02, 2014 at 10:37:28AM -0700, Chris Murphy wrote: On Jan 1, 2014, at 3:35 PM, Oliver Mangold o.mang...@gmail.com wrote: On 01.01.2014 22:58, Chris Murphy wrote: On Jan 1, 2014, at 2:27 PM, Oliver Mangold o.mang...@gmail.com wrote: I fear, I broke my FS by running btrfsck. I tried 'btrfsck --repair' and it fixed several problems but finally crashed with some debug message from 'extent-tree.c', so I also tried 'btrfsck --repair --init-extent-tree'. It is sort of a (near) last restort, you know this right? What did you try before btrfsck? Did you set dmesg -n7, then mount -o recovery and if so what was recorded in dmesg? Ehm, actually, no. https://btrfs.wiki.kernel.org/index.php/FAQ#When_will_Btrfs_have_a_fsck_like_tool.3F This is a bit dated, but the general idea is to not use repair except on advice of a developer, and also there are still some risks. Just a week or so ago, one said it was a little dangerous still. So yeah, -o recovery should be the first choice. I was thinking about this: Considering that everyone out there has been conditioned/used to running fsck on any filesystem if thre is a problem, and considering btrfs has been different and likely will be for the forseable future, I'd like to suggest the following: In order to accomodate more users trying btrfs, the documentation for btrfsck really needs to be changed. Neither the tool help nor the man page say anything about 'this is not the fsck you're looking for', nor point to the wiki above. See: gandalfthegreat:~# btrfsck usage: btrfs check [options] device Check an unmounted btrfs filesystem. (...) and man btrfsck Would it be possible for whoever maintains btrfs-tools to change both the man page and the help included in the tool to clearly state that running the fsck tool is unlikely to be the right course of action and talk about btrfs-zero-log as well as mount -o recovery? Cheers, Marc -- A mouse is a device used to point at the xterm you want to type in - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 1024R/763BE901 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Is anyone using btrfs send/receive for backups instead of rsync?
On Mon, Dec 30, 2013 at 09:57:40AM -0800, Marc MERLIN wrote: On Mon, Dec 30, 2013 at 10:48:10AM -0700, Chris Murphy wrote: On Dec 30, 2013, at 10:10 AM, Marc MERLIN m...@merlins.org wrote: If one day, it could at least work on a subvolume level (only sync a subvolume), then it would be more useful to me. Maybe later… Maybe I'm missing something, but btrfs send/receive only work on a subvolume level. Never mind, I seem to be the one being dense. I mis-read that you needed to create the filesystem with btrfs receive. Indeed, it's on a subvolume level, so it's actually fine since it does allow over provisionning afterall. Mmmh, but I just realized that on my laptop, I do boot the btrfs copy (currently done with rsync) from time to time (i.e. emergency boot from the HD the SSD was copied to). If I do that, it'll change the filesystem that was created with btrfs receive and break it, preventing further updates, correct? If so, can I get around that by making a boot snapshot after each copy and mount that snapshot for emergency boot instead of the main volume? Thanks, Marc -- A mouse is a device used to point at the xterm you want to type in - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 1024R/763BE901 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Is anyone using btrfs send/receive for backups instead of rsync?
On Fri, 2014-01-03 at 12:15 -0800, Marc MERLIN wrote: On Mon, Dec 30, 2013 at 09:57:40AM -0800, Marc MERLIN wrote: On Mon, Dec 30, 2013 at 10:48:10AM -0700, Chris Murphy wrote: On Dec 30, 2013, at 10:10 AM, Marc MERLIN m...@merlins.org wrote: If one day, it could at least work on a subvolume level (only sync a subvolume), then it would be more useful to me. Maybe later… Maybe I'm missing something, but btrfs send/receive only work on a subvolume level. Never mind, I seem to be the one being dense. I mis-read that you needed to create the filesystem with btrfs receive. Indeed, it's on a subvolume level, so it's actually fine since it does allow over provisionning afterall. Mmmh, but I just realized that on my laptop, I do boot the btrfs copy (currently done with rsync) from time to time (i.e. emergency boot from the HD the SSD was copied to). If I do that, it'll change the filesystem that was created with btrfs receive and break it, preventing further updates, correct? If so, can I get around that by making a boot snapshot after each copy and mount that snapshot for emergency boot instead of the main volume? Yes that will work. -chris -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs-transaction blocked for more than 120 seconds
Marc MERLIN posted on Fri, 03 Jan 2014 09:25:06 -0800 as excerpted: First, a big thank you for taking the time to post this very informative message. On Wed, Jan 01, 2014 at 12:37:42PM +, Duncan wrote: Apparently the way some distribution installation scripts work results in even a brand new installation being highly fragmented. =:^( If in addition they don't add autodefrag to the mount options used when mounting the filesystem for the original installation, the problem is made even worse, since the autodefrag mount option is designed to help catch some of this sort of issue, and schedule the affected files for auto-defrag by a separate thread. Assuming you can stomach a bit of occasional performance loss due to autodefrag, is there a reason not to always have this on btrfs filesystems in newer kernels? (let's say 3.12+)? Is there even a reason for this not to become a default mount option in newer kernels? For big internal write files, autodefrag isn't yet well tuned, because it effectively write-magnifies too much, forcing rewrite of the entire file for just a small change. If whatever app is more or less constantly writing those small changes, faster than the file can be rewritten... I don't know where the break-over might be, but certainly, multi-gig sized IO-active VMs images or databases aren't something I'd want to use it with. That's where the NOCOW thing will likely work better. IIRC someone also mentioned problems with autodefrag and an about 3/4 gig systemd journal. My gut feeling (IOW, *NOT* benchmarked!) is that double- digit MiB files should /normally/ be fine, but somewhere in the lower triple digits, write-magnification could well become an issue, depending of course on exactly how much active writing the app is doing into the file. As I said there's more work going into tuning autodefrag ATM, but as it is, I couldn't really recommend making it a global default... tho maybe a distro could enable it by default on a no-VM desktop system (as opposed to a server). Certainly I'd recommend most desktop types enable it. The NOCOW file attribute. Simple command form: chattr +C /path/to/file/or/directory Thank you for that tip, I had been unaware of it 'till now. This will make my virtualbox image directory much happier :) I think I said it, but it bears repeating. Once you set that attribute on the dir, you may want to move the files out of the dir (to another partition would make sure the data is actually moved) and back in, so they're effectively new files in the dir. Or use something like cat oldfile newfile, so you know it's actually creating the new file, not reflinking. That'll ensure the NOCOW takes effect. Unfortunately, on a 83GB vdi (virtualbox) file, with 3.12.5, it did a lot of writing and chewed up my 4 CPUs. Then, it started to be hard to move my mouse cursor and my procmeter graph was barely updating seconds. Next, nothing updated on my X server anymore, not even seconds in time widgets. But, I could still sometimes move my mouse cursor, and I could sometimes see the HD light fliker a bit before going dead again. In other words, the system wasn't fully deadlocked, but btrfs sure got into a state where it was unable to to finish the job, and took the kernel down with it (64bit, 8GB of RAM). I waited 2H and it never came out of it, I had to power down the system in the end. Note that this was on a top of the line 500MB/s write Samsung Evo 840 SSD, not a slow HD. That was defrag (the command) or autodefrag (the mount option)? I'd guess defrag (the command). That's fragmentation for you! What did/does filefrag have to say about that file? Were you the one that posted the 6-digit extents? For something that bad, it might be faster to copy/move it off-device (expect it to take awhile) then move it back. That way you're only trying to read OR write on the device, not both, and the move elsewhere should defrag it quite a bit, effectively sequential write, then read and write on the move back. But even that might be prohibitive. At some point, you may need to either simply give up on it (if you're lazy), or get down and dirty with the tracing/profiling, working with a dev to figure out where it's spending its time and hopefully get btrfs recoded to work a bit faster for that sort of thing. I think I had enough free space: Label: 'btrfs_pool1' uuid: 4850ee22-bf32-4131-a841-02abdb4a5ba6 Total devices 1 FS bytes used 732.14GB devid1 size 865.01GB used 865.01GB path /dev/dm-0 Is it possible expected behaviour of defrag to lock up on big files? Should I have had more spare free space for it to work? Other? From my understanding it's not the file size, but the number of fragments. I'm guessing you simply overwhelmed the system. Ideally you never let it get that bad in the first place. =:^( As I suggested above, you might try the old school method of defrag,
btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT
I'm using Ubuntu 12.04.3 with an up-to-date 3.11 kernel, and the btrfs-progs from Debian Sid (since the ones from Ubuntu are ancient). I discovered to my horror during testing today that neither raid1 nor raid10 arrays are fault tolerant of losing an actual disk. mkfs.btrfs -d raid10 -m raid10 /dev/vdc /dev/vdd /dev/vdd /dev/vde mkdir /test mount /dev/vdb /test echo test /test/test btrfs filesystem sync /test shutdown -hP now After shutting down the VM, I can remove ANY of the drives from the btrfs raid10 array, and be unable to mount the array. In this case, I removed the drive that was at /dev/vde, then restarted the VM. btrfs fi show Label: none uuid: 94af1f5d-6ad2-4582-ab4a-5410c410c455 Total devices 4 FS bytes used 156.00KB devid3 size 1.00GB used 212.75MB path /dev/vdd devid3 size 1.00GB used 212.75MB path /dev/vdc devid3 size 1.00GB used 232.75MB path /dev/vdb *** Some devices missing OK, we have three of four raid10 devices present. Should be fine. Let's mount it: mount -t btrfs /dev/vdb /test mount: wrong fs type, bad option, bad superblock on /dev/vdb, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so What's the kernel log got to say about it? dmesg | tail -n 4 [ 536.694363] device fsid 94af1f5d-6ad2-4582-ab4a-5410c410c455 devid 1 transid 7 /dev/vdb [ 536.700515] btrfs: disk space caching is enabled [ 536.703491] btrfs: failed to read the system array on vdd [ 536.708337] btrfs: open_ctree failed Same behavior persists whether I create a raid1 or raid10 array, and whether I create it as that raid level using mkfs.btrfs or convert it afterwards using btrfs balance start -dconvert=raidn -mconvert=raidn. Also persists even if I both scrub AND sync the array before shutting the machine down and removing one of the disks. What's up with this? This is a MASSIVE bug, and I haven't seen anybody else talking about it... has nobody tried actually failing out a disk yet, or what? -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT
Am 03.01.2014 23:28, schrieb Jim Salter: I'm using Ubuntu 12.04.3 with an up-to-date 3.11 kernel, and the btrfs-progs from Debian Sid (since the ones from Ubuntu are ancient). I discovered to my horror during testing today that neither raid1 nor raid10 arrays are fault tolerant of losing an actual disk. mkfs.btrfs -d raid10 -m raid10 /dev/vdc /dev/vdd /dev/vdd /dev/vde mkdir /test mount /dev/vdb /test echo test /test/test btrfs filesystem sync /test shutdown -hP now After shutting down the VM, I can remove ANY of the drives from the btrfs raid10 array, and be unable to mount the array. In this case, I removed the drive that was at /dev/vde, then restarted the VM. btrfs fi show Label: none uuid: 94af1f5d-6ad2-4582-ab4a-5410c410c455 Total devices 4 FS bytes used 156.00KB devid3 size 1.00GB used 212.75MB path /dev/vdd devid3 size 1.00GB used 212.75MB path /dev/vdc devid3 size 1.00GB used 232.75MB path /dev/vdb *** Some devices missing OK, we have three of four raid10 devices present. Should be fine. Let's mount it: mount -t btrfs /dev/vdb /test mount: wrong fs type, bad option, bad superblock on /dev/vdb, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so What's the kernel log got to say about it? dmesg | tail -n 4 [ 536.694363] device fsid 94af1f5d-6ad2-4582-ab4a-5410c410c455 devid 1 transid 7 /dev/vdb [ 536.700515] btrfs: disk space caching is enabled [ 536.703491] btrfs: failed to read the system array on vdd [ 536.708337] btrfs: open_ctree failed Same behavior persists whether I create a raid1 or raid10 array, and whether I create it as that raid level using mkfs.btrfs or convert it afterwards using btrfs balance start -dconvert=raidn -mconvert=raidn. Also persists even if I both scrub AND sync the array before shutting the machine down and removing one of the disks. What's up with this? This is a MASSIVE bug, and I haven't seen anybody else talking about it... has nobody tried actually failing out a disk yet, or what? Hey Jim, keep calm and read the wiki ;) https://btrfs.wiki.kernel.org/ You need to mount with -o degraded to tell btrfs a disk is missing. Joshua -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT
I actually read the wiki pretty obsessively before blasting the list - could not successfully find anything answering the question, by scanning the FAQ or by Googling. You're right - mount -t btrfs -o degraded /dev/vdb /test worked fine. HOWEVER - this won't allow a root filesystem to mount. How do you deal with this if you'd set up a btrfs-raid1 or btrfs-raid10 as your root filesystem? Few things are scarier than seeing the cannot find init message in GRUB and being faced with a BusyBox prompt... which is actually how I initially got my scare; I was trying to do a walkthrough for setting up a raid1 / for an article in a major online magazine and it wouldn't boot at all after removing a device; I backed off and tested with a non root filesystem before hitting the list. I did find the -o degraded argument in the wiki now that you mentioned it - but it's not prominent enough if you ask me. =) On 01/03/2014 05:43 PM, Joshua Schüler wrote: Am 03.01.2014 23:28, schrieb Jim Salter: I'm using Ubuntu 12.04.3 with an up-to-date 3.11 kernel, and the btrfs-progs from Debian Sid (since the ones from Ubuntu are ancient). I discovered to my horror during testing today that neither raid1 nor raid10 arrays are fault tolerant of losing an actual disk. mkfs.btrfs -d raid10 -m raid10 /dev/vdc /dev/vdd /dev/vdd /dev/vde mkdir /test mount /dev/vdb /test echo test /test/test btrfs filesystem sync /test shutdown -hP now After shutting down the VM, I can remove ANY of the drives from the btrfs raid10 array, and be unable to mount the array. In this case, I removed the drive that was at /dev/vde, then restarted the VM. btrfs fi show Label: none uuid: 94af1f5d-6ad2-4582-ab4a-5410c410c455 Total devices 4 FS bytes used 156.00KB devid3 size 1.00GB used 212.75MB path /dev/vdd devid3 size 1.00GB used 212.75MB path /dev/vdc devid3 size 1.00GB used 232.75MB path /dev/vdb *** Some devices missing OK, we have three of four raid10 devices present. Should be fine. Let's mount it: mount -t btrfs /dev/vdb /test mount: wrong fs type, bad option, bad superblock on /dev/vdb, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so What's the kernel log got to say about it? dmesg | tail -n 4 [ 536.694363] device fsid 94af1f5d-6ad2-4582-ab4a-5410c410c455 devid 1 transid 7 /dev/vdb [ 536.700515] btrfs: disk space caching is enabled [ 536.703491] btrfs: failed to read the system array on vdd [ 536.708337] btrfs: open_ctree failed Same behavior persists whether I create a raid1 or raid10 array, and whether I create it as that raid level using mkfs.btrfs or convert it afterwards using btrfs balance start -dconvert=raidn -mconvert=raidn. Also persists even if I both scrub AND sync the array before shutting the machine down and removing one of the disks. What's up with this? This is a MASSIVE bug, and I haven't seen anybody else talking about it... has nobody tried actually failing out a disk yet, or what? Hey Jim, keep calm and read the wiki ;) https://btrfs.wiki.kernel.org/ You need to mount with -o degraded to tell btrfs a disk is missing. Joshua -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT
Am 03.01.2014 23:56, schrieb Jim Salter: I actually read the wiki pretty obsessively before blasting the list - could not successfully find anything answering the question, by scanning the FAQ or by Googling. You're right - mount -t btrfs -o degraded /dev/vdb /test worked fine. don't forget to btrfs device delete missing path See https://btrfs.wiki.kernel.org/index.php/Using_Btrfs_with_Multiple_Devices HOWEVER - this won't allow a root filesystem to mount. How do you deal with this if you'd set up a btrfs-raid1 or btrfs-raid10 as your root filesystem? Few things are scarier than seeing the cannot find init message in GRUB and being faced with a BusyBox prompt... which is actually how I initially got my scare; I was trying to do a walkthrough for setting up a raid1 / for an article in a major online magazine and it wouldn't boot at all after removing a device; I backed off and tested with a non root filesystem before hitting the list. Add -o degraded to the boot-options in GRUB. If your filesystem is more heavily corrupted then you either need the btrfs tools in your initrd or a rescue cd I did find the -o degraded argument in the wiki now that you mentioned it - but it's not prominent enough if you ask me. =) [snip] Joshua -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT
On Fri, Jan 03, 2014 at 05:56:42PM -0500, Jim Salter wrote: I actually read the wiki pretty obsessively before blasting the list - could not successfully find anything answering the question, by scanning the FAQ or by Googling. You're right - mount -t btrfs -o degraded /dev/vdb /test worked fine. HOWEVER - this won't allow a root filesystem to mount. How do you deal with this if you'd set up a btrfs-raid1 or btrfs-raid10 as your root filesystem? Few things are scarier than seeing the cannot find init message in GRUB and being faced with a BusyBox prompt... Use grub's command-line editing to add rootflags=degraded to it. Hugo. which is actually how I initially got my scare; I was trying to do a walkthrough for setting up a raid1 / for an article in a major online magazine and it wouldn't boot at all after removing a device; I backed off and tested with a non root filesystem before hitting the list. I did find the -o degraded argument in the wiki now that you mentioned it - but it's not prominent enough if you ask me. =) On 01/03/2014 05:43 PM, Joshua Schüler wrote: Am 03.01.2014 23:28, schrieb Jim Salter: I'm using Ubuntu 12.04.3 with an up-to-date 3.11 kernel, and the btrfs-progs from Debian Sid (since the ones from Ubuntu are ancient). I discovered to my horror during testing today that neither raid1 nor raid10 arrays are fault tolerant of losing an actual disk. mkfs.btrfs -d raid10 -m raid10 /dev/vdc /dev/vdd /dev/vdd /dev/vde mkdir /test mount /dev/vdb /test echo test /test/test btrfs filesystem sync /test shutdown -hP now After shutting down the VM, I can remove ANY of the drives from the btrfs raid10 array, and be unable to mount the array. In this case, I removed the drive that was at /dev/vde, then restarted the VM. btrfs fi show Label: none uuid: 94af1f5d-6ad2-4582-ab4a-5410c410c455 Total devices 4 FS bytes used 156.00KB devid3 size 1.00GB used 212.75MB path /dev/vdd devid3 size 1.00GB used 212.75MB path /dev/vdc devid3 size 1.00GB used 232.75MB path /dev/vdb *** Some devices missing OK, we have three of four raid10 devices present. Should be fine. Let's mount it: mount -t btrfs /dev/vdb /test mount: wrong fs type, bad option, bad superblock on /dev/vdb, missing codepage or helper program, or other error In some cases useful info is found in syslog - try dmesg | tail or so What's the kernel log got to say about it? dmesg | tail -n 4 [ 536.694363] device fsid 94af1f5d-6ad2-4582-ab4a-5410c410c455 devid 1 transid 7 /dev/vdb [ 536.700515] btrfs: disk space caching is enabled [ 536.703491] btrfs: failed to read the system array on vdd [ 536.708337] btrfs: open_ctree failed Same behavior persists whether I create a raid1 or raid10 array, and whether I create it as that raid level using mkfs.btrfs or convert it afterwards using btrfs balance start -dconvert=raidn -mconvert=raidn. Also persists even if I both scrub AND sync the array before shutting the machine down and removing one of the disks. What's up with this? This is a MASSIVE bug, and I haven't seen anybody else talking about it... has nobody tried actually failing out a disk yet, or what? Hey Jim, keep calm and read the wiki ;) https://btrfs.wiki.kernel.org/ You need to mount with -o degraded to tell btrfs a disk is missing. Joshua -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Eighth Army Push Bottles Up Germans -- WWII newspaper --- headline (possibly apocryphal) signature.asc Description: Digital signature
Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT
Sorry - where do I put this in GRUB? /boot/grub/grub.cfg is still kinda black magic to me, and I don't think I'm supposed to be editing it directly at all anymore anyway, if I remember correctly... HOWEVER - this won't allow a root filesystem to mount. How do you deal with this if you'd set up a btrfs-raid1 or btrfs-raid10 as your root filesystem? Few things are scarier than seeing the cannot find init message in GRUB and being faced with a BusyBox prompt... which is actually how I initially got my scare; I was trying to do a walkthrough for setting up a raid1 / for an article in a major online magazine and it wouldn't boot at all after removing a device; I backed off and tested with a non root filesystem before hitting the list. Add -o degraded to the boot-options in GRUB. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT
On Fri, Jan 03, 2014 at 06:13:25PM -0500, Jim Salter wrote: Sorry - where do I put this in GRUB? /boot/grub/grub.cfg is still kinda black magic to me, and I don't think I'm supposed to be editing it directly at all anymore anyway, if I remember correctly... You don't need to edit grub.cfg -- when you boot, grub has an edit option, so you can do it at boot time without having to use a rescue disk. Regardless, the thing you need to edit is the line starting linux, and will look something like this: linux /vmlinuz-3.11.0-rc2-dirty root=UUID=1b6ec419-211a-445e-b762-ae7da27b6e8a ro single rootflags=subvol=fs-root If there's a rootflags= option already (as above), add ,degraded to the end. If there isn't, add rootflags=degraded. Hugo. HOWEVER - this won't allow a root filesystem to mount. How do you deal with this if you'd set up a btrfs-raid1 or btrfs-raid10 as your root filesystem? Few things are scarier than seeing the cannot find init message in GRUB and being faced with a BusyBox prompt... which is actually how I initially got my scare; I was trying to do a walkthrough for setting up a raid1 / for an article in a major online magazine and it wouldn't boot at all after removing a device; I backed off and tested with a non root filesystem before hitting the list. Add -o degraded to the boot-options in GRUB. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Eighth Army Push Bottles Up Germans -- WWII newspaper --- headline (possibly apocryphal) signature.asc Description: Digital signature
Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT
On Jan 3, 2014, at 3:56 PM, Jim Salter j...@jrs-s.net wrote: I actually read the wiki pretty obsessively before blasting the list - could not successfully find anything answering the question, by scanning the FAQ or by Googling. You're right - mount -t btrfs -o degraded /dev/vdb /test worked fine. HOWEVER - this won't allow a root filesystem to mount. How do you deal with this if you'd set up a btrfs-raid1 or btrfs-raid10 as your root filesystem? I'd say that it's not ready for unattended/auto degraded mounting, that this is intended to be a red flag show stopper to get the attention of the user. Before automatic degraded mounts, which md and LVM raid do now, there probably needs to be notification support in desktop's, .e.g. Gnome will report degraded state for at least md arrays (maybe LVM too, not sure). There's also a list of other multiple device stuff on the to do, some of which maybe should be done before auto degraded mount, for example the hot spare work. https://btrfs.wiki.kernel.org/index.php/Project_ideas#Multiple_Devices Chris Murphy-- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT
Yep - had just figured that out and successfully booted with it, and was in the process of typing up instructions for the list (and posterity). One thing that concerns me is that edits made directly to grub.cfg will get wiped out with every kernel upgrade when update-grub is run - any idea where I'd put this in /etc/grub.d to have a persistent change? I have to tell you, I'm not real thrilled with this behavior either way - it means I can't have the option to automatically mount degraded filesystems without the filesystems in question ALWAYS showing as being mounted degraded, whether the disks are all present and working fine or not. That's kind of blecchy. =\ On 01/03/2014 06:18 PM, Hugo Mills wrote: On Fri, Jan 03, 2014 at 06:13:25PM -0500, Jim Salter wrote: Sorry - where do I put this in GRUB? /boot/grub/grub.cfg is still kinda black magic to me, and I don't think I'm supposed to be editing it directly at all anymore anyway, if I remember correctly... You don't need to edit grub.cfg -- when you boot, grub has an edit option, so you can do it at boot time without having to use a rescue disk. Regardless, the thing you need to edit is the line starting linux, and will look something like this: linux /vmlinuz-3.11.0-rc2-dirty root=UUID=1b6ec419-211a-445e-b762-ae7da27b6e8a ro single rootflags=subvol=fs-root If there's a rootflags= option already (as above), add ,degraded to the end. If there isn't, add rootflags=degraded. Hugo. HOWEVER - this won't allow a root filesystem to mount. How do you deal with this if you'd set up a btrfs-raid1 or btrfs-raid10 as your root filesystem? Few things are scarier than seeing the cannot find init message in GRUB and being faced with a BusyBox prompt... which is actually how I initially got my scare; I was trying to do a walkthrough for setting up a raid1 / for an article in a major online magazine and it wouldn't boot at all after removing a device; I backed off and tested with a non root filesystem before hitting the list. Add -o degraded to the boot-options in GRUB. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT
On Jan 3, 2014, at 4:13 PM, Jim Salter j...@jrs-s.net wrote: Sorry - where do I put this in GRUB? /boot/grub/grub.cfg is still kinda black magic to me, and I don't think I'm supposed to be editing it directly at all anymore anyway, if I remember correctly… Don't edit the grub.cfg directly. At the grub menu, only highlight the entry you want to boot, then hit 'e', and then edit the existing linux/linuxefi line. If you already have rootfs on a subvolume, you'll have an existing parameter on that line rootflags=subvol=rootname and you can change this to rootflags=subvol=rootname,degraded I would not make this option persistent by putting it permanently in the grub.cfg; although I don't know the consequence of always mounting with degraded even if not necessary it could have some negative effects (?) Chris Murphy-- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT
On Jan 3, 2014, at 4:25 PM, Jim Salter j...@jrs-s.net wrote: One thing that concerns me is that edits made directly to grub.cfg will get wiped out with every kernel upgrade when update-grub is run - any idea where I'd put this in /etc/grub.d to have a persistent change? /etc/default/grub I don't recommend making it persistent. At this stage of development, a disk failure should cause mount failure so you're alerted to the problem. I have to tell you, I'm not real thrilled with this behavior either way - it means I can't have the option to automatically mount degraded filesystems without the filesystems in question ALWAYS showing as being mounted degraded, whether the disks are all present and working fine or not. That's kind of blecchy. =\ If you need something that comes up degraded automatically by design as a supported use case, use md (or possibly LVM which uses different user space tools and monitoring but uses the md kernel driver code and supports raid 0,1,5,6 - quite nifty). I haven't tried this yet, but I think that's also supported with the thin provisioning work, which even if you don't use thin provisioning gets you the significantly more efficient snapshot behavior. Chris Murphy-- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfsck does not fix
On Jan 3, 2014, at 12:41 PM, Hendrik Friedel hend...@friedels.name wrote: Hello, I ran btrfsck on my volume with the repair option. When I re-run it, I get the same errors as before. Did you try mounting with -o recovery first? https://btrfs.wiki.kernel.org/index.php/Problem_FAQ What messages in dmesg so you get when you use recovery? Chris Murphy-- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT
For anybody else interested, if you want your system to automatically boot a degraded btrfs array, here are my crib notes, verified working: * boot degraded 1. edit /etc/grub.d/10_linux, add degraded to the rootflags GRUB_CMDLINE_LINUX=rootflags=degraded,subvol=${rootsubvol} ${GRUB_CMDLINE_LINUX} 2. add degraded to options in /etc/fstab also UUID=bf9ea9b9-54a7-4efc-8003-6ac0b344c6b5 / btrfs defaults,degraded,subvol=@ 0 1 3. Update and reinstall GRUB to all boot disks update-grub grub-install /dev/vda grub-install /dev/vdb Now you have a system which will automatically start a degraded array. ** Side note: sorry, but I absolutely don't buy the argument that the system won't boot without you driving down to its physical location, standing in front of it, and hammering panickily at a BusyBox prompt is the best way to find out your array is degraded. I'll set up a Nagios module to check for degraded arrays using btrfs fi list instead, thanks... On 01/03/2014 06:06 PM, Freddie Cash wrote: Why is manual intervention even needed? Why isn't the filesystem smart enough to mount in a degraded mode automatically? -- Freddie Cash fjwc...@gmail.com mailto:fjwc...@gmail.com -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT
Minor correction: you need to close the double-quotes at the end of the GRUB_CMDLINE_LINUX line: GRUB_CMDLINE_LINUX=rootflags=degraded,subvol=${rootsubvol} ${GRUB_CMDLINE_LINUX} On 01/03/2014 06:42 PM, Jim Salter wrote: For anybody else interested, if you want your system to automatically boot a degraded btrfs array, here are my crib notes, verified working: * boot degraded 1. edit /etc/grub.d/10_linux, add degraded to the rootflags GRUB_CMDLINE_LINUX=rootflags=degraded,subvol=${rootsubvol} ${GRUB_CMDLINE_LINUX} 2. add degraded to options in /etc/fstab also UUID=bf9ea9b9-54a7-4efc-8003-6ac0b344c6b5 / btrfs defaults,degraded,subvol=@ 0 1 3. Update and reinstall GRUB to all boot disks update-grub grub-install /dev/vda grub-install /dev/vdb Now you have a system which will automatically start a degraded array. ** Side note: sorry, but I absolutely don't buy the argument that the system won't boot without you driving down to its physical location, standing in front of it, and hammering panickily at a BusyBox prompt is the best way to find out your array is degraded. I'll set up a Nagios module to check for degraded arrays using btrfs fi list instead, thanks... On 01/03/2014 06:06 PM, Freddie Cash wrote: Why is manual intervention even needed? Why isn't the filesystem smart enough to mount in a degraded mode automatically? -- Freddie Cash fjwc...@gmail.com mailto:fjwc...@gmail.com -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: coredump in btrfsck
On Jan 3, 2014, at 5:33 AM, Marc MERLIN m...@merlins.org wrote: Would it be possible for whoever maintains btrfs-tools to change both the man page and the help included in the tool to clearly state that running the fsck tool is unlikely to be the right course of action and talk about btrfs-zero-log as well as mount -o recovery? The problem FAQ doesn't even mention btrfsck so I think people are just getting around that page or making assumptions. https://btrfs.wiki.kernel.org/index.php/Problem_FAQ Should btrfs check (btrfsck without --repair) work similar to xfs_repair when the file system is not cleanly unmounted? If an XFS volume is not cleanly unmounted, running xfs_repair will instruct the user to first mount the volume so that the journal is replayed, then umount the volume, then run xfs_repair. A possible variant of this for btrfs check: inform the user the first step in repairing a problem Btrfs volume is to use -o recovery, for more information see Btrfs FAQ url for additional problem solving recommendations. ? Chris Murphy-- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT
On Jan 3, 2014, at 4:42 PM, Jim Salter j...@jrs-s.net wrote: For anybody else interested, if you want your system to automatically boot a degraded btrfs array, here are my crib notes, verified working: * boot degraded 1. edit /etc/grub.d/10_linux, add degraded to the rootflags GRUB_CMDLINE_LINUX=rootflags=degraded,subvol=${rootsubvol} ${GRUB_CMDLINE_LINUX} This is the wrong way to solve this. /etc/grub.d/10_linux is subject to being replaced on updates. It is not recommended it be edited, same as for grub.cfg. The correct way is as I already stated, which is to edit the GRUB_CMDLINE_LINUX= line in /etc/default/grub. 2. add degraded to options in /etc/fstab also UUID=bf9ea9b9-54a7-4efc-8003-6ac0b344c6b5 / btrfs defaults,degraded,subvol=@ 0 1 I think it's bad advice to recommend always persistently mounting a good volume with this option. There's a reason why degraded is not the default mount option, and why there isn't yet automatic degraded mount functionality. That fstab contains other errors. The correct way to automate this before Btrfs developers get around to it is to create a systemd unit that checks for the mount failure, determines that there's a missing device, and generates a modified sysroot.mount job that includes degraded. Side note: sorry, but I absolutely don't buy the argument that the system won't boot without you driving down to its physical location, standing in front of it, and hammering panickily at a BusyBox prompt is the best way to find out your array is degraded. You're simply dissatisfied with the state of Btrfs development and are suggesting bad hacks as a work around. That's my argument. Again, if your use case requires automatic degraded mounts, use a technology that's mature and well tested for that use case. Don't expect a lot of sympathy if these bad hacks cause you problems later. I'll set up a Nagios module to check for degraded arrays using btrfs fi list instead, thanks… That's a good idea, except that it's show rather than list. Chris Murphy-- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Status of raid5/6 in 2014?
I personally consider proper RAID6 support with gracious non-intrusive handling of failing drives and a proper warning mechanism the most important missing feature of btrfs, and I know this view is shared by many others with software RAID based storage systems, currently limited by the existing choises on Linux. But having been a (naughty) user of btrfs the last few months I fully understand that there are important bugs, performance fixes and issues in the existing state of btrfs that need more immediate attention as they affect the currently installed base. I will however stress that the faster the functionality gets implemented the sooner users like myself can begin using it and reporting issues, and hence btrfs gets ready for enterprise usage and general deployment sooner. Regards, Hans-Kristian Bakke Mvh Hans-Kristian Bakke On 3 January 2014 17:45, Dave d...@thekilempire.com wrote: Back in Feb 2013 there was quite a bit of press about the preliminary raid5/6 implementation in Btrfs. At the time it wasn't useful for anything other then testing and it's my understanding that this is still the case. I've seen a few git commits and some chatter on this list but it would appear the developers are largely silent. Parity based raid would be a powerful addition the the Btrfs feature stack and it's the feature I most anxiously await. Are there any milestones planned for 2014? Keep up the good work... -- -=[dave]=- Entropy isn't what it used to be. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT
On 01/03/2014 07:27 PM, Chris Murphy wrote: This is the wrong way to solve this. /etc/grub.d/10_linux is subject to being replaced on updates. It is not recommended it be edited, same as for grub.cfg. The correct way is as I already stated, which is to edit the GRUB_CMDLINE_LINUX= line in /etc/default/grub. Fair enough - though since I already have to monkey-patch 00_header, I kind of already have an eye on grub.d so it doesn't seem as onerous as it otherwise would. There is definitely a lot of work that needs to be done on the boot sequence for btrfs IMO. I think it's bad advice to recommend always persistently mounting a good volume with this option. There's a reason why degraded is not the default mount option, and why there isn't yet automatic degraded mount functionality. That fstab contains other errors. What other errors does it contain? Aside from adding the degraded option, that's a bone-stock fstab entry from an Ubuntu Server installation. The correct way to automate this before Btrfs developers get around to it is to create a systemd unit that checks for the mount failure, determines that there's a missing device, and generates a modified sysroot.mount job that includes degraded. Systemd is not the boot system in use for my distribution, and using it would require me to build a custom kernel, among other things. We're going to have to agree to disagree that that's an appropriate workaround, I think. You're simply dissatisfied with the state of Btrfs development and are suggesting bad hacks as a work around. That's my argument. Again, if your use case requires automatic degraded mounts, use a technology that's mature and well tested for that use case. Don't expect a lot of sympathy if these bad hacks cause you problems later. You're suggesting the wrong alternatives here (mdraid, LVM, etc) - they don't provide the features that I need or are accustomed to (true snapshots, copy on write, self-correcting redundant arrays, and on down the line). If you're going to shoo me off, the correct way to do it is to wave me in the direction of ZFS, in which case I can tell you I've been a happy user of ZFS for 5+ years now on hundreds of systems. ZFS and btrfs are literally the *only* options available that do what I want to do, and have been doing for years now. (At least aside from six-figure-and-up proprietary systems, which I have neither the budget nor the inclination for.) I'm testing btrfs heavily in throwaway virtual environments and in a few small, heavily-monitored test production instances because ZFS on Linux has its own set of problems, both technical and licensing, and I think it's clear btrfs is going to take the lead in the very near future - in many ways, it does already. I'll set up a Nagios module to check for degraded arrays using btrfs fi list instead, thanks… That's a good idea, except that it's show rather than list. Yup, that's what I meant all right. I frequently still get the syntax backwards between btrfs fi show and btrfs subv list. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT
On Fri, Jan 3, 2014 at 9:59 PM, Jim Salter j...@jrs-s.net wrote: You're suggesting the wrong alternatives here (mdraid, LVM, etc) - they don't provide the features that I need or are accustomed to (true snapshots, copy on write, self-correcting redundant arrays, and on down the line). If you're going to shoo me off, the correct way to do it is to wave me in the direction of ZFS, in which case I can tell you I've been a happy user of ZFS for 5+ years now on hundreds of systems. ZFS and btrfs are literally the *only* options available that do what I want to do, and have been doing for years now. (At least aside from six-figure-and-up proprietary systems, which I have neither the budget nor the inclination for.) Jim, there's nothing stopping you from creating a Btrfs filesystem on top of an mdraid array. I'm currently running three WD Red 3TB drives in a raid5 configuration under a Btrfs filesystem. This configuration works pretty well and fills the feature gap you're describing. I will say, though, that the whole tone of your email chain leaves a bad taste in my mouth; kind of like a poorly adjusted relative who shows up once a year for Thanksgiving and makes everyone feel uncomfortable. I find myself annoyed by the constant disclaimers I read on this list, about the experimental status of Btrfs, but it's apparent that this hasn't sunk in for everyone. Your poor budget doesn't a production filesytem make. I and many others on this list who have been using Btrfs, will tell you with no hesitation, that due to the maturity of the code, Btrfs should be making NO assumptions in the event of a failure, and everything should come to a screeching halt. I've seen it all: the infamous 120 second process hangs, csum errors, multiple separate catastrophic failures (search me on this list). Things are MOSTLY stable but you simply have to glance at a few weeks of history on this list to see the experimental status is fully justified. I use Btrfs because of its intoxicating feature set. As an IT director though, I'd never subject my company to these rigors. If Btrfs on mdraid isn't an acceptable solution for you, then ZFS is the only responsible alternative. -- -=[dave]=- Entropy isn't what it used to be. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: btrfs raid1 and btrfs raid10 arrays NOT REDUNDANT
Chris Murphy posted on Fri, 03 Jan 2014 16:22:44 -0700 as excerpted: I would not make this option persistent by putting it permanently in the grub.cfg; although I don't know the consequence of always mounting with degraded even if not necessary it could have some negative effects (?) Degraded only actually does anything if it's actually needed. On a normal array it'll be a NOOP, so should be entirely safe for /normal/ operation, but that doesn't mean I'd /recommend/ it for normal operation, since it bypasses checks that are there for a reason, thus silently bypassing information that an admin needs to know before he boots it anyway, in ordered to recover. However, I've some other comments to add: 1) As you I'm uncomfortable with the whole idea of adding degraded permanently at this point. Mention was made of having to drive down to the data center and actually stand in front of the box if something goes wrong, otherwise. At the moment, for btrfs' development state at this point, fine. Btrfs remains under development and there are clear warnings about using it without backups one hasn't tested recovery from or are not otherwise prepared to actually use. It's stated in multiple locations on the wiki; it's stated on the kernel btrfs config option, and it's stated in mkfs.btrfs output when you create the filesystem. If after all that people are using it in a remote situation where they're not prepared to drive down to the data center and stab at the keys if they have to, they're using possibly the right filesystem, but at the wrong too early point in its development, for their needs at this moment. 2) As the wiki explains, certain configurations require at least a minimum number of devices in ordered to work undegraded. The example given in the OP was of a 4-device raid10, already the minimum number to work undegraded, with one device dropped out, to below the minimum required number to mount undegraded, so of /course/ it wouldn't mount without that option. If five or six devices would have been used, a device could have been dropped and the remaining number of devices would still be greater than or equal to the minimum number of devices to run an undegraded raid10, and the result would likely have been different, since there's still enough devices to mount writable with proper redundancy, even if existing information doesn't have that redundancy until a rebalance is done to take care of the missing device. Similarly with a raid1 and its minimum two devices. Configure with three, then drop one, and it should still work as it's above the two minimum for raid1 configuration. Configure with two and drop one, and you'll have to mount degraded (and it'll drop to read-only if it happens in operation) since there's no second device to write the second copy to, as required by raid1. 3) Frankly, this whole thread smells of going off half cocked, posting before doing the proper research. I know when I took a look at btrfs here, I read up on the wiki, reading the multiple devices stuff, the faq, the problem faq, the gotchas, the use cases, the sysadmin guide, the getting started and mount options... loading the pages multiple times as I followed links back and forth between them. Because I care about my data and want to understand what I'm doing with it before I do it! And even now I often reread specific parts as I'm trying to help others with questions on this list Then I still had some questions about how it worked that I couldn't find answers for on the wiki, and as traditional with mailing lists and newsgroups before them, I read several weeks worth of posts (on an archive for lists) before actually posting my questions, to see if they were FAQs already answered on the list. Then and only then did I post the questions to the list, and when I did, it was, Questions I haven't found answers for on the wiki or list, not THE WORLD IS GOING TO END, OH NOS!!111!!11!111!!! Now later on I did post some behavior that had me rather upset, but that was AFTER I had already engaged the list in general, and was pretty sure by that point that what I was seeing was NOT covered on the wiki, and was reasonably new information for at least SOME list users. 4) As a matter of fact, AFAIK that behavior remains relevant today, and may well be of interest to the OP. FWIW my background was Linux kernel md/raid, so I approached the btrfs raid expecting similar behavior. What I found in my testing (and NOT covered on the WIKI or in the various documentation other than in a few threads on list to this day, AFAIK) , however... Test: a) Create a two device btrfs raid1. b) Mount it and write some data to it. c) Unmount it, unplug one device, mount degraded the remaining device. d) Write some data to a test file on it, noting the path/filename and data. e) Unmount again, switch plugged devices so the formerly unplugged one is now the plugged