Re: How can I get blockdev offsets of btrfs chunks for a file?
On Fri, Jul 15, 2016 at 04:21:31PM -0700, Eric Wheeler wrote: > Hello all, > > We do btrfs subvolume snapshots over time for backups. I would like to > traverse the files in the subvolumes and find the total unique chunk count > to calculate total space for a set of subvolumes. btrfs fi du may help here. Alternatively, qgroups should be able to tell you for groups of subvols, if it's set up correctly. You shouldn't need to implement this at a low level yourself... > This sounds kind of like the beginning of what a deduplicator would do, > but I just want to count the blocks, so no submission for deduplication. > I started looking at bedup and other deduplicator code, but the answer to > this question wasn't obvious (to me, anyway). > > Questions: > > Is there an ioctl (or some other way) to get the block device offset for a > file (or file offset) so I can count the unique occurances? This is very much an X/Y question. There already exist a couple of things that are at least close to the thing you actually want to do. :) Hugo. > What API documentation should I review? > > Can you point me at the ioctl(s) that would handle this? > > > Thank you for your help! > > -- Hugo Mills | Reintarnation: Coming back from the dead as a hugo@... carfax.org.uk | hillbilly http://carfax.org.uk/ | PGP: E2AB1DE4 | signature.asc Description: Digital signature
Re: How can I get blockdev offsets of btrfs chunks for a file?
On Fri, Jul 15, 2016 at 04:21:31PM -0700, Eric Wheeler wrote: > We do btrfs subvolume snapshots over time for backups. I would like to > traverse the files in the subvolumes and find the total unique chunk count > to calculate total space for a set of subvolumes. > > This sounds kind of like the beginning of what a deduplicator would do, > but I just want to count the blocks, so no submission for deduplication. > I started looking at bedup and other deduplicator code, but the answer to > this question wasn't obvious (to me, anyway). > > Questions: > > Is there an ioctl (or some other way) to get the block device offset for a > file (or file offset) so I can count the unique occurances? Yes, FIEMAP. You can play with it via "/usr/sbin/filefrag -v". That /usr/sbin is misleading -- FIEMAP doesn't require root, although its predecessor did need that, https://bugs.debian.org/819923 > What API documentation should I review? In kernel sources, Documentation/filesystems/fiemap.txt Meow! -- An imaginary friend squared is a real enemy. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: How can I get blockdev offsets of btrfs chunks for a file?
No answer here, but mate if you are involved in anything that will provide some more automated backup tool for btrfs you got a lot of silent people rooting for you. > On 16 Jul 2016, at 00:21, Eric Wheelerwrote: > > Hello all, > > We do btrfs subvolume snapshots over time for backups. I would like to > traverse the files in the subvolumes and find the total unique chunk count > to calculate total space for a set of subvolumes. > > This sounds kind of like the beginning of what a deduplicator would do, > but I just want to count the blocks, so no submission for deduplication. > I started looking at bedup and other deduplicator code, but the answer to > this question wasn't obvious (to me, anyway). > > Questions: > > Is there an ioctl (or some other way) to get the block device offset for a > file (or file offset) so I can count the unique occurances? > > What API documentation should I review? > > Can you point me at the ioctl(s) that would handle this? > > > Thank you for your help! > > > -- > Eric Wheeler > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
How can I get blockdev offsets of btrfs chunks for a file?
Hello all, We do btrfs subvolume snapshots over time for backups. I would like to traverse the files in the subvolumes and find the total unique chunk count to calculate total space for a set of subvolumes. This sounds kind of like the beginning of what a deduplicator would do, but I just want to count the blocks, so no submission for deduplication. I started looking at bedup and other deduplicator code, but the answer to this question wasn't obvious (to me, anyway). Questions: Is there an ioctl (or some other way) to get the block device offset for a file (or file offset) so I can count the unique occurances? What API documentation should I review? Can you point me at the ioctl(s) that would handle this? Thank you for your help! -- Eric Wheeler -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Btrfs uuid snapshots: orphaned parent_uuid after deleting intermediate subvol
Hello all, If I create three subvolumes like so: # btrfs subvolume create a # btrfs subvolume snapshot a b # btrfs subvolume snapshot b c I get a parent-child relationship which can be determined like so: # btrfs subvolume list -uq /home/ |grep [abc]$ parent_uuid - uuid 0e5f473a-d9e5-144a-8f49-1899af7320ad path a parent_uuid 0e5f473a-d9e5-144a-8f49-1899af7320ad uuid cb4768eb-98e3-5e4c-935d-14f1b97b0de2 path b parent_uuid cb4768eb-98e3-5e4c-935d-14f1b97b0de2 uuid 5ee8de35-2bab-d642-b5c2-f619e46f65c2 path c Now if I delete 'b', the parent_uuid of 'c' doesn't change to point at 'a': # btrfs subvolume delete b # btrfs subvolume list -uq /home/ |grep [abc]$ parent_uuid - uuid 0e5f473a-d9e5-144a-8f49-1899af7320ad path a parent_uuid cb4768eb-98e3-5e4c-935d-14f1b97b0de2 uuid 5ee8de35-2bab-d642-b5c2-f619e46f65c2 path c Notice that 'c' still points at b's UUID, but 'b' is missing and the parent_uuid for 'c' wasn't set to '-' as if it were a root node (like 'a'). Is this an inconsistency? Child parent_uuid's it be updated on delete? It would be nice to know that 'c' is actually a descendent of 'a', even after having deleted 'b'. Is a way to look that up somehow? This is running 4.1.15, so its a bit behind. If this is fixed in a later version then please let me know that too. Thanks! -- Eric Wheeler -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Status of SMR with BTRFS
Thou I’m not a hardcore storage system professional: What disk are you using ? There are two types: 1. SMR managed by device firmware. BTRFS sees that as a normal block device … problems you get are not related to BTRFS it self … 2. SMR managed by host system, BTRFS still does see this as a block device … just emulated by host system to look normal. In case of funky technologies like that I would research how exactly data is stored in terms of “BAND” and experiment with setting leaf & sector size to match a band, then create a btrfs on this device. Run stress.sh on it for couple of days. If you get errors - setup a two standard disk raid1 btrfs file system run stress.sh to see whenever you get errors on this system - to eliminate possibility that your system is actually generating errors. Then come back and we will see what’s going on :) > On 15 Jul 2016, at 19:29, Hendrik Friedelwrote: > > Hello, > > I have a 5TB Seagate drive that uses SMR. > > I was wondering, if BTRFS is usable with this Harddrive technology. So, first > I searched the BTRFS wiki -nothing. Then google. > > * I found this: https://bbs.archlinux.org/viewtopic.php?id=203696 > But this turned out to be an issue not related to BTRFS. > > * Then this: http://www.snia.org/sites/default/files/SDC15_presentations/smr/ > HannesReinecke_Strategies_for_running_unmodified_FS_SMR.pdf > " BTRFS operation matches SMR parameters very closely [...] > > High number of misaligned write accesses ; points to an issue with btrfs > itself > > > * Then this: > http://superuser.com/questions/962257/fastest-linux-filesystem-on-shingled-disks > The BTRFS performance seemed good. > > > * Finally this: http://www.spinics.net/lists/linux-btrfs/msg48072.html > "So you can get mixed results when trying to use the SMR devices but I'd say > it will mostly not work. > But, btrfs has all the fundamental features in place, we'd have to make > adjustments to follow the SMR constraints:" > [...] > I have some notes at > https://github.com/kdave/drafts/blob/master/btrfs/smr-mode.txt; > > > So, now I am wondering, what the state is today. "We" (I am happy to do that; > but not sure of access rights) should also summarize this in the wiki. > My use-case by the way are back-ups. I am thinking of using some of the > interesting BTRFS features for this (send/receive, deduplication) > > Greetings, > Hendrik > > > --- > Diese E-Mail wurde von Avast Antivirus-Software auf Viren geprüft. > https://www.avast.com/antivirus > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: A lot warnings in dmesg while running thunderbird
On 07/15/2016 03:35 PM, Chris Mason wrote: On 07/07/2016 06:24 AM, Gabriel C wrote: Hi, while running thunderbird on linux 4.6.3 and 4.7.0-rc6 ( didn't tested other versions ) I trigger the following : [ 6393.305675] WARNING: CPU: 6 PID: 5870 at fs/btrfs/inode.c:9306 btrfs_destroy_inode+0x22e/0x2a0 [btrfs] Every time I've reproduced this, I've hit a warning in extent-tree.c about trying to decrement bytes_may_use too far. Then I get enospc on every operation. Josef fixed a few corner cases here with his new enospc changes, and I'm not able to trigger (yet) with those applied. Dave Sterba has them all in his for-next branch: git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-next Can you please try that on top of v4.7-rc7 A few hours later and it reproduced on this kernel too. What must be happening is we're freeing too many bytes from bytes_may_use. I'll get tracing in and nail it down. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: A lot warnings in dmesg while running thunderbird
On 07/07/2016 06:24 AM, Gabriel C wrote: Hi, while running thunderbird on linux 4.6.3 and 4.7.0-rc6 ( didn't tested other versions ) I trigger the following : [ 6393.305675] WARNING: CPU: 6 PID: 5870 at fs/btrfs/inode.c:9306 btrfs_destroy_inode+0x22e/0x2a0 [btrfs] Every time I've reproduced this, I've hit a warning in extent-tree.c about trying to decrement bytes_may_use too far. Then I get enospc on every operation. Josef fixed a few corner cases here with his new enospc changes, and I'm not able to trigger (yet) with those applied. Dave Sterba has them all in his for-next branch: git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-next Can you please try that on top of v4.7-rc7 -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] Btrfs: fix free space tree bitmaps+tests on big-endian systems
On Fri, Jul 15, 2016 at 12:34:10PM +0530, Chandan Rajendra wrote: > On Thursday, July 14, 2016 07:47:04 PM Chris Mason wrote: > > On 07/14/2016 07:31 PM, Omar Sandoval wrote: > > > From: Omar Sandoval> > > > > > So it turns out that the free space tree bitmap handling has always been > > > broken on big-endian systems. Totally my bad. > > > > > > Patch 1 fixes this. Technically, it's a disk format change for > > > big-endian systems, but it never could have worked before, so I won't go > > > through the trouble of any incompat bits. If you've somehow been using > > > space_cache=v2 on a big-endian system (I doubt anyone is), you're going > > > to want to mount with nospace_cache to clear it and wait for this to go > > > in. > > > > > > Patch 2 fixes a similar error in the sanity tests (it's the same as the > > > v2 I posted here [1]) and patch 3 expands the sanity tests to catch the > > > oversight that patch 1 fixes. > > > > > > Applies to v4.7-rc7. No regressions in xfstests, and the sanity tests > > > pass on x86_64 and MIPS. > > > > Thanks for fixing this up Omar. Any big endian friends want to try this > > out in extended testing and make sure we've nailed it down? > > > > Hi Omar & Chris, > > I will run fstests with this patchset applied on ppc64 BE and inform you about > the results. > Thanks, Chandan! I set up my xfstests for space_cache=v2 by doing: mkfs.btrfs "$TEST_DEV" mount -o space_cache=v2 "$TEST_DEV" "$TEST_DIR" umount "$TEST_DEV" and adding export MOUNT_OPTIONS="-o space_cache=v2" to local.config. btrfsck also needs the patch here [1]. Thanks again. 1: http://thread.gmane.org/gmane.comp.file-systems.btrfs/58382 -- Omar -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs-progs: fix btrfsck of space_cache=v2 bitmaps on big-endian
From: Omar SandovalCopy le_test_bit() from the kernel and use that for the free space tree bitmaps. Signed-off-by: Omar Sandoval --- Same sort of mistake as in the kernel. Applies to v4.6.1. extent_io.c | 2 +- extent_io.h | 19 +++ kerncompat.h | 3 ++- 3 files changed, 22 insertions(+), 2 deletions(-) diff --git a/extent_io.c b/extent_io.c index c99d3627e370..d956c5731332 100644 --- a/extent_io.c +++ b/extent_io.c @@ -889,5 +889,5 @@ void memset_extent_buffer(struct extent_buffer *eb, char c, int extent_buffer_test_bit(struct extent_buffer *eb, unsigned long start, unsigned long nr) { - return test_bit(nr, (unsigned long *)(eb->data + start)); + return le_test_bit(nr, (u8 *)eb->data + start); } diff --git a/extent_io.h b/extent_io.h index a9a7353556a7..94a42bf5e180 100644 --- a/extent_io.h +++ b/extent_io.h @@ -49,6 +49,25 @@ #define BLOCK_GROUP_DIRTY EXTENT_DIRTY +/* + * The extent buffer bitmap operations are done with byte granularity instead of + * word granularity for two reasons: + * 1. The bitmaps must be little-endian on disk. + * 2. Bitmap items are not guaranteed to be aligned to a word and therefore a + *single word in a bitmap may straddle two pages in the extent buffer. + */ +#define BIT_BYTE(nr) ((nr) / BITS_PER_BYTE) +#define BYTE_MASK ((1 << BITS_PER_BYTE) - 1) +#define BITMAP_FIRST_BYTE_MASK(start) \ + ((BYTE_MASK << ((start) & (BITS_PER_BYTE - 1))) & BYTE_MASK) +#define BITMAP_LAST_BYTE_MASK(nbits) \ + (BYTE_MASK >> (-(nbits) & (BITS_PER_BYTE - 1))) + +static inline int le_test_bit(int nr, const u8 *addr) +{ + return 1U & (addr[BIT_BYTE(nr)] >> (nr & (BITS_PER_BYTE-1))); +} + struct btrfs_fs_info; struct extent_io_tree { diff --git a/kerncompat.h b/kerncompat.h index 378f0552edd2..c9b9b79782b9 100644 --- a/kerncompat.h +++ b/kerncompat.h @@ -55,7 +55,8 @@ #define gfp_t int #define get_cpu_var(p) (p) #define __get_cpu_var(p) (p) -#define BITS_PER_LONG (__SIZEOF_LONG__ * 8) +#define BITS_PER_BYTE 8 +#define BITS_PER_LONG (__SIZEOF_LONG__ * BITS_PER_BYTE) #define __GFP_BITS_SHIFT 20 #define __GFP_BITS_MASK ((int)((1 << __GFP_BITS_SHIFT) - 1)) #define GFP_KERNEL 0 -- 2.9.0 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Data recovery from a linear multi-disk btrfs file system
On 2016-07-15 14:45, Matt wrote: On 15 Jul 2016, at 14:10, Austin S. Hemmelgarnwrote: On 2016-07-15 05:51, Matt wrote: Hello I glued together 6 disks in linear lvm fashion (no RAID) to obtain one large file system (see below). One of the 6 disk failed. What is the best way to recover from this? The tool you want is `btrfs restore`. You'll need somewhere to put the files from this too of course. That said, given that you had data in raid0 mode, you're not likely to get much other than very small files back out of this, and given other factors, you're not likely to get what you would consider reasonable performance out of this either. Thanks so much for pointing me towards btrfs-restore. I surely will give it a try. Note that the FS is not a RAID0 but linear (“JPOD") configuration. This is why it somehow did not occur to me to try btrfs-restore. The good news about in this configuration the files are *not* distributed across disks. We can read most of the files just fine. The failed disk was actually smaller than the others five so that we should be able to recover more than 5/6 of the data, shouldn’t we? My trouble is that the IO errors due to the missing disk cripple the transfer speed of both rsync and dd_rescue. Your own 'btrfs fi df' output clearly says that more than 99% of your data chunks are in a RAID0 profile, hence my statement. Functionally, this is similar to concatenating all the disks, but it gets better performance and is a bit harder to recover data from. I hadn't noticed however that the disks were different sizes, so should be able to recover a significant amount of data from it. Your best bet to get a working filesystem again would be to just recreate it from scratch, there's not much else that can be done when you've got a raid0 profile and have lost a disk. This is what I plan to do if there if btrfs-restore turns out to be too slow and nobody on this list has any better idea. It will, however, require transferring >15TB across the Atlantic (this is were the “backup” reside). This can be tedious which is why I would love to avoid it. Entirely understandable. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Data recovery from a linear multi-disk btrfs file system
> On 15 Jul 2016, at 14:10, Austin S. Hemmelgarnwrote: > > On 2016-07-15 05:51, Matt wrote: >> Hello >> >> I glued together 6 disks in linear lvm fashion (no RAID) to obtain one large >> file system (see below). One of the 6 disk failed. What is the best way to >> recover from this? >> > The tool you want is `btrfs restore`. You'll need somewhere to put the files > from this too of course. That said, given that you had data in raid0 mode, > you're not likely to get much other than very small files back out of this, > and given other factors, you're not likely to get what you would consider > reasonable performance out of this either. Thanks so much for pointing me towards btrfs-restore. I surely will give it a try. Note that the FS is not a RAID0 but linear (“JPOD") configuration. This is why it somehow did not occur to me to try btrfs-restore. The good news about in this configuration the files are *not* distributed across disks. We can read most of the files just fine. The failed disk was actually smaller than the others five so that we should be able to recover more than 5/6 of the data, shouldn’t we? My trouble is that the IO errors due to the missing disk cripple the transfer speed of both rsync and dd_rescue. > Your best bet to get a working filesystem again would be to just recreate it > from scratch, there's not much else that can be done when you've got a raid0 > profile and have lost a disk. This is what I plan to do if there if btrfs-restore turns out to be too slow and nobody on this list has any better idea. It will, however, require transferring >15TB across the Atlantic (this is were the “backup” reside). This can be tedious which is why I would love to avoid it. Matt -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Status of SMR with BTRFS
Hello, I have a 5TB Seagate drive that uses SMR. I was wondering, if BTRFS is usable with this Harddrive technology. So, first I searched the BTRFS wiki -nothing. Then google. * I found this: https://bbs.archlinux.org/viewtopic.php?id=203696 But this turned out to be an issue not related to BTRFS. * Then this: http://www.snia.org/sites/default/files/SDC15_presentations/smr/ HannesReinecke_Strategies_for_running_unmodified_FS_SMR.pdf " BTRFS operation matches SMR parameters very closely [...] High number of misaligned write accesses ; points to an issue with btrfs itself * Then this: http://superuser.com/questions/962257/fastest-linux-filesystem-on-shingled-disks The BTRFS performance seemed good. * Finally this: http://www.spinics.net/lists/linux-btrfs/msg48072.html "So you can get mixed results when trying to use the SMR devices but I'd say it will mostly not work. But, btrfs has all the fundamental features in place, we'd have to make adjustments to follow the SMR constraints:" [...] I have some notes at https://github.com/kdave/drafts/blob/master/btrfs/smr-mode.txt; So, now I am wondering, what the state is today. "We" (I am happy to do that; but not sure of access rights) should also summarize this in the wiki. My use-case by the way are back-ups. I am thinking of using some of the interesting BTRFS features for this (send/receive, deduplication) Greetings, Hendrik --- Diese E-Mail wurde von Avast Antivirus-Software auf Viren geprüft. https://www.avast.com/antivirus -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: FIDEDUPERANGE with src_length == 0
On Thu, Jul 14, 2016 at 11:16:47AM -0700, Omar Sandoval wrote: > On Thu, Jul 14, 2016 at 02:12:58PM -0400, Chris Mason wrote: > > > > > > On 07/14/2016 02:06 PM, Darrick J. Wong wrote: > > > On Wed, Jul 13, 2016 at 03:19:38PM +0200, David Sterba wrote: > > > > On Tue, Jul 12, 2016 at 10:26:43PM -0700, Darrick J. Wong wrote: > > > > > On Mon, Jul 11, 2016 at 05:35:37PM -0700, Omar Sandoval wrote: > > > > > > Hey, Darrick, > > > > > > > > > > > > generic/182 is failing on Btrfs for me with the following output: > > > > > > > > > > > > --- tests/generic/182.out 2016-07-07 19:51:54.0 -0700 > > > > > > +++ /tmp/fixxfstests/xfstests/results//generic/182.out.bad > > > > > > 2016-07-11 17:28:28.230039216 -0700 > > > > > > @@ -1,12 +1,10 @@ > > > > > > QA output created by 182 > > > > > > Create the original files > > > > > > -dedupe: Extents did not match. > > > > > > f4820540fc0ac02750739896fe028d56 TEST_DIR/test-182/file1 > > > > > > 69ad53078a16243d98e21d9f8704a071 TEST_DIR/test-182/file2 > > > > > > 69ad53078a16243d98e21d9f8704a071 TEST_DIR/test-182/file2.chk > > > > > > Compare against check files > > > > > > Make the original file almost dedup-able > > > > > > -dedupe: Extents did not match. > > > > > > f4820540fc0ac02750739896fe028d56 TEST_DIR/test-182/file1 > > > > > > 158d4e3578b94b89cbb44493a2110fb9 TEST_DIR/test-182/file2 > > > > > > 158d4e3578b94b89cbb44493a2110fb9 TEST_DIR/test-182/file2.chk > > > > > > > > > > > > It looks like that test is checking that a dedupe with length == 0 > > > > > > is > > > > > > treated as a dedupe to EOF, but Btrfs doesn't do that [1]. As far > > > > > > as I > > > > > > can tell, it never did, but maybe I'm just confused. What was the > > > > > > behavior when you introduced that test? That seems like a reasonable > > > > > > thing to do, but I wanted to clear this up before changing/fixing > > > > > > Btrfs. > > > > > > > > > > It's a shortcut that we're introducing in the upcoming XFS > > > > > implementation, > > > > > since it shares the same back end as clone/clonerange, which both have > > > > > this behavior. > > > > > > > > The support for zero length does not seem to be mentioned anywhere with > > > > the dedupe range ioctl [1], so the current implemetnation is "up to > > > > spec". That it should be valid is hidden in clone_verify_area where a > > > > zero length is substituted with OFFSET_MAX > > > > > > > > https://urldefense.proofpoint.com/v2/url?u=http-3A__lxr.free-2Delectrons.com_source_fs_read-5Fwrite.c-23L1607=CwIBAg=5VD0RTtNlTh3ycd41b3MUw=9QPtTAxcitoznaWRKKHoEQ=CKo3CgE8Up_NBDdC9t7fCuwHwsdf6nZG2nKcl5-NqnI=ZymMvbZ2mZOYBKya3guibggSaaqOHZUqedhz0pT5PPc= > > > > > > > > So it looks like it's up to the implementation in the filesystem to > > > > handle that. As the btrfs ioctl was extent-based, a zero length extent > > > > does not make sense, so this case was not handled. But in your patch > > > > > > > > 2b3909f8a7fe94e0234850aa9d120cca15b6e1f7 > > > > btrfs: use new dedupe data function pointer > > > > > > > > it was suddenly expected to work. So the missing bits are either 'not > > > > supported' for zero length or actually implement iteration over the > > > > whole file. > > > > > > > > [1] > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.mankier.com_2_ioctl-5Ffideduperange=CwIBAg=5VD0RTtNlTh3ycd41b3MUw=9QPtTAxcitoznaWRKKHoEQ=CKo3CgE8Up_NBDdC9t7fCuwHwsdf6nZG2nKcl5-NqnI=NYdHr9JyZZNKPLsOf_VmtZ-3X2B1azTYfyE4Lf1Fa5w= > > > > > > Well, we can't change the semantics now because there could be programs > > > that > > > aren't expecting a nonzero return from a length == 0 dedupe, so like > > > Christoph > > > said, I'll just change generic/182 and make the VFS wrapper emulate the > > > btrfs > > > behavior so that any subsequent implementation won't hit this. > > > > > > I'll update the clone/clonerange manpages to mention the 0 -> EOF > > > behavior. > > > > Its fine with me if we change btrfs to do the 0->EOF. It's a corner case > > I'm happy to include. > > > > -chris > > Yeah, I think it's a nice shortcut. Are there any programs which > wouldn't want this, though? It's a milder sort of correctness problem > since dedupe is "safe", but maybe there's some tool which is being dumb > and trying to dedupe nothing. The only problems I can see here is some program that calls dedupe with a length == 0 /and/ doesn't expect a non-zero return value... or gets confused that bytes_deduped > 0. I don't think duperemove has either of those problems. Is that the only client? --D > > -- > Omar -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] Btrfs scrub sometime recalculate wrong parity in raid5: take two
15.07.2016 19:29, Chris Mason пишет: > >> However I have to point out that this kind of test is very >> difficult to do: the file-cache could lead to read an old data, so please >> suggestion about how flush the cache are good (I do some sync, >> unmount the filesystem and perform "echo 3 >/proc/sys/vm/drop_caches", >> but sometime it seems not enough). > > O_DIRECT should handle the cache flushing for you. > There is also BLKFLSBUF ioctl (blockdev --flushbufs on shell level). -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] Btrfs scrub sometime recalculate wrong parity in raid5: take two
On 2016-07-15 06:39, Andrei Borzenkov wrote: > 15.07.2016 00:20, Chris Mason пишет: >> >> >> On 07/12/2016 05:50 PM, Goffredo Baroncelli wrote: >>> Hi All, >>> >>> I developed a new btrfs command "btrfs insp phy"[1] to further >>> investigate this bug [2]. Using "btrfs insp phy" I developed a script >>> to trigger the bug. The bug is not always triggered, but most of time >>> yes. >>> >>> Basically the script create a raid5 filesystem (using three >>> loop-device on three file called disk[123].img); on this filesystem > > Are those devices themselves on btrfs? Just to avoid any sort of > possible side effects? Good question. However the files are stored on a ext4 filesystem (but I don't know if this is better or worse) > >>> it is create a file. Then using "btrfs insp phy", the physical >>> placement of the data on the device are computed. >>> >>> First the script checks that the data are the right one (for data1, >>> data2 and parity), then it corrupt the data: >>> >>> test1: the parity is corrupted, then scrub is ran. Then the (data1, >>> data2, parity) data on the disk are checked. This test goes fine all >>> the times >>> >>> test2: data2 is corrupted, then scrub is ran. Then the (data1, data2, >>> parity) data on the disk are checked. This test fail most of the time: >>> the data on the disk is not correct; the parity is wrong. Scrub >>> sometime reports "WARNING: errors detected during scrubbing, >>> corrected" and sometime reports "ERROR: there are uncorrectable >>> errors". But this seems unrelated to the fact that the data is >>> corrupetd or not >>> test3: like test2, but data1 is corrupted. The result are the same as >>> above. >>> >>> >>> test4: data2 is corrupted, the the file is read. The system doesn't >>> return error (the data seems to be fine); but the data2 on the disk is >>> still corrupted. >>> >>> >>> Note: data1, data2, parity are the disk-element of the raid5 stripe- >>> >>> Conclusion: >>> >>> most of the time, it seems that btrfs-raid5 is not capable to rebuild >>> parity and data. Worse the message returned by scrub is incoherent by >>> the status on the disk. The tests didn't fail every time; this >>> complicate the diagnosis. However my script fails most of the time. >> >> Interesting, thanks for taking the time to write this up. Is the >> failure specific to scrub? Or is parity rebuild in general also failing >> in this case? >> > > How do you rebuild parity without scrub as long as all devices appear to > be present? I corrupted the data, then I read the file. The data has to be correct on the basis of the parity. Even in this case I found problem. > > > -- gpg @keyserver.linux.it: Goffredo Baroncelli Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] Btrfs scrub sometime recalculate wrong parity in raid5: take two
On 07/15/2016 12:28 PM, Goffredo Baroncelli wrote: On 2016-07-14 23:20, Chris Mason wrote: On 07/12/2016 05:50 PM, Goffredo Baroncelli wrote: Hi All, I developed a new btrfs command "btrfs insp phy"[1] to further investigate this bug [2]. Using "btrfs insp phy" I developed a script to trigger the bug. The bug is not always triggered, but most of time yes. Basically the script create a raid5 filesystem (using three loop-device on three file called disk[123].img); on this filesystem it is create a file. Then using "btrfs insp phy", the physical placement of the data on the device are computed. First the script checks that the data are the right one (for data1, data2 and parity), then it corrupt the data: test1: the parity is corrupted, then scrub is ran. Then the (data1, data2, parity) data on the disk are checked. This test goes fine all the times test2: data2 is corrupted, then scrub is ran. Then the (data1, data2, parity) data on the disk are checked. This test fail most of the time: the data on the disk is not correct; the parity is wrong. Scrub sometime reports "WARNING: errors detected during scrubbing, corrected" and sometime reports "ERROR: there are uncorrectable errors". But this seems unrelated to the fact that the data is corrupetd or not test3: like test2, but data1 is corrupted. The result are the same as above. test4: data2 is corrupted, the the file is read. The system doesn't return error (the data seems to be fine); but the data2 on the disk is still corrupted. Note: data1, data2, parity are the disk-element of the raid5 stripe- Conclusion: most of the time, it seems that btrfs-raid5 is not capable to rebuild parity and data. Worse the message returned by scrub is incoherent by the status on the disk. The tests didn't fail every time; this complicate the diagnosis. However my script fails most of the time. Interesting, thanks for taking the time to write this up. Is the failure specific to scrub? Or is parity rebuild in general also failing in this case? Test #4 handles this case: I corrupt the data, and when I read it the data is good. So parity is used but the data on the platter are still bad. However I have to point out that this kind of test is very difficult to do: the file-cache could lead to read an old data, so please suggestion about how flush the cache are good (I do some sync, unmount the filesystem and perform "echo 3 >/proc/sys/vm/drop_caches", but sometime it seems not enough). O_DIRECT should handle the cache flushing for you. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] Btrfs scrub sometime recalculate wrong parity in raid5: take two
On 2016-07-14 23:20, Chris Mason wrote: > > > On 07/12/2016 05:50 PM, Goffredo Baroncelli wrote: >> Hi All, >> >> I developed a new btrfs command "btrfs insp phy"[1] to further >> investigate this bug [2]. Using "btrfs insp phy" I developed a >> script to trigger the bug. The bug is not always triggered, but >> most of time yes. >> >> Basically the script create a raid5 filesystem (using three >> loop-device on three file called disk[123].img); on this filesystem >> it is create a file. Then using "btrfs insp phy", the physical >> placement of the data on the device are computed. >> >> First the script checks that the data are the right one (for data1, >> data2 and parity), then it corrupt the data: >> >> test1: the parity is corrupted, then scrub is ran. Then the (data1, >> data2, parity) data on the disk are checked. This test goes fine >> all the times >> >> test2: data2 is corrupted, then scrub is ran. Then the (data1, >> data2, parity) data on the disk are checked. This test fail most of >> the time: the data on the disk is not correct; the parity is wrong. >> Scrub sometime reports "WARNING: errors detected during scrubbing, >> corrected" and sometime reports "ERROR: there are uncorrectable >> errors". But this seems unrelated to the fact that the data is >> corrupetd or not test3: like test2, but data1 is corrupted. The >> result are the same as above. >> >> >> test4: data2 is corrupted, the the file is read. The system doesn't >> return error (the data seems to be fine); but the data2 on the disk >> is still corrupted. >> >> >> Note: data1, data2, parity are the disk-element of the raid5 >> stripe- >> >> Conclusion: >> >> most of the time, it seems that btrfs-raid5 is not capable to >> rebuild parity and data. Worse the message returned by scrub is >> incoherent by the status on the disk. The tests didn't fail every >> time; this complicate the diagnosis. However my script fails most >> of the time. > > Interesting, thanks for taking the time to write this up. Is the > failure specific to scrub? Or is parity rebuild in general also > failing in this case? Test #4 handles this case: I corrupt the data, and when I read it the data is good. So parity is used but the data on the platter are still bad. However I have to point out that this kind of test is very difficult to do: the file-cache could lead to read an old data, so please suggestion about how flush the cache are good (I do some sync, unmount the filesystem and perform "echo 3 >/proc/sys/vm/drop_caches", but sometime it seems not enough). > > -chris > BR G.Baroncelli -- gpg @keyserver.linux.it: Goffredo Baroncelli Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: New btrfs sub command: btrfs inspect physical-find
On 2016-07-14 23:45, Chris Mason wrote: > > > On 07/12/2016 05:40 PM, Goffredo Baroncelli wrote: >> Hi All, >> >> the enclosed patch adds a new btrfs sub command: "btrfs inspect >> physical-find". The aim of this new command is to show the physical >> placement on the disk of a file. Currently it handles all the >> profiles (single, dup, raid1/10/5/6). I develop this command in >> order to show some bug in btrfs RAID5 profile (see next email). > > I've done this manually from time to time, and love the idea of > having a helper for it. Can I talk you into adding a way to save the > contents of the block without having to use dd? btrfs-map-logical > does this now, but not via the search ioctl and not by filename. > > say: > > btrfs inspect physical-find -c -o > offset I prefer to add another command to do that (like btrfs insp physical-dump). And I will add as constraint like offset % blocksize == 0 this in order to avoid handling data spread different stripes/chunks. However has different meaning: single/raid0 -> means nothing raid1/raid10 -> means the copy # raid5/raid6 -> could mean the parity: i.e. -1 -> first parity (raid5/raid6) -2 -> 2nd parity (raid6 only) > Looks like you've open coded btrfs_map_logical() below, getting > output from the search ioctl. Dave might want that in a more > centralized place. I will give a look > Also, please turn: > > for(;;) if (foo) { statements } > > Into > > for(;;) { if (foo) { statements } } > > I find that much less error prone. Ok > > -chris > BR G.Baroncelli -- gpg @keyserver.linux.it: Goffredo Baroncelli Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] Btrfs: fix unexpected balance crash due to BUG_ON
On Tue, Jul 12, 2016 at 11:24:21AM -0700, Liu Bo wrote: > Mounting a btrfs can resume previous balance operations asynchronously. > An user got a crash when one drive has some corrupt sectors. > > Since balance can cancel itself in case of any error, we can gracefully > return errors to upper layers and let balance do the cancel job. > > Reported-by: sash> Signed-off-by: Liu Bo Reviewed-by: David Sterba -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] Btrfs scrub sometime recalculate wrong parity in raid5: take two
On 07/15/2016 11:10 AM, Andrei Borzenkov wrote: 15.07.2016 16:20, Chris Mason пишет: Interesting, thanks for taking the time to write this up. Is the failure specific to scrub? Or is parity rebuild in general also failing in this case? How do you rebuild parity without scrub as long as all devices appear to be present? If one block is corrupted, the crcs will fail and the kernel will rebuild parity when you read the file. You can also use balance instead of scrub. As we have seen recently, btrfs does not compute, stores or verifies checksum of RAID56 parity. So if parity is corrupted, the only way to detect and correct it is to use scrub. Balance may work by side effect, because it simply recomputes parity on new data, but it will not fix wrong parity on existing data. Ah, I misread your question Yes, this is definitely where scrub is the best tool. But even if we have to add debugging to force parity recomputation, we should see if the problem is only in scrub or deeper. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] Btrfs scrub sometime recalculate wrong parity in raid5: take two
15.07.2016 16:20, Chris Mason пишет: >>> >>> Interesting, thanks for taking the time to write this up. Is the >>> failure specific to scrub? Or is parity rebuild in general also failing >>> in this case? >>> >> >> How do you rebuild parity without scrub as long as all devices appear to >> be present? > > If one block is corrupted, the crcs will fail and the kernel will > rebuild parity when you read the file. You can also use balance instead > of scrub. > As we have seen recently, btrfs does not compute, stores or verifies checksum of RAID56 parity. So if parity is corrupted, the only way to detect and correct it is to use scrub. Balance may work by side effect, because it simply recomputes parity on new data, but it will not fix wrong parity on existing data. I agree that if data block is corrupted it will be detected, but then you do not need to recompute parity in the first place. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH next] Btrfs: fix comparison in __btrfs_map_block()
Add missing comparison to op in expression, which was forgotten when doing the REQ_OP transition. Fixes: b3d3fa519905 ("btrfs: update __btrfs_map_block for REQ_OP transition") Signed-off-by: Vincent StehléCc: Mike Christie Cc: Jens Axboe --- Hi, I saw that issue in linux next. Not sure if it is too late to squash the fix with commit b3d3fa519905 or not... Best regards, Vincent. fs/btrfs/volumes.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index a69203a..6ee1e36 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -5533,7 +5533,7 @@ static int __btrfs_map_block(struct btrfs_fs_info *fs_info, int op, } } else if (map->type & BTRFS_BLOCK_GROUP_DUP) { - if (op == REQ_OP_WRITE || REQ_OP_DISCARD || + if (op == REQ_OP_WRITE || op == REQ_OP_DISCARD || op == REQ_GET_READ_MIRRORS) { num_stripes = map->num_stripes; } else if (mirror_num) { -- 2.8.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs: remove obsolete part of comment in statfs
The mixed blockgroup reporting has been fixed by commit ae02d1bd070767e109f4a6f1bb1f466e9698a355 "btrfs: fix mixed block count of available space" Signed-off-by: David Sterba--- fs/btrfs/super.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index 60e7179ed4b7..135fe88de568 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -2030,9 +2030,6 @@ static int btrfs_calc_avail_data_space(struct btrfs_root *root, u64 *free_bytes) * chunk). * * If metadata is exhausted, f_bavail will be 0. - * - * FIXME: not accurate for mixed block groups, total and free/used are ok, - * available appears slightly larger. */ static int btrfs_statfs(struct dentry *dentry, struct kstatfs *buf) { -- 2.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] btrfs: hide test-only member under ifdef
Signed-off-by: David Sterba--- fs/btrfs/ctree.h | 2 ++ fs/btrfs/extent-tree.c | 2 ++ 2 files changed, 4 insertions(+) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 4274a7bfdaed..47ad088cfa00 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1179,8 +1179,10 @@ struct btrfs_root { u64 highest_objectid; +#ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS /* only used with CONFIG_BTRFS_FS_RUN_SANITY_TESTS is enabled */ u64 alloc_bytenr; +#endif u64 defrag_trans_start; struct btrfs_key defrag_progress; diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 82b912a293ab..f043c1f972de 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -8142,6 +8142,7 @@ struct extent_buffer *btrfs_alloc_tree_block(struct btrfs_trans_handle *trans, bool skinny_metadata = btrfs_fs_incompat(root->fs_info, SKINNY_METADATA); +#ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS if (btrfs_test_is_dummy_root(root)) { buf = btrfs_init_new_buffer(trans, root, root->alloc_bytenr, level); @@ -8149,6 +8150,7 @@ struct extent_buffer *btrfs_alloc_tree_block(struct btrfs_trans_handle *trans, root->alloc_bytenr += blocksize; return buf; } +#endif block_rsv = use_block_rsv(trans, root, blocksize); if (IS_ERR(block_rsv)) -- 2.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] Btrfs scrub sometime recalculate wrong parity in raid5: take two
On 07/15/2016 12:39 AM, Andrei Borzenkov wrote: 15.07.2016 00:20, Chris Mason пишет: On 07/12/2016 05:50 PM, Goffredo Baroncelli wrote: Hi All, I developed a new btrfs command "btrfs insp phy"[1] to further investigate this bug [2]. Using "btrfs insp phy" I developed a script to trigger the bug. The bug is not always triggered, but most of time yes. Basically the script create a raid5 filesystem (using three loop-device on three file called disk[123].img); on this filesystem Are those devices themselves on btrfs? Just to avoid any sort of possible side effects? it is create a file. Then using "btrfs insp phy", the physical placement of the data on the device are computed. First the script checks that the data are the right one (for data1, data2 and parity), then it corrupt the data: test1: the parity is corrupted, then scrub is ran. Then the (data1, data2, parity) data on the disk are checked. This test goes fine all the times test2: data2 is corrupted, then scrub is ran. Then the (data1, data2, parity) data on the disk are checked. This test fail most of the time: the data on the disk is not correct; the parity is wrong. Scrub sometime reports "WARNING: errors detected during scrubbing, corrected" and sometime reports "ERROR: there are uncorrectable errors". But this seems unrelated to the fact that the data is corrupetd or not test3: like test2, but data1 is corrupted. The result are the same as above. test4: data2 is corrupted, the the file is read. The system doesn't return error (the data seems to be fine); but the data2 on the disk is still corrupted. Note: data1, data2, parity are the disk-element of the raid5 stripe- Conclusion: most of the time, it seems that btrfs-raid5 is not capable to rebuild parity and data. Worse the message returned by scrub is incoherent by the status on the disk. The tests didn't fail every time; this complicate the diagnosis. However my script fails most of the time. Interesting, thanks for taking the time to write this up. Is the failure specific to scrub? Or is parity rebuild in general also failing in this case? How do you rebuild parity without scrub as long as all devices appear to be present? If one block is corrupted, the crcs will fail and the kernel will rebuild parity when you read the file. You can also use balance instead of scrub. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Data recovery from a linear multi-disk btrfs file system
On 2016-07-15 05:51, Matt wrote: Hello I glued together 6 disks in linear lvm fashion (no RAID) to obtain one large file system (see below). One of the 6 disk failed. What is the best way to recover from this? Thanks to RAID1 of the metadata I can still access the data residing on the remaining 5 disks after mounting ro,force. What I would like to do now is to 1) Find out the names of all the files with missing data 2) Make the file system fully functional (rw) again. To achieve 2 I wanted to move the data of the disk. This, however, turns out to be rather difficult. - rsync does not provide a immediate time-out option in case of an IO error - Even when I set the time-out for dd_rescue to a minimum, the transfer speed is still way too low to move the data (> 15TB) off the file system. Both methods are too slow to move off the data within a reasonable time frame. Does anybody have a suggestion how to best recover from this? (Our backup is incomplete). I am looking for either a tool to move off the data — something which gives up immediately in case of IO error and log the affected files. Alternatively I am looking for a btrfs command like “ btrfs device delete missing “ for a non-RAID multi-disk btrfs filesystem. Would some variant of "btrfs balance" do something helpful? Any help is appreciated! Regards, Matt # btrfs fi show Label: none uuid: d82fff2c-0232-47dd-a257-04c67141fc83 Total devices 6 FS bytes used 16.83TiB devid1 size 3.64TiB used 3.47TiB path /dev/sdc devid2 size 3.64TiB used 3.47TiB path /dev/sdd devid3 size 3.64TiB used 3.47TiB path /dev/sde devid4 size 3.64TiB used 3.47TiB path /dev/sdf devid5 size 1.82TiB used 1.82TiB path /dev/sdb *** Some devices missing # btrfs fi df /work Data, RAID0: total=18.31TiB, used=16.80TiB Data, single: total=8.00MiB, used=8.00MiB System, RAID1: total=8.00MiB, used=896.00KiB System, single: total=4.00MiB, used=0.00B Metadata, RAID1: total=34.00GiB, used=30.18GiB Metadata, single: total=8.00MiB, used=0.00B GlobalReserve, single: total=512.00MiB, used=0.00B The tool you want is `btrfs restore`. You'll need somewhere to put the files from this too of course. That said, given that you had data in raid0 mode, you're not likely to get much other than very small files back out of this, and given other factors, you're not likely to get what you would consider reasonable performance out of this either. Your best bet to get a working filesystem again would be to just recreate it from scratch, there's not much else that can be done when you've got a raid0 profile and have lost a disk. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: allocate exact page array size in extent_buffer
On Friday, July 15, 2016 11:44:06 AM David Sterba wrote: > On Fri, Jul 15, 2016 at 11:47:07AM +0530, Chandan Rajendra wrote: > > On Thursday, July 14, 2016 02:29:32 PM David Sterba wrote: > > > The calculation of extent_buffer::pages size was done for 4k PAGE_SIZE, > > > but this wastes 15 unused pointers on arches with large page size. Eg. > > > on ppc64 this gives 15 * 8 = 120 bytes. > > > > > > > The non PAGE_SIZE aligned extent buffer usage in page straddling tests in > > test_eb_bitmaps() need atleast one more page. So how about the following ... > > > > #define INLINE_EXTENT_BUFFER_PAGES(BTRFS_MAX_METADATA_BLOCKSIZE / > > PAGE_SIZE + 1) > > Could the extra page pointer be normally used? Ie. not just for the sake > of the tests. I'd rather not waste the bytes. As a compromise, we can do +1 > only if the tests are compiled in. > I don't see any other scenario where the extra page pointer gets used. Also, I just executed fstests with your patch applied and disabling self-tests from the kernel configuration. The tests ran fine. -- chandan -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: mount btrfs takes 30 minutes, btrfs check runs out of memory
Hey Qu, all On 07/15/2016 05:56 AM, Qu Wenruo wrote: > > The good news is, we have patch to slightly speedup the mount, by > avoiding reading out unrelated tree blocks. > > In our test environment, it takes 15% less time to mount a fs filled > with 16K files(2T used space). > > https://patchwork.kernel.org/patch/9021421/ I have a 30TB RAID6 filesystem with compression on and I've seen mount times of up to 20 minutes (!). I don't want to sound unfair, but 15% improvement is good, but not in the league where BTRFS needs to be. Do I understand you comments correctly that further improvement would result in a change of the on-disk format? Thanks and with regards Christian -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Data recovery from a linear multi-disk btrfs file system
Hello I glued together 6 disks in linear lvm fashion (no RAID) to obtain one large file system (see below). One of the 6 disk failed. What is the best way to recover from this? Thanks to RAID1 of the metadata I can still access the data residing on the remaining 5 disks after mounting ro,force. What I would like to do now is to 1) Find out the names of all the files with missing data 2) Make the file system fully functional (rw) again. To achieve 2 I wanted to move the data of the disk. This, however, turns out to be rather difficult. - rsync does not provide a immediate time-out option in case of an IO error - Even when I set the time-out for dd_rescue to a minimum, the transfer speed is still way too low to move the data (> 15TB) off the file system. Both methods are too slow to move off the data within a reasonable time frame. Does anybody have a suggestion how to best recover from this? (Our backup is incomplete). I am looking for either a tool to move off the data — something which gives up immediately in case of IO error and log the affected files. Alternatively I am looking for a btrfs command like “ btrfs device delete missing “ for a non-RAID multi-disk btrfs filesystem. Would some variant of "btrfs balance" do something helpful? Any help is appreciated! Regards, Matt # btrfs fi show Label: none uuid: d82fff2c-0232-47dd-a257-04c67141fc83 Total devices 6 FS bytes used 16.83TiB devid1 size 3.64TiB used 3.47TiB path /dev/sdc devid2 size 3.64TiB used 3.47TiB path /dev/sdd devid3 size 3.64TiB used 3.47TiB path /dev/sde devid4 size 3.64TiB used 3.47TiB path /dev/sdf devid5 size 1.82TiB used 1.82TiB path /dev/sdb *** Some devices missing # btrfs fi df /work Data, RAID0: total=18.31TiB, used=16.80TiB Data, single: total=8.00MiB, used=8.00MiB System, RAID1: total=8.00MiB, used=896.00KiB System, single: total=4.00MiB, used=0.00B Metadata, RAID1: total=34.00GiB, used=30.18GiB Metadata, single: total=8.00MiB, used=0.00B GlobalReserve, single: total=512.00MiB, used=0.00B -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs-progs: fix btrfs-map-logical to only print extent mapping info
On Fri, Jul 15, 2016 at 10:22:52AM +0800, Qu Wenruo wrote: > > > At 07/15/2016 09:40 AM, Liu Bo wrote: > > I have a valid btrfs image which contains, > > ... > > item 10 key (1103101952 BLOCK_GROUP_ITEM 1288372224) itemoff 15947 > > itemsize 24 > > block group used 655360 chunk_objectid 256 flags DATA|RAID5 > > item 11 key (1103364096 EXTENT_ITEM 131072) itemoff 15894 itemsize > > 53 > > extent refs 1 gen 11 flags DATA > > extent data backref root 5 objectid 258 offset 0 count 1 > > item 12 key (1103888384 EXTENT_ITEM 262144) itemoff 15841 itemsize > > 53 > > extent refs 1 gen 15 flags DATA > > extent data backref root 1 objectid 256 offset 0 count 1 > > item 13 key (1104281600 EXTENT_ITEM 262144) itemoff 15788 itemsize > > 53 > > extent refs 1 gen 15 flags DATA > > extent data backref root 1 objectid 257 offset 0 count 1 > > ... > > > > The extent [1103364096, 131072) has length 131072, but if we run > > > > "btrfs-map-logical -l 1103364096 -b $((65536 * 3)) /dev/sda" > > > > it will return mapping info 's of non-existing extents. > > > > It's because it assumes that extents's are contiguous on logical address, > > when it's not true, after one loop (cur_logical += cur_len) and mapping > > the next extent, we can get an extent that is out of our search range and > > we end up with a negative @real_len and printing all mapping infos till > > the disk end. > > > > Signed-off-by: Liu Bo> > Reviewed-by: Qu Wenruo Applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: allocate exact page array size in extent_buffer
On Fri, Jul 15, 2016 at 11:47:07AM +0530, Chandan Rajendra wrote: > On Thursday, July 14, 2016 02:29:32 PM David Sterba wrote: > > The calculation of extent_buffer::pages size was done for 4k PAGE_SIZE, > > but this wastes 15 unused pointers on arches with large page size. Eg. > > on ppc64 this gives 15 * 8 = 120 bytes. > > > > The non PAGE_SIZE aligned extent buffer usage in page straddling tests in > test_eb_bitmaps() need atleast one more page. So how about the following ... > > #define INLINE_EXTENT_BUFFER_PAGES(BTRFS_MAX_METADATA_BLOCKSIZE / > PAGE_SIZE + 1) Could the extra page pointer be normally used? Ie. not just for the sake of the tests. I'd rather not waste the bytes. As a compromise, we can do +1 only if the tests are compiled in. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] Btrfs: fix free space tree bitmaps+tests on big-endian systems
On Thursday, July 14, 2016 07:47:04 PM Chris Mason wrote: > On 07/14/2016 07:31 PM, Omar Sandoval wrote: > > From: Omar Sandoval> > > > So it turns out that the free space tree bitmap handling has always been > > broken on big-endian systems. Totally my bad. > > > > Patch 1 fixes this. Technically, it's a disk format change for > > big-endian systems, but it never could have worked before, so I won't go > > through the trouble of any incompat bits. If you've somehow been using > > space_cache=v2 on a big-endian system (I doubt anyone is), you're going > > to want to mount with nospace_cache to clear it and wait for this to go > > in. > > > > Patch 2 fixes a similar error in the sanity tests (it's the same as the > > v2 I posted here [1]) and patch 3 expands the sanity tests to catch the > > oversight that patch 1 fixes. > > > > Applies to v4.7-rc7. No regressions in xfstests, and the sanity tests > > pass on x86_64 and MIPS. > > Thanks for fixing this up Omar. Any big endian friends want to try this > out in extended testing and make sure we've nailed it down? > Hi Omar & Chris, I will run fstests with this patchset applied on ppc64 BE and inform you about the results. > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- chandan -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: mount btrfs takes 30 minutes, btrfs check runs out of memory
Am Fri, 15 Jul 2016 13:24:45 +0800 schrieb Qu Wenruo: > > as for defrag, all my partitions are already on > > autodefrag, so I assume that should be good. Or is manual once in a > > while a good idea as well? > AFAIK autodefrag will only help if you're doing appending write. > > Manual one will help more, but since btrfs has problem defraging > extents shared by different subvolumes, I doubt the effect if you > have a lot of subvolumes/snapshots. "btrfs fi defrag" is said to only defrag metadata if you are pointing it to directories only without recursion. It could maybe help that case without unsharing the extents: find /btrfs-subvol0 -type d -print0 | xargs -0 btrfs fi defrag -- Regards, Kai Replies to list-only preferred. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs: allocate exact page array size in extent_buffer
On Thursday, July 14, 2016 02:29:32 PM David Sterba wrote: > The calculation of extent_buffer::pages size was done for 4k PAGE_SIZE, > but this wastes 15 unused pointers on arches with large page size. Eg. > on ppc64 this gives 15 * 8 = 120 bytes. > The non PAGE_SIZE aligned extent buffer usage in page straddling tests in test_eb_bitmaps() need atleast one more page. So how about the following ... #define INLINE_EXTENT_BUFFER_PAGES(BTRFS_MAX_METADATA_BLOCKSIZE / PAGE_SIZE + 1) > Signed-off-by: David Sterba> --- > fs/btrfs/ctree.h | 6 -- > fs/btrfs/extent_io.c | 2 ++ > fs/btrfs/extent_io.h | 8 +++- > 3 files changed, 9 insertions(+), 7 deletions(-) > > diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h > index 4274a7bfdaed..f914f6187753 100644 > --- a/fs/btrfs/ctree.h > +++ b/fs/btrfs/ctree.h > @@ -66,12 +66,6 @@ struct btrfs_ordered_sum; > #define BTRFS_COMPAT_EXTENT_TREE_V0 > > /* > - * the max metadata block size. This limit is somewhat artificial, > - * but the memmove costs go through the roof for larger blocks. > - */ > -#define BTRFS_MAX_METADATA_BLOCKSIZE 65536 > - > -/* > * we can actually store much bigger names, but lets not confuse the rest > * of linux > */ > diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c > index 75533adef998..6f468a1842e6 100644 > --- a/fs/btrfs/extent_io.c > +++ b/fs/btrfs/extent_io.c > @@ -4660,6 +4660,8 @@ __alloc_extent_buffer(struct btrfs_fs_info *fs_info, > u64 start, > /* >* Sanity checks, currently the maximum is 64k covered by 16x 4k pages >*/ > + BUILD_BUG_ON(INLINE_EXTENT_BUFFER_PAGES * PAGE_SIZE > + != BTRFS_MAX_METADATA_BLOCKSIZE); > BUILD_BUG_ON(BTRFS_MAX_METADATA_BLOCKSIZE > > MAX_INLINE_EXTENT_BUFFER_SIZE); > BUG_ON(len > MAX_INLINE_EXTENT_BUFFER_SIZE); > diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h > index c0c1c4fef6ce..edfa1a0ab82b 100644 > --- a/fs/btrfs/extent_io.h > +++ b/fs/btrfs/extent_io.h > @@ -4,6 +4,12 @@ > #include > #include "ulist.h" > > +/* > + * The maximum metadata block size. This limit is somewhat artificial, > + * but the memmove costs go through the roof for larger blocks. > + */ > +#define BTRFS_MAX_METADATA_BLOCKSIZE (65536U) > + > /* bits for the extent state */ > #define EXTENT_DIRTY (1U << 0) > #define EXTENT_WRITEBACK (1U << 1) > @@ -118,7 +124,7 @@ struct extent_state { > #endif > }; > > -#define INLINE_EXTENT_BUFFER_PAGES 16 > +#define INLINE_EXTENT_BUFFER_PAGES(BTRFS_MAX_METADATA_BLOCKSIZE / > PAGE_SIZE) > #define MAX_INLINE_EXTENT_BUFFER_SIZE (INLINE_EXTENT_BUFFER_PAGES * > PAGE_SIZE) > struct extent_buffer { > u64 start; > -- chandan -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html