Re: How can I get blockdev offsets of btrfs chunks for a file?

2016-07-15 Thread Hugo Mills
On Fri, Jul 15, 2016 at 04:21:31PM -0700, Eric Wheeler wrote:
> Hello all,
> 
> We do btrfs subvolume snapshots over time for backups.  I would like to 
> traverse the files in the subvolumes and find the total unique chunk count 
> to calculate total space for a set of subvolumes.

   btrfs fi du may help here. Alternatively, qgroups should be able to
tell you for groups of subvols, if it's set up correctly. You
shouldn't need to implement this at a low level yourself...

> This sounds kind of like the beginning of what a deduplicator would do, 
> but I just want to count the blocks, so no submission for deduplication.  
> I started looking at bedup and other deduplicator code, but the answer to 
> this question wasn't obvious (to me, anyway).
> 
> Questions:
> 
> Is there an ioctl (or some other way) to get the block device offset for a 
> file (or file offset) so I can count the unique occurances?

   This is very much an X/Y question. There already exist a couple of
things that are at least close to the thing you actually want to
do. :)

   Hugo.

> What API documentation should I review?
> 
> Can you point me at the ioctl(s) that would handle this?
> 
> 
> Thank you for your help!
> 
> 

-- 
Hugo Mills | Reintarnation: Coming back from the dead as a
hugo@... carfax.org.uk | hillbilly
http://carfax.org.uk/  |
PGP: E2AB1DE4  |


signature.asc
Description: Digital signature


Re: How can I get blockdev offsets of btrfs chunks for a file?

2016-07-15 Thread Adam Borowski
On Fri, Jul 15, 2016 at 04:21:31PM -0700, Eric Wheeler wrote:
> We do btrfs subvolume snapshots over time for backups.  I would like to 
> traverse the files in the subvolumes and find the total unique chunk count 
> to calculate total space for a set of subvolumes.
> 
> This sounds kind of like the beginning of what a deduplicator would do, 
> but I just want to count the blocks, so no submission for deduplication.  
> I started looking at bedup and other deduplicator code, but the answer to 
> this question wasn't obvious (to me, anyway).
> 
> Questions:
> 
> Is there an ioctl (or some other way) to get the block device offset for a 
> file (or file offset) so I can count the unique occurances?

Yes, FIEMAP.

You can play with it via "/usr/sbin/filefrag -v".  That /usr/sbin is
misleading -- FIEMAP doesn't require root, although its predecessor did need
that, https://bugs.debian.org/819923

> What API documentation should I review?

In kernel sources, Documentation/filesystems/fiemap.txt


Meow!
-- 
An imaginary friend squared is a real enemy.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How can I get blockdev offsets of btrfs chunks for a file?

2016-07-15 Thread Tomasz Kusmierz
No answer here, but mate if you are involved in anything that will provide some 
more automated backup tool for btrfs you got a lot of silent people rooting for 
you.

> On 16 Jul 2016, at 00:21, Eric Wheeler  wrote:
> 
> Hello all,
> 
> We do btrfs subvolume snapshots over time for backups.  I would like to 
> traverse the files in the subvolumes and find the total unique chunk count 
> to calculate total space for a set of subvolumes.
> 
> This sounds kind of like the beginning of what a deduplicator would do, 
> but I just want to count the blocks, so no submission for deduplication.  
> I started looking at bedup and other deduplicator code, but the answer to 
> this question wasn't obvious (to me, anyway).
> 
> Questions:
> 
> Is there an ioctl (or some other way) to get the block device offset for a 
> file (or file offset) so I can count the unique occurances?
> 
> What API documentation should I review?
> 
> Can you point me at the ioctl(s) that would handle this?
> 
> 
> Thank you for your help!
> 
> 
> --
> Eric Wheeler
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


How can I get blockdev offsets of btrfs chunks for a file?

2016-07-15 Thread Eric Wheeler
Hello all,

We do btrfs subvolume snapshots over time for backups.  I would like to 
traverse the files in the subvolumes and find the total unique chunk count 
to calculate total space for a set of subvolumes.

This sounds kind of like the beginning of what a deduplicator would do, 
but I just want to count the blocks, so no submission for deduplication.  
I started looking at bedup and other deduplicator code, but the answer to 
this question wasn't obvious (to me, anyway).

Questions:

Is there an ioctl (or some other way) to get the block device offset for a 
file (or file offset) so I can count the unique occurances?

What API documentation should I review?

Can you point me at the ioctl(s) that would handle this?


Thank you for your help!


--
Eric Wheeler
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Btrfs uuid snapshots: orphaned parent_uuid after deleting intermediate subvol

2016-07-15 Thread Eric Wheeler
Hello all,

If I create three subvolumes like so:

# btrfs subvolume create a
# btrfs subvolume snapshot a b
# btrfs subvolume snapshot b c

I get a parent-child relationship which can be determined like so:

# btrfs subvolume list -uq /home/ |grep [abc]$
parent_uuid - uuid 0e5f473a-d9e5-144a-8f49-1899af7320ad path a
parent_uuid 0e5f473a-d9e5-144a-8f49-1899af7320ad uuid 
cb4768eb-98e3-5e4c-935d-14f1b97b0de2 path b
parent_uuid cb4768eb-98e3-5e4c-935d-14f1b97b0de2 uuid 
5ee8de35-2bab-d642-b5c2-f619e46f65c2 path c

Now if I delete 'b', the parent_uuid of 'c' doesn't change to point at 'a':

# btrfs subvolume delete b
# btrfs subvolume list -uq /home/ |grep [abc]$
parent_uuid - uuid 0e5f473a-d9e5-144a-8f49-1899af7320ad path a
parent_uuid cb4768eb-98e3-5e4c-935d-14f1b97b0de2 uuid 
5ee8de35-2bab-d642-b5c2-f619e46f65c2 path c

Notice that 'c' still points at b's UUID, but 'b' is missing and the 
parent_uuid for 'c' wasn't set to '-' as if it were a root node (like 'a').

Is this an inconsistency?  Child parent_uuid's it be updated on delete?

It would be nice to know that 'c' is actually a descendent of 'a', even 
after having deleted 'b'.  Is a way to look that up somehow?


This is running 4.1.15, so its a bit behind.  If this is fixed in a later 
version then please let me know that too.  Thanks!


--
Eric Wheeler
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Status of SMR with BTRFS

2016-07-15 Thread Tomasz Kusmierz
Thou I’m not a hardcore storage system professional:

What disk are you using ? There are two types:
1. SMR managed by device firmware. BTRFS sees that as a normal block device … 
problems you get are not related to BTRFS it self …
2. SMR managed by host system, BTRFS still does see this as a block device … 
just emulated by host system to look normal. 

In case of funky technologies like that I would research how exactly data is 
stored in terms of “BAND” and experiment with setting leaf & sector size to 
match a band, then create a btrfs on this device. 
Run stress.sh on it for couple of days.
If you get errors - setup a two standard disk raid1 btrfs file system
run stress.sh to see whenever you get errors on this system - to eliminate 
possibility that your system is actually generating errors. 

Then come back and we will see what’s going on :)


> On 15 Jul 2016, at 19:29, Hendrik Friedel  wrote:
> 
> Hello,
> 
> I have a 5TB Seagate drive that uses SMR.
> 
> I was wondering, if BTRFS is usable with this Harddrive technology. So, first 
> I searched the BTRFS wiki -nothing. Then google.
> 
> * I found this: https://bbs.archlinux.org/viewtopic.php?id=203696
> But this turned out to be an issue not related to BTRFS.
> 
> * Then this: http://www.snia.org/sites/default/files/SDC15_presentations/smr/ 
> HannesReinecke_Strategies_for_running_unmodified_FS_SMR.pdf
>  " BTRFS operation matches SMR parameters very closely [...]
> 
> High number of misaligned write accesses ; points to an issue with btrfs 
> itself
> 
> 
> * Then this: 
> http://superuser.com/questions/962257/fastest-linux-filesystem-on-shingled-disks
> The BTRFS performance seemed good.
> 
> 
> * Finally this: http://www.spinics.net/lists/linux-btrfs/msg48072.html
> "So you can get mixed results when trying to use the SMR devices but I'd say 
> it will mostly not work.
> But, btrfs has all the fundamental features in place, we'd have to make
> adjustments to follow the SMR constraints:"
> [...]
> I have some notes at
> https://github.com/kdave/drafts/blob/master/btrfs/smr-mode.txt;
> 
> 
> So, now I am wondering, what the state is today. "We" (I am happy to do that; 
> but not sure of access rights) should also summarize this in the wiki.
> My use-case by the way are back-ups. I am thinking of using some of the 
> interesting BTRFS features for this (send/receive, deduplication)
> 
> Greetings,
> Hendrik
> 
> 
> ---
> Diese E-Mail wurde von Avast Antivirus-Software auf Viren geprüft.
> https://www.avast.com/antivirus
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: A lot warnings in dmesg while running thunderbird

2016-07-15 Thread Chris Mason

On 07/15/2016 03:35 PM, Chris Mason wrote:



On 07/07/2016 06:24 AM, Gabriel C wrote:

Hi,

while running thunderbird on linux 4.6.3 and 4.7.0-rc6 ( didn't tested
other versions )
I trigger the following :


[ 6393.305675] WARNING: CPU: 6 PID: 5870 at fs/btrfs/inode.c:9306
btrfs_destroy_inode+0x22e/0x2a0 [btrfs]


Every time I've reproduced this, I've hit a warning in extent-tree.c
about trying to decrement bytes_may_use too far.  Then I get enospc
on every operation.

Josef fixed a few corner cases here with his new enospc changes, and I'm
not able to trigger (yet) with those applied.  Dave Sterba has them all
in his for-next branch:

git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-next

Can you please try that on top of v4.7-rc7


A few hours later and it reproduced on this kernel too.  What must be 
happening is we're freeing too many bytes from bytes_may_use.


I'll get tracing in and nail it down.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: A lot warnings in dmesg while running thunderbird

2016-07-15 Thread Chris Mason



On 07/07/2016 06:24 AM, Gabriel C wrote:

Hi,

while running thunderbird on linux 4.6.3 and 4.7.0-rc6 ( didn't tested
other versions )
I trigger the following :


[ 6393.305675] WARNING: CPU: 6 PID: 5870 at fs/btrfs/inode.c:9306
btrfs_destroy_inode+0x22e/0x2a0 [btrfs]


Every time I've reproduced this, I've hit a warning in extent-tree.c 
about trying to decrement bytes_may_use too far.  Then I get enospc

on every operation.

Josef fixed a few corner cases here with his new enospc changes, and I'm 
not able to trigger (yet) with those applied.  Dave Sterba has them all 
in his for-next branch:


git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-next

Can you please try that on top of v4.7-rc7

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] Btrfs: fix free space tree bitmaps+tests on big-endian systems

2016-07-15 Thread Omar Sandoval
On Fri, Jul 15, 2016 at 12:34:10PM +0530, Chandan Rajendra wrote:
> On Thursday, July 14, 2016 07:47:04 PM Chris Mason wrote:
> > On 07/14/2016 07:31 PM, Omar Sandoval wrote:
> > > From: Omar Sandoval 
> > >
> > > So it turns out that the free space tree bitmap handling has always been
> > > broken on big-endian systems. Totally my bad.
> > >
> > > Patch 1 fixes this. Technically, it's a disk format change for
> > > big-endian systems, but it never could have worked before, so I won't go
> > > through the trouble of any incompat bits. If you've somehow been using
> > > space_cache=v2 on a big-endian system (I doubt anyone is), you're going
> > > to want to mount with nospace_cache to clear it and wait for this to go
> > > in.
> > >
> > > Patch 2 fixes a similar error in the sanity tests (it's the same as the
> > > v2 I posted here [1]) and patch 3 expands the sanity tests to catch the
> > > oversight that patch 1 fixes.
> > >
> > > Applies to v4.7-rc7. No regressions in xfstests, and the sanity tests
> > > pass on x86_64 and MIPS.
> > 
> > Thanks for fixing this up Omar.  Any big endian friends want to try this 
> > out in extended testing and make sure we've nailed it down?
> >
> 
> Hi Omar & Chris,
> 
> I will run fstests with this patchset applied on ppc64 BE and inform you about
> the results.
> 

Thanks, Chandan! I set up my xfstests for space_cache=v2 by doing:

mkfs.btrfs "$TEST_DEV"
mount -o space_cache=v2 "$TEST_DEV" "$TEST_DIR"
umount "$TEST_DEV"

and adding

export MOUNT_OPTIONS="-o space_cache=v2"

to local.config. btrfsck also needs the patch here [1].

Thanks again.

1: http://thread.gmane.org/gmane.comp.file-systems.btrfs/58382

-- 
Omar
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs-progs: fix btrfsck of space_cache=v2 bitmaps on big-endian

2016-07-15 Thread Omar Sandoval
From: Omar Sandoval 

Copy le_test_bit() from the kernel and use that for the free space tree
bitmaps.

Signed-off-by: Omar Sandoval 
---
Same sort of mistake as in the kernel. Applies to v4.6.1.

 extent_io.c  |  2 +-
 extent_io.h  | 19 +++
 kerncompat.h |  3 ++-
 3 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/extent_io.c b/extent_io.c
index c99d3627e370..d956c5731332 100644
--- a/extent_io.c
+++ b/extent_io.c
@@ -889,5 +889,5 @@ void memset_extent_buffer(struct extent_buffer *eb, char c,
 int extent_buffer_test_bit(struct extent_buffer *eb, unsigned long start,
   unsigned long nr)
 {
-   return test_bit(nr, (unsigned long *)(eb->data + start));
+   return le_test_bit(nr, (u8 *)eb->data + start);
 }
diff --git a/extent_io.h b/extent_io.h
index a9a7353556a7..94a42bf5e180 100644
--- a/extent_io.h
+++ b/extent_io.h
@@ -49,6 +49,25 @@
 
 #define BLOCK_GROUP_DIRTY EXTENT_DIRTY
 
+/*
+ * The extent buffer bitmap operations are done with byte granularity instead 
of
+ * word granularity for two reasons:
+ * 1. The bitmaps must be little-endian on disk.
+ * 2. Bitmap items are not guaranteed to be aligned to a word and therefore a
+ *single word in a bitmap may straddle two pages in the extent buffer.
+ */
+#define BIT_BYTE(nr) ((nr) / BITS_PER_BYTE)
+#define BYTE_MASK ((1 << BITS_PER_BYTE) - 1)
+#define BITMAP_FIRST_BYTE_MASK(start) \
+   ((BYTE_MASK << ((start) & (BITS_PER_BYTE - 1))) & BYTE_MASK)
+#define BITMAP_LAST_BYTE_MASK(nbits) \
+   (BYTE_MASK >> (-(nbits) & (BITS_PER_BYTE - 1)))
+
+static inline int le_test_bit(int nr, const u8 *addr)
+{
+   return 1U & (addr[BIT_BYTE(nr)] >> (nr & (BITS_PER_BYTE-1)));
+}
+
 struct btrfs_fs_info;
 
 struct extent_io_tree {
diff --git a/kerncompat.h b/kerncompat.h
index 378f0552edd2..c9b9b79782b9 100644
--- a/kerncompat.h
+++ b/kerncompat.h
@@ -55,7 +55,8 @@
 #define gfp_t int
 #define get_cpu_var(p) (p)
 #define __get_cpu_var(p) (p)
-#define BITS_PER_LONG (__SIZEOF_LONG__ * 8)
+#define BITS_PER_BYTE 8
+#define BITS_PER_LONG (__SIZEOF_LONG__ * BITS_PER_BYTE)
 #define __GFP_BITS_SHIFT 20
 #define __GFP_BITS_MASK ((int)((1 << __GFP_BITS_SHIFT) - 1))
 #define GFP_KERNEL 0
-- 
2.9.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Data recovery from a linear multi-disk btrfs file system

2016-07-15 Thread Austin S. Hemmelgarn

On 2016-07-15 14:45, Matt wrote:



On 15 Jul 2016, at 14:10, Austin S. Hemmelgarn  wrote:

On 2016-07-15 05:51, Matt wrote:

Hello

I glued together 6 disks in linear lvm fashion (no RAID) to obtain one large 
file system (see below).  One of the 6 disk failed. What is the best way to 
recover from this?


The tool you want is `btrfs restore`.  You'll need somewhere to put the files 
from this too of course.  That said, given that you had data in raid0 mode, 
you're not likely to get much other than very small files back out of this, and 
given other factors, you're not likely to get what you would consider 
reasonable performance out of this either.


Thanks so much for pointing me towards btrfs-restore. I surely will give it a try.  
Note that the FS is not a RAID0 but  linear (“JPOD") configuration. This is why 
 it somehow did not occur to me to try btrfs-restore.  The good news about in this 
configuration  the files are *not* distributed across disks. We can  read most of 
the files just fine.  The failed disk was actually smaller than the others five so 
that we should be able to recover more than 5/6 of the data, shouldn’t we?  My 
trouble is that the IO errors due to the missing disk  cripple the transfer speed of 
both rsync and dd_rescue.
Your own 'btrfs fi df' output clearly says that more than 99% of your 
data chunks are in a RAID0 profile, hence my statement.  Functionally, 
this is similar to concatenating all the disks, but it gets better 
performance and is a bit harder to recover data from.  I hadn't noticed 
however that the disks were different sizes, so should be able to 
recover a significant amount of data from it.



Your best bet to get a working filesystem again would be to just recreate it 
from scratch, there's not much else that can be done when you've got a raid0 
profile and have lost a disk.


This is what I plan to do if there if btrfs-restore turns out to be too slow and 
nobody on this list has any better idea.  It will, however, require  transferring  
>15TB across the Atlantic (this is were the “backup” reside).  This can be 
tedious which is why I would love to avoid it.

Entirely understandable.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Data recovery from a linear multi-disk btrfs file system

2016-07-15 Thread Matt

> On 15 Jul 2016, at 14:10, Austin S. Hemmelgarn  wrote:
> 
> On 2016-07-15 05:51, Matt wrote:
>> Hello
>> 
>> I glued together 6 disks in linear lvm fashion (no RAID) to obtain one large 
>> file system (see below).  One of the 6 disk failed. What is the best way to 
>> recover from this?
>> 
> The tool you want is `btrfs restore`.  You'll need somewhere to put the files 
> from this too of course.  That said, given that you had data in raid0 mode, 
> you're not likely to get much other than very small files back out of this, 
> and given other factors, you're not likely to get what you would consider 
> reasonable performance out of this either.

Thanks so much for pointing me towards btrfs-restore. I surely will give it a 
try.  Note that the FS is not a RAID0 but  linear (“JPOD") configuration. This 
is why  it somehow did not occur to me to try btrfs-restore.  The good news 
about in this configuration  the files are *not* distributed across disks. We 
can  read most of the files just fine.  The failed disk was actually smaller 
than the others five so that we should be able to recover more than 5/6 of the 
data, shouldn’t we?  My trouble is that the IO errors due to the missing disk  
cripple the transfer speed of both rsync and dd_rescue.

> Your best bet to get a working filesystem again would be to just recreate it 
> from scratch, there's not much else that can be done when you've got a raid0 
> profile and have lost a disk.

This is what I plan to do if there if btrfs-restore turns out to be too slow 
and nobody on this list has any better idea.  It will, however, require  
transferring  >15TB across the Atlantic (this is were the “backup” reside).  
This can be tedious which is why I would love to avoid it.

Matt

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Status of SMR with BTRFS

2016-07-15 Thread Hendrik Friedel

Hello,

I have a 5TB Seagate drive that uses SMR.

I was wondering, if BTRFS is usable with this Harddrive technology. So, 
first I searched the BTRFS wiki -nothing. Then google.


* I found this: https://bbs.archlinux.org/viewtopic.php?id=203696
But this turned out to be an issue not related to BTRFS.

* Then this: 
http://www.snia.org/sites/default/files/SDC15_presentations/smr/ 
HannesReinecke_Strategies_for_running_unmodified_FS_SMR.pdf

  " BTRFS operation matches SMR parameters very closely [...]

 High number of misaligned write accesses ; points to an issue with 
btrfs itself



* Then this: 
http://superuser.com/questions/962257/fastest-linux-filesystem-on-shingled-disks

The BTRFS performance seemed good.


* Finally this: http://www.spinics.net/lists/linux-btrfs/msg48072.html
"So you can get mixed results when trying to use the SMR devices but I'd 
say it will mostly not work.

But, btrfs has all the fundamental features in place, we'd have to make
adjustments to follow the SMR constraints:"
[...]
I have some notes at
https://github.com/kdave/drafts/blob/master/btrfs/smr-mode.txt;


So, now I am wondering, what the state is today. "We" (I am happy to do 
that; but not sure of access rights) should also summarize this in the wiki.
My use-case by the way are back-ups. I am thinking of using some of the 
interesting BTRFS features for this (send/receive, deduplication)


Greetings,
Hendrik


---
Diese E-Mail wurde von Avast Antivirus-Software auf Viren geprüft.
https://www.avast.com/antivirus

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: FIDEDUPERANGE with src_length == 0

2016-07-15 Thread Darrick J. Wong
On Thu, Jul 14, 2016 at 11:16:47AM -0700, Omar Sandoval wrote:
> On Thu, Jul 14, 2016 at 02:12:58PM -0400, Chris Mason wrote:
> > 
> > 
> > On 07/14/2016 02:06 PM, Darrick J. Wong wrote:
> > > On Wed, Jul 13, 2016 at 03:19:38PM +0200, David Sterba wrote:
> > > > On Tue, Jul 12, 2016 at 10:26:43PM -0700, Darrick J. Wong wrote:
> > > > > On Mon, Jul 11, 2016 at 05:35:37PM -0700, Omar Sandoval wrote:
> > > > > > Hey, Darrick,
> > > > > > 
> > > > > > generic/182 is failing on Btrfs for me with the following output:
> > > > > > 
> > > > > > --- tests/generic/182.out   2016-07-07 19:51:54.0 -0700
> > > > > > +++ /tmp/fixxfstests/xfstests/results//generic/182.out.bad  
> > > > > > 2016-07-11 17:28:28.230039216 -0700
> > > > > > @@ -1,12 +1,10 @@
> > > > > >  QA output created by 182
> > > > > >  Create the original files
> > > > > > -dedupe: Extents did not match.
> > > > > >  f4820540fc0ac02750739896fe028d56  TEST_DIR/test-182/file1
> > > > > >  69ad53078a16243d98e21d9f8704a071  TEST_DIR/test-182/file2
> > > > > >  69ad53078a16243d98e21d9f8704a071  TEST_DIR/test-182/file2.chk
> > > > > >  Compare against check files
> > > > > >  Make the original file almost dedup-able
> > > > > > -dedupe: Extents did not match.
> > > > > >  f4820540fc0ac02750739896fe028d56  TEST_DIR/test-182/file1
> > > > > >  158d4e3578b94b89cbb44493a2110fb9  TEST_DIR/test-182/file2
> > > > > >  158d4e3578b94b89cbb44493a2110fb9  TEST_DIR/test-182/file2.chk
> > > > > > 
> > > > > > It looks like that test is checking that a dedupe with length == 0 
> > > > > > is
> > > > > > treated as a dedupe to EOF, but Btrfs doesn't do that [1]. As far 
> > > > > > as I
> > > > > > can tell, it never did, but maybe I'm just confused. What was the
> > > > > > behavior when you introduced that test? That seems like a reasonable
> > > > > > thing to do, but I wanted to clear this up before changing/fixing 
> > > > > > Btrfs.
> > > > > 
> > > > > It's a shortcut that we're introducing in the upcoming XFS 
> > > > > implementation,
> > > > > since it shares the same back end as clone/clonerange, which both have
> > > > > this behavior.
> > > > 
> > > > The support for zero length does not seem to be mentioned anywhere with
> > > > the dedupe range ioctl [1], so the current implemetnation is "up to
> > > > spec". That it should be valid is hidden in clone_verify_area where a
> > > > zero length is substituted with OFFSET_MAX
> > > > 
> > > > https://urldefense.proofpoint.com/v2/url?u=http-3A__lxr.free-2Delectrons.com_source_fs_read-5Fwrite.c-23L1607=CwIBAg=5VD0RTtNlTh3ycd41b3MUw=9QPtTAxcitoznaWRKKHoEQ=CKo3CgE8Up_NBDdC9t7fCuwHwsdf6nZG2nKcl5-NqnI=ZymMvbZ2mZOYBKya3guibggSaaqOHZUqedhz0pT5PPc=
> > > > 
> > > > So it looks like it's up to the implementation in the filesystem to
> > > > handle that. As the btrfs ioctl was extent-based, a zero length extent
> > > > does not make sense, so this case was not handled. But in your patch
> > > > 
> > > > 2b3909f8a7fe94e0234850aa9d120cca15b6e1f7
> > > > btrfs: use new dedupe data function pointer
> > > > 
> > > > it was suddenly expected to work. So the missing bits are either 'not
> > > > supported' for zero length or actually implement iteration over the
> > > > whole file.
> > > > 
> > > > [1] 
> > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.mankier.com_2_ioctl-5Ffideduperange=CwIBAg=5VD0RTtNlTh3ycd41b3MUw=9QPtTAxcitoznaWRKKHoEQ=CKo3CgE8Up_NBDdC9t7fCuwHwsdf6nZG2nKcl5-NqnI=NYdHr9JyZZNKPLsOf_VmtZ-3X2B1azTYfyE4Lf1Fa5w=
> > > 
> > > Well, we can't change the semantics now because there could be programs 
> > > that
> > > aren't expecting a nonzero return from a length == 0 dedupe, so like 
> > > Christoph
> > > said, I'll just change generic/182 and make the VFS wrapper emulate the 
> > > btrfs
> > > behavior so that any subsequent implementation won't hit this.
> > > 
> > > I'll update the clone/clonerange manpages to mention the 0 -> EOF 
> > > behavior.
> > 
> > Its fine with me if we change btrfs to do the 0->EOF.  It's a corner case
> > I'm happy to include.
> > 
> > -chris
> 
> Yeah, I think it's a nice shortcut. Are there any programs which
> wouldn't want this, though? It's a milder sort of correctness problem
> since dedupe is "safe", but maybe there's some tool which is being dumb
> and trying to dedupe nothing.

 The only problems I can see here is some program that calls dedupe with
a length == 0 /and/ doesn't expect a non-zero return value... or gets confused
that bytes_deduped > 0.  I don't think duperemove has either of those problems.
Is that the only client?

--D

> 
> -- 
> Omar
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] Btrfs scrub sometime recalculate wrong parity in raid5: take two

2016-07-15 Thread Andrei Borzenkov
15.07.2016 19:29, Chris Mason пишет:
>
>> However I have to point out that this kind of test is very
>> difficult to do: the file-cache could lead to read an old data, so please
>> suggestion about how flush the cache are good (I do some sync,
>> unmount the filesystem and perform "echo 3 >/proc/sys/vm/drop_caches",
>> but sometime it seems not enough).
> 
> O_DIRECT should handle the cache flushing for you.
> 

There is also BLKFLSBUF ioctl (blockdev --flushbufs on shell level).
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] Btrfs scrub sometime recalculate wrong parity in raid5: take two

2016-07-15 Thread Goffredo Baroncelli
On 2016-07-15 06:39, Andrei Borzenkov wrote:
> 15.07.2016 00:20, Chris Mason пишет:
>>
>>
>> On 07/12/2016 05:50 PM, Goffredo Baroncelli wrote:
>>> Hi All,
>>>
>>> I developed a new btrfs command "btrfs insp phy"[1] to further
>>> investigate this bug [2]. Using "btrfs insp phy" I developed a script
>>> to trigger the bug. The bug is not always triggered, but most of time
>>> yes.
>>>
>>> Basically the script create a raid5 filesystem (using three
>>> loop-device on three file called disk[123].img); on this filesystem 
> 
> Are those devices themselves on btrfs? Just to avoid any sort of
> possible side effects?

Good question. However the files are stored on a ext4 filesystem (but I don't 
know if this is better or worse)

> 
>>> it is create a file. Then using "btrfs insp phy", the physical
>>> placement of the data on the device are computed.
>>>
>>> First the script checks that the data are the right one (for data1,
>>> data2 and parity), then it corrupt the data:
>>>
>>> test1: the parity is corrupted, then scrub is ran. Then the (data1,
>>> data2, parity) data on the disk are checked. This test goes fine all
>>> the times
>>>
>>> test2: data2 is corrupted, then scrub is ran. Then the (data1, data2,
>>> parity) data on the disk are checked. This test fail most of the time:
>>> the data on the disk is not correct; the parity is wrong. Scrub
>>> sometime reports "WARNING: errors detected during scrubbing,
>>> corrected" and sometime reports "ERROR: there are uncorrectable
>>> errors". But this seems unrelated to the fact that the data is
>>> corrupetd or not
>>> test3: like test2, but data1 is corrupted. The result are the same as
>>> above.
>>>
>>>
>>> test4: data2 is corrupted, the the file is read. The system doesn't
>>> return error (the data seems to be fine); but the data2 on the disk is
>>> still corrupted.
>>>
>>>
>>> Note: data1, data2, parity are the disk-element of the raid5 stripe-
>>>
>>> Conclusion:
>>>
>>> most of the time, it seems that btrfs-raid5 is not capable to rebuild
>>> parity and data. Worse the message returned by scrub is incoherent by
>>> the status on the disk. The tests didn't fail every time; this
>>> complicate the diagnosis. However my script fails most of the time.
>>
>> Interesting, thanks for taking the time to write this up.  Is the
>> failure specific to scrub?  Or is parity rebuild in general also failing
>> in this case?
>>
> 
> How do you rebuild parity without scrub as long as all devices appear to
> be present?

I corrupted the data, then I read the file. The data has to be correct on
the basis of the parity. Even in this case I found problem.

> 
> 
> 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli 
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] Btrfs scrub sometime recalculate wrong parity in raid5: take two

2016-07-15 Thread Chris Mason



On 07/15/2016 12:28 PM, Goffredo Baroncelli wrote:

On 2016-07-14 23:20, Chris Mason wrote:



On 07/12/2016 05:50 PM, Goffredo Baroncelli wrote:

Hi All,

I developed a new btrfs command "btrfs insp phy"[1] to further
investigate this bug [2]. Using "btrfs insp phy" I developed a
script to trigger the bug. The bug is not always triggered, but
most of time yes.

Basically the script create a raid5 filesystem (using three
loop-device on three file called disk[123].img); on this filesystem
it is create a file. Then using "btrfs insp phy", the physical
placement of the data on the device are computed.

First the script checks that the data are the right one (for data1,
data2 and parity), then it corrupt the data:

test1: the parity is corrupted, then scrub is ran. Then the (data1,
data2, parity) data on the disk are checked. This test goes fine
all the times

test2: data2 is corrupted, then scrub is ran. Then the (data1,
data2, parity) data on the disk are checked. This test fail most of
the time: the data on the disk is not correct; the parity is wrong.
Scrub sometime reports "WARNING: errors detected during scrubbing,
corrected" and sometime reports "ERROR: there are uncorrectable
errors". But this seems unrelated to the fact that the data is
corrupetd or not test3: like test2, but data1 is corrupted. The
result are the same as above.


test4: data2 is corrupted, the the file is read. The system doesn't
return error (the data seems to be fine); but the data2 on the disk
is still corrupted.


Note: data1, data2, parity are the disk-element of the raid5
stripe-

Conclusion:

most of the time, it seems that btrfs-raid5 is not capable to
rebuild parity and data. Worse the message returned by scrub is
incoherent by the status on the disk. The tests didn't fail every
time; this complicate the diagnosis. However my script fails most
of the time.


Interesting, thanks for taking the time to write this up.  Is the
failure specific to scrub?  Or is parity rebuild in general also
failing in this case?


Test #4 handles this case: I corrupt the data, and when I read
it the data is good. So parity is used but the data on the platter
are still bad.

However I have to point out that this kind of test is very
difficult to do: the file-cache could lead to read an old data, so please
suggestion about how flush the cache are good (I do some sync,
unmount the filesystem and perform "echo 3 >/proc/sys/vm/drop_caches",
but sometime it seems not enough).


O_DIRECT should handle the cache flushing for you.

-chris

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] Btrfs scrub sometime recalculate wrong parity in raid5: take two

2016-07-15 Thread Goffredo Baroncelli
On 2016-07-14 23:20, Chris Mason wrote:
> 
> 
> On 07/12/2016 05:50 PM, Goffredo Baroncelli wrote:
>> Hi All,
>> 
>> I developed a new btrfs command "btrfs insp phy"[1] to further
>> investigate this bug [2]. Using "btrfs insp phy" I developed a
>> script to trigger the bug. The bug is not always triggered, but
>> most of time yes.
>> 
>> Basically the script create a raid5 filesystem (using three
>> loop-device on three file called disk[123].img); on this filesystem
>> it is create a file. Then using "btrfs insp phy", the physical
>> placement of the data on the device are computed.
>> 
>> First the script checks that the data are the right one (for data1,
>> data2 and parity), then it corrupt the data:
>> 
>> test1: the parity is corrupted, then scrub is ran. Then the (data1,
>> data2, parity) data on the disk are checked. This test goes fine
>> all the times
>> 
>> test2: data2 is corrupted, then scrub is ran. Then the (data1,
>> data2, parity) data on the disk are checked. This test fail most of
>> the time: the data on the disk is not correct; the parity is wrong.
>> Scrub sometime reports "WARNING: errors detected during scrubbing,
>> corrected" and sometime reports "ERROR: there are uncorrectable
>> errors". But this seems unrelated to the fact that the data is
>> corrupetd or not test3: like test2, but data1 is corrupted. The
>> result are the same as above.
>> 
>> 
>> test4: data2 is corrupted, the the file is read. The system doesn't
>> return error (the data seems to be fine); but the data2 on the disk
>> is still corrupted.
>> 
>> 
>> Note: data1, data2, parity are the disk-element of the raid5
>> stripe-
>> 
>> Conclusion:
>> 
>> most of the time, it seems that btrfs-raid5 is not capable to
>> rebuild parity and data. Worse the message returned by scrub is
>> incoherent by the status on the disk. The tests didn't fail every
>> time; this complicate the diagnosis. However my script fails most
>> of the time.
> 
> Interesting, thanks for taking the time to write this up.  Is the
> failure specific to scrub?  Or is parity rebuild in general also
> failing in this case?

Test #4 handles this case: I corrupt the data, and when I read
it the data is good. So parity is used but the data on the platter
are still bad.

However I have to point out that this kind of test is very
difficult to do: the file-cache could lead to read an old data, so please
suggestion about how flush the cache are good (I do some sync, 
unmount the filesystem and perform "echo 3 >/proc/sys/vm/drop_caches", 
but sometime it seems not enough).



> 
> -chris
> 

BR
G.Baroncelli
-- 
gpg @keyserver.linux.it: Goffredo Baroncelli 
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: New btrfs sub command: btrfs inspect physical-find

2016-07-15 Thread Goffredo Baroncelli
On 2016-07-14 23:45, Chris Mason wrote:
> 
> 
> On 07/12/2016 05:40 PM, Goffredo Baroncelli wrote:
>> Hi All,
>> 
>> the enclosed patch adds a new btrfs sub command: "btrfs inspect
>> physical-find". The aim of this new command is to show the physical
>> placement on the disk of a file. Currently it handles all the
>> profiles (single, dup, raid1/10/5/6). I develop this command in
>> order to show some bug in btrfs RAID5 profile (see next email).
> 
> I've done this manually from time to time, and love the idea of
> having a helper for it.  Can I talk you into adding a way to save the
> contents of the block without having to use dd?  btrfs-map-logical
> does this now, but not via the search ioctl and not by filename.
> 
> say:
> 
> btrfs inspect physical-find -c  -o   
> offset

I prefer to add another command to do that (like btrfs insp physical-dump). And 
I will add as constraint like 
offset % blocksize == 0
this in order to avoid handling data spread different stripes/chunks.

However  has different meaning:

single/raid0 -> means nothing
raid1/raid10 -> means the copy #
raid5/raid6  -> could mean the parity: i.e.
-1 -> first parity (raid5/raid6)
-2 -> 2nd parity (raid6 only)
 
> Looks like you've open coded btrfs_map_logical() below, getting
> output from the search ioctl.  Dave might want that in a more
> centralized place.

I will give a look 

 
> Also, please turn:
> 
> for(;;) if (foo) { statements }
> 
> Into
> 
> for(;;) { if (foo) { statements } }
> 
> I find that much less error prone.

Ok

> 
> -chris
> 
BR
G.Baroncelli

-- 
gpg @keyserver.linux.it: Goffredo Baroncelli 
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] Btrfs: fix unexpected balance crash due to BUG_ON

2016-07-15 Thread David Sterba
On Tue, Jul 12, 2016 at 11:24:21AM -0700, Liu Bo wrote:
> Mounting a btrfs can resume previous balance operations asynchronously.
> An user got a crash when one drive has some corrupt sectors.
> 
> Since balance can cancel itself in case of any error, we can gracefully
> return errors to upper layers and let balance do the cancel job.
> 
> Reported-by: sash 
> Signed-off-by: Liu Bo 

Reviewed-by: David Sterba 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] Btrfs scrub sometime recalculate wrong parity in raid5: take two

2016-07-15 Thread Chris Mason



On 07/15/2016 11:10 AM, Andrei Borzenkov wrote:

15.07.2016 16:20, Chris Mason пишет:


Interesting, thanks for taking the time to write this up.  Is the
failure specific to scrub?  Or is parity rebuild in general also failing
in this case?



How do you rebuild parity without scrub as long as all devices appear to
be present?


If one block is corrupted, the crcs will fail and the kernel will
rebuild parity when you read the file.  You can also use balance instead
of scrub.



As we have seen recently, btrfs does not compute, stores or verifies
checksum of RAID56 parity. So if parity is corrupted, the only way to
detect and correct it is to use scrub. Balance may work by side effect,
because it simply recomputes parity on new data, but it will not fix
wrong parity on existing data.


Ah, I misread your question  Yes, this is definitely where scrub is the 
best tool.  But even if we have to add debugging to force parity 
recomputation, we should see if the problem is only in scrub or deeper.


-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] Btrfs scrub sometime recalculate wrong parity in raid5: take two

2016-07-15 Thread Andrei Borzenkov
15.07.2016 16:20, Chris Mason пишет:
>>>
>>> Interesting, thanks for taking the time to write this up.  Is the
>>> failure specific to scrub?  Or is parity rebuild in general also failing
>>> in this case?
>>>
>>
>> How do you rebuild parity without scrub as long as all devices appear to
>> be present?
> 
> If one block is corrupted, the crcs will fail and the kernel will
> rebuild parity when you read the file.  You can also use balance instead
> of scrub.
> 

As we have seen recently, btrfs does not compute, stores or verifies
checksum of RAID56 parity. So if parity is corrupted, the only way to
detect and correct it is to use scrub. Balance may work by side effect,
because it simply recomputes parity on new data, but it will not fix
wrong parity on existing data.

I agree that if data block is corrupted it will be detected, but then
you do not need to recompute parity in the first place.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH next] Btrfs: fix comparison in __btrfs_map_block()

2016-07-15 Thread Vincent Stehlé
Add missing comparison to op in expression, which was forgotten when doing
the REQ_OP transition.

Fixes: b3d3fa519905 ("btrfs: update __btrfs_map_block for REQ_OP transition")
Signed-off-by: Vincent Stehlé 
Cc: Mike Christie 
Cc: Jens Axboe 
---


Hi,

I saw that issue in linux next.

Not sure if it is too late to squash the fix with commit b3d3fa519905 or
not...

Best regards,

Vincent.


 fs/btrfs/volumes.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index a69203a..6ee1e36 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -5533,7 +5533,7 @@ static int __btrfs_map_block(struct btrfs_fs_info 
*fs_info, int op,
}
 
} else if (map->type & BTRFS_BLOCK_GROUP_DUP) {
-   if (op == REQ_OP_WRITE || REQ_OP_DISCARD ||
+   if (op == REQ_OP_WRITE || op == REQ_OP_DISCARD ||
op == REQ_GET_READ_MIRRORS) {
num_stripes = map->num_stripes;
} else if (mirror_num) {
-- 
2.8.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs: remove obsolete part of comment in statfs

2016-07-15 Thread David Sterba
The mixed blockgroup reporting has been fixed by commit
ae02d1bd070767e109f4a6f1bb1f466e9698a355
"btrfs: fix mixed block count of available space"

Signed-off-by: David Sterba 
---
 fs/btrfs/super.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 60e7179ed4b7..135fe88de568 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -2030,9 +2030,6 @@ static int btrfs_calc_avail_data_space(struct btrfs_root 
*root, u64 *free_bytes)
  * chunk).
  *
  * If metadata is exhausted, f_bavail will be 0.
- *
- * FIXME: not accurate for mixed block groups, total and free/used are ok,
- * available appears slightly larger.
  */
 static int btrfs_statfs(struct dentry *dentry, struct kstatfs *buf)
 {
-- 
2.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] btrfs: hide test-only member under ifdef

2016-07-15 Thread David Sterba
Signed-off-by: David Sterba 
---
 fs/btrfs/ctree.h   | 2 ++
 fs/btrfs/extent-tree.c | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 4274a7bfdaed..47ad088cfa00 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1179,8 +1179,10 @@ struct btrfs_root {
 
u64 highest_objectid;
 
+#ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS
/* only used with CONFIG_BTRFS_FS_RUN_SANITY_TESTS is enabled */
u64 alloc_bytenr;
+#endif
 
u64 defrag_trans_start;
struct btrfs_key defrag_progress;
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 82b912a293ab..f043c1f972de 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -8142,6 +8142,7 @@ struct extent_buffer *btrfs_alloc_tree_block(struct 
btrfs_trans_handle *trans,
bool skinny_metadata = btrfs_fs_incompat(root->fs_info,
 SKINNY_METADATA);
 
+#ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS
if (btrfs_test_is_dummy_root(root)) {
buf = btrfs_init_new_buffer(trans, root, root->alloc_bytenr,
level);
@@ -8149,6 +8150,7 @@ struct extent_buffer *btrfs_alloc_tree_block(struct 
btrfs_trans_handle *trans,
root->alloc_bytenr += blocksize;
return buf;
}
+#endif
 
block_rsv = use_block_rsv(trans, root, blocksize);
if (IS_ERR(block_rsv))
-- 
2.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] Btrfs scrub sometime recalculate wrong parity in raid5: take two

2016-07-15 Thread Chris Mason



On 07/15/2016 12:39 AM, Andrei Borzenkov wrote:

15.07.2016 00:20, Chris Mason пишет:



On 07/12/2016 05:50 PM, Goffredo Baroncelli wrote:

Hi All,

I developed a new btrfs command "btrfs insp phy"[1] to further
investigate this bug [2]. Using "btrfs insp phy" I developed a script
to trigger the bug. The bug is not always triggered, but most of time
yes.

Basically the script create a raid5 filesystem (using three
loop-device on three file called disk[123].img); on this filesystem


Are those devices themselves on btrfs? Just to avoid any sort of
possible side effects?


it is create a file. Then using "btrfs insp phy", the physical
placement of the data on the device are computed.

First the script checks that the data are the right one (for data1,
data2 and parity), then it corrupt the data:

test1: the parity is corrupted, then scrub is ran. Then the (data1,
data2, parity) data on the disk are checked. This test goes fine all
the times

test2: data2 is corrupted, then scrub is ran. Then the (data1, data2,
parity) data on the disk are checked. This test fail most of the time:
the data on the disk is not correct; the parity is wrong. Scrub
sometime reports "WARNING: errors detected during scrubbing,
corrected" and sometime reports "ERROR: there are uncorrectable
errors". But this seems unrelated to the fact that the data is
corrupetd or not
test3: like test2, but data1 is corrupted. The result are the same as
above.


test4: data2 is corrupted, the the file is read. The system doesn't
return error (the data seems to be fine); but the data2 on the disk is
still corrupted.


Note: data1, data2, parity are the disk-element of the raid5 stripe-

Conclusion:

most of the time, it seems that btrfs-raid5 is not capable to rebuild
parity and data. Worse the message returned by scrub is incoherent by
the status on the disk. The tests didn't fail every time; this
complicate the diagnosis. However my script fails most of the time.


Interesting, thanks for taking the time to write this up.  Is the
failure specific to scrub?  Or is parity rebuild in general also failing
in this case?



How do you rebuild parity without scrub as long as all devices appear to
be present?


If one block is corrupted, the crcs will fail and the kernel will 
rebuild parity when you read the file.  You can also use balance instead 
of scrub.


-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Data recovery from a linear multi-disk btrfs file system

2016-07-15 Thread Austin S. Hemmelgarn

On 2016-07-15 05:51, Matt wrote:

Hello

I glued together 6 disks in linear lvm fashion (no RAID) to obtain one large 
file system (see below).  One of the 6 disk failed. What is the best way to 
recover from this?

Thanks to RAID1 of the metadata I can still access the data residing on the 
remaining 5 disks after mounting ro,force.  What I would like to do now is to

1) Find out the names of all the files with missing data
2) Make the file system fully functional (rw) again.

To achieve 2 I wanted to move the data of the disk. This, however, turns out to 
be rather difficult.
 - rsync does not provide a immediate time-out option in case of an IO error
 - Even when I set the time-out for dd_rescue to a minimum, the transfer speed 
is still way too low to move the data
 (> 15TB) off the file system.
Both methods are too slow to move off the data within a reasonable time frame.

Does anybody have a suggestion how to best recover from this? (Our backup is 
incomplete).
I am looking for either a tool to move off the  data — something which gives up 
immediately in case of IO error and log the affected files.
Alternatively I am looking for a btrfs command like  “ btrfs device delete 
missing “ for a non-RAID multi-disk btrfs filesystem.
Would some variant of  "btrfs balance" do something helpful?

Any help is appreciated!

Regards,
Matt

# btrfs fi show
Label: none  uuid: d82fff2c-0232-47dd-a257-04c67141fc83
Total devices 6 FS bytes used 16.83TiB
devid1 size 3.64TiB used 3.47TiB path /dev/sdc
devid2 size 3.64TiB used 3.47TiB path /dev/sdd
devid3 size 3.64TiB used 3.47TiB path /dev/sde
devid4 size 3.64TiB used 3.47TiB path /dev/sdf
devid5 size 1.82TiB used 1.82TiB path /dev/sdb
*** Some devices missing


# btrfs fi df /work
Data, RAID0: total=18.31TiB, used=16.80TiB
Data, single: total=8.00MiB, used=8.00MiB
System, RAID1: total=8.00MiB, used=896.00KiB
System, single: total=4.00MiB, used=0.00B
Metadata, RAID1: total=34.00GiB, used=30.18GiB
Metadata, single: total=8.00MiB, used=0.00B
GlobalReserve, single: total=512.00MiB, used=0.00B
The tool you want is `btrfs restore`.  You'll need somewhere to put the 
files from this too of course.  That said, given that you had data in 
raid0 mode, you're not likely to get much other than very small files 
back out of this, and given other factors, you're not likely to get what 
you would consider reasonable performance out of this either.


Your best bet to get a working filesystem again would be to just 
recreate it from scratch, there's not much else that can be done when 
you've got a raid0 profile and have lost a disk.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: allocate exact page array size in extent_buffer

2016-07-15 Thread Chandan Rajendra
On Friday, July 15, 2016 11:44:06 AM David Sterba wrote:
> On Fri, Jul 15, 2016 at 11:47:07AM +0530, Chandan Rajendra wrote:
> > On Thursday, July 14, 2016 02:29:32 PM David Sterba wrote:
> > > The calculation of extent_buffer::pages size was done for 4k PAGE_SIZE,
> > > but this wastes 15 unused pointers on arches with large page size. Eg.
> > > on ppc64 this gives 15 * 8 = 120 bytes.
> > >
> > 
> > The non PAGE_SIZE aligned extent buffer usage in page straddling tests in
> > test_eb_bitmaps() need atleast one more page. So how about the following ...
> > 
> > #define INLINE_EXTENT_BUFFER_PAGES(BTRFS_MAX_METADATA_BLOCKSIZE / 
> > PAGE_SIZE + 1)
> 
> Could the extra page pointer be normally used? Ie. not just for the sake
> of the tests. I'd rather not waste the bytes. As a compromise, we can do +1
> only if the tests are compiled in.
> 

I don't see any other scenario where the extra page pointer gets used. Also, I
just executed fstests with your patch applied and disabling self-tests from
the kernel configuration. The tests ran fine.

-- 
chandan

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: mount btrfs takes 30 minutes, btrfs check runs out of memory

2016-07-15 Thread Christian Rohmann
Hey Qu, all

On 07/15/2016 05:56 AM, Qu Wenruo wrote:
> 
> The good news is, we have patch to slightly speedup the mount, by
> avoiding reading out unrelated tree blocks.
> 
> In our test environment, it takes 15% less time to mount a fs filled
> with 16K files(2T used space).
> 
> https://patchwork.kernel.org/patch/9021421/

I have a 30TB RAID6 filesystem with compression on and I've seen mount
times of up to 20 minutes (!).

I don't want to sound unfair, but 15% improvement is good, but not in
the league where BTRFS needs to be.
Do I understand you comments correctly that further improvement would
result in a change of the on-disk format?



Thanks and with regards

Christian
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Data recovery from a linear multi-disk btrfs file system

2016-07-15 Thread Matt
Hello

I glued together 6 disks in linear lvm fashion (no RAID) to obtain one large 
file system (see below).  One of the 6 disk failed. What is the best way to 
recover from this?

Thanks to RAID1 of the metadata I can still access the data residing on the 
remaining 5 disks after mounting ro,force.  What I would like to do now is to 

1) Find out the names of all the files with missing data
2) Make the file system fully functional (rw) again.

To achieve 2 I wanted to move the data of the disk. This, however, turns out to 
be rather difficult. 
 - rsync does not provide a immediate time-out option in case of an IO error
 - Even when I set the time-out for dd_rescue to a minimum, the transfer speed 
is still way too low to move the data
 (> 15TB) off the file system.
Both methods are too slow to move off the data within a reasonable time frame. 

Does anybody have a suggestion how to best recover from this? (Our backup is 
incomplete).
I am looking for either a tool to move off the  data — something which gives up 
immediately in case of IO error and log the affected files.
Alternatively I am looking for a btrfs command like  “ btrfs device delete 
missing “ for a non-RAID multi-disk btrfs filesystem.
Would some variant of  "btrfs balance" do something helpful?

Any help is appreciated!

Regards,
Matt

# btrfs fi show
Label: none  uuid: d82fff2c-0232-47dd-a257-04c67141fc83
Total devices 6 FS bytes used 16.83TiB
devid1 size 3.64TiB used 3.47TiB path /dev/sdc
devid2 size 3.64TiB used 3.47TiB path /dev/sdd
devid3 size 3.64TiB used 3.47TiB path /dev/sde
devid4 size 3.64TiB used 3.47TiB path /dev/sdf
devid5 size 1.82TiB used 1.82TiB path /dev/sdb
*** Some devices missing


# btrfs fi df /work
Data, RAID0: total=18.31TiB, used=16.80TiB
Data, single: total=8.00MiB, used=8.00MiB
System, RAID1: total=8.00MiB, used=896.00KiB
System, single: total=4.00MiB, used=0.00B
Metadata, RAID1: total=34.00GiB, used=30.18GiB
Metadata, single: total=8.00MiB, used=0.00B
GlobalReserve, single: total=512.00MiB, used=0.00B

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs-progs: fix btrfs-map-logical to only print extent mapping info

2016-07-15 Thread David Sterba
On Fri, Jul 15, 2016 at 10:22:52AM +0800, Qu Wenruo wrote:
> 
> 
> At 07/15/2016 09:40 AM, Liu Bo wrote:
> > I have a valid btrfs image which contains,
> > ...
> > item 10 key (1103101952 BLOCK_GROUP_ITEM 1288372224) itemoff 15947 
> > itemsize 24
> > block group used 655360 chunk_objectid 256 flags DATA|RAID5
> > item 11 key (1103364096 EXTENT_ITEM 131072) itemoff 15894 itemsize 
> > 53
> > extent refs 1 gen 11 flags DATA
> > extent data backref root 5 objectid 258 offset 0 count 1
> > item 12 key (1103888384 EXTENT_ITEM 262144) itemoff 15841 itemsize 
> > 53
> > extent refs 1 gen 15 flags DATA
> > extent data backref root 1 objectid 256 offset 0 count 1
> > item 13 key (1104281600 EXTENT_ITEM 262144) itemoff 15788 itemsize 
> > 53
> > extent refs 1 gen 15 flags DATA
> > extent data backref root 1 objectid 257 offset 0 count 1
> > ...
> >
> > The extent [1103364096, 131072) has length 131072, but if we run
> >
> > "btrfs-map-logical -l 1103364096 -b $((65536 * 3)) /dev/sda"
> >
> > it will return mapping info 's of  non-existing extents.
> >
> > It's because it assumes that extents's are contiguous on logical address,
> > when it's not true, after one loop (cur_logical += cur_len) and mapping
> > the next extent, we can get an extent that is out of our search range and
> > we end up with a negative @real_len and printing all mapping infos till
> > the disk end.
> >
> > Signed-off-by: Liu Bo 
> 
> Reviewed-by: Qu Wenruo 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: allocate exact page array size in extent_buffer

2016-07-15 Thread David Sterba
On Fri, Jul 15, 2016 at 11:47:07AM +0530, Chandan Rajendra wrote:
> On Thursday, July 14, 2016 02:29:32 PM David Sterba wrote:
> > The calculation of extent_buffer::pages size was done for 4k PAGE_SIZE,
> > but this wastes 15 unused pointers on arches with large page size. Eg.
> > on ppc64 this gives 15 * 8 = 120 bytes.
> >
> 
> The non PAGE_SIZE aligned extent buffer usage in page straddling tests in
> test_eb_bitmaps() need atleast one more page. So how about the following ...
> 
> #define INLINE_EXTENT_BUFFER_PAGES(BTRFS_MAX_METADATA_BLOCKSIZE / 
> PAGE_SIZE + 1)

Could the extra page pointer be normally used? Ie. not just for the sake
of the tests. I'd rather not waste the bytes. As a compromise, we can do +1
only if the tests are compiled in.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] Btrfs: fix free space tree bitmaps+tests on big-endian systems

2016-07-15 Thread Chandan Rajendra
On Thursday, July 14, 2016 07:47:04 PM Chris Mason wrote:
> On 07/14/2016 07:31 PM, Omar Sandoval wrote:
> > From: Omar Sandoval 
> >
> > So it turns out that the free space tree bitmap handling has always been
> > broken on big-endian systems. Totally my bad.
> >
> > Patch 1 fixes this. Technically, it's a disk format change for
> > big-endian systems, but it never could have worked before, so I won't go
> > through the trouble of any incompat bits. If you've somehow been using
> > space_cache=v2 on a big-endian system (I doubt anyone is), you're going
> > to want to mount with nospace_cache to clear it and wait for this to go
> > in.
> >
> > Patch 2 fixes a similar error in the sanity tests (it's the same as the
> > v2 I posted here [1]) and patch 3 expands the sanity tests to catch the
> > oversight that patch 1 fixes.
> >
> > Applies to v4.7-rc7. No regressions in xfstests, and the sanity tests
> > pass on x86_64 and MIPS.
> 
> Thanks for fixing this up Omar.  Any big endian friends want to try this 
> out in extended testing and make sure we've nailed it down?
>

Hi Omar & Chris,

I will run fstests with this patchset applied on ppc64 BE and inform you about
the results.

> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-- 
chandan

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: mount btrfs takes 30 minutes, btrfs check runs out of memory

2016-07-15 Thread Kai Krakow
Am Fri, 15 Jul 2016 13:24:45 +0800
schrieb Qu Wenruo :

> > as for defrag, all my partitions are already on
> > autodefrag, so I assume that should be good. Or is manual once in a
> > while a good idea as well?  
> AFAIK autodefrag will only help if you're doing appending write.
> 
> Manual one will help more, but since btrfs has problem defraging
> extents shared by different subvolumes, I doubt the effect if you
> have a lot of subvolumes/snapshots.

"btrfs fi defrag" is said to only defrag metadata if you are pointing
it to directories only without recursion. It could maybe help that case
without unsharing the extents:

find /btrfs-subvol0 -type d -print0 | xargs -0 btrfs fi defrag

-- 
Regards,
Kai

Replies to list-only preferred.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: allocate exact page array size in extent_buffer

2016-07-15 Thread Chandan Rajendra
On Thursday, July 14, 2016 02:29:32 PM David Sterba wrote:
> The calculation of extent_buffer::pages size was done for 4k PAGE_SIZE,
> but this wastes 15 unused pointers on arches with large page size. Eg.
> on ppc64 this gives 15 * 8 = 120 bytes.
>

The non PAGE_SIZE aligned extent buffer usage in page straddling tests in
test_eb_bitmaps() need atleast one more page. So how about the following ...

#define INLINE_EXTENT_BUFFER_PAGES(BTRFS_MAX_METADATA_BLOCKSIZE / PAGE_SIZE 
+ 1)


> Signed-off-by: David Sterba 
> ---
>  fs/btrfs/ctree.h | 6 --
>  fs/btrfs/extent_io.c | 2 ++
>  fs/btrfs/extent_io.h | 8 +++-
>  3 files changed, 9 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index 4274a7bfdaed..f914f6187753 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -66,12 +66,6 @@ struct btrfs_ordered_sum;
>  #define BTRFS_COMPAT_EXTENT_TREE_V0
> 
>  /*
> - * the max metadata block size.  This limit is somewhat artificial,
> - * but the memmove costs go through the roof for larger blocks.
> - */
> -#define BTRFS_MAX_METADATA_BLOCKSIZE 65536
> -
> -/*
>   * we can actually store much bigger names, but lets not confuse the rest
>   * of linux
>   */
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 75533adef998..6f468a1842e6 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -4660,6 +4660,8 @@ __alloc_extent_buffer(struct btrfs_fs_info *fs_info, 
> u64 start,
>   /*
>* Sanity checks, currently the maximum is 64k covered by 16x 4k pages
>*/
> + BUILD_BUG_ON(INLINE_EXTENT_BUFFER_PAGES * PAGE_SIZE
> + != BTRFS_MAX_METADATA_BLOCKSIZE);
>   BUILD_BUG_ON(BTRFS_MAX_METADATA_BLOCKSIZE
>   > MAX_INLINE_EXTENT_BUFFER_SIZE);
>   BUG_ON(len > MAX_INLINE_EXTENT_BUFFER_SIZE);
> diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
> index c0c1c4fef6ce..edfa1a0ab82b 100644
> --- a/fs/btrfs/extent_io.h
> +++ b/fs/btrfs/extent_io.h
> @@ -4,6 +4,12 @@
>  #include 
>  #include "ulist.h"
> 
> +/*
> + * The maximum metadata block size.  This limit is somewhat artificial,
> + * but the memmove costs go through the roof for larger blocks.
> + */
> +#define BTRFS_MAX_METADATA_BLOCKSIZE (65536U)
> +
>  /* bits for the extent state */
>  #define EXTENT_DIRTY (1U << 0)
>  #define EXTENT_WRITEBACK (1U << 1)
> @@ -118,7 +124,7 @@ struct extent_state {
>  #endif
>  };
> 
> -#define INLINE_EXTENT_BUFFER_PAGES 16
> +#define INLINE_EXTENT_BUFFER_PAGES(BTRFS_MAX_METADATA_BLOCKSIZE / 
> PAGE_SIZE)
>  #define MAX_INLINE_EXTENT_BUFFER_SIZE (INLINE_EXTENT_BUFFER_PAGES * 
> PAGE_SIZE)
>  struct extent_buffer {
>   u64 start;
> 

-- 
chandan

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html