date:20180625

btrfs raid10 performance

2018-06-25 Thread Sterling Windmill

I am running a single btrfs RAID10 volume of eight LUKS devices, each
using a 2TB SATA hard drive as a backing store. The SATA drives are a
mixture of Seagate and Western Digital drives, some with RPMs ranging
from 5400 to 7200. Each seems to individually performance test where I
would expect for drives of this caliber. They are all attached to an
LSI PCIe SAS controller and configured in JBOD.

I have a relatively beefy quad core Xeon CPU that supports AES-NI and
don't think LUKS is my bottleneck.

Here's some info from the resulting filesystem:

  btrfs fi df /storage
  Data, RAID10: total=6.30TiB, used=6.29TiB
  System, RAID10: total=8.00MiB, used=560.00KiB
  Metadata, RAID10: total=9.00GiB, used=7.64GiB
  GlobalReserve, single: total=512.00MiB, used=0.00B

In general I see good performance, especially read performance which
is enough to regularly saturate my gigabit network when copying files
from this host via samba. Reads are definitely taking advantage of the
multiple copies of data available and spreading the load among all
drives.

Writes aren't quite as rosy, however.

When writing files using dd like in this example:

  dd if=/dev/zero of=tempfile bs=1M count=10240 conv=fdatasync,notrun
c status=progress

And running a command like:

  iostat -m 1

to monitor disk I/O, writes seem to only focus on one of the eight
disks at a time, moving from one drive to the next. This results in a
sustained 55-90 MB/sec throughput depending on which disk is being
written to (remember, some have faster spindle speed than others).

Am I wrong to expect btrfs' RAID10 mode to write to multiple disks
simultaneously and to break larger writes into smaller stripes across
my four pairs of disks?

I had trouble identifying whether btrfs RAID10 is writing (64K?)
stripes or (1GB?) blocks to disk in this mode. The latter might make
more sense based upon what I'm seeing?

Anything else I should be trying to narrow down the bottleneck?

Thanks!
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] btrfs: Use iocb to derive pos instead of passing a separate parameter

2018-06-25 Thread Misono Tomohiro

On 2018/06/26 1:20, David Sterba wrote:
> On Mon, Jun 25, 2018 at 01:58:58PM +0900, Misono Tomohiro wrote:
>> So, this is the updated version of 
>> https://patchwork.kernel.org/patch/10063039/
>>
>> This time xfstest is ok and
>>  Reviewed-by: Misono Tomohiro 
> 
> Your comment about invalidate_mapping_pages is also ok, right? As
> filemap_fdatawait_range and invalidate_mapping_pages use the same
> start/end of the range.
> 

This time local variable 'pos' is kept to have the same value before and
invalidate_mapping_pages() uses it, so it should be ok.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] btrfs: Use iocb to derive pos instead of passing a separate parameter

2018-06-25 Thread Goldwyn Rodrigues

On 06-25 18:20, David Sterba wrote:
> On Mon, Jun 25, 2018 at 01:58:58PM +0900, Misono Tomohiro wrote:
> > So, this is the updated version of 
> > https://patchwork.kernel.org/patch/10063039/
> > 
> > This time xfstest is ok and
> >  Reviewed-by: Misono Tomohiro 

Yes, thats right.

> 
> Your comment about invalidate_mapping_pages is also ok, right? As
> filemap_fdatawait_range and invalidate_mapping_pages use the same
> start/end of the range.

I did not mess around with other functions which are affected by
iocb->ki_pos (as opposed to local pos). So, this should be safe with
respect to invalidate_mapping_pages().

-- 
Goldwyn
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs balance did not progress after 12H, hang on reboot, btrfs check --repair kills the system still

2018-06-25 Thread Marc MERLIN

On Mon, Jun 25, 2018 at 01:07:10PM -0400, Austin S. Hemmelgarn wrote:
> > - mount -o recovery still hung
> > - mount -o ro did not hang though
> One tip here specifically, if you had to reboot during a balance and the FS
> hangs when it mounts, try mounting with `-o skip_balance`.  That should
> pause the balance instead of resuming it on mount, at which point you should
> also be able to cancel it without it hanging.

Very good tip, I have this in all my mountpoints :)

#LABEL=dshelf2 /mnt/btrfs_pool2 btrfs defaults,compress=lzo,skip_balance,noatime

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/3] fs: add initial bh_result->b_private value to __blockdev_direct_IO()

2018-06-25 Thread David Sterba

On Mon, May 14, 2018 at 06:35:48PM +0200, David Sterba wrote:
> On Fri, May 11, 2018 at 01:30:01PM -0700, Omar Sandoval wrote:
> > On Fri, May 11, 2018 at 09:05:38PM +0100, Al Viro wrote:
> > > On Thu, May 10, 2018 at 11:30:10PM -0700, Omar Sandoval wrote:
> > > >  do_blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
> > > >   struct block_device *bdev, struct iov_iter *iter,
> > > >   get_block_t get_block, dio_iodone_t end_io,
> > > > - dio_submit_t submit_io, int flags)
> > > > + dio_submit_t submit_io, int flags, void *private)
> > > 
> > > Oh, dear...  That's what, 9 arguments?  I agree that the hack in question
> > > is obscene, but so is this ;-/
> > 
> > So looking at these one by one, obviously needed:
> > 
> > - iocb
> > - inode
> > - iter
> > 
> > bdev is almost always inode->i_sb->s_bdev, except for Btrfs :(
> > 
> > These could _maybe_ go in struct kiocb:
> > 
> > - flags could maybe be folded into ki_flags
> > - private could maybe go in iocb->private, but I haven't yet read
> >   through to figure out if we're already using iocb->private for direct
> >   I/O
> 
> I think the kiocb::private can be used for the purpose. There's only one
> user, ext4, that also passes some DIO data around so it would in line
> with the interface AFAICS.

Omar, do you have an update to the patchset? Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs balance did not progress after 12H, hang on reboot, btrfs check --repair kills the system still

2018-06-25 Thread Austin S. Hemmelgarn


On 2018-06-25 12:07, Marc MERLIN wrote:

On Tue, Jun 19, 2018 at 12:58:44PM -0400, Austin S. Hemmelgarn wrote:

In your situation, I would run "btrfs pause ", wait to hear from
a btrfs developer, and not use the volume whatsoever in the meantime.

I would say this is probably good advice.  I don't really know what's going
on here myself actually, though it looks like the balance got stuck (the
output hasn't changed for over 36 hours, unless you've got an insanely slow
storage array, that's extremely unusual (it should only be moving at most
3GB of data per chunk)).


I didn't hear from any developer, so I had to continue.
- btrfs scrub cancel did not work (hang)
- at reboot mounting the filesystem hung, even with 4.17, which is
   disappointing (it should not hang)
- mount -o recovery still hung
- mount -o ro did not hang though
One tip here specifically, if you had to reboot during a balance and the 
FS hangs when it mounts, try mounting with `-o skip_balance`.  That 
should pause the balance instead of resuming it on mount, at which point 
you should also be able to cancel it without it hanging.


Sigh, why is my FS corrupted again?
Anyway, back to
btrfs check --repair
and, it took all my 32GB of RAM on a system I can't add more RAM to, so
I'm hosed. I'll note in passing (and it's not ok at all) that check
--repair after a 20 to 30mn pause, takes all the kernel RAM more quickly
than the system can OOM or log anything, and just deadlocks it.
This is repeateable and totally not ok :(

I'm now left with btrfs-progs git master, and lowmem which finally does
a bit of repair.
So far:
gargamel:~# btrfs check --mode=lowmem --repair -p /dev/mapper/dshelf2
enabling repair mode
WARNING: low-memory mode repair support is only partial
Checking filesystem on /dev/mapper/dshelf2
UUID: 0f1a0c9f-4e54-4fa7-8736-fd50818ff73d
Fixed 0 roots.
ERROR: extent[84302495744, 69632] referencer count mismatch (root: 21872, 
owner: 374857, offset: 3407872) wanted: 3, have: 4
Created new chunk [18457780224000 1073741824]
Delete backref in extent [84302495744 69632]
ERROR: extent[84302495744, 69632] referencer count mismatch (root: 22911, 
owner: 374857, offset: 3407872) wanted: 3, have: 4
Delete backref in extent [84302495744 69632]
ERROR: extent[125712527360, 12214272] referencer count mismatch (root: 21872, 
owner: 374857, offset: 114540544) wanted: 181, have: 240
Delete backref in extent [125712527360 12214272]
ERROR: extent[125730848768, 5111808] referencer count mismatch (root: 21872, 
owner: 374857, offset: 126754816) wanted: 68, have: 115
Delete backref in extent [125730848768 5111808]
ERROR: extent[125730848768, 5111808] referencer count mismatch (root: 22911, 
owner: 374857, offset: 126754816) wanted: 68, have: 115
Delete backref in extent [125730848768 5111808]
ERROR: extent[125736914944, 6037504] referencer count mismatch (root: 21872, 
owner: 374857, offset: 131866624) wanted: 115, have: 143
Delete backref in extent [125736914944 6037504]
ERROR: extent[125736914944, 6037504] referencer count mismatch (root: 22911, 
owner: 374857, offset: 131866624) wanted: 115, have: 143
Delete backref in extent [125736914944 6037504]
ERROR: extent[129952120832, 20242432] referencer count mismatch (root: 21872, 
owner: 374857, offset: 148234240) wanted: 302, have: 431
Delete backref in extent [129952120832 20242432]
ERROR: extent[129952120832, 20242432] referencer count mismatch (root: 22911, 
owner: 374857, offset: 148234240) wanted: 356, have: 433
Delete backref in extent [129952120832 20242432]
ERROR: extent[134925357056, 11829248] referencer count mismatch (root: 21872, 
owner: 374857, offset: 180371456) wanted: 161, have: 240
Delete backref in extent [134925357056 11829248]
ERROR: extent[134925357056, 11829248] referencer count mismatch (root: 22911, 
owner: 374857, offset: 180371456) wanted: 162, have: 240
Delete backref in extent [134925357056 11829248]
ERROR: extent[147895111680, 12345344] referencer count mismatch (root: 21872, 
owner: 374857, offset: 192200704) wanted: 170, have: 249
Delete backref in extent [147895111680 12345344]
ERROR: extent[147895111680, 12345344] referencer count mismatch (root: 22911, 
owner: 374857, offset: 192200704) wanted: 172, have: 251
Delete backref in extent [147895111680 12345344]
ERROR: extent[150850146304, 17522688] referencer count mismatch (root: 21872, 
owner: 374857, offset: 217653248) wanted: 348, have: 418
Delete backref in extent [150850146304 17522688]
ERROR: extent[156909494272, 55320576] referencer count mismatch (root: 22911, 
owner: 374857, offset: 235175936) wanted: 555, have: 1449
Deleted root 2 item[156909494272, 178, 5476627808561673095]
ERROR: extent[156909494272, 55320576] referencer count mismatch (root: 21872, 
owner: 374857, offset: 235175936) wanted: 556, have: 1452
Deleted root 2 item[156909494272, 178, 7338474132555182983]

At the rate it's going, it'll probably take days though, it's already been 36H

Marc



--
To unsubscribe from this list: send the line "unsubscribe

[PATCH v2] Btrfs: fix regression in btrfs_page_mkwrite() from vm_fault_t conversion

2018-06-25 Thread Chris Mason

The vm_fault_t conversion commit introduced a ret2 variable for tracking
the integer return values from internal btrfs functions.  It was
sometimes returning VM_FAULT_LOCKED for pages that were actually invalid
and had been removed from the radix.  Something like this:

ret2 = btrfs_delalloc_reserve_space() // returns zero on success

lock_page(page)
if (page->mapping != inode->i_mapping)
goto out_unlock;

...

out_unlock:
if (!ret2) {
...
return VM_FAULT_LOCKED;
}

This ends up triggering this WARNING in btrfs_destroy_inode()
WARN_ON(BTRFS_I(inode)->block_rsv.size);

xfstests generic/095 was able to reliably reproduce the errors.

Since out_unlock: is only used for errors, this fix moves it below the
if (!ret2) check we use to return VM_FAULT_LOCKED for success.

Fixes: a528a2415087 (btrfs: change return type of btrfs_page_mkwrite to 
vm_fault_t)
Signed-off-by: Chris Mason 
---

 Changes since v1: don't set the vmfault_t 'ret' to zero, just move our
 goto taret around around instead.

 fs/btrfs/inode.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 193f933..63f713a 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -9036,13 +9036,14 @@ vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf)
 
unlock_extent_cached(io_tree, page_start, page_end, _state);
 
-out_unlock:
if (!ret2) {
btrfs_delalloc_release_extents(BTRFS_I(inode), PAGE_SIZE, true);
sb_end_pagefault(inode->i_sb);
extent_changeset_free(data_reserved);
return VM_FAULT_LOCKED;
}
+
+out_unlock:
unlock_page(page);
 out:
btrfs_delalloc_release_extents(BTRFS_I(inode), PAGE_SIZE, (ret != 0));
-- 
2.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: unsolvable technical issues?

2018-06-25 Thread Hugo Mills

On Mon, Jun 25, 2018 at 06:43:38PM +0200, waxhead wrote:
[snip]
> I hope I am not asking for too much (but I know I probably am), but
> I suggest that having a small snippet of information on the status
> page showing a little bit about what is either currently the
> development focus , or what people are known for working at would be
> very valuable for users and it may of course work both ways, such as
> exciting people or calming them down. ;)
> 
> For example something simple like a "development focus" list...
> 2018-Q4: (planned) Renaming the grotesque "RAID" terminology
> 2018-Q3: (planned) Magical feature X
> 2018-Q2: N-Way mirroring
> 2018-Q1: Feature work "RAID"5/6
> 
> I think it would be good for people living their lives outside as it
> would perhaps spark some attention from developers and perhaps even
> media as well.

   I started doing this a couple of years ago, but it turned out to be
impossible to keep even vaguely accurate or up to date, without going
round and bugging the developers individually on a per-release
basis. I don't think it's going to happen.

   Hugo.

-- 
Hugo Mills | emacs: Emacs Makes A Computer Slow.
hugo@... carfax.org.uk |
http://carfax.org.uk/  |
PGP: E2AB1DE4  |


signature.asc
Description: Digital signature

Re: btrfs balance did not progress after 12H, hang on reboot, btrfs check --repair kills the system still

2018-06-25 Thread Marc MERLIN

On Mon, Jun 25, 2018 at 06:24:37PM +0200, Hans van Kranenburg wrote:
> >> output hasn't changed for over 36 hours, unless you've got an insanely slow
> >> storage array, that's extremely unusual (it should only be moving at most
> >> 3GB of data per chunk)).
> > 
> > I didn't hear from any developer, so I had to continue.
> > - btrfs scrub cancel did not work (hang)
> 
> Did you mean balance cancel? It waits until the current block group is
> finished.

Yes, I meant that, thanks for correcting me.  And you're correct that
because it was hung, cancel wasn't going to go anywhere.
At least my filesystem was still working at the time (as in IO was going
on just fine)

> > - at reboot mounting the filesystem hung, even with 4.17, which is
> >   disappointing (it should not hang)
> > - mount -o recovery still hung
> > - mount -o ro did not hang though
> > 
> > Sigh, why is my FS corrupted again?
> 
> Again? Do you think balance is corrupting the filesystem? Or have there
> been previous btrfs check --repair operations which made smaller
> problems bigger in the past?

Honestly, I don't fully remember at this point, I keep notes, but not
detailled enough and it's been a little while.
I know I've had to delete/recreate this filesystem twice already over
the last years, but I'm not fully certain I remember when this one was
last wiped.
Yes, I do run balance along with scrub once a month:

btrfs balance start -musage=0 -v $mountpoint 2>&1 | grep -Ev "$FILTER"
# After metadata, let's do data:
btrfs balance start -dusage=0 -v $mountpoint 2>&1 | grep -Ev "$FILTER"
btrfs balance start -dusage=20 -v $mountpoint 2>&1 | grep -Ev "$FILTER"
echo btrfs scrub start -Bd $mountpoint
ionice -c 3 nice -10 btrfs scrub start -Bd $mountpoint

Hard to say if balance has damaged my filesystem over time, but it's
definitely possible.

> Am I right to interpret the messages below, and see that you have
> extents that are referenced hundreds of times?

I'm not certain, but it's a backup server with many blocks that are the same, 
so it 
could be some COW stuff, even if I didn't run any dedupe commands myself.

> Is there heavy snapshotting or deduping going on in this filesystem? If
> so, it's not surprising balance will get a hard time moving extents
> around, since it has to update all of the metadata for each extent again
> in hundreds of places.

There is some snapshotting, but maybe around 20 or so per subvolume, not 
hundreds.

> Did you investigate what balance was doing if it takes long? Is is using
> cpu all the time, or is it reading from disk slowly (random reads) or is
> it writing to disk all the time at full speed?

I couldn't see what it was doing, but it's running in the kernel, is it not?
(or can you just strace the user space command?)
Either way, it's too late for that now, and given that it didn't make progress 
of
a single block in 36H, I'm assuming it was well deadlocked.

Thanks for the reply.
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: unsolvable technical issues?

2018-06-25 Thread waxhead





David Sterba wrote:

On Fri, Jun 22, 2018 at 01:13:31AM +0200, waxhead wrote:

According to this:

https://stratis-storage.github.io/StratisSoftwareDesign.pdf
Page 4 , section 1.2

It claims that BTRFS still have significant technical issues that may
never be resolved.
Could someone shed some light on exactly what these technical issues
might be?! What are BTRFS biggest technical problems?


The subject you write is 'unsolvable', which I read as 'impossible to
solve', eg. on the design level. I'm not aware of such issues.

Alright , so I interpret this as there is no showstopper regarding 
implementation of existing and planned features...



If this is about issues that are difficult either to implement or
getting right, there are a few known ones.

Alright again, and I interpret this as there might be some code that is 
not flexible enough and changing that might affect working / stable 
parts of the code so therefore other solutions are looked at which is 
not that uncommon for software. Apart from not listing the known issues 
I think I got my questions answered :) and now it is perhaps finally 
appropriate to file a request at the Stratis bugtracker to ask what 
specifically they are referring to.



If you forget about the "RAID"5/6 like features then the only annoyances
that I have with BTRFS so far is...

1. Lack of per subvolume "RAID" levels
2. Lack of not using the deviceid to re-discover and re-add dropped devices

And that's about it really...


This could quickly turn into 'my faviourite bug/feature' list that can
be very long. The most asked for are raid56, and performance of qgroups.

Qu Wenruo improved some of the core problems and Jeff is working on the
performance problem. So there are people working on that.

On the raid56 front, there were some recent updates that fixed some
bugs, but the fix for write hole is still missing so we can't raise the
status yet.  I have some some good news but nobody should get too
excited until the code lands.

I have prototype for the N-copy raid (where N is 3 or 4).  This will
provide the underlying infrastructure for the raid5/6 logging mechanism,
the rest can be taken from Liu Bo's patchset sent some time ago.  In the
end the N-copy can be used for data and metadata too, independently and
flexibly switched via the balance filters. This will cost one
incompatibility bit.


I hope I am not asking for too much (but I know I probably am), but I 
suggest that having a small snippet of information on the status page 
showing a little bit about what is either currently the development 
focus , or what people are known for working at would be very valuable 
for users and it may of course work both ways, such as exciting people 
or calming them down. ;)


For example something simple like a "development focus" list...
2018-Q4: (planned) Renaming the grotesque "RAID" terminology
2018-Q3: (planned) Magical feature X
2018-Q2: N-Way mirroring
2018-Q1: Feature work "RAID"5/6

I think it would be good for people living their lives outside as it 
would perhaps spark some attention from developers and perhaps even 
media as well.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] btrfs: remove warnings superseded by refcount_t usage

2018-06-25 Thread David Sterba

There are several WARN_ON calls that catch incrorrect reference counter
updates, but this is what the refcount_t does already:

* refcount_inc from 0 will warn
* refcount_dec_and_test from 0 will warn

Signed-off-by: David Sterba 
---
 fs/btrfs/delayed-ref.h | 1 -
 fs/btrfs/extent_map.c  | 2 +-
 fs/btrfs/transaction.c | 1 -
 fs/btrfs/volumes.c | 1 -
 4 files changed, 1 insertion(+), 4 deletions(-)

diff --git a/fs/btrfs/delayed-ref.h b/fs/btrfs/delayed-ref.h
index ea1aecb6a50d..16ee92c541bf 100644
--- a/fs/btrfs/delayed-ref.h
+++ b/fs/btrfs/delayed-ref.h
@@ -210,7 +210,6 @@ btrfs_free_delayed_extent_op(struct btrfs_delayed_extent_op 
*op)
 
 static inline void btrfs_put_delayed_ref(struct btrfs_delayed_ref_node *ref)
 {
-   WARN_ON(refcount_read(>refs) == 0);
if (refcount_dec_and_test(>refs)) {
WARN_ON(ref->in_tree);
switch (ref->type) {
diff --git a/fs/btrfs/extent_map.c b/fs/btrfs/extent_map.c
index 6648d55e5339..6808ad4ed3c9 100644
--- a/fs/btrfs/extent_map.c
+++ b/fs/btrfs/extent_map.c
@@ -72,7 +72,7 @@ void free_extent_map(struct extent_map *em)
 {
if (!em)
return;
-   WARN_ON(refcount_read(>refs) == 0);
+
if (refcount_dec_and_test(>refs)) {
WARN_ON(extent_map_in_tree(em));
WARN_ON(!list_empty(>list));
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 4485eae41e88..f6b68099b767 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -41,7 +41,6 @@ static const unsigned int 
btrfs_blocked_trans_types[TRANS_STATE_MAX] = {
 
 void btrfs_put_transaction(struct btrfs_transaction *transaction)
 {
-   WARN_ON(refcount_read(>use_count) == 0);
if (refcount_dec_and_test(>use_count)) {
BUG_ON(!list_empty(>list));
WARN_ON(!RB_EMPTY_ROOT(>delayed_refs.href_root));
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index e034ad9e23b4..f64d26b4f278 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -5330,7 +5330,6 @@ static struct btrfs_bio *alloc_btrfs_bio(int 
total_stripes, int real_stripes)
 
 void btrfs_get_bbio(struct btrfs_bio *bbio)
 {
-   WARN_ON(!refcount_read(>refs));
refcount_inc(>refs);
 }
 
-- 
2.17.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs balance did not progress after 12H, hang on reboot, btrfs check --repair kills the system still

2018-06-25 Thread Hans van Kranenburg

On 06/25/2018 06:07 PM, Marc MERLIN wrote:
> On Tue, Jun 19, 2018 at 12:58:44PM -0400, Austin S. Hemmelgarn wrote:
>>> In your situation, I would run "btrfs pause ", wait to hear from
>>> a btrfs developer, and not use the volume whatsoever in the meantime.
>> I would say this is probably good advice.  I don't really know what's going
>> on here myself actually, though it looks like the balance got stuck (the
>> output hasn't changed for over 36 hours, unless you've got an insanely slow
>> storage array, that's extremely unusual (it should only be moving at most
>> 3GB of data per chunk)).
> 
> I didn't hear from any developer, so I had to continue.
> - btrfs scrub cancel did not work (hang)

Did you mean balance cancel? It waits until the current block group is
finished.

> - at reboot mounting the filesystem hung, even with 4.17, which is
>   disappointing (it should not hang)
> - mount -o recovery still hung
> - mount -o ro did not hang though
> 
> Sigh, why is my FS corrupted again?

Again? Do you think balance is corrupting the filesystem? Or have there
been previous btrfs check --repair operations which made smaller
problems bigger in the past?

> Anyway, back to 
> btrfs check --repair
> and, it took all my 32GB of RAM on a system I can't add more RAM to, so
> I'm hosed. I'll note in passing (and it's not ok at all) that check
> --repair after a 20 to 30mn pause, takes all the kernel RAM more quickly
> than the system can OOM or log anything, and just deadlocks it.
> This is repeateable and totally not ok :(
> 
> I'm now left with btrfs-progs git master, and lowmem which finally does
> a bit of repair.
> So far:
> gargamel:~# btrfs check --mode=lowmem --repair -p /dev/mapper/dshelf2  
> enabling repair mode  
> WARNING: low-memory mode repair support is only partial  
> Checking filesystem on /dev/mapper/dshelf2  
> UUID: 0f1a0c9f-4e54-4fa7-8736-fd50818ff73d  
> Fixed 0 roots.  

Am I right to interpret the messages below, and see that you have
extents that are referenced hundreds of times?

Is there heavy snapshotting or deduping going on in this filesystem? If
so, it's not surprising balance will get a hard time moving extents
around, since it has to update all of the metadata for each extent again
in hundreds of places.

Did you investigate what balance was doing if it takes long? Is is using
cpu all the time, or is it reading from disk slowly (random reads) or is
it writing to disk all the time at full speed?

K

> ERROR: extent[84302495744, 69632] referencer count mismatch (root: 21872, 
> owner: 374857, offset: 3407872) wanted: 3, have: 4
> Created new chunk [18457780224000 1073741824]
> Delete backref in extent [84302495744 69632]
> ERROR: extent[84302495744, 69632] referencer count mismatch (root: 22911, 
> owner: 374857, offset: 3407872) wanted: 3, have: 4
> Delete backref in extent [84302495744 69632]
> ERROR: extent[125712527360, 12214272] referencer count mismatch (root: 21872, 
> owner: 374857, offset: 114540544) wanted: 181, have: 240
> Delete backref in extent [125712527360 12214272]
> ERROR: extent[125730848768, 5111808] referencer count mismatch (root: 21872, 
> owner: 374857, offset: 126754816) wanted: 68, have: 115
> Delete backref in extent [125730848768 5111808]
> ERROR: extent[125730848768, 5111808] referencer count mismatch (root: 22911, 
> owner: 374857, offset: 126754816) wanted: 68, have: 115
> Delete backref in extent [125730848768 5111808]
> ERROR: extent[125736914944, 6037504] referencer count mismatch (root: 21872, 
> owner: 374857, offset: 131866624) wanted: 115, have: 143
> Delete backref in extent [125736914944 6037504]
> ERROR: extent[125736914944, 6037504] referencer count mismatch (root: 22911, 
> owner: 374857, offset: 131866624) wanted: 115, have: 143
> Delete backref in extent [125736914944 6037504]
> ERROR: extent[129952120832, 20242432] referencer count mismatch (root: 21872, 
> owner: 374857, offset: 148234240) wanted: 302, have: 431
> Delete backref in extent [129952120832 20242432]
> ERROR: extent[129952120832, 20242432] referencer count mismatch (root: 22911, 
> owner: 374857, offset: 148234240) wanted: 356, have: 433
> Delete backref in extent [129952120832 20242432]
> ERROR: extent[134925357056, 11829248] referencer count mismatch (root: 21872, 
> owner: 374857, offset: 180371456) wanted: 161, have: 240
> Delete backref in extent [134925357056 11829248]
> ERROR: extent[134925357056, 11829248] referencer count mismatch (root: 22911, 
> owner: 374857, offset: 180371456) wanted: 162, have: 240
> Delete backref in extent [134925357056 11829248]
> ERROR: extent[147895111680, 12345344] referencer count mismatch (root: 21872, 
> owner: 374857, offset: 192200704) wanted: 170, have: 249
> Delete backref in extent [147895111680 12345344]
> ERROR: extent[147895111680, 12345344] referencer count mismatch (root: 22911, 
> owner: 374857, offset: 192200704) wanted: 172, have: 251
> Delete backref in extent [147895111680 12345344]
> ERROR: extent[150850146304,

Re: [PATCH] btrfs: Use iocb to derive pos instead of passing a separate parameter

2018-06-25 Thread David Sterba

On Mon, Jun 25, 2018 at 01:58:58PM +0900, Misono Tomohiro wrote:
> So, this is the updated version of 
> https://patchwork.kernel.org/patch/10063039/
> 
> This time xfstest is ok and
>  Reviewed-by: Misono Tomohiro 

Your comment about invalidate_mapping_pages is also ok, right? As
filemap_fdatawait_range and invalidate_mapping_pages use the same
start/end of the range.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs balance did not progress after 12H, hang on reboot, btrfs check --repair kills the system still

2018-06-25 Thread Marc MERLIN

On Tue, Jun 19, 2018 at 12:58:44PM -0400, Austin S. Hemmelgarn wrote:
> > In your situation, I would run "btrfs pause ", wait to hear from
> > a btrfs developer, and not use the volume whatsoever in the meantime.
> I would say this is probably good advice.  I don't really know what's going
> on here myself actually, though it looks like the balance got stuck (the
> output hasn't changed for over 36 hours, unless you've got an insanely slow
> storage array, that's extremely unusual (it should only be moving at most
> 3GB of data per chunk)).

I didn't hear from any developer, so I had to continue.
- btrfs scrub cancel did not work (hang)
- at reboot mounting the filesystem hung, even with 4.17, which is
  disappointing (it should not hang)
- mount -o recovery still hung
- mount -o ro did not hang though

Sigh, why is my FS corrupted again?
Anyway, back to 
btrfs check --repair
and, it took all my 32GB of RAM on a system I can't add more RAM to, so
I'm hosed. I'll note in passing (and it's not ok at all) that check
--repair after a 20 to 30mn pause, takes all the kernel RAM more quickly
than the system can OOM or log anything, and just deadlocks it.
This is repeateable and totally not ok :(

I'm now left with btrfs-progs git master, and lowmem which finally does
a bit of repair.
So far:
gargamel:~# btrfs check --mode=lowmem --repair -p /dev/mapper/dshelf2  
enabling repair mode  
WARNING: low-memory mode repair support is only partial  
Checking filesystem on /dev/mapper/dshelf2  
UUID: 0f1a0c9f-4e54-4fa7-8736-fd50818ff73d  
Fixed 0 roots.  
ERROR: extent[84302495744, 69632] referencer count mismatch (root: 21872, 
owner: 374857, offset: 3407872) wanted: 3, have: 4
Created new chunk [18457780224000 1073741824]
Delete backref in extent [84302495744 69632]
ERROR: extent[84302495744, 69632] referencer count mismatch (root: 22911, 
owner: 374857, offset: 3407872) wanted: 3, have: 4
Delete backref in extent [84302495744 69632]
ERROR: extent[125712527360, 12214272] referencer count mismatch (root: 21872, 
owner: 374857, offset: 114540544) wanted: 181, have: 240
Delete backref in extent [125712527360 12214272]
ERROR: extent[125730848768, 5111808] referencer count mismatch (root: 21872, 
owner: 374857, offset: 126754816) wanted: 68, have: 115
Delete backref in extent [125730848768 5111808]
ERROR: extent[125730848768, 5111808] referencer count mismatch (root: 22911, 
owner: 374857, offset: 126754816) wanted: 68, have: 115
Delete backref in extent [125730848768 5111808]
ERROR: extent[125736914944, 6037504] referencer count mismatch (root: 21872, 
owner: 374857, offset: 131866624) wanted: 115, have: 143
Delete backref in extent [125736914944 6037504]
ERROR: extent[125736914944, 6037504] referencer count mismatch (root: 22911, 
owner: 374857, offset: 131866624) wanted: 115, have: 143
Delete backref in extent [125736914944 6037504]
ERROR: extent[129952120832, 20242432] referencer count mismatch (root: 21872, 
owner: 374857, offset: 148234240) wanted: 302, have: 431
Delete backref in extent [129952120832 20242432]
ERROR: extent[129952120832, 20242432] referencer count mismatch (root: 22911, 
owner: 374857, offset: 148234240) wanted: 356, have: 433
Delete backref in extent [129952120832 20242432]
ERROR: extent[134925357056, 11829248] referencer count mismatch (root: 21872, 
owner: 374857, offset: 180371456) wanted: 161, have: 240
Delete backref in extent [134925357056 11829248]
ERROR: extent[134925357056, 11829248] referencer count mismatch (root: 22911, 
owner: 374857, offset: 180371456) wanted: 162, have: 240
Delete backref in extent [134925357056 11829248]
ERROR: extent[147895111680, 12345344] referencer count mismatch (root: 21872, 
owner: 374857, offset: 192200704) wanted: 170, have: 249
Delete backref in extent [147895111680 12345344]
ERROR: extent[147895111680, 12345344] referencer count mismatch (root: 22911, 
owner: 374857, offset: 192200704) wanted: 172, have: 251
Delete backref in extent [147895111680 12345344]
ERROR: extent[150850146304, 17522688] referencer count mismatch (root: 21872, 
owner: 374857, offset: 217653248) wanted: 348, have: 418
Delete backref in extent [150850146304 17522688]
ERROR: extent[156909494272, 55320576] referencer count mismatch (root: 22911, 
owner: 374857, offset: 235175936) wanted: 555, have: 1449
Deleted root 2 item[156909494272, 178, 5476627808561673095]
ERROR: extent[156909494272, 55320576] referencer count mismatch (root: 21872, 
owner: 374857, offset: 235175936) wanted: 556, have: 1452
Deleted root 2 item[156909494272, 178, 7338474132555182983]

At the rate it's going, it'll probably take days though, it's already been 36H

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/   | PGP 7F55D5F27AAF9D08
--
To unsubscribe from this list: send the line "unsubscribe

Re: [PATCH 2/2] btrfs: Add graceful handling of V0 extents

2018-06-25 Thread David Sterba

On Mon, Jun 25, 2018 at 11:24:50AM +0300, Nikolay Borisov wrote:
> Following the removal of the v0 handling code let's be courteous and
> print an error message when such extents are handled. In the cases
> where we have a transaction just abort it, otherwise just call
> btrfs_handle_fs_error. Both cases result in the FS being re-mounted RO.
> 
> Signed-off-by: Nikolay Borisov 
> ---
>  fs/btrfs/extent-tree.c | 44 
>  fs/btrfs/print-tree.c  | 11 ---
>  fs/btrfs/relocation.c  | 36 +++-
>  3 files changed, 79 insertions(+), 12 deletions(-)
> 
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 4129831523a2..131773528683 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -870,8 +870,17 @@ int btrfs_lookup_extent_info(struct btrfs_trans_handle 
> *trans,
>   num_refs = btrfs_extent_refs(leaf, ei);
>   extent_flags = btrfs_extent_flags(leaf, ei);
>   } else {
> - BUG();
> + ret = -EINVAL;
> + btrfs_err(fs_info,
> + "Unsupported V0 extent filesystem detected. Aborting. Please 
> re-create your filesystem with a newer kernel");

Can you please add a helper that prints the message so there aren't
several different copies of that? Also btrfs_err will add the last '\n',
so it's not needed.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: unsolvable technical issues?

2018-06-25 Thread David Sterba

On Sat, Jun 23, 2018 at 05:11:52AM +, Duncan wrote:
> > According to this:
> > 
> > https://stratis-storage.github.io/StratisSoftwareDesign.pdf Page 4 ,
> > section 1.2
> > 
> > It claims that BTRFS still have significant technical issues that may
> > never be resolved.
> > Could someone shed some light on exactly what these technical issues
> > might be?! What are BTRFS biggest technical problems?
> > 
> > If you forget about the "RAID"5/6 like features then the only annoyances
> > that I have with BTRFS so far is...
> > 
> > 1. Lack of per subvolume "RAID" levels
> > 2. Lack of not using the deviceid to re-discover and re-add dropped
> > devices
> > 
> > And that's about it really...
> 
> ... And those both have solutions on the roadmap, with RFC patches 
> already posted for #2 (tho I'm not sure they use devid) altho 
> realistically they're likely to take years to appear and be tested to 
> stability.  Meanwhile...
> 
> While as the others have said you really need to go to the author to get 
> what was referred to, and I agree, I can speculate a bit.  While this 
> *is* speculation, admittedly somewhat uninformed as I don't claim to be a 
> dev, and I'd actually be interested in what others think so don't be 
> afraid to tell me I haven't a clue, as long as you say why... based on 
> several years reading the list now...
> 
> 1) When I see btrfs "technical issue that may never be resolved", the #1 
> first thing I think of, that AFAIK there are _definitely_ no plans to 
> resolve, because it's very deeply woven into the btrfs core by now, is...
> 
> Filesystem UUID Identification.

> Btrfs takes the UU bit of Universally 
> Unique quite literally, assuming they really *are* unique, at least on 
> that system, and uses them to identify the possibly multiple devices that 
> may be components of the filesystem, a problem most filesystems don't 
> have to deal with since they're single-device-only.  Because btrfs uses 
> this supposedly unique ID to ID devices that belong to the filesystem, it 
> can get *very* mixed up, with results possibly including dataloss, if it 
> sees devices that don't actually belong to a filesystem with the same UUID 
> as a mounted filesystem.
> 
> But technologies such as LVM allow cloning devices and these additional 
> devices naturally have the same filesystem metadata, including filesystem 
> UUID, as the original.  Making the problem worse is udev with its plug-n-
> play style detection, which will normally trigger a btrfs device scan, 
> thus making btrfs aware of new devices containing (a component of) a 
> btrfs, as soon as udev detects the device.

The automatic scanning is partially making it hard and would require
either extending the scanning mechanim to distinguish automatic and
manual scan, and using that information in kernel.

Right now, a cloned device will not be added to the filesystem UUID set
if the fs is mounted, but otherwise it's up to the administrator. The
misisng bit is possibly a way to tell the kernel module to 'forget' a
device (forget and never auto-scan).

> 2) Subvolume and (more technically) reflink-aware defrag.

There was a discussion in the mailinglist recently, some additions to
the interface were requested. The code to avoid the OOM exists but the
original author is not apparently interested and noone else has that
high enough in the todo list.

> 3) N-way-mirroring.

I have a prototype code for that, 3-copy and 4-copy type of profile.
Doning a fully dynamic N-way would become a messs once thre are mixed
N-way chunks for different N. Adding N=5 would not be too hard, but I'm
not sure if this makes sense.

The raid5 write-hole log will build on top of that, but the code has not
been written yet, other than the separate device logging sent by Liu Bo. 

> 4) (Until relatively recently, and still in terms of scaling) Quotas.

That's ongoing WIP, as qgroups touch the core parts.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Btrfs: fix regression in btrfs_page_mkwrite() from vm_fault_t conversion

2018-06-25 Thread Chris Mason

On 25 Jun 2018, at 9:54, David Sterba wrote:

> On Mon, Jun 25, 2018 at 06:45:32AM -0700, Chris Mason wrote:

>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
>> index 193f933..38403aa 100644
>> --- a/fs/btrfs/inode.c
>> +++ b/fs/btrfs/inode.c
>> @@ -9013,6 +9013,7 @@ vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf)
>>  goto out_unlock;
>>  }
>>  ret2 = 0;
>> +ret = 0;
>
> That's something I'd rather avoid, now there are two error values that
> need to be tracked. Shouldn't the 'ret2 = 0' be removed completely?


Yeah, the whole thing hurts my eyes.  Fixing.

>
>>
>>  /* page is wholly or partially inside EOF */
>>  if (page_start + PAGE_SIZE > size)
>> @@ -9037,7 +9038,7 @@ vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf)
>>  unlock_extent_cached(io_tree, page_start, page_end, _state);
>>
>>  out_unlock:
>> -if (!ret2) {
>> +if (!ret) {
>
> I'm not sure is safe, the new ret is one of the VM_FAULT_ values and
> it's not 0 by default and in fact I don't see anything that would set it
> to 0 directly or indirectly. Which means that the code in the out: label
> also needs to be revisited:

Good point, ok.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Btrfs: fix regression in btrfs_page_mkwrite() from vm_fault_t conversion

2018-06-25 Thread David Sterba

On Mon, Jun 25, 2018 at 06:45:32AM -0700, Chris Mason wrote:
> The vm_fault_t conversion commit introduced a ret2 variable for
> tracking the integer return values from internal btrfs functions.  It
> was sometimes returning VM_FAULT_LOCKED for pages that were actually
> invalid and had been removed from the radix.  Something like this:
> 
>   ret2 = btrfs_delalloc_reserve_space() // returns zero on success
> 
>   lock_page(page)
>   if (page->mapping != inode->i_mapping)
>   goto out_unlock;
> 
> ...
> 
> out_unlock:
>   if (!ret2) {
>   ...
>   return VM_FAULT_LOCKED;
>   }
> 
> This ends up triggering this WARNING in btrfs_destroy_inode()
>   WARN_ON(BTRFS_I(inode)->block_rsv.size);
> 
> The fix used here is to use ret instead of ret2 in our check for success.
> 
> Fixes: a528a2415087 (btrfs: change return type of btrfs_page_mkwrite to 
> vm_fault_t)
> Signed-off-by: Chris Mason 
> ---
>  fs/btrfs/inode.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 193f933..38403aa 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -9013,6 +9013,7 @@ vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf)
>   goto out_unlock;
>   }
>   ret2 = 0;
> + ret = 0;

That's something I'd rather avoid, now there are two error values that
need to be tracked. Shouldn't the 'ret2 = 0' be removed completely?

>  
>   /* page is wholly or partially inside EOF */
>   if (page_start + PAGE_SIZE > size)
> @@ -9037,7 +9038,7 @@ vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf)
>   unlock_extent_cached(io_tree, page_start, page_end, _state);
>  
>  out_unlock:
> - if (!ret2) {
> + if (!ret) {

I'm not sure is safe, the new ret is one of the VM_FAULT_ values and
it's not 0 by default and in fact I don't see anything that would set it
to 0 directly or indirectly. Which means that the code in the out: label
also needs to be revisited:

9016 out:
9017 btrfs_delalloc_release_extents(BTRFS_I(inode), PAGE_SIZE, (ret != 
0));
9018 btrfs_delalloc_release_space(inode, data_reserved, page_start,
9019  reserved_space, (ret != 0));

>   btrfs_delalloc_release_extents(BTRFS_I(inode), PAGE_SIZE, true);
>   sb_end_pagefault(inode->i_sb);
>   extent_changeset_free(data_reserved);
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC 0/2] Btrfs: fix file data corruptions due to lost dirty bits

2018-06-25 Thread Chris Mason


On 25 Jun 2018, at 7:10, David Sterba wrote:


On Fri, Jun 22, 2018 at 05:25:54PM -0400, Chris Mason wrote:

The bug came here:

commit a528a24150870c5c16cbbbec69dba7e992b08456
Author: Souptick Joarder 
Date:   Wed Jun 6 19:54:44 2018 +0530

 btrfs: change return type of btrfs_page_mkwrite to vm_fault_t

When page->mapping != mapping, we improperly return success because 
ret2

is zero on goto out_unlock.

I'm running a fix through a full xfstests and I'll post soon.


The ret/ret2 pattern is IMO used in the wrong way, the ret2 is some
temporary value and it should be set and tested next to each other, 
not

holding the state accross the function.

The fix I'd propose is to avoid relying on it and add a separate exit
block, similar to out_noreserve:

--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8981,7 +8981,6 @@ vm_fault_t btrfs_page_mkwrite(struct vm_fault 
*vmf)

ret = VM_FAULT_SIGBUS;
goto out_unlock;
}
-   ret2 = 0;

/* page is wholly or partially inside EOF */
if (page_start + PAGE_SIZE > size)
@@ -9004,14 +9003,6 @@ vm_fault_t btrfs_page_mkwrite(struct vm_fault 
*vmf)
BTRFS_I(inode)->last_log_commit = 
BTRFS_I(inode)->root->last_log_commit;


unlock_extent_cached(io_tree, page_start, page_end, 
_state);

-
-out_unlock:
-   if (!ret2) {
-   btrfs_delalloc_release_extents(BTRFS_I(inode), 
PAGE_SIZE, true);

-   sb_end_pagefault(inode->i_sb);
-   extent_changeset_free(data_reserved);
-   return VM_FAULT_LOCKED;
-   }
unlock_page(page);


VM_FAULT_LOCKED is where we return success.  The out_unlock goto is 
confusing because it's actually only used for failure, but the goto 
lands right above the if (everything actually worked) {} test.


On top of the patch I sent today, moving out_unlock: after the if would 
make it more clear.


-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] Btrfs: fix regression in btrfs_page_mkwrite() from vm_fault_t conversion

2018-06-25 Thread Chris Mason

The vm_fault_t conversion commit introduced a ret2 variable for
tracking the integer return values from internal btrfs functions.  It
was sometimes returning VM_FAULT_LOCKED for pages that were actually
invalid and had been removed from the radix.  Something like this:

ret2 = btrfs_delalloc_reserve_space() // returns zero on success

lock_page(page)
if (page->mapping != inode->i_mapping)
goto out_unlock;

...

out_unlock:
if (!ret2) {
...
return VM_FAULT_LOCKED;
}

This ends up triggering this WARNING in btrfs_destroy_inode()
WARN_ON(BTRFS_I(inode)->block_rsv.size);

The fix used here is to use ret instead of ret2 in our check for success.

Fixes: a528a2415087 (btrfs: change return type of btrfs_page_mkwrite to 
vm_fault_t)
Signed-off-by: Chris Mason 
---
 fs/btrfs/inode.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 193f933..38403aa 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -9013,6 +9013,7 @@ vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf)
goto out_unlock;
}
ret2 = 0;
+   ret = 0;
 
/* page is wholly or partially inside EOF */
if (page_start + PAGE_SIZE > size)
@@ -9037,7 +9038,7 @@ vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf)
unlock_extent_cached(io_tree, page_start, page_end, _state);
 
 out_unlock:
-   if (!ret2) {
+   if (!ret) {
btrfs_delalloc_release_extents(BTRFS_I(inode), PAGE_SIZE, true);
sb_end_pagefault(inode->i_sb);
extent_changeset_free(data_reserved);
-- 
2.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: unsolvable technical issues?

2018-06-25 Thread David Sterba

On Fri, Jun 22, 2018 at 01:13:31AM +0200, waxhead wrote:
> According to this:
> 
> https://stratis-storage.github.io/StratisSoftwareDesign.pdf
> Page 4 , section 1.2
> 
> It claims that BTRFS still have significant technical issues that may 
> never be resolved.
> Could someone shed some light on exactly what these technical issues 
> might be?! What are BTRFS biggest technical problems?

The subject you write is 'unsolvable', which I read as 'impossible to
solve', eg. on the design level. I'm not aware of such issues.

If this is about issues that are difficult either to implement or
getting right, there are a few known ones.

> If you forget about the "RAID"5/6 like features then the only annoyances 
> that I have with BTRFS so far is...
> 
> 1. Lack of per subvolume "RAID" levels
> 2. Lack of not using the deviceid to re-discover and re-add dropped devices
> 
> And that's about it really...

This could quickly turn into 'my faviourite bug/feature' list that can
be very long. The most asked for are raid56, and performance of qgroups.

Qu Wenruo improved some of the core problems and Jeff is working on the
performance problem. So there are people working on that.

On the raid56 front, there were some recent updates that fixed some
bugs, but the fix for write hole is still missing so we can't raise the
status yet.  I have some some good news but nobody should get too
excited until the code lands.

I have prototype for the N-copy raid (where N is 3 or 4).  This will
provide the underlying infrastructure for the raid5/6 logging mechanism,
the rest can be taken from Liu Bo's patchset sent some time ago.  In the
end the N-copy can be used for data and metadata too, independently and
flexibly switched via the balance filters. This will cost one
incompatibility bit.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: unsolvable technical issues?

2018-06-25 Thread Austin S. Hemmelgarn


On 2018-06-24 16:22, Goffredo Baroncelli wrote:

On 06/23/2018 07:11 AM, Duncan wrote:

waxhead posted on Fri, 22 Jun 2018 01:13:31 +0200 as excerpted:


According to this:

https://stratis-storage.github.io/StratisSoftwareDesign.pdf Page 4 ,
section 1.2

It claims that BTRFS still have significant technical issues that may
never be resolved.
Could someone shed some light on exactly what these technical issues
might be?! What are BTRFS biggest technical problems?

If you forget about the "RAID"5/6 like features then the only annoyances
that I have with BTRFS so far is...

1. Lack of per subvolume "RAID" levels
2. Lack of not using the deviceid to re-discover and re-add dropped
devices

And that's about it really...


... And those both have solutions on the roadmap, with RFC patches
already posted for #2 (tho I'm not sure they use devid) altho
realistically they're likely to take years to appear and be tested to
stability.  Meanwhile...

While as the others have said you really need to go to the author to get
what was referred to, and I agree, I can speculate a bit.  While this
*is* speculation, admittedly somewhat uninformed as I don't claim to be a
dev, and I'd actually be interested in what others think so don't be
afraid to tell me I haven't a clue, as long as you say why... based on
several years reading the list now...

1) When I see btrfs "technical issue that may never be resolved", the #1
first thing I think of, that AFAIK there are _definitely_ no plans to
resolve, because it's very deeply woven into the btrfs core by now, is...

Filesystem UUID Identification.  Btrfs takes the UU bit of Universally
Unique quite literally, assuming they really *are* unique, at least on
that system, and uses them to identify the possibly multiple devices that
may be components of the filesystem, a problem most filesystems don't
have to deal with since they're single-device-only.  Because btrfs uses
this supposedly unique ID to ID devices that belong to the filesystem, it
can get *very* mixed up, with results possibly including dataloss, if it
sees devices that don't actually belong to a filesystem with the same UUID
as a mounted filesystem.


As partial workaround you can disable udev btrfs rules and then do a "btrfs dev 
scan" manually only for the device which you need. The you can mount the filesystem. 
Unfortunately you cannot mount two filesystem with the same UUID. However I have to point 
out that also LVM/dm might have problems if you clone a PV
You don't even need `btrfs dev scan` if you just specify the exact set 
of devices in the mount options.  The `device=` mount option tells the 
kernel to check that device during the mount process.


Also, while LVM does have 'issues' with cloned PV's, it fails safe (by 
refusing to work on VG's that have duplicate PV's), while BTRFS fails 
very unsafely (by randomly corrupting data).


[...]
der say 3-5 (or 5-7, or whatever)

years.  These could arguably include:

2) Subvolume and (more technically) reflink-aware defrag.

It was there for a couple kernel versions some time ago, but "impossibly"
slow, so it was disabled until such time as btrfs could be made to scale
rather better in this regard.


Did you try something like that with XFS+DM snapshot ? No you can't, because 
defrag in XFS cannot traverse snapshot (and I have to suppose that defrag 
cannot be effective on a dm-snapshot at all)..
What I am trying to point out is that even tough btrfs is not the fastest 
filesystem (and for some workload is VERY slow), when you compare it when few 
snapshots were presents LVM/dm is a lot slower.

IMHO most of the complaint which affect BTRFS, are due to the fact that with 
BTRFS an user can quite easily exploit a lot of features and their 
combinations. When a the slowness issue appears when some advance features 
combinations are used (i.e. multiple disks profile and (a lot of ) snapshots), 
this is reported as a BTRFS failure. But in fact even LVM/dm is very slow when 
the snapshot is used.
I still contend that the biggest issue WRT reflink-aware defrag was that 
it was not optional.  The only way to get the old defrag behavior was to 
boot a kernel that didn't have reflink-aware defrag support.  IOW, 
_everyone_ had to deal with the performance issues, not just the people 
who wanted to use reflink-aware defrag.





There's no hint yet as to when that might actually be, if it will _ever_
be, so this can arguably be validly added to the "may never be resolved"
list.

3) N-way-mirroring.


[...]
This is not an issue, but a not implemented feature

If you're looking at feature parity with competitors, it's an issue.


4) (Until relatively recently, and still in terms of scaling) Quotas.

Until relatively recently, quotas could arguably be added to the list.
They were rewritten multiple times, and until recently, appeared to be
effectively eternally broken.


Even tough what you are reporting is correct, I have to point out that the quota in BTRFS 
is more

Re: [PATCH RFC 0/2] Btrfs: fix file data corruptions due to lost dirty bits

2018-06-25 Thread David Sterba

On Fri, Jun 22, 2018 at 05:25:54PM -0400, Chris Mason wrote:
> The bug came here:
> 
> commit a528a24150870c5c16cbbbec69dba7e992b08456
> Author: Souptick Joarder 
> Date:   Wed Jun 6 19:54:44 2018 +0530
> 
>  btrfs: change return type of btrfs_page_mkwrite to vm_fault_t
> 
> When page->mapping != mapping, we improperly return success because ret2 
> is zero on goto out_unlock.
> 
> I'm running a fix through a full xfstests and I'll post soon.

The ret/ret2 pattern is IMO used in the wrong way, the ret2 is some
temporary value and it should be set and tested next to each other, not
holding the state accross the function.

The fix I'd propose is to avoid relying on it and add a separate exit
block, similar to out_noreserve:

--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8981,7 +8981,6 @@ vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf)
ret = VM_FAULT_SIGBUS;
goto out_unlock;
}
-   ret2 = 0;
 
/* page is wholly or partially inside EOF */
if (page_start + PAGE_SIZE > size)
@@ -9004,14 +9003,6 @@ vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf)
BTRFS_I(inode)->last_log_commit = BTRFS_I(inode)->root->last_log_commit;
 
unlock_extent_cached(io_tree, page_start, page_end, _state);
-
-out_unlock:
-   if (!ret2) {
-   btrfs_delalloc_release_extents(BTRFS_I(inode), PAGE_SIZE, true);
-   sb_end_pagefault(inode->i_sb);
-   extent_changeset_free(data_reserved);
-   return VM_FAULT_LOCKED;
-   }
unlock_page(page);
 out:
btrfs_delalloc_release_extents(BTRFS_I(inode), PAGE_SIZE, (ret != 0));
@@ -9021,6 +9012,12 @@ vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf)
sb_end_pagefault(inode->i_sb);
extent_changeset_free(data_reserved);
return ret;
+
+out_unlock:
+   btrfs_delalloc_release_extents(BTRFS_I(inode), PAGE_SIZE, true);
+   sb_end_pagefault(inode->i_sb);
+   extent_changeset_free(data_reserved);
+   return VM_FAULT_LOCKED;
 }
 
 static int btrfs_truncate(struct inode *inode, bool skip_writeback)
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2] Btrfs: fix physical offset reported by fiemap for inline extents

2018-06-25 Thread Nikolay Borisov




On 20.06.2018 12:02, fdman...@kernel.org wrote:
> From: Filipe Manana 
> 


> So fix this by ensuring the physical offset is always set to 0 when we
> are processing an extent with a special block_start value.
> 
> Fixes: 9d311e11fc1f ("Btrfs: fiemap: pass correct bytenr when fm_extent_count 
> is zero")
> Signed-off-by: Filipe Manana 

At first I was "wtf" since I had missed the u64 cast. After realising
that the constants are in fact always parsed as positive numbers then it
makes sense.

Reviewed-by: Nikolay Borisov 

> ---
> 
> v2: Set the physical offset to 0 for other extent maps with a special
> block_start value as well.
> 
>  fs/btrfs/extent_io.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 8e4a7cdbc9f5..1aa91d57404a 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -4545,8 +4545,11 @@ int extent_fiemap(struct inode *inode, struct 
> fiemap_extent_info *fieinfo,
>   offset_in_extent = em_start - em->start;
>   em_end = extent_map_end(em);
>   em_len = em_end - em_start;
> - disko = em->block_start + offset_in_extent;
>   flags = 0;
> + if (em->block_start < EXTENT_MAP_LAST_BYTE)
> + disko = em->block_start + offset_in_extent;
> + else
> + disko = 0;
>  
>   /*
>* bump off for our next call to get_extent
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2] btrfs: remove -ERANGE check and avoid errno overriding in btrfs_get_acl()

2018-06-25 Thread Nikolay Borisov




On 25.06.2018 11:44, Nikolay Borisov wrote:
> 
> 
> On 23.06.2018 09:38, Chengguang Xu wrote:
>> Remove -ERANGE error check because there is no chance to get into
>> this condition and meanwhile avoid overriding errno to -EIO in
>> btrfs_get_acl().
> 
> This is way too terse. The reason why we can't get an ERANGE error is
> because we first call btrfs_getxattr to get the length of the attribute,
> then we do a subsequent call with the size from the first call. Between
> the 2 calls the size shouldn't change.
> 
> Furthermore, I see one more bad practice in this code - the first call
> to btrfs_getxattr is not using the value buffer hence it should be
> passing NULL to make this obvious, instead it's passing an empty string
> form the rodata section.

While at it, I guess you could also remove the BUG() in the default case
of the switch statement in btrfs_get_acl. A simple return
ERR_PTR(-EINVAL) should suffice.
> 
>>
>> Signed-off-by: Chengguang Xu 
>> ---
>> v2:
>> - Avoid errno overriding instead of print error message in error case.
>> - Change commit log for better understanding.
>>
>>  fs/btrfs/acl.c | 10 +-
>>  1 file changed, 5 insertions(+), 5 deletions(-)
>>
>> diff --git a/fs/btrfs/acl.c b/fs/btrfs/acl.c
>> index 15e1dfef56a5..b71a875036af 100644
>> --- a/fs/btrfs/acl.c
>> +++ b/fs/btrfs/acl.c
>> @@ -40,13 +40,13 @@ struct posix_acl *btrfs_get_acl(struct inode *inode, int 
>> type)
>>  return ERR_PTR(-ENOMEM);
>>  size = btrfs_getxattr(inode, name, value, size);
>>  }
>> -if (size > 0) {
>> +if (size > 0)
>>  acl = posix_acl_from_xattr(_user_ns, value, size);
>> -} else if (size == -ERANGE || size == -ENODATA || size == 0) {
>> +else if (size == -ENODATA || size == 0)
>>  acl = NULL;
>> -} else {
>> -acl = ERR_PTR(-EIO);
>> -}
>> +else
>> +acl = ERR_PTR(size);
>> +
>>  kfree(value);
>>  
>>  return acl;
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2] btrfs: remove -ERANGE check and avoid errno overriding in btrfs_get_acl()

2018-06-25 Thread Nikolay Borisov




On 23.06.2018 09:38, Chengguang Xu wrote:
> Remove -ERANGE error check because there is no chance to get into
> this condition and meanwhile avoid overriding errno to -EIO in
> btrfs_get_acl().

This is way too terse. The reason why we can't get an ERANGE error is
because we first call btrfs_getxattr to get the length of the attribute,
then we do a subsequent call with the size from the first call. Between
the 2 calls the size shouldn't change.

Furthermore, I see one more bad practice in this code - the first call
to btrfs_getxattr is not using the value buffer hence it should be
passing NULL to make this obvious, instead it's passing an empty string
form the rodata section.

> 
> Signed-off-by: Chengguang Xu 
> ---
> v2:
> - Avoid errno overriding instead of print error message in error case.
> - Change commit log for better understanding.
> 
>  fs/btrfs/acl.c | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/btrfs/acl.c b/fs/btrfs/acl.c
> index 15e1dfef56a5..b71a875036af 100644
> --- a/fs/btrfs/acl.c
> +++ b/fs/btrfs/acl.c
> @@ -40,13 +40,13 @@ struct posix_acl *btrfs_get_acl(struct inode *inode, int 
> type)
>   return ERR_PTR(-ENOMEM);
>   size = btrfs_getxattr(inode, name, value, size);
>   }
> - if (size > 0) {
> + if (size > 0)
>   acl = posix_acl_from_xattr(_user_ns, value, size);
> - } else if (size == -ERANGE || size == -ENODATA || size == 0) {
> + else if (size == -ENODATA || size == 0)
>   acl = NULL;
> - } else {
> - acl = ERR_PTR(-EIO);
> - }
> + else
> + acl = ERR_PTR(size);
> +
>   kfree(value);
>  
>   return acl;
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/2] btrfs: Remove V0 extent support

2018-06-25 Thread Nikolay Borisov

The v0 compat code was introduced in commit 5d4f98a28c7d
("Btrfs: Mixed back reference  (FORWARD ROLLING FORMAT CHANGE)") 9
years ago, which was merged in 2.6.31. This means that the code is
there to support filesystems which are _VERY_ old and if you are using
btrfs on such an old kernel, you have much bigger problems. This coupled
with the fact that no one is likely testing/maintining this code likely
means it has bugs lurking. All things considered I think 43 kernel
releases later it's high time this remnant of the past got removed.

Signed-off-by: Nikolay Borisov 
---
 fs/btrfs/ctree.c   |   6 +-
 fs/btrfs/ctree.h   |   2 -
 fs/btrfs/extent-tree.c | 209 +
 fs/btrfs/print-tree.c  |  30 +--
 fs/btrfs/relocation.c  | 151 +--
 5 files changed, 4 insertions(+), 394 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 6879697520d5..d1273ec2e0d5 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -888,11 +888,7 @@ int btrfs_block_can_be_shared(struct btrfs_root *root,
 btrfs_root_last_snapshot(>root_item) ||
 btrfs_header_flag(buf, BTRFS_HEADER_FLAG_RELOC)))
return 1;
-#ifdef BTRFS_COMPAT_EXTENT_TREE_V0
-   if (test_bit(BTRFS_ROOT_REF_COWS, >state) &&
-   btrfs_header_backref_rev(buf) < BTRFS_MIXED_BACKREF_REV)
-   return 1;
-#endif
+
return 0;
 }
 
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index e671a1fcbbec..bc52bf7ac572 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -55,8 +55,6 @@ struct btrfs_ordered_sum;
 
 #define BTRFS_OLDEST_GENERATION0ULL
 
-#define BTRFS_COMPAT_EXTENT_TREE_V0
-
 /*
  * the max metadata block size.  This limit is somewhat artificial,
  * but the memmove costs go through the roof for larger blocks.
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 0f7e797236dc..4129831523a2 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -870,17 +870,7 @@ int btrfs_lookup_extent_info(struct btrfs_trans_handle 
*trans,
num_refs = btrfs_extent_refs(leaf, ei);
extent_flags = btrfs_extent_flags(leaf, ei);
} else {
-#ifdef BTRFS_COMPAT_EXTENT_TREE_V0
-   struct btrfs_extent_item_v0 *ei0;
-   BUG_ON(item_size != sizeof(*ei0));
-   ei0 = btrfs_item_ptr(leaf, path->slots[0],
-struct btrfs_extent_item_v0);
-   num_refs = btrfs_extent_refs_v0(leaf, ei0);
-   /* FIXME: this isn't correct for data */
-   extent_flags = BTRFS_BLOCK_FLAG_FULL_BACKREF;
-#else
BUG();
-#endif
}
BUG_ON(num_refs == 0);
} else {
@@ -1039,89 +1029,6 @@ int btrfs_lookup_extent_info(struct btrfs_trans_handle 
*trans,
  * tree block info structure.
  */
 
-#ifdef BTRFS_COMPAT_EXTENT_TREE_V0
-static int convert_extent_item_v0(struct btrfs_trans_handle *trans,
- struct btrfs_fs_info *fs_info,
- struct btrfs_path *path,
- u64 owner, u32 extra_size)
-{
-   struct btrfs_root *root = fs_info->extent_root;
-   struct btrfs_extent_item *item;
-   struct btrfs_extent_item_v0 *ei0;
-   struct btrfs_extent_ref_v0 *ref0;
-   struct btrfs_tree_block_info *bi;
-   struct extent_buffer *leaf;
-   struct btrfs_key key;
-   struct btrfs_key found_key;
-   u32 new_size = sizeof(*item);
-   u64 refs;
-   int ret;
-
-   leaf = path->nodes[0];
-   BUG_ON(btrfs_item_size_nr(leaf, path->slots[0]) != sizeof(*ei0));
-
-   btrfs_item_key_to_cpu(leaf, , path->slots[0]);
-   ei0 = btrfs_item_ptr(leaf, path->slots[0],
-struct btrfs_extent_item_v0);
-   refs = btrfs_extent_refs_v0(leaf, ei0);
-
-   if (owner == (u64)-1) {
-   while (1) {
-   if (path->slots[0] >= btrfs_header_nritems(leaf)) {
-   ret = btrfs_next_leaf(root, path);
-   if (ret < 0)
-   return ret;
-   BUG_ON(ret > 0); /* Corruption */
-   leaf = path->nodes[0];
-   }
-   btrfs_item_key_to_cpu(leaf, _key,
- path->slots[0]);
-   BUG_ON(key.objectid != found_key.objectid);
-   if (found_key.type != BTRFS_EXTENT_REF_V0_KEY) {
-   path->slots[0]++;
-   continue;
-   }
-   ref0 = btrfs_item_ptr(leaf, path->slots[0],
- struct btrfs_extent_ref_v0);
-

[PATCH 0/2] Remove v0 extent support

2018-06-25 Thread Nikolay Borisov

It's unlikely there is anyone using that or even if they are they have bigger 
problems than this patchset :). After all, v0 was introduced 9 years ago and 
it was already conditionally compiled by the time BTRFS was merged in the
upstream kernel. The patches themselves are really simple - patch 1 removes 
all the code within ifdef guards. Patch 2, in turn, ads graceful handling by 
aborting transaction where it makes sense or calling btrfs_handle_fs_error and 
printing an informative error message. 

Nikolay Borisov (2):
  btrfs: Remove V0 extent support
  btrfs: Add graceful handling of V0 extents

 fs/btrfs/ctree.c   |   6 +-
 fs/btrfs/ctree.h   |   2 -
 fs/btrfs/extent-tree.c | 239 +++--
 fs/btrfs/print-tree.c  |  35 ++--
 fs/btrfs/relocation.c  | 181 ++---
 5 files changed, 70 insertions(+), 393 deletions(-)

-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/2] btrfs: Add graceful handling of V0 extents

2018-06-25 Thread Nikolay Borisov

Following the removal of the v0 handling code let's be courteous and
print an error message when such extents are handled. In the cases
where we have a transaction just abort it, otherwise just call
btrfs_handle_fs_error. Both cases result in the FS being re-mounted RO.

Signed-off-by: Nikolay Borisov 
---
 fs/btrfs/extent-tree.c | 44 
 fs/btrfs/print-tree.c  | 11 ---
 fs/btrfs/relocation.c  | 36 +++-
 3 files changed, 79 insertions(+), 12 deletions(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 4129831523a2..131773528683 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -870,8 +870,17 @@ int btrfs_lookup_extent_info(struct btrfs_trans_handle 
*trans,
num_refs = btrfs_extent_refs(leaf, ei);
extent_flags = btrfs_extent_flags(leaf, ei);
} else {
-   BUG();
+   ret = -EINVAL;
+   btrfs_err(fs_info,
+   "Unsupported V0 extent filesystem detected. Aborting. Please 
re-create your filesystem with a newer kernel");
+   if (trans)
+   btrfs_abort_transaction(trans, ret);
+   else
+   btrfs_handle_fs_error(fs_info, ret, NULL);
+
+   goto out_free;
}
+
BUG_ON(num_refs == 0);
} else {
num_refs = 0;
@@ -1302,6 +1311,11 @@ static noinline int remove_extent_data_ref(struct 
btrfs_trans_handle *trans,
ref2 = btrfs_item_ptr(leaf, path->slots[0],
  struct btrfs_shared_data_ref);
num_refs = btrfs_shared_data_ref_count(leaf, ref2);
+   } else if (key.type == BTRFS_EXTENT_REF_V0_KEY) {
+   btrfs_err(fs_info,
+ "Unsupported V0 extent filesystem detected. Aborting. 
Please re-create your filesystem with a newer kernel");
+   btrfs_abort_transaction(trans, -EINVAL);
+   return -EINVAL;
} else {
BUG();
}
@@ -1334,6 +1348,8 @@ static noinline u32 extent_data_ref_count(struct 
btrfs_path *path,
 
leaf = path->nodes[0];
btrfs_item_key_to_cpu(leaf, , path->slots[0]);
+
+   BUG_ON(key.type == BTRFS_EXTENT_REF_V0_KEY);
if (iref) {
/*
 * If type is invalid, we should have bailed out earlier than
@@ -1549,7 +1565,13 @@ int lookup_inline_extent_backref(struct 
btrfs_trans_handle *trans,
 
leaf = path->nodes[0];
item_size = btrfs_item_size_nr(leaf, path->slots[0]);
-   BUG_ON(item_size < sizeof(*ei));
+   if (item_size < sizeof(*ei)) {
+   err = -EINVAL;
+   btrfs_err(fs_info,
+   "Unsupported V0 extent filesystem detected. Aborting. Please 
re-create your filesystem with a newer kernel");
+   btrfs_abort_transaction(trans, err);
+   goto out;
+   }
 
ei = btrfs_item_ptr(leaf, path->slots[0], struct btrfs_extent_item);
flags = btrfs_extent_flags(leaf, ei);
@@ -2282,7 +2304,15 @@ static int run_delayed_extent_op(struct 
btrfs_trans_handle *trans,
 
leaf = path->nodes[0];
item_size = btrfs_item_size_nr(leaf, path->slots[0]);
-   BUG_ON(item_size < sizeof(*ei));
+
+   if (item_size < sizeof(*ei)) {
+   err = -EINVAL;
+   btrfs_err(fs_info,
+   "Unsupported V0 extent filesystem detected. Aborting. Please 
re-create your filesystem with a newer kernel");
+   btrfs_abort_transaction(trans, err);
+   goto out;
+   }
+
ei = btrfs_item_ptr(leaf, path->slots[0], struct btrfs_extent_item);
__run_delayed_extent_op(extent_op, leaf, ei);
 
@@ -6815,7 +6845,13 @@ static int __btrfs_free_extent(struct btrfs_trans_handle 
*trans,
 
leaf = path->nodes[0];
item_size = btrfs_item_size_nr(leaf, extent_slot);
-   BUG_ON(item_size < sizeof(*ei));
+   if (item_size < sizeof(*ei)) {
+   ret = -EINVAL;
+   btrfs_err(info,
+   "Unsupported V0 extent filesystem detected. Aborting. Please 
re-create your filesystem with a newer kernel");
+   btrfs_abort_transaction(trans, ret);
+   goto out;
+   }
ei = btrfs_item_ptr(leaf, extent_slot,
struct btrfs_extent_item);
if (owner_objectid < BTRFS_FIRST_FREE_OBJECTID &&
diff --git a/fs/btrfs/print-tree.c b/fs/btrfs/print-tree.c
index b03040b84fe9..cd4387fc766c 100644
--- a/fs/btrfs/print-tree.c
+++ b/fs/btrfs/print-tree.c
@@ -52,8 +52,11 @@ static void print_extent_item(struct extent_buffer *eb, int 
slot, int type)
u64 offset;
int ref_index = 0;
 
-   if (item_size < sizeof(*ei))
-   BUG();
+   if (item_size <

btrfs raid10 performance

Re: [PATCH] btrfs: Use iocb to derive pos instead of passing a separate parameter

Re: [PATCH] btrfs: Use iocb to derive pos instead of passing a separate parameter

Re: btrfs balance did not progress after 12H, hang on reboot, btrfs check --repair kills the system still

Re: [PATCH 1/3] fs: add initial bh_result->b_private value to __blockdev_direct_IO()

Re: btrfs balance did not progress after 12H, hang on reboot, btrfs check --repair kills the system still

[PATCH v2] Btrfs: fix regression in btrfs_page_mkwrite() from vm_fault_t conversion

Re: unsolvable technical issues?

Re: btrfs balance did not progress after 12H, hang on reboot, btrfs check --repair kills the system still

Re: unsolvable technical issues?

[PATCH] btrfs: remove warnings superseded by refcount_t usage

Re: btrfs balance did not progress after 12H, hang on reboot, btrfs check --repair kills the system still

Re: [PATCH] btrfs: Use iocb to derive pos instead of passing a separate parameter

Re: btrfs balance did not progress after 12H, hang on reboot, btrfs check --repair kills the system still

Re: [PATCH 2/2] btrfs: Add graceful handling of V0 extents

Re: unsolvable technical issues?

Re: [PATCH] Btrfs: fix regression in btrfs_page_mkwrite() from vm_fault_t conversion

Re: [PATCH] Btrfs: fix regression in btrfs_page_mkwrite() from vm_fault_t conversion

Re: [PATCH RFC 0/2] Btrfs: fix file data corruptions due to lost dirty bits

[PATCH] Btrfs: fix regression in btrfs_page_mkwrite() from vm_fault_t conversion

Re: unsolvable technical issues?

Re: unsolvable technical issues?

Re: [PATCH RFC 0/2] Btrfs: fix file data corruptions due to lost dirty bits

Re: [PATCH v2] Btrfs: fix physical offset reported by fiemap for inline extents

Re: [PATCH v2] btrfs: remove -ERANGE check and avoid errno overriding in btrfs_get_acl()

Re: [PATCH v2] btrfs: remove -ERANGE check and avoid errno overriding in btrfs_get_acl()

[PATCH 1/2] btrfs: Remove V0 extent support

[PATCH 0/2] Remove v0 extent support

[PATCH 2/2] btrfs: Add graceful handling of V0 extents

29 matches

Site Navigation

Mail list logo

Footer information