Re: shall distros run btrfsck on boot?

2015-11-25 Thread Martin Steigerwald
Am Mittwoch, 25. November 2015, 07:32:34 CET schrieb Austin S Hemmelgarn:
> On 2015-11-24 17:26, Eric Sandeen wrote:
> > On 11/24/15 2:38 PM, Austin S Hemmelgarn wrote:
> >> if the system was
> >> shut down cleanly, you're fine barring software bugs, but if it
> >> crashed, you should be running a check on the FS.
> > 
> > Um, no...
> > 
> > The *entire point* of having a journaling filesystem is that after a
> > crash or power loss, a journal replay on next mount will bring the
> > metadata into a consistent state.
> 
> OK, first, that was in reference to BTRFS, not ext4, and BTRFS is a COW
> filesystem, not a journaling one, which is an important distinction as
> mentioned by Hugo in his reply.  Second, there are two reasons that you
> should be running a check even of a journaled filesystem when the system
> crashes (this also applies to COW filesystems, and anything else that
> relies on atomicity of write operations for consistency):
> 
> 1. Disks don't atomically write anything bigger than a sector, and may
> not even atomically write the sector itself.  This means that it's
> possible to get a partial write to the journal, which in turn has
> significant potential to put the metadata in an inconsistent state when
> the journal gets replayed (IIRC, ext4 has a journal_checksum mount
> option that is supposed to mitigate this possibility).  This sounds like
> something that shouldn't happen all that often, but on a busy
> filesystem, the probability is exactly proportionate to the size of the
> journal relative to the size of the FS.
> 
> 2. If the system crashed, all code running on it immediately before the
> crash is instantly suspect, and you have no way to know for certain that
> something didn't cause random garbage to be written to the disk.  On top
> of this, hardware is potentially suspect, and when your hardware is
> misbehaving, then all bets as to consistency are immediately off.

In the case of shaky hardware a fsck run can report bogus data, i.e. problems 
where they are none or vice versa. If I suspect defect memory or controller I 
would check the device on different hardware only. Especially on attempts to 
repair any possible issues.


-- 
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs check help

2015-11-25 Thread Vincent Olivier
I should probably point out that there is 64GB of RAM on this machine and it’s 
a dual Xeon processor (LGA2011-3) system. Also, there is only Btrfs served via 
Samba and the kernel panic was caused Btrfs (as per what I remember from the 
log on the screen just before I rebooted) and happened in the middle of the 
night when zero (0) client was connected.

You will find below the full “btrfs check” log for each device in the order it 
is listed by “btrfs fi show”.

Ca I get a strong confirmation that I should run with the “—repair” option on 
each device? Thanks.

Vincent


Checking filesystem on /dev/sdk
UUID: 6a742786-070d-4557-9e67-c73b84967bf5
checking extents [o]
checking free space cache [.]
root 5 inode 1341670 errors 400, nbytes wrong
root 11406 inode 1341670 errors 400, nbytes wrong

found 19328980191604 bytes used err is 1
total csum bytes: 18849205856
total tree bytes: 27393392640
total fs tree bytes: 4452958208
total extent tree bytes: 3075571712
btree space waste bytes: 2881050910
file data blocks allocated: 19445786390528
referenced 20138885959680
Checking filesystem on /dev/sdp
UUID: 6a742786-070d-4557-9e67-c73b84967bf5
checking extents [O]
checking free space cache [o]
root 5 inode 1341670 errors 400, nbytes wrong
root 11406 inode 1341670 errors 400, nbytes wrong

found 19328980191604 bytes used err is 1
total csum bytes: 18849205856
total tree bytes: 27393392640
total fs tree bytes: 4452958208
total extent tree bytes: 3075571712
btree space waste bytes: 2881050910
file data blocks allocated: 19445786390528
referenced 20138885959680
Checking filesystem on /dev/sdi
UUID: 6a742786-070d-4557-9e67-c73b84967bf5
checking extents [.]
checking free space cache [.]
root 5 inode 1341670 errors 400, nbytes wrong
root 11406 inode 1341670 errors 400, nbytes wrong

found 19328980191604 bytes used err is 1
total csum bytes: 18849205856
total tree bytes: 27393392640
total fs tree bytes: 4452958208
total extent tree bytes: 3075571712
btree space waste bytes: 2881050910
file data blocks allocated: 19445786390528
referenced 20138885959680
Checking filesystem on /dev/sdq
UUID: 6a742786-070d-4557-9e67-c73b84967bf5
checking extents [o]
checking free space cache [o]
root 5 inode 1341670 errors 400, nbytes wrong
root 11406 inode 1341670 errors 400, nbytes wrong

found 19328980191604 bytes used err is 1
total csum bytes: 18849205856
total tree bytes: 27393392640
total fs tree bytes: 4452958208
total extent tree bytes: 3075571712
btree space waste bytes: 2881050910
file data blocks allocated: 19445786390528
referenced 20138885959680
Checking filesystem on /dev/sdh
UUID: 6a742786-070d-4557-9e67-c73b84967bf5
checking extents [o]
checking free space cache [.]
root 5 inode 1341670 errors 400, nbytes wrong
root 11406 inode 1341670 errors 400, nbytes wrong

found 19328980191604 bytes used err is 1
total csum bytes: 18849205856
total tree bytes: 27393392640
total fs tree bytes: 4452958208
total extent tree bytes: 3075571712
btree space waste bytes: 2881050910
file data blocks allocated: 19445786390528
referenced 20138885959680
Checking filesystem on /dev/sdm
UUID: 6a742786-070d-4557-9e67-c73b84967bf5
checking extents [O]
checking free space cache [.]
root 5 inode 1341670 errors 400, nbytes wrong
root 11406 inode 1341670 errors 400, nbytes wrong

found 19328980191604 bytes used err is 1
total csum bytes: 18849205856
total tree bytes: 27393392640
total fs tree bytes: 4452958208
total extent tree bytes: 3075571712
btree space waste bytes: 2881050910
file data blocks allocated: 19445786390528
referenced 20138885959680
Checking filesystem on /dev/sdj
UUID: 6a742786-070d-4557-9e67-c73b84967bf5
checking extents [.]
checking free space cache [.]
root 5 inode 1341670 errors 400, nbytes wrong
root 11406 inode 1341670 errors 400, nbytes wrong

found 19328980191604 bytes used err is 1
total csum bytes: 18849205856
total tree bytes: 27393392640
total fs tree bytes: 4452958208
total extent tree bytes: 3075571712
btree space waste bytes: 2881050910
file data blocks allocated: 19445786390528
referenced 20138885959680
Checking filesystem on /dev/sdo
UUID: 6a742786-070d-4557-9e67-c73b84967bf5
checking extents [O]
checking free space cache [.]
checking fs roots [o]
root 5 inode 1341670 errors 400, nbytes wrong
root 11406 inode 1341670 errors 400, nbytes wrong

found 19328980191604 bytes used err is 1
total csum bytes: 18849205856
total tree bytes: 27393392640
total fs tree bytes: 4452958208
total extent tree bytes: 3075571712
btree space waste bytes: 2881050910
file data blocks allocated: 19445786390528
referenced 20138885959680
Checking filesystem on /dev/sdg
UUID: 6a742786-070d-4557-9e67-c73b84967bf5
checking extents [o]
checking free space cache [o]
root 5 inode 1341670 errors 400, nbytes wrong
root 11406 inode 1341670 errors 400, nbytes wrong

found 19328980191604 bytes used err is 1
total csum bytes: 18849205856
total tree bytes: 27393392640
total fs tree bytes: 4452958208
total extent tree bytes: 3075571712
btree space waste bytes: 

Re: [4.3-rc4] scrubbing aborts before finishing

2015-11-25 Thread Martin Steigerwald
Am Samstag, 31. Oktober 2015, 12:10:37 CET schrieb Martin Steigerwald:
> Am Donnerstag, 22. Oktober 2015, 10:41:15 CET schrieb Martin Steigerwald:
> > I get this:
> > 
> > merkaba:~> btrfs scrub status -d /
> > scrub status for […]
> > scrub device /dev/mapper/sata-debian (id 1) history
> > 
> > scrub started at Thu Oct 22 10:05:49 2015 and was aborted after
> > 00:00:00
> > total bytes scrubbed: 0.00B with 0 errors
> > 
> > scrub device /dev/dm-2 (id 2) history
> > 
> > scrub started at Thu Oct 22 10:05:49 2015 and was aborted after
> > 00:01:30
> > total bytes scrubbed: 23.81GiB with 0 errors
> > 
> > For / scrub aborts for sata SSD immediately.
> > 
> > For /home scrub aborts for both SSDs at some time.
> > 
> > merkaba:~> btrfs scrub status -d /home
> > scrub status for […]
> > scrub device /dev/mapper/msata-home (id 1) history
> > 
> > scrub started at Thu Oct 22 10:09:37 2015 and was aborted after
> > 00:01:31
> > total bytes scrubbed: 22.03GiB with 0 errors
> > 
> > scrub device /dev/dm-3 (id 2) history
> > 
> > scrub started at Thu Oct 22 10:09:37 2015 and was aborted after
> > 00:03:34
> > total bytes scrubbed: 53.30GiB with 0 errors
> > 
> > Also single volume BTRFS is affected:
> > 
> > merkaba:~> btrfs scrub status /daten
> > scrub status for […]
> > 
> > scrub started at Thu Oct 22 10:36:38 2015 and was aborted after
> > 00:00:00
> > total bytes scrubbed: 0.00B with 0 errors
> > 
> > No errors in dmesg, btrfs device stat or smartctl -a.
> > 
> > Any known issue?
> 
> I am still seeing this in 4.3-rc7. It happens so that on one SSD BTRFS
> doesn´t even start scrubbing. But in the end it aborts it scrubbing anyway.
> 
> I do not see any other issue so far. But I would really like to be able to
> scrub my BTRFS filesystems completely again. Any hints? Any further
> information needed?
> 
> merkaba:~> btrfs scrub status -d /
> scrub status for […]
> scrub device /dev/dm-5 (id 1) history
> scrub started at Sat Oct 31 11:58:45 2015, running for 00:00:00
> total bytes scrubbed: 0.00B with 0 errors
> scrub device /dev/mapper/msata-debian (id 2) status
> scrub started at Sat Oct 31 11:58:45 2015, running for 00:00:20
> total bytes scrubbed: 5.27GiB with 0 errors
> merkaba:~> btrfs scrub status -d /
> scrub status for […]
> scrub device /dev/dm-5 (id 1) history
> scrub started at Sat Oct 31 11:58:45 2015, running for 00:00:00
> total bytes scrubbed: 0.00B with 0 errors
> scrub device /dev/mapper/msata-debian (id 2) status
> scrub started at Sat Oct 31 11:58:45 2015, running for 00:00:25
> total bytes scrubbed: 6.59GiB with 0 errors
> merkaba:~> btrfs scrub status -d /
> scrub status for […]
> scrub device /dev/dm-5 (id 1) history
> scrub started at Sat Oct 31 11:58:45 2015, running for 00:00:00
> total bytes scrubbed: 0.00B with 0 errors
> scrub device /dev/mapper/msata-debian (id 2) status
> scrub started at Sat Oct 31 11:58:45 2015, running for 00:01:25
> total bytes scrubbed: 21.97GiB with 0 errors
> merkaba:~> btrfs scrub status -d /
> scrub status for […]
> scrub device /dev/dm-5 (id 1) history
> scrub started at Sat Oct 31 11:58:45 2015 and was aborted after
> 00:00:00 total bytes scrubbed: 0.00B with 0 errors
> scrub device /dev/mapper/msata-debian (id 2) history
> scrub started at Sat Oct 31 11:58:45 2015 and was aborted after
> 00:01:32 total bytes scrubbed: 23.63GiB with 0 errors
> 
> 
> For the sake of it I am going to btrfs check one of the filesystem where
> BTRFS aborts scrubbing (which is all of the laptop filesystems, not only
> the RAID 1 one).
> 
> I will use the /daten filesystem as I can unmount it during laptop runtime
> easily. There scrubbing aborts immediately:
> 
> merkaba:~> btrfs scrub start /daten
> scrub started on /daten, fsid […] (pid=13861)
> merkaba:~> btrfs scrub status /daten
> scrub status for […]
> scrub started at Sat Oct 31 12:04:25 2015 and was aborted after
> 00:00:00 total bytes scrubbed: 0.00B with 0 errors
> 
> It is single device:
> 
> merkaba:~> btrfs fi sh /daten
> Label: 'daten'  uuid: […]
> Total devices 1 FS bytes used 227.23GiB
> devid1 size 230.00GiB used 230.00GiB path
> /dev/mapper/msata-daten
> 
> btrfs-progs v4.2.2
> merkaba:~> btrfs fi df /daten
> Data, single: total=228.99GiB, used=226.79GiB
> System, single: total=4.00MiB, used=48.00KiB
> Metadata, single: total=1.01GiB, used=449.50MiB
> GlobalReserve, single: total=160.00MiB, used=0.00B
> 
> 
> I do not see any output in btrfs check that points to any issue:
> 
> merkaba:~> btrfs check /dev/msata/daten
> Checking filesystem on /dev/msata/daten
> UUID: 7918274f-e2ec-4983-bbb0-aa93ef95fcf7
> checking extents
> checking free space cache
> checking fs roots
> checking csums
> checking root refs
> found 243936530607 bytes used err is 0
> total 

Re: [auto-]defrag, nodatacow - general suggestions?(was: btrfs: poor performance on deleting many large files?)

2015-11-25 Thread Hugo Mills
On Thu, Nov 26, 2015 at 01:23:59AM +0100, Christoph Anton Mitterer wrote:
> 2) Why does notdatacow imply nodatasum and can that ever be decoupled?

   Answering the second part first, no, it can't.

   The issue is that nodatacow bypasses the transactional nature of
the FS, making changes to live data immediately. This then means that
if you modify a modatacow file, the csum for that modified section is
out of date, and won't be back in sync again until the latest
transaction is committed. So you can end up with an inconsistent
filesystem if there's a crash between the two events.

> For me the checksumming is actually the most important part of btrfs
> (not that I wouldn't like its other features as well)... so turning it
> off is something I really would want to avoid.
> 
> Plus it opens questions like: When there are no checksums, how can it
> (in the RAID cases) decide which block is the good one in case of
> corruptions?

   It doesn't decide -- both copies look equally good, because there's
no checksum, so if you read the data, the FS will return whatever data
was on the copy it happened to pick.


> 3) When I would actually disable datacow for e.g. a subvolume that
> holds VMs or DBs... what are all the implications?
> Obviously no checksumming, but what happens if I snapshot such a
> subvolume or if I send/receive it?

   After snapshotting, modifications are CoWed precisely once, and
then it reverts to nodatacow again. This means that making a snapshot
of a nodatacow object will cause it to fragment as writes are made to
it.

> I'd expect that then some kind of CoW needs to take place or does that
> simply not work?
> 
> 
> 4) Duncan mentioned that defrag (and I guess that's also for auto-
> defrag) isn't ref-link aware...
> Isn't that somehow a complete showstopper?

   It is, but the one attempt at dealing with it caused massive data
corruption, and it was turned off again. autodefrag, however, has
always been snapshot aware and snapshot safe, and would be the
recommended approach here. (Actually, it was broken in the same
incident I just described -- but fixed again when the broken patches
were reverted).

> As soon as one uses snapshot, and would defrag or auto defrag any of
> them, space usage would just explode, perhaps to the extent of ENOSPC,
> and rendering the fs effectively useless.
> 
> That sounds to me like, either I can't use ref-links, which are crucial
> not only to snapshots but every file I copy with cp --reflink auto ...
> or I can't defrag... which however will sooner or later cause quite
> some fragmentation issues on btrfs?
> 
> 
> 5) Especially keeping (4) in mind but also the other comments in from
> Duncan and Austin...
> Is auto-defrag now recommended to be generally used?

   Absolutely, yes.

   It's late for me, and this email was longer than I suspected, so
I'm going to stop here, but I'll try to pick it up again and answer
your other questions tomorrow.

   Hugo.

> Are both auto-defrag and defrag considered stable to be used? Or are
> there other implications, like when I use compression
> 
> 
> 6) Does defragmentation work with compression? Or is it just filefrag
> which can't cope with it?
> 
> Any other combinations or things with the typicaly btrfs technologies
> (cow/nowcow, compression, snapshots, subvols, compressions, defrag,
> balance) that one can do but which lead to unexpected problems (I, for
> example, wouldn't have expected that defragmentation isn't ref-link
> aware... still kinda shocked ;) )
> 
> For example, when I do a balance and change the compression, and I have
> multiple snaphots or files within one subvol that share their blocks...
> would that also lead to copies being made and the space growing
> possibly dramatically?
> 
> 
> 7) How das free-space defragmentation happen (or is there even such a
> thing)?
> For example, when I have my big qemu images, *not* using nodatacow, and
> I copy the image e.g. with qemu-img old.img new.img ... and delete the
> old then.
> Then I'd expect that the new.img is more or less not fragmented,... but
> will my free space (from the removed old.img) still be completely
> messed up sooner or later driving me into problems?
> 
> 
> 8) why does a balance not also defragment? Since everything is anyway
> copied... why not defragmenting it?
> I somehow would have hoped that a balance cleans up all kinds of
> things,... like free space issues and also fragmentation.
> 
> 
> Given all these issues,... fragmentation, situations in which space may
> grow dramatically where the end-user/admin may not necessarily expect
> it (e.g. the defrag or the balance+compression case?)... btrfs seem to
> require much more in-depth knowledge and especially care (that even
> depends on the type of data) on the end-user/admin side than the
> traditional filesystems.
> Are there for example any general recommendations what to regularly to
> do keep the fs in a clean and proper shape (and I don't count "start
> with a fresh one and 

Re: [PATCH 00/25] Btrfs-convert rework to support native separate

2015-11-25 Thread Qu Wenruo



David Sterba wrote on 2015/11/25 13:42 +0100:

On Tue, Nov 24, 2015 at 04:50:00PM +0800, Qu Wenruo wrote:

It seems the conflict is quite huge, your reiserfs support is based on
the old behavior, just like what old ext2 one do: custom extent allocation.



I'm afraid the rebase will take a lot of time since I'm completely a
newbie about reiserfs... :(


Yeah, the ext2 callbacks are abstracted and replaced by reiserfs
implementations, and the abstratction is quite direct. This might be a
problem with merging your patchset.


The abstraction is better than I expected, and should be quite handle to 
use.


Although a lot of my codes will be changed to use it.




I may need to change a lot of ext2 direct call to generic one, and may
even change the generic function calls.(no alloc/free, only free space
lookup)

And some (maybe a lot) of reiserfs codes may be removed during the rework.


As far as the conversion support stays, it's not a problem of course. I
don't have a complete picture of all the actual merging conflicts, but
the idea is to provide the callback abstraction v2 to allow ext2 and
reiser plus allow all the changes of this pathcset.



Glad to hear that.

BTW, which reiserfs progs headers are you using?

It seems that the headers you are using is quite different from what my 
distribution is providing, and this makes compile impossible.


For example, in my /usr/include/reiserfs, there is no io.h, no reiserfs.h.

No structure named reseifs_key, but only key.

Not sure if it is my progsreiserfs is too old or whatever other reason.

What progsreiserfs are you using?

Thanks,
Qu


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs check help

2015-11-25 Thread Henk Slager
[...]
> Ca I get a strong confirmation that I should run with the “—repair” option on 
> each device? Thanks.
>
> Vincent
>
>
> Checking filesystem on /dev/sdk
> UUID: 6a742786-070d-4557-9e67-c73b84967bf5
> checking extents [o]
> checking free space cache [.]
> root 5 inode 1341670 errors 400, nbytes wrong
> root 11406 inode 1341670 errors 400, nbytes wrong
[...]

I just remember that I have seen this kind of error before; luckily, I
found the btrfs check output (august 2015) on some backup of an old
snapshot; In my case it was on a raid5 fs from november 2013, 7 small
txt files (all several 100 bytes) and the 7 errors are repeated for
about 10 snapshots. I did   # find . -inum to find
the files. 2 of the 7 were still in the latest/actual subvol and I
just recreated them.

The errors from the older snapshots are still there as far as I
remember from the last btrfs check I did (with kernel 4.3.0 tools
4.3.x). The fs is converted to raid10 since 3 months. As I also got
other fake errors (as in this
https://www.mail-archive.com/linux-btrfs%40vger.kernel.org/msg48325.html
), I won't run a repair until I see proof that this 'errors 400,
nbytes wrong' is a risk for file-server stability.
I just see that on an archive clone fs with this 10 old snapshots
(created via send|receive), there is no error.

In your case, it is likely just 1 small file in rootvol (5) and the
same allocation in other subvol (11406), so maybe you can fix this
like I did and don't run a '--repair'
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V1.1 2/7] btrfs-progs: add kernel alias for each of the features in the list

2015-11-25 Thread Anand Jain
We should have maintained feature's name same across progs UI and sysfs UI.
For example, progs mixed-bg is /sys/fs/btrfs/features/mixed_groups
in sysfs. As these are already released and is UIs, there is nothing much
can be done about it, except for creating the alias and making it aware.

Add kernel alias for each of the features in the list.

eg: The string with in () is the sysfs name for the same feaure

mkfs.btrfs -O list-all
Filesystem features available:
mixed-bg (mixed_groups)   - mixed data and metadata block groups (0x4, 
2.7.37)
extref (extended_iref)- increased hardlink limit per file to 65536 
(0x40, 3.7, default)
raid56 (raid56)   - raid56 extended format (0x80, 3.9)
skinny-metadata (skinny_metadata) - reduced-size metadata extent refs (0x100, 
3.10, default)
no-holes (no_holes)   - no explicit hole extents for files (0x200, 
3.14)

btrfs-convert -O list-all
Filesystem features available:
extref (extended_iref)- increased hardlink limit per file to 65536 
(0x40, 3.7, default)
skinny-metadata (skinny_metadata) - reduced-size metadata extent refs (0x100, 
3.10, default)
no-holes (no_holes)   - no explicit hole extents for files (0x200, 
3.14)

Signed-off-by: Anand Jain 
---
V1.1 add signed-off-by

 utils.c | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/utils.c b/utils.c
index 0163915..6d2675d 100644
--- a/utils.c
+++ b/utils.c
@@ -648,17 +648,26 @@ void btrfs_process_fs_features(u64 flags)
 void btrfs_list_all_fs_features(u64 mask_disallowed)
 {
int i;
+   u64 feature_per_sysfs;
+
+   btrfs_features_allowed_by_sysfs(_per_sysfs);
 
fprintf(stderr, "Filesystem features available:\n");
for (i = 0; i < ARRAY_SIZE(mkfs_features) - 1; i++) {
char *is_default = "";
+   char name[256];
 
if (mkfs_features[i].flag & mask_disallowed)
continue;
if (mkfs_features[i].flag & BTRFS_MKFS_DEFAULT_FEATURES)
is_default = ", default";
-   fprintf(stderr, "%-20s- %s (0x%llx, %s%s)\n",
-   mkfs_features[i].name,
+   if (mkfs_features[i].flag & feature_per_sysfs)
+   sprintf(name, "%s (%s)",
+   mkfs_features[i].name, 
mkfs_features[i].name_ker);
+   else
+   sprintf(name, "%s", mkfs_features[i].name);
+   fprintf(stderr, "%-34s- %s (0x%llx, %s%s)\n",
+   name,
mkfs_features[i].desc,
mkfs_features[i].flag,
mkfs_features[i].min_ker_ver,
-- 
2.6.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH] Btrfs: improve performance on dbench

2015-11-25 Thread Liu Bo
Kent Overstreet posted some dbench test numbers in the announcement of
bcachefs[1], in which btrfs's performance is much worse than that of
ext4 and xfs, especially in the case of multiple threads.

This difference can be observed on fast storage, I ran 'dbench -t10 64'
with 1.6T NVMe disk,
Processor: Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz
Memory:504G

I took time to dig it a bit, perf shows that in the case of multiple
threads we spend most of cpu cycles on spin_lock_irqsave() and
spin_unlock_irqrestore() pair, which is called by wait_event() in btree
locking.

72.84%72.84%  dbench   [kernel.vmlinux][k] 
native_queued_spin_lock_slowpath
 |
 ---native_queued_spin_lock_slowpath
|
|--71.64%-- _raw_spin_lock_irqsave
|  |
|  |--52.17%-- prepare_to_wait_event
|  |  |
|  |  |--94.33%-- btrfs_tree_lock
|  |  |  |
|  |  |  |--99.10%-- 
btrfs_lock_root_node
|  |  |  |  btrfs_search_slot
|  |  |  |  |
|  |  |  |  |--26.44%-- 
btrfs_lookup_dir_item
|  |  |  |  |  |
|  |  |  |  |  
|--99.31%-- __btrfs_unlink_inode

Having serious contention on btree lock can also be proved by another fact,
that is, if you use subvolume instead of directory for each dbench client,
the test number in the case of multiple threads will be considerably nice,
for 64 clients,

Throughput 5904.71 MB/sec  64 clients  64 procs  max_latency=816.715 ms

I did a few things to avoid waiting for blocking writers and readers:

1) Use path->leave_spinning=1 as much as possible, this leaves us holding
   spinning lock after searching btree.

2) Find out the cases that we don't have to take blocking lock, for example,
   we don't need blocking lock when parent node has more than 1/4 items it
   can hold.

3) Avoid unnecessary "goto again", eg. on btree root level,
   just update write_lock_level if we already hold BTRFS_WRITE_LOCK.

4) Remove btrfs_set_path_blocking() in btrfs_clear_path_blocking(), this
   contributes to a large part of improved numbers, this function is
   introduced to avoid lockdep warning, but after I turned lockdep on,
   xfstests didn't report about such a warning.

5) Make btrfs_search_forward to use non-sleepable function to find eb, this
   fixes a deadlock with previous changes.

Here is the end results for 64 clients, with vanilla 4.2, btrfs runs 15x faster
but with higher latency.

   tput(MB/sec) max_latency(ms)
xfs 2742.9321.855
ext47182.9219.053
btrfs+subvol w/o5904.71   816.715
btrfs+dir w/o122.778  718.674
*btrfs+dir w1715.77  1366.981

I've marked it as RFC since I'm not confident on the lockdep part.

Any comments are welcome!

[1]: https://lkml.org/lkml/2015/8/21/22

Signed-off-by: Liu Bo 
---
 fs/btrfs/ctree.c | 79 +++-
 fs/btrfs/dir-item.c  |  1 +
 fs/btrfs/file-item.c |  3 +-
 fs/btrfs/file.c  |  7 -
 fs/btrfs/inode-map.c |  2 ++
 fs/btrfs/inode.c |  3 ++
 fs/btrfs/orphan.c|  2 ++
 fs/btrfs/root-tree.c |  1 +
 fs/btrfs/xattr.c |  2 ++
 9 files changed, 84 insertions(+), 16 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 5b8e235..a27dbae 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -87,7 +87,6 @@ noinline void btrfs_clear_path_blocking(struct btrfs_path *p,
else if (held_rw == BTRFS_READ_LOCK)
held_rw = BTRFS_READ_LOCK_BLOCKING;
}
-   btrfs_set_path_blocking(p);
 
for (i = BTRFS_MAX_LEVEL - 1; i >= 0; i--) {
if (p->nodes[i] && p->locks[i]) {
@@ -2536,8 +2535,16 @@ setup_nodes_for_search(struct btrfs_trans_handle *trans,
 
if (*write_lock_level < level + 1) {
*write_lock_level = level + 1;
-   btrfs_release_path(p);
-   goto again;
+
+   ASSERT(p->locks[level] == BTRFS_WRITE_LOCK ||
+  p->locks[level] == BTRFS_READ_LOCK);
+
+   /* if it's not root node or the lock is not WRTIE_LOCK 
*/
+   if ((level < BTRFS_MAX_LEVEL - 1 && p->nodes[level + 
1]) ||
+   p->locks[level] != BTRFS_WRITE_LOCK) {
+   btrfs_release_path(p);
+   goto again;
+   }
}
 

Re: [PATCH 2/7] btrfs-progs: add kernel alias for each of the features in the list

2015-11-25 Thread Anand Jain



Liu Bo wrote:

On Wed, Nov 25, 2015 at 08:08:15PM +0800, Anand Jain wrote:

We should have maintained feature's name same across progs UI and sysfs UI.
For example, progs mixed-bg is /sys/fs/btrfs/features/mixed_groups
in sysfs. As these are already released and is UIs, there is nothing much
can be done about it, except for creating the alias and making it aware.

Add kernel alias for each of the features in the list.

eg: The string with in () is the sysfs name for the same feaure

mkfs.btrfs -O list-all
Filesystem features available:
mixed-bg (mixed_groups)   - mixed data and metadata block groups (0x4, 
2.7.37)
extref (extended_iref)- increased hardlink limit per file to 65536 
(0x40, 3.7, default)
raid56 (raid56)   - raid56 extended format (0x80, 3.9)
skinny-metadata (skinny_metadata) - reduced-size metadata extent refs (0x100, 
3.10, default)
no-holes (no_holes)   - no explicit hole extents for files (0x200, 
3.14)

btrfs-convert -O list-all
Filesystem features available:
extref (extended_iref)- increased hardlink limit per file to 65536 
(0x40, 3.7, default)
skinny-metadata (skinny_metadata) - reduced-size metadata extent refs (0x100, 
3.10, default)
no-holes (no_holes)   - no explicit hole extents for files (0x200, 
3.14)


You miss a signed-off-by here.


 oh no. thanks for the catch.


Thanks,

-liubo

---
  utils.c | 13 +++--
  1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/utils.c b/utils.c
index 0163915..6d2675d 100644
--- a/utils.c
+++ b/utils.c
@@ -648,17 +648,26 @@ void btrfs_process_fs_features(u64 flags)
  void btrfs_list_all_fs_features(u64 mask_disallowed)
  {
int i;
+   u64 feature_per_sysfs;
+
+   btrfs_features_allowed_by_sysfs(_per_sysfs);

fprintf(stderr, "Filesystem features available:\n");
for (i = 0; i < ARRAY_SIZE(mkfs_features) - 1; i++) {
char *is_default = "";
+   char name[256];

if (mkfs_features[i].flag & mask_disallowed)
continue;
if (mkfs_features[i].flag & BTRFS_MKFS_DEFAULT_FEATURES)
is_default = ", default";
-   fprintf(stderr, "%-20s- %s (0x%llx, %s%s)\n",
-   mkfs_features[i].name,
+   if (mkfs_features[i].flag & feature_per_sysfs)
+   sprintf(name, "%s (%s)",
+   mkfs_features[i].name, 
mkfs_features[i].name_ker);
+   else
+   sprintf(name, "%s", mkfs_features[i].name);
+   fprintf(stderr, "%-34s- %s (0x%llx, %s%s)\n",
+   name,
mkfs_features[i].desc,
mkfs_features[i].flag,
mkfs_features[i].min_ker_ver,
--
2.6.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [auto-]defrag, nodatacow - general suggestions?(was: btrfs: poor performance on deleting many large files?)

2015-11-25 Thread Christoph Anton Mitterer
Hey.

I've worried before about the topics Mitch has raised.
Some questions.

1) AFAIU, the fragmentation problem exists especially for those files
that see many random writes, especially, but not limited to, big files.
Now that databases and VMs are affected by this, is probably broadly
known in the meantime (well at least by people on that list).
But I'd guess there are n other cases where such IO patterns can happen
which one simply never notices, while the btrfs continues to degrade.

So is there any general approach towards this?

And what are the actual possible consequences? Is it just that fs gets
slower (due to the fragmentation) or may I even run into other issues
to the point the space is eaten up or the fs becomes basically
unusable?

This is especially important for me, because for some VMs and even DBs
I wouldn't want to use nodatacow, because I want to have the
checksumming. (i.e. those cases where data integrity is much more
important than security)


2) Why does notdatacow imply nodatasum and can that ever be decoupled?
For me the checksumming is actually the most important part of btrfs
(not that I wouldn't like its other features as well)... so turning it
off is something I really would want to avoid.

Plus it opens questions like: When there are no checksums, how can it
(in the RAID cases) decide which block is the good one in case of
corruptions?


3) When I would actually disable datacow for e.g. a subvolume that
holds VMs or DBs... what are all the implications?
Obviously no checksumming, but what happens if I snapshot such a
subvolume or if I send/receive it?
I'd expect that then some kind of CoW needs to take place or does that
simply not work?


4) Duncan mentioned that defrag (and I guess that's also for auto-
defrag) isn't ref-link aware...
Isn't that somehow a complete showstopper?

As soon as one uses snapshot, and would defrag or auto defrag any of
them, space usage would just explode, perhaps to the extent of ENOSPC,
and rendering the fs effectively useless.

That sounds to me like, either I can't use ref-links, which are crucial
not only to snapshots but every file I copy with cp --reflink auto ...
or I can't defrag... which however will sooner or later cause quite
some fragmentation issues on btrfs?


5) Especially keeping (4) in mind but also the other comments in from
Duncan and Austin...
Is auto-defrag now recommended to be generally used?
Are both auto-defrag and defrag considered stable to be used? Or are
there other implications, like when I use compression


6) Does defragmentation work with compression? Or is it just filefrag
which can't cope with it?

Any other combinations or things with the typicaly btrfs technologies
(cow/nowcow, compression, snapshots, subvols, compressions, defrag,
balance) that one can do but which lead to unexpected problems (I, for
example, wouldn't have expected that defragmentation isn't ref-link
aware... still kinda shocked ;) )

For example, when I do a balance and change the compression, and I have
multiple snaphots or files within one subvol that share their blocks...
would that also lead to copies being made and the space growing
possibly dramatically?


7) How das free-space defragmentation happen (or is there even such a
thing)?
For example, when I have my big qemu images, *not* using nodatacow, and
I copy the image e.g. with qemu-img old.img new.img ... and delete the
old then.
Then I'd expect that the new.img is more or less not fragmented,... but
will my free space (from the removed old.img) still be completely
messed up sooner or later driving me into problems?


8) why does a balance not also defragment? Since everything is anyway
copied... why not defragmenting it?
I somehow would have hoped that a balance cleans up all kinds of
things,... like free space issues and also fragmentation.


Given all these issues,... fragmentation, situations in which space may
grow dramatically where the end-user/admin may not necessarily expect
it (e.g. the defrag or the balance+compression case?)... btrfs seem to
require much more in-depth knowledge and especially care (that even
depends on the type of data) on the end-user/admin side than the
traditional filesystems.
Are there for example any general recommendations what to regularly to
do keep the fs in a clean and proper shape (and I don't count "start
with a fresh one and copy the data over" as a valid way).


Thanks,
Chris.

> 

smime.p7s
Description: S/MIME cryptographic signature


Re: 4.2.6: livelock in recovery (free_reloc_roots)?

2015-11-25 Thread Lukas Pirl
On 11/21/2015 10:01 PM, Alexander Fougner wrote as excerpted:
> This is fixed in btrfs-progs 4.3.1, that allows you to delete a
> device again by the 'missing' keyword.

Thanks Alexander! I just found the thread reporting the bug but not the
patch with the corresponding btrfs-tools version it was merged in.

Lukas
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/7] Let user specify the kernel version for features

2015-11-25 Thread Qu Wenruo



Anand Jain wrote on 2015/11/25 20:08 +0800:

Sometimes users may want to have a btrfs to be supported on multiple
kernel version. A simple example, USB drive can be used with multiple
system running different kernel versions. Or in a data center a SAN
LUN could be mounted on any system with different kernel version.

Thanks for providing comments and feedback.
Further to it, here below is a set of patch which will introduce, to
specify a kernel version so that default features can be set based on
what features were supported at that kernel version.


With the new -O comp= option, the concern on user who want to make a 
btrfs for newer kernel is hugely reduced.


But I still prefer such feature align to be done only when specified by 
user, instead of automatically. (yeah, already told for several times 
though)
Warning should be enough for user, sometimes too automatic is not  good, 
especially for tests.


A lot of btrfs-progs change, like recent disabling mixed-bg for small 
volume has already cause regression in generic/077 testcase.

And Dave is already fed up with such problem from btrfs...

Especially such auto-detection will make default behavior more unstable, 
at least not a good idea for me.




Beside this, I'm curious how other filesystm user tools handle such 
kernel mismatch, or do they?


Thanks,
Qu




First of all to let user know what features was supported at what kernel
version. Patch 1/7 updates -O list-all which will list the feature with
version.

As we didn't maintain the sysfs and progs feature names consistent, so
to avoid confusion Patch 2/7 displays sysfs feature name as well again
in the list-all output.

Next, Patch 3,4,5/7 are helper functions.

Patch 6,7/7 provides the -O comp= for mkfs.btrfs and
btrfs-convert respectively

Thanks, Anand

Anand Jain (7):
   btrfs-progs: show the version for -O list-all
   btrfs-progs: add kernel alias for each of the features in the list
   btrfs-progs: make is_numerical non static
   btrfs-progs: check for numerical in version_to_code()
   btrfs-progs: introduce framework version to features
   btrfs-progs: add -O comp= option for mkfs.btrfs
   btrfs-progs: add -O comp= option for btrfs-convert

  btrfs-convert.c | 21 +
  cmds-replace.c  | 11 ---
  mkfs.c  | 24 ++--
  utils.c | 58 -
  utils.h |  2 ++
  5 files changed, 98 insertions(+), 18 deletions(-)




--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: fix error path when failing to submit bio for direct IO write

2015-11-25 Thread Filipe Manana
On Wed, Nov 25, 2015 at 5:58 PM, Liu Bo  wrote:
> On Tue, Nov 24, 2015 at 05:25:18PM +, fdman...@kernel.org wrote:
>> From: Filipe Manana 
>>
>> Commit 61de718fceb6 ("Btrfs: fix memory corruption on failure to submit
>> bio for direct IO") fixed problems with the error handling code after we
>> fail to submit a bio for direct IO. However there were 2 problems that it
>> did not address when the failure is due to memory allocation failures for
>> direct IO writes:
>>
>> 1) We considered that there could be only one ordered extent for the whole
>>IO range, which is not always true, as we can have multiple;
>>
>> 2) It did not set the bit BTRFS_ORDERED_IO_DONE in the ordered extent,
>>which can make other tasks running btrfs_wait_logged_extents() hang
>>forever, since they wait for that bit to be set. The general assumption
>>is that regardless of an error, the BTRFS_ORDERED_IO_DONE is always set
>>and it precedes setting the bit BTRFS_ORDERED_COMPLETE.
>>
>> Fix these issues by moving part of the btrfs_endio_direct_write() handler
>> into a new helper function and having that new helper function called when
>> we fail to allocate memory to submit the bio (and its private object) for
>> a direct IO write.
>>
>> Signed-off-by: Filipe Manana 
>> ---
>>  fs/btrfs/inode.c | 54 +++---
>>  1 file changed, 27 insertions(+), 27 deletions(-)
>>
>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
>> index f82d1f4..4f8560c 100644
>> --- a/fs/btrfs/inode.c
>> +++ b/fs/btrfs/inode.c
>> @@ -7995,22 +7995,22 @@ static void btrfs_endio_direct_read(struct bio *bio)
>>   bio_put(bio);
>>  }
>>
>> -static void btrfs_endio_direct_write(struct bio *bio)
>> +static void btrfs_endio_direct_write_update_ordered(struct inode *inode,
>> + const u64 offset,
>> + const u64 bytes,
>> + const int uptodate)
>>  {
>> - struct btrfs_dio_private *dip = bio->bi_private;
>> - struct inode *inode = dip->inode;
>>   struct btrfs_root *root = BTRFS_I(inode)->root;
>>   struct btrfs_ordered_extent *ordered = NULL;
>> - u64 ordered_offset = dip->logical_offset;
>> - u64 ordered_bytes = dip->bytes;
>> - struct bio *dio_bio;
>> + u64 ordered_offset = offset;
>> + u64 ordered_bytes = bytes;
>>   int ret;
>>
>>  again:
>>   ret = btrfs_dec_test_first_ordered_pending(inode, ,
>>  _offset,
>>  ordered_bytes,
>> -!bio->bi_error);
>> +uptodate);
>>   if (!ret)
>>   goto out_test;
>>
>> @@ -8023,13 +8023,22 @@ out_test:
>>* our bio might span multiple ordered extents.  If we haven't
>>* completed the accounting for the whole dio, go back and try again
>>*/
>> - if (ordered_offset < dip->logical_offset + dip->bytes) {
>> - ordered_bytes = dip->logical_offset + dip->bytes -
>> - ordered_offset;
>> + if (ordered_offset < offset + bytes) {
>> + ordered_bytes = offset + bytes - ordered_offset;
>>   ordered = NULL;
>>   goto again;
>>   }
>> - dio_bio = dip->dio_bio;
>> +}
>> +
>> +static void btrfs_endio_direct_write(struct bio *bio)
>> +{
>> + struct btrfs_dio_private *dip = bio->bi_private;
>> + struct bio *dio_bio = dip->dio_bio;
>> +
>> + btrfs_endio_direct_write_update_ordered(dip->inode,
>> + dip->logical_offset,
>> + dip->bytes,
>> + !bio->bi_error);
>>
>>   kfree(dip);
>>
>> @@ -8365,24 +8374,15 @@ free_ordered:
>>   dip = NULL;
>>   io_bio = NULL;
>>   } else {
>> - if (write) {
>> - struct btrfs_ordered_extent *ordered;
>> -
>> - ordered = btrfs_lookup_ordered_extent(inode,
>> -   file_offset);
>> - set_bit(BTRFS_ORDERED_IOERR, >flags);
>> - /*
>> -  * Decrements our ref on the ordered extent and removes
>> -  * the ordered extent from the inode's ordered tree,
>> -  * doing all the proper resource cleanup such as for 
>> the
>> -  * reserved space and waking up any waiters for this
>> -  * ordered extent (through 
>> btrfs_remove_ordered_extent).
>> -  */
>> - btrfs_finish_ordered_io(ordered);
>> - } else {
>> + if (write)
>> +   

[PATCH v2] Btrfs: fix error path when failing to submit bio for direct IO write

2015-11-25 Thread fdmanana
From: Filipe Manana 

Commit 61de718fceb6 ("Btrfs: fix memory corruption on failure to submit
bio for direct IO") fixed problems with the error handling code after we
fail to submit a bio for direct IO. However there were 2 problems that it
did not address when the failure is due to memory allocation failures for
direct IO writes:

1) We considered that there could be only one ordered extent for the whole
   IO range, which is not always true, as we can have multiple;

2) It did not set the bit BTRFS_ORDERED_IO_DONE in the ordered extent,
   which can make other tasks running btrfs_wait_logged_extents() hang
   forever, since they wait for that bit to be set. The general assumption
   is that regardless of an error, the BTRFS_ORDERED_IO_DONE is always set
   and it precedes setting the bit BTRFS_ORDERED_COMPLETE.

Fix these issues by moving part of the btrfs_endio_direct_write() handler
into a new helper function and having that new helper function called when
we fail to allocate memory to submit the bio (and its private object) for
a direct IO write.

Signed-off-by: Filipe Manana 
---

V2: Fixed wrong uptodate value passed to helper
btrfs_endio_direct_write_update_ordered() (1 vs 0).

 fs/btrfs/inode.c | 54 +++---
 1 file changed, 27 insertions(+), 27 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index f82d1f4..66106b4 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7995,22 +7995,22 @@ static void btrfs_endio_direct_read(struct bio *bio)
bio_put(bio);
 }
 
-static void btrfs_endio_direct_write(struct bio *bio)
+static void btrfs_endio_direct_write_update_ordered(struct inode *inode,
+   const u64 offset,
+   const u64 bytes,
+   const int uptodate)
 {
-   struct btrfs_dio_private *dip = bio->bi_private;
-   struct inode *inode = dip->inode;
struct btrfs_root *root = BTRFS_I(inode)->root;
struct btrfs_ordered_extent *ordered = NULL;
-   u64 ordered_offset = dip->logical_offset;
-   u64 ordered_bytes = dip->bytes;
-   struct bio *dio_bio;
+   u64 ordered_offset = offset;
+   u64 ordered_bytes = bytes;
int ret;
 
 again:
ret = btrfs_dec_test_first_ordered_pending(inode, ,
   _offset,
   ordered_bytes,
-  !bio->bi_error);
+  uptodate);
if (!ret)
goto out_test;
 
@@ -8023,13 +8023,22 @@ out_test:
 * our bio might span multiple ordered extents.  If we haven't
 * completed the accounting for the whole dio, go back and try again
 */
-   if (ordered_offset < dip->logical_offset + dip->bytes) {
-   ordered_bytes = dip->logical_offset + dip->bytes -
-   ordered_offset;
+   if (ordered_offset < offset + bytes) {
+   ordered_bytes = offset + bytes - ordered_offset;
ordered = NULL;
goto again;
}
-   dio_bio = dip->dio_bio;
+}
+
+static void btrfs_endio_direct_write(struct bio *bio)
+{
+   struct btrfs_dio_private *dip = bio->bi_private;
+   struct bio *dio_bio = dip->dio_bio;
+
+   btrfs_endio_direct_write_update_ordered(dip->inode,
+   dip->logical_offset,
+   dip->bytes,
+   !bio->bi_error);
 
kfree(dip);
 
@@ -8365,24 +8374,15 @@ free_ordered:
dip = NULL;
io_bio = NULL;
} else {
-   if (write) {
-   struct btrfs_ordered_extent *ordered;
-
-   ordered = btrfs_lookup_ordered_extent(inode,
- file_offset);
-   set_bit(BTRFS_ORDERED_IOERR, >flags);
-   /*
-* Decrements our ref on the ordered extent and removes
-* the ordered extent from the inode's ordered tree,
-* doing all the proper resource cleanup such as for the
-* reserved space and waking up any waiters for this
-* ordered extent (through btrfs_remove_ordered_extent).
-*/
-   btrfs_finish_ordered_io(ordered);
-   } else {
+   if (write)
+   btrfs_endio_direct_write_update_ordered(inode,
+   file_offset,
+   dio_bio->bi_iter.bi_size,
+   0);

Re: [PATCH] Btrfs: fix error path when failing to submit bio for direct IO write

2015-11-25 Thread Liu Bo
On Tue, Nov 24, 2015 at 05:25:18PM +, fdman...@kernel.org wrote:
> From: Filipe Manana 
> 
> Commit 61de718fceb6 ("Btrfs: fix memory corruption on failure to submit
> bio for direct IO") fixed problems with the error handling code after we
> fail to submit a bio for direct IO. However there were 2 problems that it
> did not address when the failure is due to memory allocation failures for
> direct IO writes:
> 
> 1) We considered that there could be only one ordered extent for the whole
>IO range, which is not always true, as we can have multiple;
> 
> 2) It did not set the bit BTRFS_ORDERED_IO_DONE in the ordered extent,
>which can make other tasks running btrfs_wait_logged_extents() hang
>forever, since they wait for that bit to be set. The general assumption
>is that regardless of an error, the BTRFS_ORDERED_IO_DONE is always set
>and it precedes setting the bit BTRFS_ORDERED_COMPLETE.
> 
> Fix these issues by moving part of the btrfs_endio_direct_write() handler
> into a new helper function and having that new helper function called when
> we fail to allocate memory to submit the bio (and its private object) for
> a direct IO write.
> 
> Signed-off-by: Filipe Manana 
> ---
>  fs/btrfs/inode.c | 54 +++---
>  1 file changed, 27 insertions(+), 27 deletions(-)
> 
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index f82d1f4..4f8560c 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -7995,22 +7995,22 @@ static void btrfs_endio_direct_read(struct bio *bio)
>   bio_put(bio);
>  }
>  
> -static void btrfs_endio_direct_write(struct bio *bio)
> +static void btrfs_endio_direct_write_update_ordered(struct inode *inode,
> + const u64 offset,
> + const u64 bytes,
> + const int uptodate)
>  {
> - struct btrfs_dio_private *dip = bio->bi_private;
> - struct inode *inode = dip->inode;
>   struct btrfs_root *root = BTRFS_I(inode)->root;
>   struct btrfs_ordered_extent *ordered = NULL;
> - u64 ordered_offset = dip->logical_offset;
> - u64 ordered_bytes = dip->bytes;
> - struct bio *dio_bio;
> + u64 ordered_offset = offset;
> + u64 ordered_bytes = bytes;
>   int ret;
>  
>  again:
>   ret = btrfs_dec_test_first_ordered_pending(inode, ,
>  _offset,
>  ordered_bytes,
> -!bio->bi_error);
> +uptodate);
>   if (!ret)
>   goto out_test;
>  
> @@ -8023,13 +8023,22 @@ out_test:
>* our bio might span multiple ordered extents.  If we haven't
>* completed the accounting for the whole dio, go back and try again
>*/
> - if (ordered_offset < dip->logical_offset + dip->bytes) {
> - ordered_bytes = dip->logical_offset + dip->bytes -
> - ordered_offset;
> + if (ordered_offset < offset + bytes) {
> + ordered_bytes = offset + bytes - ordered_offset;
>   ordered = NULL;
>   goto again;
>   }
> - dio_bio = dip->dio_bio;
> +}
> +
> +static void btrfs_endio_direct_write(struct bio *bio)
> +{
> + struct btrfs_dio_private *dip = bio->bi_private;
> + struct bio *dio_bio = dip->dio_bio;
> +
> + btrfs_endio_direct_write_update_ordered(dip->inode,
> + dip->logical_offset,
> + dip->bytes,
> + !bio->bi_error);
>  
>   kfree(dip);
>  
> @@ -8365,24 +8374,15 @@ free_ordered:
>   dip = NULL;
>   io_bio = NULL;
>   } else {
> - if (write) {
> - struct btrfs_ordered_extent *ordered;
> -
> - ordered = btrfs_lookup_ordered_extent(inode,
> -   file_offset);
> - set_bit(BTRFS_ORDERED_IOERR, >flags);
> - /*
> -  * Decrements our ref on the ordered extent and removes
> -  * the ordered extent from the inode's ordered tree,
> -  * doing all the proper resource cleanup such as for the
> -  * reserved space and waking up any waiters for this
> -  * ordered extent (through btrfs_remove_ordered_extent).
> -  */
> - btrfs_finish_ordered_io(ordered);
> - } else {
> + if (write)
> + btrfs_endio_direct_write_update_ordered(inode,
> + file_offset,
> + 

Re: [PATCH v2] Btrfs: fix error path when failing to submit bio for direct IO write

2015-11-25 Thread Liu Bo
On Tue, Nov 24, 2015 at 11:35:25PM +, fdman...@kernel.org wrote:
> From: Filipe Manana 
> 
> Commit 61de718fceb6 ("Btrfs: fix memory corruption on failure to submit
> bio for direct IO") fixed problems with the error handling code after we
> fail to submit a bio for direct IO. However there were 2 problems that it
> did not address when the failure is due to memory allocation failures for
> direct IO writes:
> 
> 1) We considered that there could be only one ordered extent for the whole
>IO range, which is not always true, as we can have multiple;
> 
> 2) It did not set the bit BTRFS_ORDERED_IO_DONE in the ordered extent,
>which can make other tasks running btrfs_wait_logged_extents() hang
>forever, since they wait for that bit to be set. The general assumption
>is that regardless of an error, the BTRFS_ORDERED_IO_DONE is always set
>and it precedes setting the bit BTRFS_ORDERED_COMPLETE.
> 
> Fix these issues by moving part of the btrfs_endio_direct_write() handler
> into a new helper function and having that new helper function called when
> we fail to allocate memory to submit the bio (and its private object) for
> a direct IO write.
> 

You can have

Reviewed-by: Liu Bo 

> Signed-off-by: Filipe Manana 
> ---
> 
> V2: Fixed wrong uptodate value passed to helper
> btrfs_endio_direct_write_update_ordered() (1 vs 0).
> 
>  fs/btrfs/inode.c | 54 +++---
>  1 file changed, 27 insertions(+), 27 deletions(-)
> 
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index f82d1f4..66106b4 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -7995,22 +7995,22 @@ static void btrfs_endio_direct_read(struct bio *bio)
>   bio_put(bio);
>  }
>  
> -static void btrfs_endio_direct_write(struct bio *bio)
> +static void btrfs_endio_direct_write_update_ordered(struct inode *inode,
> + const u64 offset,
> + const u64 bytes,
> + const int uptodate)
>  {
> - struct btrfs_dio_private *dip = bio->bi_private;
> - struct inode *inode = dip->inode;
>   struct btrfs_root *root = BTRFS_I(inode)->root;
>   struct btrfs_ordered_extent *ordered = NULL;
> - u64 ordered_offset = dip->logical_offset;
> - u64 ordered_bytes = dip->bytes;
> - struct bio *dio_bio;
> + u64 ordered_offset = offset;
> + u64 ordered_bytes = bytes;
>   int ret;
>  
>  again:
>   ret = btrfs_dec_test_first_ordered_pending(inode, ,
>  _offset,
>  ordered_bytes,
> -!bio->bi_error);
> +uptodate);
>   if (!ret)
>   goto out_test;
>  
> @@ -8023,13 +8023,22 @@ out_test:
>* our bio might span multiple ordered extents.  If we haven't
>* completed the accounting for the whole dio, go back and try again
>*/
> - if (ordered_offset < dip->logical_offset + dip->bytes) {
> - ordered_bytes = dip->logical_offset + dip->bytes -
> - ordered_offset;
> + if (ordered_offset < offset + bytes) {
> + ordered_bytes = offset + bytes - ordered_offset;
>   ordered = NULL;
>   goto again;
>   }
> - dio_bio = dip->dio_bio;
> +}
> +
> +static void btrfs_endio_direct_write(struct bio *bio)
> +{
> + struct btrfs_dio_private *dip = bio->bi_private;
> + struct bio *dio_bio = dip->dio_bio;
> +
> + btrfs_endio_direct_write_update_ordered(dip->inode,
> + dip->logical_offset,
> + dip->bytes,
> + !bio->bi_error);
>  
>   kfree(dip);
>  
> @@ -8365,24 +8374,15 @@ free_ordered:
>   dip = NULL;
>   io_bio = NULL;
>   } else {
> - if (write) {
> - struct btrfs_ordered_extent *ordered;
> -
> - ordered = btrfs_lookup_ordered_extent(inode,
> -   file_offset);
> - set_bit(BTRFS_ORDERED_IOERR, >flags);
> - /*
> -  * Decrements our ref on the ordered extent and removes
> -  * the ordered extent from the inode's ordered tree,
> -  * doing all the proper resource cleanup such as for the
> -  * reserved space and waking up any waiters for this
> -  * ordered extent (through btrfs_remove_ordered_extent).
> -  */
> - btrfs_finish_ordered_io(ordered);
> - } else {
> + if (write)
> +   

Re: btrfs: poor performance on deleting many large files

2015-11-25 Thread Mitchell Fossen
On Mon, 2015-11-23 at 06:29 +, Duncan wrote:

> Using subvolumes was the first recommendation I was going to make, too, 
> so you're on the right track. =:^)
> 
> Also, in case you are using it (you didn't say, but this has been 
> demonstrated to solve similar issues for others so it's worth 
> mentioning), try turning btrfs quota functionality off.  While the devs 
> are working very hard on that feature for btrfs, the fact is that it's 
> simply still buggy and doesn't work reliably anyway, in addition to 
> triggering scaling issues before they'd otherwise occur.  So my 
> recommendation has been, and remains, unless you're working directly with 
> the devs to fix quota issues (in which case, thanks!), if you actually 
> NEED quota functionality, use a filesystem where it works reliably, while 
> if you don't, just turn it off and avoid the scaling and other issues 
> that currently still come with it.
> 

I did indeed have quotas turned on for the home directories! Since they were
mostly to calculate space used by everyone (since du -hs is so slow) and not
actually needed to limit people, I disabled them. 

> As for defrag, that's quite a topic of its own, with complications 
> related to snapshots and the nocow file attribute.  Very briefly, if you 
> haven't been running it regularly or using the autodefrag mount option by 
> default, chances are your available free space is rather fragmented as 
> well, and while defrag may help, it may not reduce fragmentation to the 
> degree you'd like.  (I'd suggest using filefrag to check fragmentation, 
> but it doesn't know how to deal with btrfs compression, and will report 
> heavy fragmentation for compressed files even if they're fine.  Since you 
> use compression, that kind of eliminates using filefrag to actually see 
> what your fragmentation is.)
> Additionally, defrag isn't snapshot aware (they tried it for a few 
> kernels a couple years ago but it simply didn't scale), so if you're 
> using snapshots (as I believe Ubuntu does by default on btrfs, at least 
> taking snapshots for upgrade-in-place), so using defrag on files that 
> exist in the snapshots as well can dramatically increase space usage, 
> since defrag will break the reflinks to the snapshotted extents and 
> create new extents for defragged files.
> 
> Meanwhile, the absolute worst-case fragmentation on btrfs occurs with  
> random-internal-rewrite-pattern files (as opposed to never changed, or 
> append-only).  Common examples are database files and VM images.  For 
> /relatively/ small files, to say 256 MiB, the autodefrag mount option is 
> a reasonably effective solution, but it tends to have scaling issues with 
> files over half a GiB so you can call this a negative recommendation for 
> trying that option with half-gig-plus internal-random-rewrite-pattern 
> files.  There are other mitigation strategies that can be used, but here 
> the subject gets complex so I'll not detail them.  Suffice it to say that 
> if the filesystem in question is used with large VM images or database 
> files and you haven't taken specific fragmentation avoidance measures, 
> that's very likely a good part of your problem right there, and you can 
> call this a hint that further research is called for.
> 
> If your half-gig-plus files are mostly write-once, for example most media 
> files unless you're doing heavy media editing, however, then autodefrag 
> could be a good option in general, as it deals well with such files and 
> with random-internal-rewrite-pattern files under a quarter gig or so.  Be 
> aware, however, that if it's enabled on an already heavily fragmented 
> filesystem (as yours likely is), it's likely to actually make performance 
> worse until it gets things under control.  Your best bet in that case, if 
> you have spare devices available to do so, is probably to create a fresh 
> btrfs and consistently use autodefrag as you populate it from the 
> existing heavily fragmented btrfs.  That way, it'll never have a chance 
> for the fragmentation to build up in the first place, and autodefrag used 
> as a routine mount option should keep it from getting bad in normal use.

Thanks for explaining that! Most of these files are written once and then read
from for the rest of their "lifetime" until the simulations are done and they
get archived/deleted. I'll try leaving autodefrag on and defragging directories
over the holiday weekend when no one is using the server. There is some database
usage, but I turned off COW for its folder and it only gets used sporadically
and shouldn't be a huge factor in day-to-day usage. 

Also, is there a recommendation for relatime vs noatime mount options? I don't
believe anything that runs on the server needs to use file access times, so if
it can help with performance/disk usage I'm fine with setting it to noatime.

I just tried copying a 70GB folder and then rm -rf it and it didn't appear to
impact performance, and I plan to try some larger 

Re: [PATCH 2/7] btrfs-progs: add kernel alias for each of the features in the list

2015-11-25 Thread Liu Bo
On Wed, Nov 25, 2015 at 08:08:15PM +0800, Anand Jain wrote:
> We should have maintained feature's name same across progs UI and sysfs UI.
> For example, progs mixed-bg is /sys/fs/btrfs/features/mixed_groups
> in sysfs. As these are already released and is UIs, there is nothing much
> can be done about it, except for creating the alias and making it aware.
> 
> Add kernel alias for each of the features in the list.
> 
> eg: The string with in () is the sysfs name for the same feaure
> 
> mkfs.btrfs -O list-all
> Filesystem features available:
> mixed-bg (mixed_groups)   - mixed data and metadata block groups 
> (0x4, 2.7.37)
> extref (extended_iref)- increased hardlink limit per file to 
> 65536 (0x40, 3.7, default)
> raid56 (raid56)   - raid56 extended format (0x80, 3.9)
> skinny-metadata (skinny_metadata) - reduced-size metadata extent refs (0x100, 
> 3.10, default)
> no-holes (no_holes)   - no explicit hole extents for files 
> (0x200, 3.14)
> 
> btrfs-convert -O list-all
> Filesystem features available:
> extref (extended_iref)- increased hardlink limit per file to 
> 65536 (0x40, 3.7, default)
> skinny-metadata (skinny_metadata) - reduced-size metadata extent refs (0x100, 
> 3.10, default)
> no-holes (no_holes)   - no explicit hole extents for files 
> (0x200, 3.14)

You miss a signed-off-by here.

Thanks,

-liubo
> ---
>  utils.c | 13 +++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/utils.c b/utils.c
> index 0163915..6d2675d 100644
> --- a/utils.c
> +++ b/utils.c
> @@ -648,17 +648,26 @@ void btrfs_process_fs_features(u64 flags)
>  void btrfs_list_all_fs_features(u64 mask_disallowed)
>  {
>   int i;
> + u64 feature_per_sysfs;
> +
> + btrfs_features_allowed_by_sysfs(_per_sysfs);
>  
>   fprintf(stderr, "Filesystem features available:\n");
>   for (i = 0; i < ARRAY_SIZE(mkfs_features) - 1; i++) {
>   char *is_default = "";
> + char name[256];
>  
>   if (mkfs_features[i].flag & mask_disallowed)
>   continue;
>   if (mkfs_features[i].flag & BTRFS_MKFS_DEFAULT_FEATURES)
>   is_default = ", default";
> - fprintf(stderr, "%-20s- %s (0x%llx, %s%s)\n",
> - mkfs_features[i].name,
> + if (mkfs_features[i].flag & feature_per_sysfs)
> + sprintf(name, "%s (%s)",
> + mkfs_features[i].name, 
> mkfs_features[i].name_ker);
> + else
> + sprintf(name, "%s", mkfs_features[i].name);
> + fprintf(stderr, "%-34s- %s (0x%llx, %s%s)\n",
> + name,
>   mkfs_features[i].desc,
>   mkfs_features[i].flag,
>   mkfs_features[i].min_ker_ver,
> -- 
> 2.6.2
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Imbalanced RAID1 with three unequal disks

2015-11-25 Thread Hugo Mills
On Wed, Nov 25, 2015 at 12:36:32PM +0100, Mario wrote:
> 
> Hi,
> 
> I pushed a subvolume using send/receive to an 8 TB disk, added
> two 4 TB disks and started a balance with conversion to RAID1.
> 
> Afterwards, I got the following:
> 
>   Total devices 3 FS bytes used 5.40TiB
>   devid1 size 7.28TiB used 4.54TiB path /dev/mapper/yellow4
>   devid2 size 3.64TiB used 3.17TiB path /dev/mapper/yellow1
>   devid3 size 3.64TiB used 3.17TiB path /dev/mapper/yellow2
> 
>   Btrfs v3.17
>   Data, RAID1: total=5.43TiB, used=5.39TiB
>   System, RAID1: total=64.00MiB, used=800.00KiB
>   Metadata, RAID1: total=14.00GiB, used=5.55GiB
>   GlobalReserve, single: total=512.00MiB, used=0.00B
> 
> In my understanding, the data isn't properly balanced and I
> only get around 5.9TB of usable space. As suggested in #btrfs,
> I started a second balance without filters and got this:
> 
>   Total devices 3 FS bytes used 5.40TiB
>   devid1 size 7.28TiB used 5.41TiB path /dev/mapper/yellow4
>   devid2 size 3.64TiB used 2.71TiB path /dev/mapper/yellow1
>   devid3 size 3.64TiB used 2.71TiB path /dev/mapper/yellow2
> 
>   Data, RAID1: total=5.41TiB, used=5.39TiB
>   System, RAID1: total=32.00MiB, used=784.00KiB
>   Metadata, RAID1: total=7.00GiB, used=5.54GiB
>   GlobalReserve, single: total=512.00MiB, used=0.00B
> 
>   /dev/mapper/yellow4  7,3T5,4T  969G   86% /mnt/yellow
> 
> Now, I get 6.3TB of usable space but, in my understand, I should
> get around 7.28 TB or am I missing something here? Also, a second
> balance shouldn't change the data distribution, right?

   The first balance, because it was converting, didn't get the final
outcome right. (Possibly an area for further research on appropriate
algorithms for balance). The second one appears to have done the right
thing. It actually threw me slightly, because your free space isn't
equal on all the devices, but on reflection, that's expected in this
case. You have dev 1 double the size of devs 2 and 3, so in each
RAID-1 block group, one chunk will go on dev 1, and the other chunk
will go on one of the other two devices (spread evenly). This means
that *eventually*, they'll hit all devices with equal free space, but
only when everything's filled up completely. It's just on the edge
case between getting equal free space on all devices and having
unusable space (which you'd have if that 8TB drive was any larger).

   So, yes, all normal and all good.

   As to remaining free space from df, I'm fairly sure that the
algorithm for computing free space for df is just plain wrong. I've
spotted it not quite getting the answer right before. That seems to be
the case here, too; just more so. You have something just under 2 TiB
of usable space on the FS, according to my calculations.

> I'm using kernel v4.3 with a patch [1] from kernel bugzilla [2] for
> the 8 TB SMR drive. The send/receive of a 5 TB subvolume worked
> flawlessly with the patch. Without, I got a lot of errors in dmesg
> within the first 200GB of transferred data. The OS is a x86_64
> Ubuntu 15.04.

   That's useful to know, in case anyone else shows up with write
errors on SMR devices.

   Hugo.

> Thank you!
> Mario
> 
> [1] 
> http://git.kernel.org/cgit/linux/kernel/git/mkp/linux.git/commit/?h=bugzilla-93581=7c4fbd50bfece00abf529bc96ac989dd2bb83ca4
> [2] https://bugzilla.kernel.org/show_bug.cgi?id=93581

-- 
Hugo Mills | I was cursed with poetry very young. It creates
hugo@... carfax.org.uk | unrealistic expectations.
http://carfax.org.uk/  |   Victor Frankenstein
PGP: E2AB1DE4  |Penny Dreadful


signature.asc
Description: Digital signature


Re: [PATCH 0/7] Let user specify the kernel version for features

2015-11-25 Thread Qu Wenruo



Anand Jain wrote on 2015/11/26 14:07 +0800:



On 11/26/2015 10:02 AM, Qu Wenruo wrote:



Anand Jain wrote on 2015/11/25 20:08 +0800:

Sometimes users may want to have a btrfs to be supported on multiple
kernel version. A simple example, USB drive can be used with multiple
system running different kernel versions. Or in a data center a SAN
LUN could be mounted on any system with different kernel version.

Thanks for providing comments and feedback.
Further to it, here below is a set of patch which will introduce, to
specify a kernel version so that default features can be set based on
what features were supported at that kernel version.


With the new -O comp= option, the concern on user who want to make a
btrfs for newer kernel is hugely reduced.


NO!. actually new option -O comp= provides no concern for users who
want to create _a btrfs disk layout which is compatible with more
than one kernel_.  above there are two examples of it.


Why you can't give a higher kernel version than current kernel?




But I still prefer such feature align to be done only when specified by
user, instead of automatically. (yeah, already told for several times
though)
Warning should be enough for user, sometimes too automatic is not  good,


As said before.
We need latest btrfs-progs on older kernels, for obvious reasons of
btrfs-progs bug fixes. We don't have to back port fixes even on
btrfs-progs as we already do it in btrfs kernel. A btrfs-progs should
work on any kernel with the "default features as prescribed for that
kernel".

Let's say if we don't do this automatic then, latest btrfs-progs
with default mkfs.btfs && mount fails. But a user upgrading btrfs-progs
for fsck bug fixes, shouldn't find 'default mkfs.btfs && mount'
failing. Nor they have to use a "new" set of mkfs option to create all
default FS for a LTS kernel.

Default features based on btrfs-progs version instead of kernel
version- makes NO sense.


Kernel version never makes sense, especially for non-vanilla.
And unfortunately, most of kernels used in stable distribution is not 
vanilla.

And that's the *POINT1*.

That's why I stand against kernel version based detection.
You can use stable /sys/fs/btrfs/features/, but kernel version?
Not an option even as fallback.


And adding a warning for not using latest
features which is not in their running kernel is pointless.


You didn't get the point of what to WARN.

Not warning user they are not using latest features, but warning some 
features may prevent the fs being mounted for current kernel.



That's _not_ a backward kernel compatible tool.

btrfs-progs should work "for the kernel". We should avoid adding too
much intelligence into btrfs-progs. I have fixed too many issues and
redesigned progs in this area. Too many bugs were mainly because of the
idea of copy and maintain same code on btrfs-progs and btrfs-kernel
approach for progs. (ref wiki and my email before). Thats a wrong
approach.


Totally agree with this point. Too many non-sense in btrfs-progs codes 
copied from kernel, and due to lack of update, it's very buggy now.

Just check volume.c for allocating data chunk.

But I didn't see the point related to the feature auto align here.


I don't understand- if the purpose of both of these isn't
same what is the point in maintaining same code? It won't save in
efforts mainly because its like developing a distributed FS where
two parties has to be communicated to be in sync. Which is like using
the canon to shoo a crow.
But if the reason was fuse like kernel-free FS (no one said that
though) then its better to do it as a separate project.


especially for tests.


It depends whats being tested kernel OR progs? Its kernel not progs.


No, both kernel and progs. Just from Dave, even with his typo:

"xfstests is not jsut for testing kernel changes - it tests all of
the filesystem utilities for regressions, too. And so when
inadvertant changes in default behaviour occur, it detects those
regressions too."



Automatic will keep default feature constant for a given kernel
version. Further, for testing using a known set of options is even
better.


Yeah, known set of options get unknown on different kernels, thanks to 
the hidden feature align. Unless you specify it by -O options.


That's the *POINT2*:
Default auto feature align are making mkfs.btrfs behavior *unpredictable*.

Before auto feature align, QA/end-user only needs to check the 
btrfs-progs announcement to know the default behavior change.


And after it, wow, QA testers will need to check the feature matrix to 
know what's the default feature on their kernel, not to mention it may 
even be wrong due to more unpredictable kernel version.


That's why I strongly recommend to make it just a warning other than 
default behavior.





A lot of btrfs-progs change, like recent disabling mixed-bg for small
volume has already cause regression in generic/077 testcase.
And Dave is already fed up with such problem from btrfs...


I don't know 

Re: bad extent [5993525264384, 5993525280768), type mismatch with chunk

2015-11-25 Thread Laurent Bonnaud
On 25/11/2015 00:46, Qu Wenruo wrote:

> The size seems small enough, I'll try to download it as it's super useful to 
> debug it.

Thanks !

> Nice reproducer.
> Is it 100% reproducible or has a chance to reproduce?

I tried a second time and got a similar kernel backtrace.

> BTW, did you encountered the same btrfsck error "chunk type dismatch" from 
> Christoph? 

Yes, that's what drew me to this discussion :>.

I also tried the --repair option and that is perhaps what corrupted my FS.

-- 
Laurent.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Btrfs: fix error path when failing to submit bio for direct IO write

2015-11-25 Thread fdmanana
From: Filipe Manana 

Commit 61de718fceb6 ("Btrfs: fix memory corruption on failure to submit
bio for direct IO") fixed problems with the error handling code after we
fail to submit a bio for direct IO. However there were 2 problems that it
did not address when the failure is due to memory allocation failures for
direct IO writes:

1) We considered that there could be only one ordered extent for the whole
   IO range, which is not always true, as we can have multiple;

2) It did not set the bit BTRFS_ORDERED_IO_DONE in the ordered extent,
   which can make other tasks running btrfs_wait_logged_extents() hang
   forever, since they wait for that bit to be set. The general assumption
   is that regardless of an error, the BTRFS_ORDERED_IO_DONE is always set
   and it precedes setting the bit BTRFS_ORDERED_COMPLETE.

Fix these issues by moving part of the btrfs_endio_direct_write() handler
into a new helper function and having that new helper function called when
we fail to allocate memory to submit the bio (and its private object) for
a direct IO write.

Signed-off-by: Filipe Manana 
---
 fs/btrfs/inode.c | 54 +++---
 1 file changed, 27 insertions(+), 27 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index f82d1f4..4f8560c 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7995,22 +7995,22 @@ static void btrfs_endio_direct_read(struct bio *bio)
bio_put(bio);
 }
 
-static void btrfs_endio_direct_write(struct bio *bio)
+static void btrfs_endio_direct_write_update_ordered(struct inode *inode,
+   const u64 offset,
+   const u64 bytes,
+   const int uptodate)
 {
-   struct btrfs_dio_private *dip = bio->bi_private;
-   struct inode *inode = dip->inode;
struct btrfs_root *root = BTRFS_I(inode)->root;
struct btrfs_ordered_extent *ordered = NULL;
-   u64 ordered_offset = dip->logical_offset;
-   u64 ordered_bytes = dip->bytes;
-   struct bio *dio_bio;
+   u64 ordered_offset = offset;
+   u64 ordered_bytes = bytes;
int ret;
 
 again:
ret = btrfs_dec_test_first_ordered_pending(inode, ,
   _offset,
   ordered_bytes,
-  !bio->bi_error);
+  uptodate);
if (!ret)
goto out_test;
 
@@ -8023,13 +8023,22 @@ out_test:
 * our bio might span multiple ordered extents.  If we haven't
 * completed the accounting for the whole dio, go back and try again
 */
-   if (ordered_offset < dip->logical_offset + dip->bytes) {
-   ordered_bytes = dip->logical_offset + dip->bytes -
-   ordered_offset;
+   if (ordered_offset < offset + bytes) {
+   ordered_bytes = offset + bytes - ordered_offset;
ordered = NULL;
goto again;
}
-   dio_bio = dip->dio_bio;
+}
+
+static void btrfs_endio_direct_write(struct bio *bio)
+{
+   struct btrfs_dio_private *dip = bio->bi_private;
+   struct bio *dio_bio = dip->dio_bio;
+
+   btrfs_endio_direct_write_update_ordered(dip->inode,
+   dip->logical_offset,
+   dip->bytes,
+   !bio->bi_error);
 
kfree(dip);
 
@@ -8365,24 +8374,15 @@ free_ordered:
dip = NULL;
io_bio = NULL;
} else {
-   if (write) {
-   struct btrfs_ordered_extent *ordered;
-
-   ordered = btrfs_lookup_ordered_extent(inode,
- file_offset);
-   set_bit(BTRFS_ORDERED_IOERR, >flags);
-   /*
-* Decrements our ref on the ordered extent and removes
-* the ordered extent from the inode's ordered tree,
-* doing all the proper resource cleanup such as for the
-* reserved space and waking up any waiters for this
-* ordered extent (through btrfs_remove_ordered_extent).
-*/
-   btrfs_finish_ordered_io(ordered);
-   } else {
+   if (write)
+   btrfs_endio_direct_write_update_ordered(inode,
+   file_offset,
+   dio_bio->bi_iter.bi_size,
+   1);
+   else
unlock_extent(_I(inode)->io_tree, file_offset,
  

Re: shall distros run btrfsck on boot?

2015-11-25 Thread Austin S Hemmelgarn

On 2015-11-24 17:26, Eric Sandeen wrote:

On 11/24/15 2:38 PM, Austin S Hemmelgarn wrote:


if the system was
shut down cleanly, you're fine barring software bugs, but if it
crashed, you should be running a check on the FS.


Um, no...

The *entire point* of having a journaling filesystem is that after a
crash or power loss, a journal replay on next mount will bring the
metadata into a consistent state.

OK, first, that was in reference to BTRFS, not ext4, and BTRFS is a COW 
filesystem, not a journaling one, which is an important distinction as 
mentioned by Hugo in his reply.  Second, there are two reasons that you 
should be running a check even of a journaled filesystem when the system 
crashes (this also applies to COW filesystems, and anything else that 
relies on atomicity of write operations for consistency):


1. Disks don't atomically write anything bigger than a sector, and may 
not even atomically write the sector itself.  This means that it's 
possible to get a partial write to the journal, which in turn has 
significant potential to put the metadata in an inconsistent state when 
the journal gets replayed (IIRC, ext4 has a journal_checksum mount 
option that is supposed to mitigate this possibility).  This sounds like 
something that shouldn't happen all that often, but on a busy 
filesystem, the probability is exactly proportionate to the size of the 
journal relative to the size of the FS.


2. If the system crashed, all code running on it immediately before the 
crash is instantly suspect, and you have no way to know for certain that 
something didn't cause random garbage to be written to the disk.  On top 
of this, hardware is potentially suspect, and when your hardware is 
misbehaving, then all bets as to consistency are immediately off.




smime.p7s
Description: S/MIME Cryptographic Signature


[PATCH 10/12] Fix btrfs/098 to work on non-4k block sized filesystems

2015-11-25 Thread Chandan Rajendra
This commit makes use of the new _filter_xfs_io_blocks_modified filtering
function to print information in terms of file blocks rather than file
offset.

Signed-off-by: Chandan Rajendra 
---
 tests/btrfs/098 | 65 ++---
 tests/btrfs/098.out |  7 +-
 2 files changed, 38 insertions(+), 34 deletions(-)

diff --git a/tests/btrfs/098 b/tests/btrfs/098
index 8aef119..4879a90 100755
--- a/tests/btrfs/098
+++ b/tests/btrfs/098
@@ -58,43 +58,49 @@ _scratch_mkfs >>$seqres.full 2>&1
 _init_flakey
 _mount_flakey
 
-# Create our test file with a single 100K extent starting at file offset 800K.
-# We fsync the file here to make the fsync log tree gets a single csum item 
that
-# covers the whole 100K extent, which causes the second fsync, done after the
-# cloning operation below, to not leave in the log tree two csum items covering
-# two sub-ranges ([0, 20K[ and [20K, 100K[)) of our extent.
-$XFS_IO_PROG -f -c "pwrite -S 0xaa 800K 100K"  \
+BLOCK_SIZE=$(get_block_size $SCRATCH_MNT)
+
+# Create our test file with a single 25 block extent starting at file offset
+# mapped by 200th block We fsync the file here to make the fsync log tree get a
+# single csum item that covers the whole 25 block extent, which causes the
+# second fsync, done after the cloning operation below, to not leave in the log
+# tree two csum items covering two block sub-ranges ([0, 5[ and [5, 25[)) of 
our
+# extent.
+$XFS_IO_PROG -f -c "pwrite -S 0xaa $((200 * $BLOCK_SIZE)) $((25 * 
$BLOCK_SIZE))" \
-c "fsync" \
-   $SCRATCH_MNT/foo | _filter_xfs_io
+   $SCRATCH_MNT/foo | _filter_xfs_io_blocks_modified
+
 
-# Now clone part of our extent into file offset 400K. This adds a file extent
-# item to our inode's metadata that points to the 100K extent we created 
before,
-# using a data offset of 20K and a data length of 20K, so that it refers to
-# the sub-range [20K, 40K[ of our original extent.
-$CLONER_PROG -s $((800 * 1024 + 20 * 1024)) -d $((400 * 1024)) \
-   -l $((20 * 1024)) $SCRATCH_MNT/foo $SCRATCH_MNT/foo
+# Now clone part of our extent into file offset mapped by 100th block. This 
adds
+# a file extent item to our inode's metadata that points to the 25 block extent
+# we created before, using a data offset of 5 blocks and a data length of 5
+# blocks, so that it refers to the block sub-range [5, 10[ of our original
+# extent.
+$CLONER_PROG -s $(((200 * $BLOCK_SIZE) + (5 * $BLOCK_SIZE))) \
+-d $((100 * $BLOCK_SIZE)) -l $((5 * $BLOCK_SIZE)) \
+$SCRATCH_MNT/foo $SCRATCH_MNT/foo
 
 # Now fsync our file to make sure the extent cloning is durably persisted. This
 # fsync will not add a second csum item to the log tree containing the 
checksums
-# for the blocks in the sub-range [20K, 40K[ of our extent, because there was
+# for the blocks in the block sub-range [5, 10[ of our extent, because there 
was
 # already a csum item in the log tree covering the whole extent, added by the
 # first fsync we did before.
 $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/foo
 
-echo "File digest before power failure:"
-md5sum $SCRATCH_MNT/foo | _filter_scratch
+orig_hash=$(md5sum $SCRATCH_MNT/foo | cut -f 1 -d ' ')
 
 # The fsync log replay first processes the file extent item corresponding to 
the
-# file offset 400K (the one which refers to the [20K, 40K[ sub-range of our 
100K
-# extent) and then processes the file extent item for file offset 800K. It used
-# to happen that when processing the later, it erroneously left in the csum 
tree
-# 2 csum items that overlapped each other, 1 for the sub-range [20K, 40K[ and 1
-# for the whole range of our extent. This introduced a problem where subsequent
-# lookups for the checksums of blocks within the range [40K, 100K[ of our 
extent
-# would not find anything because lookups in the csum tree ended up looking 
only
-# at the smaller csum item, the one covering the subrange [20K, 40K[. This made
-# read requests assume an expected checksum with a value of 0 for those blocks,
-# which caused checksum verification failure when the read operations finished.
+# file offset mapped by 100th block (the one which refers to the [5, 10[ block
+# sub-range of our 25 block extent) and then processes the file extent item for
+# file offset mapped by 200th block. It used to happen that when processing the
+# later, it erroneously left in the csum tree 2 csum items that overlapped each
+# other, 1 for the block sub-range [5, 10[ and 1 for the whole range of our
+# extent. This introduced a problem where subsequent lookups for the checksums
+# of blocks within the block range [10, 25[ of our extent would not find
+# anything because lookups in the csum tree ended up looking only at the 
smaller
+# csum item, the one covering the block subrange [5, 10[. This made read
+# requests assume an expected checksum with a value of 0 for those blocks, 
which
+# caused checksum 

[PATCH 05/12] Fix btrfs/056 to work on non-4k block sized filesystems

2015-11-25 Thread Chandan Rajendra
This commit makes use of the new _filter_xfs_io_blocks_modified and _filter_od
filtering functions to print information in terms of file blocks rather than
file offset.

Signed-off-by: Chandan Rajendra 
---
 tests/btrfs/056 |  51 ++
 tests/btrfs/056.out | 152 +---
 2 files changed, 90 insertions(+), 113 deletions(-)

diff --git a/tests/btrfs/056 b/tests/btrfs/056
index 66a59b8..6dc3bfd 100755
--- a/tests/btrfs/056
+++ b/tests/btrfs/056
@@ -68,33 +68,42 @@ test_btrfs_clone_fsync_log_recover()
MOUNT_OPTIONS="$MOUNT_OPTIONS $2"
_mount_flakey
 
-   # Create a file with 4 extents and 1 hole, all with a size of 8Kb each.
-   # The hole is in the range [16384, 24576[.
-   $XFS_IO_PROG -s -f -c "pwrite -S 0x01 -b 8192 0 8192" \
-   -c "pwrite -S 0x02 -b 8192 8192 8192" \
-   -c "pwrite -S 0x04 -b 8192 24576 8192" \
-   -c "pwrite -S 0x05 -b 8192 32768 8192" \
-   $SCRATCH_MNT/foo | _filter_xfs_io
-
-   # Clone destination file, 1 extent of 96kb.
-   $XFS_IO_PROG -f -c "pwrite -S 0xff -b 98304 0 98304" -c "fsync" \
-   $SCRATCH_MNT/bar | _filter_xfs_io
-
-   # Clone second half of the 2nd extent, the 8kb hole, the 3rd extent
+   BLOCK_SIZE=$(get_block_size $SCRATCH_MNT)
+
+   EXTENT_SIZE=$((2 * $BLOCK_SIZE))
+
+   # Create a file with 4 extents and 1 hole, all with a size of
+   # 2 blocks each.
+   # The hole is in the block range [4, 5].
+   $XFS_IO_PROG -s -f -c "pwrite -S 0x01 -b $EXTENT_SIZE 0 $EXTENT_SIZE" \
+   -c "pwrite -S 0x02 -b $EXTENT_SIZE $((2 * $BLOCK_SIZE)) 
$EXTENT_SIZE" \
+   -c "pwrite -S 0x04 -b $EXTENT_SIZE $((6 * $BLOCK_SIZE)) 
$EXTENT_SIZE" \
+   -c "pwrite -S 0x05 -b $EXTENT_SIZE $((8 * $BLOCK_SIZE)) 
$EXTENT_SIZE" \
+   $SCRATCH_MNT/foo | _filter_xfs_io_blocks_modified
+
+   # Clone destination file, 1 extent of 24 blocks.
+   $XFS_IO_PROG -f -c "pwrite -S 0xff -b $((24 * $BLOCK_SIZE)) 0 $((24 * 
$BLOCK_SIZE))" \
+-c "fsync" $SCRATCH_MNT/bar | 
_filter_xfs_io_blocks_modified
+
+   # Clone second half of the 2nd extent, the 2 block hole, the 3rd extent
# and the first half of the 4th extent into file bar.
-   $CLONER_PROG -s 12288 -d 0 -l 24576 $SCRATCH_MNT/foo $SCRATCH_MNT/bar
+   $CLONER_PROG -s $((3 * $BLOCK_SIZE)) -d 0 -l $((6 * $BLOCK_SIZE)) \
+$SCRATCH_MNT/foo $SCRATCH_MNT/bar
$XFS_IO_PROG -c "fsync" $SCRATCH_MNT/bar
 
# Test small files too consisting of 1 inline extent
-   $XFS_IO_PROG -f -c "pwrite -S 0x00 -b 3500 0 3500" -c "fsync" \
-   $SCRATCH_MNT/foo2 | _filter_xfs_io
+   EXTENT_SIZE=$(($BLOCK_SIZE - 48))
+   $XFS_IO_PROG -f -c "pwrite -S 0x00 -b $EXTENT_SIZE 0 $EXTENT_SIZE" -c 
"fsync" \
+   $SCRATCH_MNT/foo2 | _filter_xfs_io_blocks_modified
 
-   $XFS_IO_PROG -f -c "pwrite -S 0xcc -b 1000 0 1000" -c "fsync" \
-   $SCRATCH_MNT/bar2 | _filter_xfs_io
+   EXTENT_SIZE=$(($BLOCK_SIZE - 1048))
+   $XFS_IO_PROG -f -c "pwrite -S 0xcc -b $EXTENT_SIZE 0 $EXTENT_SIZE" -c 
"fsync" \
+   $SCRATCH_MNT/bar2 | _filter_xfs_io_blocks_modified
 
# Clone the entire foo2 file into bar2, overwriting all data in bar2
# and increasing its size.
-   $CLONER_PROG -s 0 -d 0 -l 3500 $SCRATCH_MNT/foo2 $SCRATCH_MNT/bar2
+   EXTENT_SIZE=$(($BLOCK_SIZE - 48))
+   $CLONER_PROG -s 0 -d 0 -l $EXTENT_SIZE $SCRATCH_MNT/foo2 
$SCRATCH_MNT/bar2
$XFS_IO_PROG -c "fsync" $SCRATCH_MNT/bar2
 
_flakey_drop_and_remount yes
@@ -102,10 +111,10 @@ test_btrfs_clone_fsync_log_recover()
# Verify the cloned range was persisted by fsync and the log recovery
# code did its work well.
echo "Verifying file bar content"
-   od -t x1 $SCRATCH_MNT/bar
+   od -t x1 $SCRATCH_MNT/bar | _filter_od
 
echo "Verifying file bar2 content"
-   od -t x1 $SCRATCH_MNT/bar2
+   od -t x1 $SCRATCH_MNT/bar2 | _filter_od
 
_unmount_flakey
 
diff --git a/tests/btrfs/056.out b/tests/btrfs/056.out
index 1b77ae3..c4c6b2c 100644
--- a/tests/btrfs/056.out
+++ b/tests/btrfs/056.out
@@ -1,129 +1,97 @@
 QA output created by 056
 Testing without the NO_HOLES feature
-wrote 8192/8192 bytes at offset 0
-XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-wrote 8192/8192 bytes at offset 8192
-XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-wrote 8192/8192 bytes at offset 24576
-XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-wrote 8192/8192 bytes at offset 32768
-XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-wrote 98304/98304 bytes at offset 0
-XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-wrote 3500/3500 bytes at offset 0
-XXX Bytes, X ops; 

[PATCH 09/12] Fix btrfs/097 to work on non-4k block sized filesystems

2015-11-25 Thread Chandan Rajendra
This commit makes use of the new _filter_xfs_io_blocks_modified filtering
function to print information in terms of file blocks rather than file
offset.

Signed-off-by: Chandan Rajendra 
---
 tests/btrfs/097 | 42 +-
 tests/btrfs/097.out |  7 +--
 2 files changed, 26 insertions(+), 23 deletions(-)

diff --git a/tests/btrfs/097 b/tests/btrfs/097
index d9138ea..915ff9d 100755
--- a/tests/btrfs/097
+++ b/tests/btrfs/097
@@ -57,22 +57,29 @@ mkdir $send_files_dir
 _scratch_mkfs >>$seqres.full 2>&1
 _scratch_mount
 
-# Create our test file with a single extent of 64K starting at file offset 
128K.
-$XFS_IO_PROG -f -c "pwrite -S 0xaa 128K 64K" $SCRATCH_MNT/foo | _filter_xfs_io
+BLOCK_SIZE=$(get_block_size $SCRATCH_MNT)
+
+# Create our test file with a single extent of 16 blocks starting at a file
+# offset mapped by 32nd block.
+$XFS_IO_PROG -f -c "pwrite -S 0xaa $((32 * $BLOCK_SIZE)) $((16 * 
$BLOCK_SIZE))" \
+$SCRATCH_MNT/foo | _filter_xfs_io_blocks_modified
 
 _run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/mysnap1
 
 # Now clone parts of the original extent into lower offsets of the file.
 #
 # The first clone operation adds a file extent item to file offset 0 that 
points
-# to our initial extent with a data offset of 16K. The corresponding data back
-# reference in the extent tree has an offset of 18446744073709535232, which is
-# the result of file_offset - data_offset = 0 - 16K.
-#
-# The second clone operation adds a file extent item to file offset 16K that
-# points to our initial extent with a data offset of 48K. The corresponding 
data
-# back reference in the extent tree has an offset of 18446744073709518848, 
which
-# is the result of file_offset - data_offset = 16K - 48K.
+# to our initial extent with a data offset of 4 blocks. The corresponding data 
back
+# reference in the extent tree has a large value for the 'offset' field, which 
is
+# the result of file_offset - data_offset = 0 - (file offset of 4th block).  
For
+# example in case of 4k block size, it will be 0 - 16k = 18446744073709535232.
+
+# The second clone operation adds a file extent item to file offset mapped by
+# 4th block that points to our initial extent with a data offset of 12
+# blocks. The corresponding data back reference in the extent tree has a large
+# value for the 'offset' field, which is the result of file_offset - 
data_offset
+# = (file offset of 4th block) - (file offset of 12th block). For example in
+# case of 4k block size, it will be 16K - 48K = 18446744073709518848.
 #
 # Those large back reference offsets (result of unsigned arithmetic underflow)
 # confused the back reference walking code (used by an incremental send and
@@ -83,10 +90,10 @@ _run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT 
$SCRATCH_MNT/mysnap1
 # "BTRFS error (device sdc): did not find backref in send_root. inode=257, \
 #  offset=0, disk_byte=12845056 found extent=12845056"
 #
-$CLONER_PROG -s $(((128 + 16) * 1024)) -d 0 -l $((16 * 1024)) \
-   $SCRATCH_MNT/foo $SCRATCH_MNT/foo
-$CLONER_PROG -s $(((128 + 48) * 1024)) -d $((16 * 1024)) -l $((16 * 1024)) \
+$CLONER_PROG -s $(((32 + 4) * $BLOCK_SIZE)) -d 0 -l $((4 * $BLOCK_SIZE)) \
$SCRATCH_MNT/foo $SCRATCH_MNT/foo
+$CLONER_PROG -s $(((32 + 12) * $BLOCK_SIZE)) -d $((4 * $BLOCK_SIZE)) \
+-l $((4 * $BLOCK_SIZE)) $SCRATCH_MNT/foo $SCRATCH_MNT/foo
 
 _run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/mysnap2
 
@@ -94,8 +101,7 @@ _run_btrfs_util_prog send $SCRATCH_MNT/mysnap1 -f 
$send_files_dir/1.snap
 _run_btrfs_util_prog send -p $SCRATCH_MNT/mysnap1 $SCRATCH_MNT/mysnap2 \
-f $send_files_dir/2.snap
 
-echo "File digest in the original filesystem:"
-md5sum $SCRATCH_MNT/mysnap2/foo | _filter_scratch
+orig_hash=$(md5sum $SCRATCH_MNT/mysnap2/foo | cut -f 1 -d ' ')
 
 # Now recreate the filesystem by receiving both send streams and verify we get
 # the same file contents that the original filesystem had.
@@ -106,8 +112,10 @@ _scratch_mount
 _run_btrfs_util_prog receive $SCRATCH_MNT -f $send_files_dir/1.snap
 _run_btrfs_util_prog receive $SCRATCH_MNT -f $send_files_dir/2.snap
 
-echo "File digest in the new filesystem:"
-md5sum $SCRATCH_MNT/mysnap2/foo | _filter_scratch
+hash=$(md5sum $SCRATCH_MNT/mysnap2/foo | cut -f 1 -d ' ')
+if [ $orig_hash != $hash ]; then
+   echo "Btrfs send/receive failed: Mismatching hash values detected."
+fi
 
 status=0
 exit
diff --git a/tests/btrfs/097.out b/tests/btrfs/097.out
index 5e87eb2..c3a19c1 100644
--- a/tests/btrfs/097.out
+++ b/tests/btrfs/097.out
@@ -1,7 +1,2 @@
 QA output created by 097
-wrote 65536/65536 bytes at offset 131072
-XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-File digest in the original filesystem:
-6c6079335cff141b8a31233ead04cbff  SCRATCH_MNT/mysnap2/foo
-File digest in the new filesystem:
-6c6079335cff141b8a31233ead04cbff  SCRATCH_MNT/mysnap2/foo
+Blocks 

[PATCH 07/12] Fix btrfs/095 to work on non-4k block sized filesystems

2015-11-25 Thread Chandan Rajendra
This commit makes use of the new _filter_xfs_io_blocks_modified filtering
function to print information in terms of file blocks rather than file
offset.

Signed-off-by: Chandan Rajendra 
---
 tests/btrfs/095 | 113 +---
 tests/btrfs/095.out |  10 +
 2 files changed, 66 insertions(+), 57 deletions(-)

diff --git a/tests/btrfs/095 b/tests/btrfs/095
index 1b4ba90..e73b14e 100755
--- a/tests/btrfs/095
+++ b/tests/btrfs/095
@@ -63,85 +63,100 @@ _scratch_mkfs >>$seqres.full 2>&1
 _init_flakey
 _mount_flakey
 
-# Create prealloc extent covering range [160K, 620K[
-$XFS_IO_PROG -f -c "falloc 160K 460K" $SCRATCH_MNT/foo
+BLOCK_SIZE=$(get_block_size $SCRATCH_MNT)
 
-# Now write to the last 80K of the prealloc extent plus 40K to the unallocated
-# space that immediately follows it. This creates a new extent of 40K that 
spans
-# the range [620K, 660K[.
-$XFS_IO_PROG -c "pwrite -S 0xaa 540K 120K" $SCRATCH_MNT/foo | _filter_xfs_io
+# Create prealloc extent covering file block range [40, 155[
+$XFS_IO_PROG -f -c "falloc $((40 * $BLOCK_SIZE)) $((115 * $BLOCK_SIZE))" \
+$SCRATCH_MNT/foo | _filter_xfs_io_blocks_modified
+
+# Now write to the last 20 blocks of the prealloc extent plus 10 blocks to the
+# unallocated space that immediately follows it. This creates a new extent of 
10
+# blocks that spans the block range [155, 165[.
+$XFS_IO_PROG -c "pwrite -S 0xaa $((135 * $BLOCK_SIZE)) $((30 * $BLOCK_SIZE))" \
+$SCRATCH_MNT/foo | _filter_xfs_io_blocks_modified
 
 # At this point, there are now 2 back references to the prealloc extent in our
-# extent tree. Both are for our file offset 160K and one relates to a file
-# extent item with a data offset of 0 and a length of 380K, while the other
-# relates to a file extent item with a data offset of 380K and a length of 80K.
+# extent tree. Both are for our file offset mapped by the 40th block of the 
file
+# and one relates to a file extent item with a data offset of 0 and a length of
+# 95 blocks, while the other relates to a file extent item with a data offset 
of
+# 95 blocks and a length of 20 blocks.
 
 # Make sure everything done so far is durably persisted (all back references 
are
 # in the extent tree, etc).
 sync
 
-# Now clone all extents of our file that cover the offset 160K up to its eof
-# (660K at this point) into itself at offset 2M. This leaves a hole in the file
-# covering the range [660K, 2M[. The prealloc extent will now be referenced by
-# the file twice, once for offset 160K and once for offset 2M. The 40K extent
-# that follows the prealloc extent will also be referenced twice by our file,
-# once for offset 620K and once for offset 2M + 460K.
-$CLONER_PROG -s $((160 * 1024)) -d $((2 * 1024 * 1024)) -l 0 $SCRATCH_MNT/foo \
-   $SCRATCH_MNT/foo
-
-# Now create one new extent in our file with a size of 100Kb. It will span the
-# range [3M, 3M + 100K[. It also will cause creation of a hole spanning the
-# range [2M + 460K, 3M[. Our new file size is 3M + 100K.
-$XFS_IO_PROG -c "pwrite -S 0xbb 3M 100K" $SCRATCH_MNT/foo | _filter_xfs_io
+# Now clone all extents of our file that cover the file range spanned by 40th
+# block up to its eof (165th block at this point) into itself at 512th
+# block. This leaves a hole in the file covering the block range [165, 512[. 
The
+# prealloc extent will now be referenced by the file twice, once for offset
+# mapped by the 40th block and once for offset mapped by 512th block. The 10
+# blocks extent that follows the prealloc extent will also be referenced twice
+# by our file, once for offset mapped by the 155th block and once for offset
+# (512 block + 115 blocks)
+$CLONER_PROG -s $((40 * $BLOCK_SIZE)) -d $((512 * $BLOCK_SIZE)) -l 0 \
+$SCRATCH_MNT/foo $SCRATCH_MNT/foo
+
+# Now create one new extent in our file with a size of 25 blocks. It will span
+# the block range [768, 768 + 25[. It also will cause creation of a hole
+# spanning the block range [512 + 115, 768[. Our new file size is the file
+# offset mapped by (768 + 25)th block.
+$XFS_IO_PROG -c "pwrite -S 0xbb $((768 * $BLOCK_SIZE)) $((25 * $BLOCK_SIZE))" \
+$SCRATCH_MNT/foo | _filter_xfs_io_blocks_modified
 
 # At this point, there are now (in memory) 4 back references to the prealloc
 # extent.
 #
-# Two of them are for file offset 160K, related to file extent items
-# matching the file offsets 160K and 540K respectively, with data offsets of
-# 0 and 380K respectively, and with lengths of 380K and 80K respectively.
+# Two of them are for file offset mapped by the 40th block, related to file
+# extent items matching the file offsets mapped by 40th and 135th block
+# respectively, with data offsets of 0 and 95 blocks respectively, and with
+# lengths of 95 and 20 blocks respectively.
 #
-# The other two references are for file offset 2M, related to file extent items
-# matching the file offsets 2M and 2M + 380K respectively, with 

[PATCH 08/12] Fix btrfs/096 to work on non-4k block sized filesystems

2015-11-25 Thread Chandan Rajendra
This commit makes use of the new _filter_xfs_io_blocks_modified filtering
function to print information in terms of file blocks rather than file
offset.

Signed-off-by: Chandan Rajendra 
---
 tests/btrfs/096 | 45 +
 tests/btrfs/096.out | 15 +--
 2 files changed, 30 insertions(+), 30 deletions(-)

diff --git a/tests/btrfs/096 b/tests/btrfs/096
index f5b3a7f..896a209 100755
--- a/tests/btrfs/096
+++ b/tests/btrfs/096
@@ -51,30 +51,35 @@ rm -f $seqres.full
 _scratch_mkfs >>$seqres.full 2>&1
 _scratch_mount
 
-# Create our test files. File foo has the same 2K of data at offset 4K as file
-# bar has at its offset 0.
-$XFS_IO_PROG -f -s -c "pwrite -S 0xaa 0 4K" \
-   -c "pwrite -S 0xbb 4k 2K" \
-   -c "pwrite -S 0xcc 8K 4K" \
-   $SCRATCH_MNT/foo | _filter_xfs_io
+BLOCK_SIZE=$(get_block_size $SCRATCH_MNT)
 
-# File bar consists of a single inline extent (2K size).
-$XFS_IO_PROG -f -s -c "pwrite -S 0xbb 0 2K" \
-   $SCRATCH_MNT/bar | _filter_xfs_io
+# Create our test files. File foo has the same 2k of data at offset $BLOCK_SIZE
+# as file bar has at its offset 0.
+$XFS_IO_PROG -f -s -c "pwrite -S 0xaa 0 $BLOCK_SIZE" \
+   -c "pwrite -S 0xbb $BLOCK_SIZE 2k" \
+   -c "pwrite -S 0xcc $(($BLOCK_SIZE * 2)) $BLOCK_SIZE" \
+   $SCRATCH_MNT/foo | _filter_xfs_io_blocks_modified
 
-# Now call the clone ioctl to clone the extent of file bar into file foo at its
-# offset 4K. This made file foo have an inline extent at offset 4K, something
-# which the btrfs code can not deal with in future IO operations because all
-# inline extents are supposed to start at an offset of 0, resulting in all 
sorts
-# of chaos.
-# So here we validate that the clone ioctl returns an EOPNOTSUPP, which is what
-# it returns for other cases dealing with inlined extents.
-$CLONER_PROG -s 0 -d $((4 * 1024)) -l $((2 * 1024)) \
+# File bar consists of a single inline extent (2k in size).
+$XFS_IO_PROG -f -s -c "pwrite -S 0xbb 0 2k" \
+   $SCRATCH_MNT/bar | _filter_xfs_io_blocks_modified
+
+# Now call the clone ioctl to clone the extent of file bar into file
+# foo at its $BLOCK_SIZE offset. This made file foo have an inline
+# extent at offset $BLOCK_SIZE, something which the btrfs code can not
+# deal with in future IO operations because all inline extents are
+# supposed to start at an offset of 0, resulting in all sorts of
+# chaos.
+# So here we validate that the clone ioctl returns an EOPNOTSUPP,
+# which is what it returns for other cases dealing with inlined
+# extents.
+$CLONER_PROG -s 0 -d $BLOCK_SIZE -l 2048 \
$SCRATCH_MNT/bar $SCRATCH_MNT/foo
 
-# Because of the inline extent at offset 4K, the following write made the 
kernel
-# crash with a BUG_ON().
-$XFS_IO_PROG -c "pwrite -S 0xdd 6K 2K" $SCRATCH_MNT/foo | _filter_xfs_io
+# Because of the inline extent at offset $BLOCK_SIZE, the following
+# write made the kernel crash with a BUG_ON().
+$XFS_IO_PROG -c "pwrite -S 0xdd $(($BLOCK_SIZE + 2048)) 2k" \
+$SCRATCH_MNT/foo | _filter_xfs_io_blocks_modified
 
 status=0
 exit
diff --git a/tests/btrfs/096.out b/tests/btrfs/096.out
index 235198d..2a4251e 100644
--- a/tests/btrfs/096.out
+++ b/tests/btrfs/096.out
@@ -1,12 +1,7 @@
 QA output created by 096
-wrote 4096/4096 bytes at offset 0
-XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-wrote 2048/2048 bytes at offset 4096
-XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-wrote 4096/4096 bytes at offset 8192
-XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-wrote 2048/2048 bytes at offset 0
-XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+Blocks modified: [0 - 0]
+Blocks modified: [1 - 1]
+Blocks modified: [2 - 2]
+Blocks modified: [0 - 0]
 clone failed: Operation not supported
-wrote 2048/2048 bytes at offset 6144
-XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+Blocks modified: [1 - 1]
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 04/12] Fix btrfs/055 to work on non-4k block sized filesystems

2015-11-25 Thread Chandan Rajendra
This commit makes use of the new _filter_xfs_io_blocks_modified and _filter_od
filtering functions to print information in terms of file blocks rather than
file offset.

Signed-off-by: Chandan Rajendra 
---
 tests/btrfs/055 | 128 ++
 tests/btrfs/055.out | 378 +---
 2 files changed, 259 insertions(+), 247 deletions(-)

diff --git a/tests/btrfs/055 b/tests/btrfs/055
index c0dd9ed..1f50850 100755
--- a/tests/btrfs/055
+++ b/tests/btrfs/055
@@ -60,88 +60,110 @@ test_btrfs_clone_with_holes()
_scratch_mkfs "$1" >/dev/null 2>&1
_scratch_mount
 
-   # Create a file with 4 extents and 1 hole, all with a size of 8Kb each.
-   # The hole is in the range [16384, 24576[.
-   $XFS_IO_PROG -s -f -c "pwrite -S 0x01 -b 8192 0 8192" \
-   -c "pwrite -S 0x02 -b 8192 8192 8192" \
-   -c "pwrite -S 0x04 -b 8192 24576 8192" \
-   -c "pwrite -S 0x05 -b 8192 32768 8192" \
-   $SCRATCH_MNT/foo | _filter_xfs_io
+   BLOCK_SIZE=$(get_block_size $SCRATCH_MNT)
 
-   # Clone destination file, 1 extent of 96kb.
-   $XFS_IO_PROG -s -f -c "pwrite -S 0xff -b 98304 0 98304" \
-   $SCRATCH_MNT/bar | _filter_xfs_io
+   EXTENT_SIZE=$((2 * $BLOCK_SIZE))
 
-   # Clone 2nd extent, 8Kb hole and 3rd extent of foo into bar.
-   $CLONER_PROG -s 8192 -d 0 -l 24576 $SCRATCH_MNT/foo $SCRATCH_MNT/bar
+   OFFSET=0
+
+   # Create a file with 4 extents and 1 hole, all with 2 blocks each.
+   # The hole is in the block range [4, 5[.
+   $XFS_IO_PROG -s -f -c "pwrite -S 0x01 -b $EXTENT_SIZE $OFFSET 
$EXTENT_SIZE" \
+$SCRATCH_MNT/foo | _filter_xfs_io_blocks_modified
+
+   OFFSET=$(($OFFSET + $EXTENT_SIZE))
+   $XFS_IO_PROG -s -f -c "pwrite -S 0x02 -b $EXTENT_SIZE $OFFSET 
$EXTENT_SIZE" \
+$SCRATCH_MNT/foo | _filter_xfs_io_blocks_modified
+
+   OFFSET=$(($OFFSET + 2 * $EXTENT_SIZE))
+   $XFS_IO_PROG -s -f -c "pwrite -S 0x04 -b $EXTENT_SIZE $OFFSET 
$EXTENT_SIZE" \
+$SCRATCH_MNT/foo | _filter_xfs_io_blocks_modified
+
+   OFFSET=$(($OFFSET + $EXTENT_SIZE))
+   $XFS_IO_PROG -s -f -c "pwrite -S 0x05 -b $EXTENT_SIZE $OFFSET 
$EXTENT_SIZE" \
+$SCRATCH_MNT/foo | _filter_xfs_io_blocks_modified
+
+   # Clone destination file, 1 extent of 24 blocks.
+   EXTENT_SIZE=$((24 * $BLOCK_SIZE))
+   $XFS_IO_PROG -s -f -c "pwrite -S 0xff -b $EXTENT_SIZE 0 $EXTENT_SIZE" \
+   $SCRATCH_MNT/bar | _filter_xfs_io_blocks_modified
+
+   # Clone 2nd extent, 2-blocks sized hole and 3rd extent of foo into bar.
+   $CLONER_PROG -s $((2 * $BLOCK_SIZE)) -d 0 -l $((6 * $BLOCK_SIZE)) \
+$SCRATCH_MNT/foo $SCRATCH_MNT/bar
 
# Verify both extents and the hole were cloned.
echo "1) Check both extents and the hole were cloned"
-   od -t x1 $SCRATCH_MNT/bar
+   od -t x1 $SCRATCH_MNT/bar | _filter_od
 
-   # Cloning range starts at the middle of an hole.
-   $CLONER_PROG -s 20480 -d 32768 -l 12288 $SCRATCH_MNT/foo \
-   $SCRATCH_MNT/bar
+   # Cloning range starts at the middle of a hole.
+   $CLONER_PROG -s $((5 * $BLOCK_SIZE)) -d $((8 * $BLOCK_SIZE)) \
+-l $((3 * $BLOCK_SIZE)) $SCRATCH_MNT/foo $SCRATCH_MNT/bar
 
-   # Verify that half of the hole and the following 8Kb extent were cloned.
-   echo "2) Check half hole and one 8Kb extent were cloned"
-   od -t x1 $SCRATCH_MNT/bar
+   # Verify that half of the hole and the following 2 block extent were 
cloned.
+   echo "2) Check half hole and the following 2 block extent were cloned"
+   od -t x1 $SCRATCH_MNT/bar | _filter_od
 
-   # Cloning range ends at the middle of an hole.
-   $CLONER_PROG -s 0 -d 65536 -l 20480 $SCRATCH_MNT/foo $SCRATCH_MNT/bar
+   # Cloning range ends at the middle of a hole.
+   $CLONER_PROG -s 0 -d $((16 * $BLOCK_SIZE)) -l $((5 * $BLOCK_SIZE)) \
+$SCRATCH_MNT/foo $SCRATCH_MNT/bar
 
-   # Verify that 2 extents of 8kb and a 4kb hole were cloned.
-   echo "3) Check that 2 extents of 8kb eacg and a 4kb hole were cloned"
-   od -t x1 $SCRATCH_MNT/bar
+   # Verify that 2 extents of 2 blocks size and a 1-block hole were cloned.
+   echo "3) Check that 2 extents of 2 blocks each and a hole of 1 block 
were cloned"
+   od -t x1 $SCRATCH_MNT/bar | _filter_od
 
-   # Create a 24Kb hole at the end of the source file (foo).
-   $XFS_IO_PROG -c "truncate 65536" $SCRATCH_MNT/foo
+   # Create a 6-block hole at the end of the source file (foo).
+   $XFS_IO_PROG -c "truncate $((16 * $BLOCK_SIZE))" $SCRATCH_MNT/foo \
+   | _filter_xfs_io_blocks_modified
sync
 
# Now clone a range that overlaps that hole at the end of the foo file.
-   # It should 

[PATCH 11/12] Fix btrfs/103 to work on non-4k block sized filesystems

2015-11-25 Thread Chandan Rajendra
This commit makes use of the new _filter_xfs_io_blocks_modified filtering
function to print information in terms of file blocks rather than file
offset.

Signed-off-by: Chandan Rajendra 
---
 tests/btrfs/103 | 47 +--
 tests/btrfs/103.out | 48 
 2 files changed, 41 insertions(+), 54 deletions(-)

diff --git a/tests/btrfs/103 b/tests/btrfs/103
index 3020c86..a807900 100755
--- a/tests/btrfs/103
+++ b/tests/btrfs/103
@@ -56,31 +56,34 @@ test_clone_and_read_compressed_extent()
_scratch_mkfs >>$seqres.full 2>&1
_scratch_mount $mount_opts
 
+   BLOCK_SIZE=$(get_block_size $SCRATCH_MNT)
+
# Create a test file with a single extent that is compressed (the
# data we write into it is highly compressible no matter which
# compression algorithm is used, zlib or lzo).
-   $XFS_IO_PROG -f -c "pwrite -S 0xaa 0K 4K"\
-   -c "pwrite -S 0xbb 4K 8K"\
-   -c "pwrite -S 0xcc 12K 4K"   \
-   $SCRATCH_MNT/foo | _filter_xfs_io
+   $XFS_IO_PROG -f -c "pwrite -S 0xaa 0K $((1 * $BLOCK_SIZE))" \
+   -c "pwrite -S 0xbb $((1 * $BLOCK_SIZE)) $((2 * $BLOCK_SIZE))" \
+   -c "pwrite -S 0xcc $((3 * $BLOCK_SIZE)) $((1 * $BLOCK_SIZE))" \
+   $SCRATCH_MNT/foo | _filter_xfs_io_blocks_modified
+
 
# Now clone our extent into an adjacent offset.
-   $CLONER_PROG -s $((4 * 1024)) -d $((16 * 1024)) -l $((8 * 1024)) \
-   $SCRATCH_MNT/foo $SCRATCH_MNT/foo
+   $CLONER_PROG -s $((1 * $BLOCK_SIZE)) -d $((4 * $BLOCK_SIZE)) \
+-l $((2 * $BLOCK_SIZE)) $SCRATCH_MNT/foo $SCRATCH_MNT/foo
 
# Same as before but for this file we clone the extent into a lower
# file offset.
-   $XFS_IO_PROG -f -c "pwrite -S 0xaa 8K 4K" \
-   -c "pwrite -S 0xbb 12K 8K"\
-   -c "pwrite -S 0xcc 20K 4K"\
-   $SCRATCH_MNT/bar | _filter_xfs_io
+   $XFS_IO_PROG -f \
+   -c "pwrite -S 0xaa $((2 * $BLOCK_SIZE)) $((1 * $BLOCK_SIZE))" \
+   -c "pwrite -S 0xbb $((3 * $BLOCK_SIZE)) $((2 * $BLOCK_SIZE))" \
+   -c "pwrite -S 0xcc $((5 * $BLOCK_SIZE)) $((1 * $BLOCK_SIZE))" \
+   $SCRATCH_MNT/bar | _filter_xfs_io_blocks_modified
 
-   $CLONER_PROG -s $((12 * 1024)) -d 0 -l $((8 * 1024)) \
+   $CLONER_PROG -s $((3 * $BLOCK_SIZE)) -d 0 -l $((2 * $BLOCK_SIZE)) \
$SCRATCH_MNT/bar $SCRATCH_MNT/bar
 
-   echo "File digests before unmounting filesystem:"
-   md5sum $SCRATCH_MNT/foo | _filter_scratch
-   md5sum $SCRATCH_MNT/bar | _filter_scratch
+   foo_orig_hash=$(md5sum $SCRATCH_MNT/foo | cut -f 1 -d ' ')
+   bar_orig_hash=$(md5sum $SCRATCH_MNT/bar | cut -f 1 -d ' ')
 
# Evicting the inode or clearing the page cache before reading again
# the file would also trigger the bug - reads were returning all bytes
@@ -91,10 +94,18 @@ test_clone_and_read_compressed_extent()
# ranges that point to the same compressed extent.
_scratch_remount
 
-   echo "File digests after mounting filesystem again:"
-   # Must match the same digests we got before.
-   md5sum $SCRATCH_MNT/foo | _filter_scratch
-   md5sum $SCRATCH_MNT/bar | _filter_scratch
+   foo_hash=$(md5sum $SCRATCH_MNT/foo | cut -f 1 -d ' ')
+   bar_hash=$(md5sum $SCRATCH_MNT/bar | cut -f 1 -d ' ')
+
+   if [ $foo_orig_hash != $foo_hash ]; then
+   echo "Read operation failed on $SCRATCH_MNT/foo: "\
+"Mimatching hash values detected."
+   fi
+
+   if [ $bar_orig_hash != $bar_hash ]; then
+   echo "Read operation failed on $SCRATCH_MNT/bar: "\
+"Mimatching hash values detected."
+   fi
 }
 
 echo -e "\nTesting with zlib compression..."
diff --git a/tests/btrfs/103.out b/tests/btrfs/103.out
index f62de2f..964b70f 100644
--- a/tests/btrfs/103.out
+++ b/tests/btrfs/103.out
@@ -1,41 +1,17 @@
 QA output created by 103
 
 Testing with zlib compression...
-wrote 4096/4096 bytes at offset 0
-XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-wrote 8192/8192 bytes at offset 4096
-XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-wrote 4096/4096 bytes at offset 12288
-XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-wrote 4096/4096 bytes at offset 8192
-XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-wrote 8192/8192 bytes at offset 12288
-XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-wrote 4096/4096 bytes at offset 20480
-XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-File digests before unmounting filesystem:
-4b985a45790261a706c3ddbf22c5f765  SCRATCH_MNT/foo
-fd331e6b7a9ab105f48f71b53162d5b5  SCRATCH_MNT/bar
-File digests after 

[PATCH 00/12] Fix Btrfs tests to work on non-4k block sized fs instances

2015-11-25 Thread Chandan Rajendra
This patchset fixes Btrfs tests to work on variable block size. This
is based off the RFC patch sent during March of this year
(https://www.marc.info/?l=linux-btrfs=142736088310300=2).

Currently, some of the tests are written with the assumption that 4k
is the block size of the filesystem instance. On architectures
(e.g. ppc64) with a larger page size (and hence larger block size),
these tests fail because the block bondaries assumed by the the tests
are no longer true and hence btrfs_ioctl_clone() (which requires a
block aligned file offset range) returns -EINVAL.

To fix the issue, This patchset adds three new filter functions:
1. _filter_xfs_io_blocks_modified
2. _filter_xfs_io_pages_modified
3. _filter_od

P.S: Since the changes made are trivial, I could have clubbed all the
patches into two patches. First patch introducing the new filtering
functions and the second patch containing the changes to made to the
tests. If this approach sounds right, I can post version V2 with the
two patches containing the relevant changes.

Chandan Rajendra (12):
  Filter xfs_io and od's output in units of FS block size and the CPU's
page size
  Fix btrfs/017 to work on non-4k block sized filesystems
  Fix btrfs/052 to work on non-4k block sized filesystems
  Fix btrfs/055 to work on non-4k block sized filesystems
  Fix btrfs/056 to work on non-4k block sized filesystems
  Fix btrfs/094 to work on non-4k block sized filesystems
  Fix btrfs/095 to work on non-4k block sized filesystems
  Fix btrfs/096 to work on non-4k block sized filesystems
  Fix btrfs/097 to work on non-4k block sized filesystems
  Fix btrfs/098 to work on non-4k block sized filesystems
  Fix btrfs/103 to work on non-4k block sized filesystems
  Fix btrfs/106 to work on non-4k block sized filesystems

 common/filter   |  52 +
 common/rc   |   5 +
 tests/btrfs/017 |  16 +-
 tests/btrfs/017.out |   3 +-
 tests/btrfs/052 | 127 +++-
 tests/btrfs/052.out | 546 +++-
 tests/btrfs/055 | 128 +++-
 tests/btrfs/055.out | 378 ++--
 tests/btrfs/056 |  51 +++--
 tests/btrfs/056.out | 152 ++-
 tests/btrfs/094 |  78 +---
 tests/btrfs/094.out |  17 +-
 tests/btrfs/095 | 113 ++-
 tests/btrfs/095.out |  10 +-
 tests/btrfs/096 |  45 +++--
 tests/btrfs/096.out |  15 +-
 tests/btrfs/097 |  42 ++--
 tests/btrfs/097.out |   7 +-
 tests/btrfs/098 |  65 ---
 tests/btrfs/098.out |   7 +-
 tests/btrfs/103 |  47 +++--
 tests/btrfs/103.out |  48 ++---
 tests/btrfs/106 |  42 ++--
 tests/btrfs/106.out |  14 +-
 24 files changed, 1018 insertions(+), 990 deletions(-)

-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 01/12] Filter xfs_io and od's output in units of FS block size and the CPU's page size

2015-11-25 Thread Chandan Rajendra
The helpers will be used to make btrfs tests that assume 4k as the block size
to work on non-4k blocksized filesystem instances as well.

Signed-off-by: Chandan Rajendra 
---
 common/filter | 52 
 common/rc |  5 +
 2 files changed, 57 insertions(+)

diff --git a/common/filter b/common/filter
index af456c9..faa6f82 100644
--- a/common/filter
+++ b/common/filter
@@ -229,6 +229,45 @@ _filter_xfs_io_unique()
 common_line_filter | _filter_xfs_io
 }
 
+_filter_xfs_io_units_modified()
+{
+   UNIT=$1
+   UNIT_SIZE=$2
+
+   $AWK_PROG -v unit="$UNIT" -v unit_size=$UNIT_SIZE '
+   /wrote/ {
+   split($2, bytes, "/")
+
+   bytes_written = strtonum(bytes[1])
+
+   offset = strtonum($NF)
+
+   unit_start = offset / unit_size
+   unit_start = int(unit_start)
+   unit_end = (offset + bytes_written - 1) / unit_size
+   unit_end = int(unit_end)
+
+   printf("%ss modified: [%d - %d]\n", unit, unit_start, 
unit_end)
+
+   next
+   }
+   '
+}
+
+_filter_xfs_io_blocks_modified()
+{
+   BLOCK_SIZE=$(get_block_size $SCRATCH_MNT)
+
+   _filter_xfs_io_units_modified "Block" $BLOCK_SIZE
+}
+
+_filter_xfs_io_pages_modified()
+{
+   PAGE_SIZE=$(get_page_size)
+
+   _filter_xfs_io_units_modified "Page" $PAGE_SIZE
+}
+
 _filter_test_dir()
 {
sed -e "s,$TEST_DEV,TEST_DEV,g" -e "s,$TEST_DIR,TEST_DIR,g"
@@ -323,5 +362,18 @@ _filter_ro_mount() {
-e "s/mount: cannot mount block device/mount: cannot mount/g"
 }
 
+_filter_od()
+{
+   BLOCK_SIZE=$(get_block_size $SCRATCH_MNT)
+   $AWK_PROG -v block_size=$BLOCK_SIZE '
+   /^[0-9]+/ {
+   offset = strtonum("0"$1);
+   $1 = sprintf("%o", offset / block_size);
+   print $0;
+   }
+   /\*/
+   '
+}
+
 # make sure this script returns success
 /bin/true
diff --git a/common/rc b/common/rc
index 4c2f42c..acda6cb 100644
--- a/common/rc
+++ b/common/rc
@@ -3151,6 +3151,11 @@ get_block_size()
echo `stat -f -c %S $1`
 }
 
+get_page_size()
+{
+   echo $(getconf PAGE_SIZE)
+}
+
 init_rc
 
 

-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 03/12] Fix btrfs/052 to work on non-4k block sized filesystems

2015-11-25 Thread Chandan Rajendra
This commit makes use of the new _filter_xfs_io_blocks_modified filtering
function to print information in terms of file blocks rather than file
offset.

Signed-off-by: Chandan Rajendra 
---
 tests/btrfs/052 | 127 +++-
 tests/btrfs/052.out | 546 +++-
 2 files changed, 323 insertions(+), 350 deletions(-)

diff --git a/tests/btrfs/052 b/tests/btrfs/052
index c75193d..55c8332 100755
--- a/tests/btrfs/052
+++ b/tests/btrfs/052
@@ -59,78 +59,105 @@ test_btrfs_clone_same_file()
_scratch_mkfs >/dev/null 2>&1
_scratch_mount $MOUNT_OPTIONS
 
-   # Create a file with 5 extents, 4 of 8Kb each and 1 of 64Kb.
-   $XFS_IO_PROG -f -c "pwrite -S 0x01 -b 8192 0 8192" $SCRATCH_MNT/foo \
-   | _filter_xfs_io
+   BLOCK_SIZE=$(get_block_size $SCRATCH_MNT)
+
+   EXTENT_SIZE=$((2 * $BLOCK_SIZE))
+
+   # Create a file with 5 extents, 4 of 2 blocks each and 1 of 16 blocks.
+   OFFSET=0
+   $XFS_IO_PROG -f -c "pwrite -S 0x01 -b $EXTENT_SIZE $OFFSET 
$EXTENT_SIZE" $SCRATCH_MNT/foo \
+   | _filter_xfs_io_blocks_modified
sync
-   $XFS_IO_PROG -c "pwrite -S 0x02 -b 8192 8192 8192" $SCRATCH_MNT/foo \
-   | _filter_xfs_io
+
+   OFFSET=$(($OFFSET + $EXTENT_SIZE))
+   $XFS_IO_PROG -c "pwrite -S 0x02 -b $EXTENT_SIZE $OFFSET $EXTENT_SIZE" 
$SCRATCH_MNT/foo \
+   | _filter_xfs_io_blocks_modified
sync
-   $XFS_IO_PROG -c "pwrite -S 0x03 -b 8192 16384 8192" $SCRATCH_MNT/foo \
-   | _filter_xfs_io
+
+   OFFSET=$(($OFFSET + $EXTENT_SIZE))
+   $XFS_IO_PROG -c "pwrite -S 0x03 -b $EXTENT_SIZE $OFFSET $EXTENT_SIZE" 
$SCRATCH_MNT/foo \
+   | _filter_xfs_io_blocks_modified
sync
-   $XFS_IO_PROG -c "pwrite -S 0x04 -b 8192 24576 8192" $SCRATCH_MNT/foo \
-   | _filter_xfs_io
+
+   OFFSET=$(($OFFSET + $EXTENT_SIZE))
+   $XFS_IO_PROG -c "pwrite -S 0x04 -b $EXTENT_SIZE $OFFSET $EXTENT_SIZE" 
$SCRATCH_MNT/foo \
+   | _filter_xfs_io_blocks_modified
sync
-   $XFS_IO_PROG -c "pwrite -S 0x05 -b 65536 32768 65536" $SCRATCH_MNT/foo \
-   | _filter_xfs_io
+
+   OFFSET=$(($OFFSET + $EXTENT_SIZE))
+   EXTENT_SIZE=$((16 * $BLOCK_SIZE))
+   $XFS_IO_PROG -c "pwrite -S 0x05 -b $EXTENT_SIZE $OFFSET $EXTENT_SIZE" 
$SCRATCH_MNT/foo \
+   | _filter_xfs_io_blocks_modified
sync
 
# Digest of initial content.
-   md5sum $SCRATCH_MNT/foo | _filter_scratch
+   orig_hash=$(md5sum $SCRATCH_MNT/foo | cut -f 1 -d ' ')
 
# Same source and target ranges - must fail.
-   $CLONER_PROG -s 8192 -d 8192 -l 8192 $SCRATCH_MNT/foo $SCRATCH_MNT/foo
+   $CLONER_PROG -s $((2 * $BLOCK_SIZE)) -d $((2 * $BLOCK_SIZE)) \
+-l $((2 * $BLOCK_SIZE)) $SCRATCH_MNT/foo $SCRATCH_MNT/foo
# Check file content didn't change.
-   md5sum $SCRATCH_MNT/foo | _filter_scratch
+   hash=$(md5sum $SCRATCH_MNT/foo | cut -f 1 -d ' ')
+   if [ $orig_hash != $hash ]; then
+   echo "Cloning same source and target ranges:"\
+"Mimatching hash values detected."
+   fi
 
# Intersection between source and target ranges - must fail too.
-   $CLONER_PROG -s 4096 -d 8192 -l 8192 $SCRATCH_MNT/foo $SCRATCH_MNT/foo
+   # $CLONER_PROG -s 4096 -d 8192 -l 8192 $SCRATCH_MNT/foo $SCRATCH_MNT/foo
+   $CLONER_PROG -s $((1 * $BLOCK_SIZE)) -d $((2 * $BLOCK_SIZE)) \
+-l $((2 * $BLOCK_SIZE)) $SCRATCH_MNT/foo $SCRATCH_MNT/foo
# Check file content didn't change.
-   md5sum $SCRATCH_MNT/foo | _filter_scratch
+   hash=$(md5sum $SCRATCH_MNT/foo | cut -f 1 -d ' ')
+   if [ $orig_hash != $hash ]; then
+   echo "Cloning intersection between source and target ranges:"\
+"Mismatching hash values detected."
+   fi
 
# Clone an entire extent from a higher range to a lower range.
-   $CLONER_PROG -s 24576 -d 0 -l 8192 $SCRATCH_MNT/foo $SCRATCH_MNT/foo
-
-   # Check entire file, the 8Kb block at offset 0 now has the same content
-   # as the 8Kb block at offset 24576.
-   od -t x1 $SCRATCH_MNT/foo
+   $CLONER_PROG -s $((6 * $BLOCK_SIZE)) -d 0 -l $((2 * $BLOCK_SIZE)) \
+$SCRATCH_MNT/foo $SCRATCH_MNT/foo
+   # Check entire file, 0th and 1st blocks now have the same content
+   # as the 6th and 7th blocks.
+   od -t x1 $SCRATCH_MNT/foo | _filter_od
 
# Clone an entire extent from a lower range to a higher range.
-   $CLONER_PROG -s 8192 -d 16384 -l 8192 $SCRATCH_MNT/foo $SCRATCH_MNT/foo
-
-   # Check entire file, the 8Kb block at offset 0 now has the same content
-   # as the 8Kb block at offset 24576, and the 8Kb block at offset 16384
-   # now has the same content as the 8Kb block at offset 8192.
-   od -t x1 $SCRATCH_MNT/foo
-
-   # 

[PATCH 02/12] Fix btrfs/017 to work on non-4k block sized filesystems

2015-11-25 Thread Chandan Rajendra
This commit makes use of the new _filter_xfs_io_blocks_modified filtering
function to print information in terms of file blocks rather than file
offset.

Signed-off-by: Chandan Rajendra 
---
 tests/btrfs/017 | 16 
 tests/btrfs/017.out |  3 +--
 2 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/tests/btrfs/017 b/tests/btrfs/017
index f8855e3..34c5f0a 100755
--- a/tests/btrfs/017
+++ b/tests/btrfs/017
@@ -63,13 +63,21 @@ rm -f $seqres.full
 _scratch_mkfs "--nodesize 65536" >>$seqres.full 2>&1
 _scratch_mount
 
-$XFS_IO_PROG -f -d -c "pwrite 0 8K" $SCRATCH_MNT/foo | _filter_xfs_io
+BLOCK_SIZE=$(get_block_size $SCRATCH_MNT)
+EXTENT_SIZE=$((2 * $BLOCK_SIZE))
+
+$XFS_IO_PROG -f -d -c "pwrite 0 $EXTENT_SIZE" $SCRATCH_MNT/foo \
+   | _filter_xfs_io_blocks_modified
 
 _run_btrfs_util_prog subvolume snapshot $SCRATCH_MNT $SCRATCH_MNT/snap
 
-$CLONER_PROG -s 0 -d 0 -l 8192 $SCRATCH_MNT/foo $SCRATCH_MNT/foo-reflink
-$CLONER_PROG -s 0 -d 0 -l 8192 $SCRATCH_MNT/foo $SCRATCH_MNT/snap/foo-reflink
-$CLONER_PROG -s 0 -d 0 -l 8192 $SCRATCH_MNT/foo $SCRATCH_MNT/snap/foo-reflink2
+$CLONER_PROG -s 0 -d 0 -l $EXTENT_SIZE $SCRATCH_MNT/foo 
$SCRATCH_MNT/foo-reflink
+
+$CLONER_PROG -s 0 -d 0 -l $EXTENT_SIZE $SCRATCH_MNT/foo \
+$SCRATCH_MNT/snap/foo-reflink
+
+$CLONER_PROG -s 0 -d 0 -l $EXTENT_SIZE $SCRATCH_MNT/foo \
+$SCRATCH_MNT/snap/foo-reflink2
 
 _run_btrfs_util_prog quota enable $SCRATCH_MNT
 _run_btrfs_util_prog quota rescan -w $SCRATCH_MNT
diff --git a/tests/btrfs/017.out b/tests/btrfs/017.out
index f940f3a..503eb88 100644
--- a/tests/btrfs/017.out
+++ b/tests/btrfs/017.out
@@ -1,5 +1,4 @@
 QA output created by 017
-wrote 8192/8192 bytes at offset 0
-XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+Blocks modified: [0 - 1]
 65536 65536
 65536 65536
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 06/12] Fix btrfs/094 to work on non-4k block sized filesystems

2015-11-25 Thread Chandan Rajendra
On Wednesday 25 Nov 2015 11:51:52 Filipe Manana wrote:
> On Wed, Nov 25, 2015 at 11:47 AM, Chandan Rajendra
> 
>  wrote:
> > On Wednesday 25 Nov 2015 11:11:27 Filipe Manana wrote:
> >> Hi Chandan,
> >> 
> >> I can't agree with this change. We're no longer checking that file
> >> data is correct after the cloning operations. The md5sum checks were
> >> exactly for that. So essentially the test is only verifying the clone
> >> operations don't fail with errors, it no longer checks for data
> >> corruption...
> >> 
> >> Same comment applies to at least a few other patches in the series.
> > 
> > Hello Filipe,
> > 
> > All the tests where we had md5sum being echoed into output have been
> > replaced with code to verify the md5sum values as shown below,
> > 
> > if [ $foo_orig_hash != $foo_hash ]; then
> > 
> > echo "Read operation failed on $SCRATCH_MNT/foo: "\
> > 
> >  "Mimatching hash values detected."
> > 
> > fi
> > 
> > This will cause a diff between the test's ideal output versus the output
> > obtained during the test run.
> 
> Right, it compares the digests before and after some operation (which
> should always match). However we no longer validate that the file
> content is correct before the operation. For some of the tests that is
> more important, like the ones that test read corruption after cloning
> compressed extents.

Filipe, you are right. I will drop the fautly patches and send V2 containing
fixes for only btrfs/017, btrfs/055 and btrfs/056.

Thanks for providing the review comments.

-- 
chandan

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/7] btrfs-progs: introduce framework version to features

2015-11-25 Thread Anand Jain
As discussed in the mailing list this provides a framework to introduce
the feature where mkfs and btrfs-convert can set the default features
as per the given mainline kernel version.

Suggested-by: David Sterba 
Signed-off-by: Anand Jain 
---
 utils.c | 23 +++
 utils.h |  1 +
 2 files changed, 24 insertions(+)

diff --git a/utils.c b/utils.c
index 216efa6..a9b46b8 100644
--- a/utils.c
+++ b/utils.c
@@ -3222,3 +3222,26 @@ int btrfs_features_allowed_by_sysfs(u64 *features)
closedir(dir);
return 0;
 }
+
+int btrfs_features_allowed_by_version(char *version, u64 *features)
+{
+   int i;
+   int code;
+   char *ver = strdup(version);
+
+   *features = 0;
+   code = version_to_code(ver);
+   free(ver);
+   if (code < 0)
+   return code;
+
+   for (i = 0; i < ARRAY_SIZE(mkfs_features) - 1; i++) {
+   ver = strdup(mkfs_features[i].min_ker_ver);
+
+   if (code >= version_to_code(ver))
+   *features |= mkfs_features[i].flag;
+
+   free(ver);
+   }
+   return 0;
+}
diff --git a/utils.h b/utils.h
index cb20d73..1418e84 100644
--- a/utils.h
+++ b/utils.h
@@ -106,6 +106,7 @@ void btrfs_process_fs_features(u64 flags);
 void btrfs_parse_features_to_string(char *buf, u64 flags);
 u64 btrfs_features_allowed_by_kernel(void);
 int btrfs_features_allowed_by_sysfs(u64 *features);
+int btrfs_features_allowed_by_version(char *version, u64 *features);
 
 struct btrfs_mkfs_config {
char *label;
-- 
2.6.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/7] btrfs-progs: show the version for -O list-all

2015-11-25 Thread Anand Jain
Shows min kernel version in the -O list-all output

eg: (version is show with in ())
btrfs-convert -O list-all
Filesystem features available:
extref  - increased hardlink limit per file to 65536 (0x40, 3.7, 
default)
skinny-metadata - reduced-size metadata extent refs (0x100, 3.10, default)
no-holes- no explicit hole extents for files (0x200, 3.14)

mkfs.btrfs -O list-all
Filesystem features available:
mixed-bg- mixed data and metadata block groups (0x4, 2.7.37)
extref  - increased hardlink limit per file to 65536 (0x40, 3.7, 
default)
raid56  - raid56 extended format (0x80, 3.9)
skinny-metadata - reduced-size metadata extent refs (0x100, 3.10, default)
no-holes- no explicit hole extents for files (0x200, 3.14)

Signed-off-by: Anand Jain 
---
 utils.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/utils.c b/utils.c
index 2710ed7..0163915 100644
--- a/utils.c
+++ b/utils.c
@@ -657,10 +657,11 @@ void btrfs_list_all_fs_features(u64 mask_disallowed)
continue;
if (mkfs_features[i].flag & BTRFS_MKFS_DEFAULT_FEATURES)
is_default = ", default";
-   fprintf(stderr, "%-20s- %s (0x%llx%s)\n",
+   fprintf(stderr, "%-20s- %s (0x%llx, %s%s)\n",
mkfs_features[i].name,
mkfs_features[i].desc,
mkfs_features[i].flag,
+   mkfs_features[i].min_ker_ver,
is_default);
}
 }
-- 
2.6.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/7] btrfs-progs: add -O comp= option for mkfs.btrfs

2015-11-25 Thread Anand Jain
This provides default feature set by version, for mkfs.btrfs
through the new option '-O comp=|', where x.y.z is the
minimum kernel version that should be supported.

Signed-off-by: Anand Jain 
---
 mkfs.c | 24 ++--
 1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/mkfs.c b/mkfs.c
index 6cb998b..34ba77d 100644
--- a/mkfs.c
+++ b/mkfs.c
@@ -324,7 +324,9 @@ static void print_usage(int ret)
fprintf(stderr, "\t-s|--sectorsize SIZEmin block allocation (may 
not mountable by current kernel)\n");
fprintf(stderr, "\t-r|--rootdir DIRthe source directory\n");
fprintf(stderr, "\t-K|--nodiscard  do not perform whole device 
TRIM\n");
-   fprintf(stderr, "\t-O|--features LIST  comma separated list of 
filesystem features, use '-O list-all' to list features\n");
+   fprintf(stderr, "\t-O|--features LIST  comma separated list of 
filesystem features\n");
+   fprintf(stderr, "\t  use '-O list-all' to list 
features\n");
+   fprintf(stderr, "\t  use '-O 
comp=|' x.y.z is the minimum kernel version to be supported\n");
fprintf(stderr, "\t-U|--uuid UUID  specify the filesystem 
UUID\n");
fprintf(stderr, "\t-q|--quiet  no messages except 
errors\n");
fprintf(stderr, "\t-V|--versionprint the mkfs.btrfs version 
and exit\n");
@@ -1439,7 +1441,24 @@ int main(int ac, char **av)
case 'O': {
char *orig = strdup(optarg);
char *tmp = orig;
-
+   char *tok;
+
+   tok = strtok(tmp, "=");
+   if (!strcmp(tok, "comp")) {
+   tok = strtok(NULL, "=");
+   if (!tok) {
+   fprintf(stderr,
+   "Provide a version for 'comp=' option, ref to 
'mkfs.btrfs -O list-all'\n");
+   exit(1);
+   }
+   if 
(btrfs_features_allowed_by_version(tok, ) < 0) {
+   fprintf(stderr, "Wrong version 
format: '%s'\n", tok);
+   exit(1);
+   }
+   features &= BTRFS_MKFS_DEFAULT_FEATURES;
+   goto cont;
+   }
+   tmp = orig;
tmp = btrfs_parse_fs_features(tmp, );
if (tmp) {
fprintf(stderr,
@@ -1448,6 +1467,7 @@ int main(int ac, char **av)
free(orig);
exit(1);
}
+cont:
free(orig);
if (features & BTRFS_FEATURE_LIST_ALL) {
btrfs_list_all_fs_features(0);
-- 
2.6.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/7] Let user specify the kernel version for features

2015-11-25 Thread Anand Jain
Sometimes users may want to have a btrfs to be supported on multiple
kernel version. A simple example, USB drive can be used with multiple
system running different kernel versions. Or in a data center a SAN
LUN could be mounted on any system with different kernel version.

Thanks for providing comments and feedback.
Further to it, here below is a set of patch which will introduce, to
specify a kernel version so that default features can be set based on
what features were supported at that kernel version.

First of all to let user know what features was supported at what kernel
version. Patch 1/7 updates -O list-all which will list the feature with
version.

As we didn't maintain the sysfs and progs feature names consistent, so
to avoid confusion Patch 2/7 displays sysfs feature name as well again
in the list-all output.

Next, Patch 3,4,5/7 are helper functions.

Patch 6,7/7 provides the -O comp= for mkfs.btrfs and
btrfs-convert respectively

Thanks, Anand

Anand Jain (7):
  btrfs-progs: show the version for -O list-all
  btrfs-progs: add kernel alias for each of the features in the list
  btrfs-progs: make is_numerical non static
  btrfs-progs: check for numerical in version_to_code()
  btrfs-progs: introduce framework version to features
  btrfs-progs: add -O comp= option for mkfs.btrfs
  btrfs-progs: add -O comp= option for btrfs-convert

 btrfs-convert.c | 21 +
 cmds-replace.c  | 11 ---
 mkfs.c  | 24 ++--
 utils.c | 58 -
 utils.h |  2 ++
 5 files changed, 98 insertions(+), 18 deletions(-)

-- 
2.6.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/7] btrfs-progs: add kernel alias for each of the features in the list

2015-11-25 Thread Anand Jain
We should have maintained feature's name same across progs UI and sysfs UI.
For example, progs mixed-bg is /sys/fs/btrfs/features/mixed_groups
in sysfs. As these are already released and is UIs, there is nothing much
can be done about it, except for creating the alias and making it aware.

Add kernel alias for each of the features in the list.

eg: The string with in () is the sysfs name for the same feaure

mkfs.btrfs -O list-all
Filesystem features available:
mixed-bg (mixed_groups)   - mixed data and metadata block groups (0x4, 
2.7.37)
extref (extended_iref)- increased hardlink limit per file to 65536 
(0x40, 3.7, default)
raid56 (raid56)   - raid56 extended format (0x80, 3.9)
skinny-metadata (skinny_metadata) - reduced-size metadata extent refs (0x100, 
3.10, default)
no-holes (no_holes)   - no explicit hole extents for files (0x200, 
3.14)

btrfs-convert -O list-all
Filesystem features available:
extref (extended_iref)- increased hardlink limit per file to 65536 
(0x40, 3.7, default)
skinny-metadata (skinny_metadata) - reduced-size metadata extent refs (0x100, 
3.10, default)
no-holes (no_holes)   - no explicit hole extents for files (0x200, 
3.14)
---
 utils.c | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/utils.c b/utils.c
index 0163915..6d2675d 100644
--- a/utils.c
+++ b/utils.c
@@ -648,17 +648,26 @@ void btrfs_process_fs_features(u64 flags)
 void btrfs_list_all_fs_features(u64 mask_disallowed)
 {
int i;
+   u64 feature_per_sysfs;
+
+   btrfs_features_allowed_by_sysfs(_per_sysfs);
 
fprintf(stderr, "Filesystem features available:\n");
for (i = 0; i < ARRAY_SIZE(mkfs_features) - 1; i++) {
char *is_default = "";
+   char name[256];
 
if (mkfs_features[i].flag & mask_disallowed)
continue;
if (mkfs_features[i].flag & BTRFS_MKFS_DEFAULT_FEATURES)
is_default = ", default";
-   fprintf(stderr, "%-20s- %s (0x%llx, %s%s)\n",
-   mkfs_features[i].name,
+   if (mkfs_features[i].flag & feature_per_sysfs)
+   sprintf(name, "%s (%s)",
+   mkfs_features[i].name, 
mkfs_features[i].name_ker);
+   else
+   sprintf(name, "%s", mkfs_features[i].name);
+   fprintf(stderr, "%-34s- %s (0x%llx, %s%s)\n",
+   name,
mkfs_features[i].desc,
mkfs_features[i].flag,
mkfs_features[i].min_ker_ver,
-- 
2.6.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/7] btrfs-progs: check for numerical in version_to_code()

2015-11-25 Thread Anand Jain
As the version is now being passed by user it should be checked
if its numerical. We didn't need this before as version wasn't
passed by used. So this is not a bug fix.

Signed-off-by: Anand Jain 
---
 utils.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/utils.c b/utils.c
index 0e66e2b..216efa6 100644
--- a/utils.c
+++ b/utils.c
@@ -3119,14 +3119,18 @@ static int version_to_code(char *v)
 
for (b[i] = strtok_r(v, ".", _b);
b[i] != NULL;
-   b[i] = strtok_r(NULL, ".", _b))
+   b[i] = strtok_r(NULL, ".", _b)) {
+   if (!is_numerical(b[i]))
+   return -EINVAL;
i++;
+   }
 
+   if (b[1] == NULL)
+   return KERNEL_VERSION(atoi(b[0]), 0, 0);
if (b[2] == NULL)
return KERNEL_VERSION(atoi(b[0]), atoi(b[1]), 0);
-   else
-   return KERNEL_VERSION(atoi(b[0]), atoi(b[1]), atoi(b[2]));
 
+   return KERNEL_VERSION(atoi(b[0]), atoi(b[1]), atoi(b[2]));
 }
 
 static int get_kernel_code()
-- 
2.6.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/7] btrfs-progs: make is_numerical non static

2015-11-25 Thread Anand Jain
Signed-off-by: Anand Jain 
---
 cmds-replace.c | 11 ---
 utils.c| 11 +++
 utils.h|  1 +
 3 files changed, 12 insertions(+), 11 deletions(-)

diff --git a/cmds-replace.c b/cmds-replace.c
index 9ab8438..86162b6 100644
--- a/cmds-replace.c
+++ b/cmds-replace.c
@@ -65,17 +65,6 @@ static const char * const replace_cmd_group_usage[] = {
NULL
 };
 
-static int is_numerical(const char *str)
-{
-   if (!(*str >= '0' && *str <= '9'))
-   return 0;
-   while (*str >= '0' && *str <= '9')
-   str++;
-   if (*str != '\0')
-   return 0;
-   return 1;
-}
-
 static int dev_replace_cancel_fd = -1;
 static void dev_replace_sigint_handler(int signal)
 {
diff --git a/utils.c b/utils.c
index 6d2675d..0e66e2b 100644
--- a/utils.c
+++ b/utils.c
@@ -178,6 +178,17 @@ int test_uuid_unique(char *fs_uuid)
return unique;
 }
 
+int is_numerical(const char *str)
+{
+   if (!(*str >= '0' && *str <= '9'))
+   return 0;
+   while (*str >= '0' && *str <= '9')
+   str++;
+   if (*str != '\0')
+   return 0;
+   return 1;
+}
+
 /*
  * @fs_uuid - if NULL, generates a UUID, returns back the new filesystem UUID
  */
diff --git a/utils.h b/utils.h
index af0aa31..cb20d73 100644
--- a/utils.h
+++ b/utils.h
@@ -271,5 +271,6 @@ const char *get_argv0_buf(void);
"-t|--tbytesshow sizes in TiB, or TB with --si"
 
 unsigned int get_unit_mode_from_arg(int *argc, char *argv[], int df_mode);
+int is_numerical(const char *str);
 
 #endif
-- 
2.6.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 7/7] btrfs-progs: add -O comp= option for btrfs-convert

2015-11-25 Thread Anand Jain
User may want to convert the FS to a minimum kernel version. As they may need
to use btrfs on a set of known kernel versions. And have the disk layout 
compatible.

Signed-off-by: Anand Jain 
---
 btrfs-convert.c | 21 +
 1 file changed, 21 insertions(+)

diff --git a/btrfs-convert.c b/btrfs-convert.c
index b0a998b..01b8940 100644
--- a/btrfs-convert.c
+++ b/btrfs-convert.c
@@ -2879,6 +2879,8 @@ static void print_usage(void)
printf("\t-L|--copy-labeluse label from converted 
filesystem\n");
printf("\t-p|--progress  show converting progress (default)\n");
printf("\t-O|--features LIST comma separated list of filesystem 
features\n");
+   printf("\t use '-O list-all' to list 
features\n");
+   printf("\t use '-O comp=|' x.y.z is 
the minimum kernel version to be supported\n");
printf("\t--no-progress  show only overview, not the detailed 
progress\n");
 }
 
@@ -2970,6 +2972,24 @@ int main(int argc, char *argv[])
case 'O': {
char *orig = strdup(optarg);
char *tmp = orig;
+   char *tok;
+
+   tok = strtok(tmp, "=");
+   if (!strcmp(tok, "comp")) {
+   tok = strtok(NULL, "=");
+   if (!tok) {
+   fprintf(stderr,
+   "Provide a version for 'comp=' option, ref to 
'mkfs.btrfs -O list-all'\n");
+   exit(1);
+   }
+   if 
(btrfs_features_allowed_by_version(tok, ) < 0) {
+   fprintf(stderr, "Wrong version 
format: '%s'\n", tok);
+   exit(1);
+   }
+   features &= BTRFS_MKFS_DEFAULT_FEATURES;
+   goto cont;
+   }
+   tmp = orig;
 
tmp = btrfs_parse_fs_features(tmp, );
if (tmp) {
@@ -2979,6 +2999,7 @@ int main(int argc, char *argv[])
free(orig);
exit(1);
}
+cont:
free(orig);
if (features & BTRFS_FEATURE_LIST_ALL) {
btrfs_list_all_fs_features(
-- 
2.6.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 1/5] btrfs-progs: introduce framework to check kernel supported features

2015-11-25 Thread Anand Jain
In the newer kernel, supported kernel features can be known from
  /sys/fs/btrfs/features
however this interface was introduced only after 3.14, and most the
incompatible FS features were introduce before 3.14.

This patch proposes to maintain kernel version against the feature list,
and so that will be the minimum kernel version needed to use the feature.

Further, for features supported later than 3.14 this list can still be
updated, so it serves as a repository which can be displayed for easy
reference.

Signed-off-by: Anand Jain 
---
v3: Mike pointed out that mixed-bg was from version 2.6.37, update it
v2: Check for condition that what happens when we fail to read kernel
version. Now the code will fail back to use the default as set by
the progs.

 utils.c | 80 -
 utils.h |  1 +
 2 files changed, 76 insertions(+), 5 deletions(-)

diff --git a/utils.c b/utils.c
index b754686..cc0bdfb 100644
--- a/utils.c
+++ b/utils.c
@@ -32,10 +32,12 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include "kerncompat.h"
@@ -567,21 +569,28 @@ out:
return ret;
 }
 
+/*
+ * min_ker_ver: update with minimum kernel version at which the feature
+ * was integrated into the mainline. For the transit period, that is
+ * feature not yet in mainline but in mailing list and for testing,
+ * please use "0.0" to indicate the same.
+ */
 static const struct btrfs_fs_feature {
const char *name;
u64 flag;
const char *desc;
+   const char *min_ker_ver;
 } mkfs_features[] = {
{ "mixed-bg", BTRFS_FEATURE_INCOMPAT_MIXED_GROUPS,
-   "mixed data and metadata block groups" },
+   "mixed data and metadata block groups", "2.7.37"},
{ "extref", BTRFS_FEATURE_INCOMPAT_EXTENDED_IREF,
-   "increased hardlink limit per file to 65536" },
+   "increased hardlink limit per file to 65536", "3.7"},
{ "raid56", BTRFS_FEATURE_INCOMPAT_RAID56,
-   "raid56 extended format" },
+   "raid56 extended format", "3.9"},
{ "skinny-metadata", BTRFS_FEATURE_INCOMPAT_SKINNY_METADATA,
-   "reduced-size metadata extent refs" },
+   "reduced-size metadata extent refs", "3.10"},
{ "no-holes", BTRFS_FEATURE_INCOMPAT_NO_HOLES,
-   "no explicit hole extents for files" },
+   "no explicit hole extents for files", "3.14"},
/* Keep this one last */
{ "list-all", BTRFS_FEATURE_LIST_ALL, NULL }
 };
@@ -3077,3 +3086,64 @@ unsigned int get_unit_mode_from_arg(int *argc, char 
*argv[], int df_mode)
 
return unit_mode;
 }
+
+static int version_to_code(char *v)
+{
+   int i = 0;
+   char *b[3] = {NULL};
+   char *save_b = NULL;
+
+   for (b[i] = strtok_r(v, ".", _b);
+   b[i] != NULL;
+   b[i] = strtok_r(NULL, ".", _b))
+   i++;
+
+   if (b[2] == NULL)
+   return KERNEL_VERSION(atoi(b[0]), atoi(b[1]), 0);
+   else
+   return KERNEL_VERSION(atoi(b[0]), atoi(b[1]), atoi(b[2]));
+
+}
+
+static int get_kernel_code()
+{
+   int ret;
+   struct utsname utsbuf;
+   char *version;
+
+   ret = uname();
+   if (ret)
+   return -ret;
+
+   if (!strlen(utsbuf.release))
+   return -EINVAL;
+
+   version = strtok(utsbuf.release, "-");
+
+   return version_to_code(version);
+}
+
+u64 btrfs_features_allowed_by_kernel(void)
+{
+   int i;
+   int local_kernel_code = get_kernel_code();
+   u64 features = 0;
+
+   /*
+* When system did not provide the kernel version then just
+* return 0, the caller has to depend on the intelligence as
+* per btrfs-progs version
+*/
+   if (local_kernel_code <= 0)
+   return 0;
+
+   for (i = 0; i < ARRAY_SIZE(mkfs_features) - 1; i++) {
+   char *ver = strdup(mkfs_features[i].min_ker_ver);
+
+   if (local_kernel_code >= version_to_code(ver))
+   features |= mkfs_features[i].flag;
+
+   free(ver);
+   }
+   return (features);
+}
diff --git a/utils.h b/utils.h
index 192f3d1..9044643 100644
--- a/utils.h
+++ b/utils.h
@@ -104,6 +104,7 @@ void btrfs_list_all_fs_features(u64 mask_disallowed);
 char* btrfs_parse_fs_features(char *namelist, u64 *flags);
 void btrfs_process_fs_features(u64 flags);
 void btrfs_parse_features_to_string(char *buf, u64 flags);
+u64 btrfs_features_allowed_by_kernel(void);
 
 struct btrfs_mkfs_config {
char *label;
-- 
2.6.2

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 12/12] Fix btrfs/106 to work on non-4k block sized filesystems

2015-11-25 Thread Chandan Rajendra
This commit makes use of the new _filter_xfs_io_pages_modified filtering
function to print information in terms of file blocks rather than file offset.

Signed-off-by: Chandan Rajendra 
---
 tests/btrfs/106 | 42 --
 tests/btrfs/106.out | 14 ++
 2 files changed, 26 insertions(+), 30 deletions(-)

diff --git a/tests/btrfs/106 b/tests/btrfs/106
index 1670453..a1bf4ec 100755
--- a/tests/btrfs/106
+++ b/tests/btrfs/106
@@ -58,31 +58,37 @@ test_clone_and_read_compressed_extent()
_scratch_mkfs >>$seqres.full 2>&1
_scratch_mount $mount_opts
 
-   # Create our test file with a single extent of 64Kb that is going to be
-   # compressed no matter which compression algorithm is used (zlib/lzo).
-   $XFS_IO_PROG -f -c "pwrite -S 0xaa 0K 64K" \
-   $SCRATCH_MNT/foo | _filter_xfs_io
-
+   PAGE_SIZE=$(get_page_size)
+
+   # Create our test file with 16 pages worth of data in a single extent
+   # that is going to be compressed no matter which compression algorithm
+   # is used (zlib/lzo).
+   $XFS_IO_PROG -f -c "pwrite -S 0xaa 0K $((16 * $PAGE_SIZE))" \
+$SCRATCH_MNT/foo | _filter_xfs_io_pages_modified
+   
# Now clone the compressed extent into an adjacent file offset.
-   $CLONER_PROG -s 0 -d $((64 * 1024)) -l $((64 * 1024)) \
+   $CLONER_PROG -s 0 -d $((16 * $PAGE_SIZE)) -l $((16 * $PAGE_SIZE)) \
$SCRATCH_MNT/foo $SCRATCH_MNT/foo
 
-   echo "File digest before unmount:"
-   md5sum $SCRATCH_MNT/foo | _filter_scratch
+   orig_hash=$(md5sum $SCRATCH_MNT/foo | cut -f 1 -d ' ')
 
# Remount the fs or clear the page cache to trigger the bug in btrfs.
-   # Because the extent has an uncompressed length that is a multiple of
-   # 16 pages, all the pages belonging to the second range of the file
-   # (64K to 128K), which points to the same extent as the first range
-   # (0K to 64K), had their contents full of zeroes instead of the byte
-   # 0xaa. This was a bug exclusively in the read path of compressed
-   # extents, the correct data was stored on disk, btrfs just failed to
-   # fill in the pages correctly.
+   # Because the extent has an uncompressed length that is a multiple of 16
+   # pages, all the pages belonging to the second range of the file that is
+   # mapped by the page index range [16, 31], which points to the same
+   # extent as the first file range mapped by the page index range [0, 15],
+   # had their contents full of zeroes instead of the byte 0xaa. This was a
+   # bug exclusively in the read path of compressed extents, the correct
+   # data was stored on disk, btrfs just failed to fill in the pages
+   # correctly.
_scratch_remount
 
-   echo "File digest after remount:"
-   # Must match the digest we got before.
-   md5sum $SCRATCH_MNT/foo | _filter_scratch
+   hash=$(md5sum $SCRATCH_MNT/foo | cut -f 1 -d ' ')
+
+   if [ $orig_hash != $hash ]; then
+   echo "Read operation failed on $SCRATCH_MNT/foo: "\
+"Mimatching hash values detected."
+   fi
 }
 
 echo -e "\nTesting with zlib compression..."
diff --git a/tests/btrfs/106.out b/tests/btrfs/106.out
index 692108d..eceabfa 100644
--- a/tests/btrfs/106.out
+++ b/tests/btrfs/106.out
@@ -1,17 +1,7 @@
 QA output created by 106
 
 Testing with zlib compression...
-wrote 65536/65536 bytes at offset 0
-XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-File digest before unmount:
-be68df46e3cf60b559376a35f9fbb05d  SCRATCH_MNT/foo
-File digest after remount:
-be68df46e3cf60b559376a35f9fbb05d  SCRATCH_MNT/foo
+Pages modified: [0 - 15]
 
 Testing with lzo compression...
-wrote 65536/65536 bytes at offset 0
-XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
-File digest before unmount:
-be68df46e3cf60b559376a35f9fbb05d  SCRATCH_MNT/foo
-File digest after remount:
-be68df46e3cf60b559376a35f9fbb05d  SCRATCH_MNT/foo
+Pages modified: [0 - 15]
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 06/12] Fix btrfs/094 to work on non-4k block sized filesystems

2015-11-25 Thread Chandan Rajendra
This commit makes use of the new _filter_xfs_io_blocks_modified filtering
function to print information in terms of file blocks rather than file
offset.

Signed-off-by: Chandan Rajendra 
---
 tests/btrfs/094 | 78 +++--
 tests/btrfs/094.out | 17 +++-
 2 files changed, 49 insertions(+), 46 deletions(-)

diff --git a/tests/btrfs/094 b/tests/btrfs/094
index 6f6cdeb..868c088 100755
--- a/tests/btrfs/094
+++ b/tests/btrfs/094
@@ -67,36 +67,41 @@ mkdir $send_files_dir
 _scratch_mkfs >>$seqres.full 2>&1
 _scratch_mount "-o compress"
 
-# Create the file with a single extent of 128K. This creates a metadata file
-# extent item with a data start offset of 0 and a logical length of 128K.
-$XFS_IO_PROG -f -c "pwrite -S 0xaa 64K 128K" -c "fsync" \
-   $SCRATCH_MNT/foo | _filter_xfs_io
-
-# Now rewrite the range 64K to 112K of our file. This will make the inode's
-# metadata continue to point to the 128K extent we created before, but now
-# with an extent item that points to the extent with a data start offset of
-# 112K and a logical length of 16K.
-# That metadata file extent item is associated with the logical file offset
-# at 176K and covers the logical file range 176K to 192K.
-$XFS_IO_PROG -c "pwrite -S 0xbb 64K 112K" -c "fsync" \
-   $SCRATCH_MNT/foo | _filter_xfs_io
-
-# Now rewrite the range 180K to 12K. This will make the inode's metadata
-# continue to point the the 128K extent we created earlier, with a single
-# extent item that points to it with a start offset of 112K and a logical
-# length of 4K.
-# That metadata file extent item is associated with the logical file offset
-# at 176K and covers the logical file range 176K to 180K.
-$XFS_IO_PROG -c "pwrite -S 0xcc 180K 12K" -c "fsync" \
-   $SCRATCH_MNT/foo | _filter_xfs_io
+BLOCK_SIZE=$(get_block_size $SCRATCH_MNT)
+
+# Create the file with a single extent of 32 blocks. This creates a metadata
+# file extent item with a data start offset of 0 and a logical length of
+# 32 blocks.
+$XFS_IO_PROG -f -c "pwrite -S 0xaa $((16 * $BLOCK_SIZE)) $((32 * 
$BLOCK_SIZE))" \
+-c "fsync" $SCRATCH_MNT/foo | _filter_xfs_io_blocks_modified
+
+# Now rewrite the block range [16, 28[ of our file. This will make
+# the inode's metadata continue to point to the single 32 block extent
+# we created before, but now with an extent item that points to the
+# extent with a data start offset referring to the 28th block and a
+# logical length of 4 blocks.
+# That metadata file extent item is associated with the block range
+# [44, 48[.
+$XFS_IO_PROG -c "pwrite -S 0xbb $((16 * $BLOCK_SIZE)) $((28 * $BLOCK_SIZE))" \
+-c "fsync" $SCRATCH_MNT/foo | _filter_xfs_io_blocks_modified
+
+
+# Now rewrite the block range [45, 48[. This will make the inode's
+# metadata continue to point the 32 block extent we created earlier,
+# with a single extent item that points to it with a start offset
+# referring to the 28th block and a logical length of 1 block.
+# That metadata file extent item is associated with the block range
+# [44, 45[.
+$XFS_IO_PROG -c "pwrite -S 0xcc $((45 * $BLOCK_SIZE)) $((3 * $BLOCK_SIZE))" \
+-c "fsync" $SCRATCH_MNT/foo | _filter_xfs_io_blocks_modified
 
 _run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/mysnap1
 
-# Now clone that same region of the 128K extent into a new file, so that it
+# Now clone that same region of the 32 block extent into a new file, so that it
 # gets referenced twice and the incremental send operation below decides to
 # issue a clone operation instead of copying the data.
 touch $SCRATCH_MNT/bar
-$CLONER_PROG -s $((176 * 1024)) -d $((176 * 1024)) -l $((4 * 1024)) \
+$CLONER_PROG -s $((44 * $BLOCK_SIZE)) -d $((44 * $BLOCK_SIZE)) -l $BLOCK_SIZE \
$SCRATCH_MNT/foo $SCRATCH_MNT/bar
 
 _run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/mysnap2
@@ -105,10 +110,11 @@ _run_btrfs_util_prog send $SCRATCH_MNT/mysnap1 -f 
$send_files_dir/1.snap
 _run_btrfs_util_prog send -p $SCRATCH_MNT/mysnap1 $SCRATCH_MNT/mysnap2 \
-f $send_files_dir/2.snap
 
-echo "File digests in the original filesystem:"
-md5sum $SCRATCH_MNT/mysnap1/foo | _filter_scratch
-md5sum $SCRATCH_MNT/mysnap2/foo | _filter_scratch
-md5sum $SCRATCH_MNT/mysnap2/bar | _filter_scratch
+# echo "File digests in the original filesystem:"
+declare -A src_fs_hash
+src_fs_hash[mysnap1_foo]=$(md5sum $SCRATCH_MNT/mysnap1/foo | cut -f 1 -d ' ')
+src_fs_hash[mysnap2_foo]=$(md5sum $SCRATCH_MNT/mysnap2/foo | cut -f 1 -d ' ')
+src_fs_hash[mysnap2_bar]=$(md5sum $SCRATCH_MNT/mysnap2/bar | cut -f 1 -d ' ')
 
 # Now recreate the filesystem by receiving both send streams and verify we get
 # the same file contents that the original filesystem had.
@@ -119,10 +125,18 @@ _scratch_mount
 _run_btrfs_util_prog receive $SCRATCH_MNT -f $send_files_dir/1.snap
 _run_btrfs_util_prog receive $SCRATCH_MNT -f $send_files_dir/2.snap
 
-echo 

Re: [PATCH 06/12] Fix btrfs/094 to work on non-4k block sized filesystems

2015-11-25 Thread Filipe Manana
On Wed, Nov 25, 2015 at 11:03 AM, Chandan Rajendra
 wrote:
> This commit makes use of the new _filter_xfs_io_blocks_modified filtering
> function to print information in terms of file blocks rather than file
> offset.
>
> Signed-off-by: Chandan Rajendra 
> ---
>  tests/btrfs/094 | 78 
> +++--
>  tests/btrfs/094.out | 17 +++-
>  2 files changed, 49 insertions(+), 46 deletions(-)
>
> diff --git a/tests/btrfs/094 b/tests/btrfs/094
> index 6f6cdeb..868c088 100755
> --- a/tests/btrfs/094
> +++ b/tests/btrfs/094
> @@ -67,36 +67,41 @@ mkdir $send_files_dir
>  _scratch_mkfs >>$seqres.full 2>&1
>  _scratch_mount "-o compress"
>
> -# Create the file with a single extent of 128K. This creates a metadata file
> -# extent item with a data start offset of 0 and a logical length of 128K.
> -$XFS_IO_PROG -f -c "pwrite -S 0xaa 64K 128K" -c "fsync" \
> -   $SCRATCH_MNT/foo | _filter_xfs_io
> -
> -# Now rewrite the range 64K to 112K of our file. This will make the inode's
> -# metadata continue to point to the 128K extent we created before, but now
> -# with an extent item that points to the extent with a data start offset of
> -# 112K and a logical length of 16K.
> -# That metadata file extent item is associated with the logical file offset
> -# at 176K and covers the logical file range 176K to 192K.
> -$XFS_IO_PROG -c "pwrite -S 0xbb 64K 112K" -c "fsync" \
> -   $SCRATCH_MNT/foo | _filter_xfs_io
> -
> -# Now rewrite the range 180K to 12K. This will make the inode's metadata
> -# continue to point the the 128K extent we created earlier, with a single
> -# extent item that points to it with a start offset of 112K and a logical
> -# length of 4K.
> -# That metadata file extent item is associated with the logical file offset
> -# at 176K and covers the logical file range 176K to 180K.
> -$XFS_IO_PROG -c "pwrite -S 0xcc 180K 12K" -c "fsync" \
> -   $SCRATCH_MNT/foo | _filter_xfs_io
> +BLOCK_SIZE=$(get_block_size $SCRATCH_MNT)
> +
> +# Create the file with a single extent of 32 blocks. This creates a metadata
> +# file extent item with a data start offset of 0 and a logical length of
> +# 32 blocks.
> +$XFS_IO_PROG -f -c "pwrite -S 0xaa $((16 * $BLOCK_SIZE)) $((32 * 
> $BLOCK_SIZE))" \
> +-c "fsync" $SCRATCH_MNT/foo | _filter_xfs_io_blocks_modified
> +
> +# Now rewrite the block range [16, 28[ of our file. This will make
> +# the inode's metadata continue to point to the single 32 block extent
> +# we created before, but now with an extent item that points to the
> +# extent with a data start offset referring to the 28th block and a
> +# logical length of 4 blocks.
> +# That metadata file extent item is associated with the block range
> +# [44, 48[.
> +$XFS_IO_PROG -c "pwrite -S 0xbb $((16 * $BLOCK_SIZE)) $((28 * $BLOCK_SIZE))" 
> \
> +-c "fsync" $SCRATCH_MNT/foo | _filter_xfs_io_blocks_modified
> +
> +
> +# Now rewrite the block range [45, 48[. This will make the inode's
> +# metadata continue to point the 32 block extent we created earlier,
> +# with a single extent item that points to it with a start offset
> +# referring to the 28th block and a logical length of 1 block.
> +# That metadata file extent item is associated with the block range
> +# [44, 45[.
> +$XFS_IO_PROG -c "pwrite -S 0xcc $((45 * $BLOCK_SIZE)) $((3 * $BLOCK_SIZE))" \
> +-c "fsync" $SCRATCH_MNT/foo | _filter_xfs_io_blocks_modified
>
>  _run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/mysnap1
>
> -# Now clone that same region of the 128K extent into a new file, so that it
> +# Now clone that same region of the 32 block extent into a new file, so that 
> it
>  # gets referenced twice and the incremental send operation below decides to
>  # issue a clone operation instead of copying the data.
>  touch $SCRATCH_MNT/bar
> -$CLONER_PROG -s $((176 * 1024)) -d $((176 * 1024)) -l $((4 * 1024)) \
> +$CLONER_PROG -s $((44 * $BLOCK_SIZE)) -d $((44 * $BLOCK_SIZE)) -l 
> $BLOCK_SIZE \
> $SCRATCH_MNT/foo $SCRATCH_MNT/bar
>
>  _run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/mysnap2
> @@ -105,10 +110,11 @@ _run_btrfs_util_prog send $SCRATCH_MNT/mysnap1 -f 
> $send_files_dir/1.snap
>  _run_btrfs_util_prog send -p $SCRATCH_MNT/mysnap1 $SCRATCH_MNT/mysnap2 \
> -f $send_files_dir/2.snap
>
> -echo "File digests in the original filesystem:"
> -md5sum $SCRATCH_MNT/mysnap1/foo | _filter_scratch
> -md5sum $SCRATCH_MNT/mysnap2/foo | _filter_scratch
> -md5sum $SCRATCH_MNT/mysnap2/bar | _filter_scratch
> +# echo "File digests in the original filesystem:"
> +declare -A src_fs_hash
> +src_fs_hash[mysnap1_foo]=$(md5sum $SCRATCH_MNT/mysnap1/foo | cut -f 1 -d ' ')
> +src_fs_hash[mysnap2_foo]=$(md5sum $SCRATCH_MNT/mysnap2/foo | cut -f 1 -d ' ')
> +src_fs_hash[mysnap2_bar]=$(md5sum $SCRATCH_MNT/mysnap2/bar | cut -f 1 -d ' ')
>
>  # Now recreate the filesystem by receiving 

Imbalanced RAID1 with three unequal disks

2015-11-25 Thread Mario


Hi,

I pushed a subvolume using send/receive to an 8 TB disk, added
two 4 TB disks and started a balance with conversion to RAID1.

Afterwards, I got the following:

  Total devices 3 FS bytes used 5.40TiB
  devid1 size 7.28TiB used 4.54TiB path /dev/mapper/yellow4
  devid2 size 3.64TiB used 3.17TiB path /dev/mapper/yellow1
  devid3 size 3.64TiB used 3.17TiB path /dev/mapper/yellow2

  Btrfs v3.17
  Data, RAID1: total=5.43TiB, used=5.39TiB
  System, RAID1: total=64.00MiB, used=800.00KiB
  Metadata, RAID1: total=14.00GiB, used=5.55GiB
  GlobalReserve, single: total=512.00MiB, used=0.00B

In my understanding, the data isn't properly balanced and I
only get around 5.9TB of usable space. As suggested in #btrfs,
I started a second balance without filters and got this:

  Total devices 3 FS bytes used 5.40TiB
  devid1 size 7.28TiB used 5.41TiB path /dev/mapper/yellow4
  devid2 size 3.64TiB used 2.71TiB path /dev/mapper/yellow1
  devid3 size 3.64TiB used 2.71TiB path /dev/mapper/yellow2

  Data, RAID1: total=5.41TiB, used=5.39TiB
  System, RAID1: total=32.00MiB, used=784.00KiB
  Metadata, RAID1: total=7.00GiB, used=5.54GiB
  GlobalReserve, single: total=512.00MiB, used=0.00B

  /dev/mapper/yellow4  7,3T5,4T  969G   86% /mnt/yellow

Now, I get 6.3TB of usable space but, in my understand, I should
get around 7.28 TB or am I missing something here? Also, a second
balance shouldn't change the data distribution, right?

I'm using kernel v4.3 with a patch [1] from kernel bugzilla [2] for
the 8 TB SMR drive. The send/receive of a 5 TB subvolume worked
flawlessly with the patch. Without, I got a lot of errors in dmesg
within the first 200GB of transferred data. The OS is a x86_64
Ubuntu 15.04.

Thank you!
Mario

[1] 
http://git.kernel.org/cgit/linux/kernel/git/mkp/linux.git/commit/?h=bugzilla-93581=7c4fbd50bfece00abf529bc96ac989dd2bb83ca4

[2] https://bugzilla.kernel.org/show_bug.cgi?id=93581
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 06/12] Fix btrfs/094 to work on non-4k block sized filesystems

2015-11-25 Thread Filipe Manana
On Wed, Nov 25, 2015 at 11:47 AM, Chandan Rajendra
 wrote:
> On Wednesday 25 Nov 2015 11:11:27 Filipe Manana wrote:
>>
>> Hi Chandan,
>>
>> I can't agree with this change. We're no longer checking that file
>> data is correct after the cloning operations. The md5sum checks were
>> exactly for that. So essentially the test is only verifying the clone
>> operations don't fail with errors, it no longer checks for data
>> corruption...
>>
>> Same comment applies to at least a few other patches in the series.
>
> Hello Filipe,
>
> All the tests where we had md5sum being echoed into output have been replaced
> with code to verify the md5sum values as shown below,
>
> if [ $foo_orig_hash != $foo_hash ]; then
> echo "Read operation failed on $SCRATCH_MNT/foo: "\
>  "Mimatching hash values detected."
> fi
>
> This will cause a diff between the test's ideal output versus the output
> obtained during the test run.

Right, it compares the digests before and after some operation (which
should always match). However we no longer validate that the file
content is correct before the operation. For some of the tests that is
more important, like the ones that test read corruption after cloning
compressed extents.

>
> In case of btrfs/094, I have added an associative array to hold the md5sums
> and
> the file content verification is being performed by the following code,
>
> for key in "${!src_fs_hash[@]}"; do
> if [ ${src_fs_hash[$key]} != ${dst_fs_hash[$key]} ]; then
> echo "Mimatching hash value detected against \
> $(echo $key | tr _ /)"
> fi
> done
>
> --
> chandan
>



-- 
Filipe David Manana,

"Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men."
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 06/12] Fix btrfs/094 to work on non-4k block sized filesystems

2015-11-25 Thread Chandan Rajendra
On Wednesday 25 Nov 2015 11:11:27 Filipe Manana wrote:
> 
> Hi Chandan,
> 
> I can't agree with this change. We're no longer checking that file
> data is correct after the cloning operations. The md5sum checks were
> exactly for that. So essentially the test is only verifying the clone
> operations don't fail with errors, it no longer checks for data
> corruption...
> 
> Same comment applies to at least a few other patches in the series.

Hello Filipe,

All the tests where we had md5sum being echoed into output have been replaced
with code to verify the md5sum values as shown below,

if [ $foo_orig_hash != $foo_hash ]; then
echo "Read operation failed on $SCRATCH_MNT/foo: "\
 "Mimatching hash values detected."
fi

This will cause a diff between the test's ideal output versus the output
obtained during the test run.

In case of btrfs/094, I have added an associative array to hold the md5sums 
and
the file content verification is being performed by the following code,

for key in "${!src_fs_hash[@]}"; do
if [ ${src_fs_hash[$key]} != ${dst_fs_hash[$key]} ]; then
echo "Mimatching hash value detected against \
$(echo $key | tr _ /)"
fi
done

-- 
chandan

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/25] Btrfs-convert rework to support native separate

2015-11-25 Thread David Sterba
On Tue, Nov 24, 2015 at 04:50:00PM +0800, Qu Wenruo wrote:
> It seems the conflict is quite huge, your reiserfs support is based on 
> the old behavior, just like what old ext2 one do: custom extent allocation.

> I'm afraid the rebase will take a lot of time since I'm completely a 
> newbie about reiserfs... :(

Yeah, the ext2 callbacks are abstracted and replaced by reiserfs
implementations, and the abstratction is quite direct. This might be a
problem with merging your patchset.

> I may need to change a lot of ext2 direct call to generic one, and may 
> even change the generic function calls.(no alloc/free, only free space 
> lookup)
> 
> And some (maybe a lot) of reiserfs codes may be removed during the rework.

As far as the conversion support stays, it's not a problem of course. I
don't have a complete picture of all the actual merging conflicts, but
the idea is to provide the callback abstraction v2 to allow ext2 and
reiser plus allow all the changes of this pathcset.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/7] Let user specify the kernel version for features

2015-11-25 Thread Anand Jain



On 11/26/2015 10:02 AM, Qu Wenruo wrote:



Anand Jain wrote on 2015/11/25 20:08 +0800:

Sometimes users may want to have a btrfs to be supported on multiple
kernel version. A simple example, USB drive can be used with multiple
system running different kernel versions. Or in a data center a SAN
LUN could be mounted on any system with different kernel version.

Thanks for providing comments and feedback.
Further to it, here below is a set of patch which will introduce, to
specify a kernel version so that default features can be set based on
what features were supported at that kernel version.


With the new -O comp= option, the concern on user who want to make a
btrfs for newer kernel is hugely reduced.


NO!. actually new option -O comp= provides no concern for users who
want to create _a btrfs disk layout which is compatible with more
than one kernel_.  above there are two examples of it.


But I still prefer such feature align to be done only when specified by
user, instead of automatically. (yeah, already told for several times
though)
Warning should be enough for user, sometimes too automatic is not  good,


As said before.
We need latest btrfs-progs on older kernels, for obvious reasons of
btrfs-progs bug fixes. We don't have to back port fixes even on
btrfs-progs as we already do it in btrfs kernel. A btrfs-progs should
work on any kernel with the "default features as prescribed for that
kernel".

Let's say if we don't do this automatic then, latest btrfs-progs
with default mkfs.btfs && mount fails. But a user upgrading btrfs-progs
for fsck bug fixes, shouldn't find 'default mkfs.btfs && mount'
failing. Nor they have to use a "new" set of mkfs option to create all
default FS for a LTS kernel.

Default features based on btrfs-progs version instead of kernel
version- makes NO sense. And adding a warning for not using latest
features which is not in their running kernel is pointless. That's
_not_ a backward kernel compatible tool.

btrfs-progs should work "for the kernel". We should avoid adding too
much intelligence into btrfs-progs. I have fixed too many issues and
redesigned progs in this area. Too many bugs were mainly because of the
idea of copy and maintain same code on btrfs-progs and btrfs-kernel
approach for progs. (ref wiki and my email before). Thats a wrong
approach. I don't understand- if the purpose of both of these isn't
same what is the point in maintaining same code? It won't save in
efforts mainly because its like developing a distributed FS where
two parties has to be communicated to be in sync. Which is like using
the canon to shoo a crow.
But if the reason was fuse like kernel-free FS (no one said that
though) then its better to do it as a separate project.


especially for tests.


It depends whats being tested kernel OR progs? Its kernel not progs.
Automatic will keep default feature constant for a given kernel
version. Further, for testing using a known set of options is even
better.


A lot of btrfs-progs change, like recent disabling mixed-bg for small
volume has already cause regression in generic/077 testcase.
And Dave is already fed up with such problem from btrfs...


I don't know what's the regression about. But in my experience with
some xfstest test cases.. xfstests depend too much on cli output
strings which is easy thing to do but a wrong approach.
Those cli outputs and its format are NOT APIs, those are UIs. Instead
it should have used return code/ FS test interface. This will let
developers with free hands to change, otherwise you need to update the
test cases every time you change the cli _output_.


Especially such auto-detection will make default behavior more unstable,
at least not a good idea for me.


As above. We design with end-user and their use cases in mind. Not for
a test suite. If test suite breaks.. fix it.

Thanks, Anand


Beside this, I'm curious how other filesystm user tools handle such
kernel mismatch, or do they?



Thanks,
Qu




First of all to let user know what features was supported at what kernel
version. Patch 1/7 updates -O list-all which will list the feature with
version.

As we didn't maintain the sysfs and progs feature names consistent, so
to avoid confusion Patch 2/7 displays sysfs feature name as well again
in the list-all output.

Next, Patch 3,4,5/7 are helper functions.

Patch 6,7/7 provides the -O comp= for mkfs.btrfs and
btrfs-convert respectively

Thanks, Anand

Anand Jain (7):
   btrfs-progs: show the version for -O list-all
   btrfs-progs: add kernel alias for each of the features in the list
   btrfs-progs: make is_numerical non static
   btrfs-progs: check for numerical in version_to_code()
   btrfs-progs: introduce framework version to features
   btrfs-progs: add -O comp= option for mkfs.btrfs
   btrfs-progs: add -O comp= option for btrfs-convert

  btrfs-convert.c | 21 +
  cmds-replace.c  | 11 ---
  mkfs.c  | 24 ++--
  utils.c | 58