Re: btrfs-undelete shell-script

2015-10-28 Thread JDAIII

Jörg Walter  syntax-k.de> 
writes:

> 
> Hi,
> 
> I tried to recover an accidentally 
deleted text file from a btrfs volume 
using 
> the trusty old 'grep --text -C 500' 
method and failed, since the filesystem 
was 
> compressed. So I wrote a shell script 
that uses btrfs-progs for a proper 
> undelete functionality.
> 
> Attached is the script that implements a 
working btrfs-undelete using the find-
> root and restore tools from btrfs-progs. 
It is fairly complete and solid and 
> it even has some command  line help. It 
needs bash and common unix utilities 
> (sed, grep, wc, dirname, sort). I have 
successfully used it to recover a 
> couple of files I deleted accidentally 
and was able to recover 2/3 of them just 
> fine. The rest was zero-sized, I assume 
that's because the file blocks have 
> already been reused.
> 
> If you like it, feel free to add it to 
btrfs-progs. I've chosen GPLv2 or later 
> as license, as that's what btrfs-progs 
seems to use.
> 
> Please CC me on replies, I am not 
subscribed (and don't intend to).
> 

Hi all, new to btrfs and have an issue. 
Thought I'd hit you folks up for some 
assistance.

First things first. I have exhausted 
Google and tried the btrfs-undelete 
script. I've had no luck recovering my 
directory. I need to recover it asap as 
important files are on the drive also that 
we need and I have it unmounted until we 
figure this out.

I've tried a few blocks and I get the same 
error here.

I've tried btrfs restore as you see here.

$ sudo btrfs restore /dev/sde /media -v -i 
-t 1060780900352
parent transid verify failed on 
1060780900352 wanted 39805 found 39797
parent transid verify failed on 
1060780900352 wanted 39805 found 39797
parent transid verify failed on 
1060780900352 wanted 39805 found 39797
parent transid verify failed on 
1060780900352 wanted 39805 found 39797
Ignoring transid failure
Couldn't setup extent tree
Couldn't read fs root: -2
extent buffer leak: start 1060780900352 
len 16384

My setup: I have two 4tb drives in a 
single btrfs cluster in raid 0. I know, no 
need to mention how bad that idea was. So 
my wife is learning Linux and well let's 
say she's never allowed near the server 
again. She ran the command sudo rm -rf 
/media/Movies trying to delete a single 
file and that is why we are here.

Well, I had 1.5TB in that folder of 
hundreds of files. I could either spend 
another 2 months ripping all my DVDs again 
or recover my directory. I'm really 
surprised btrfs doesn't have better 
documentation on file recovery. Or a tool 
that can be used for this purpose. First 
thing I did when she told me about it is 
stop all processes writing to that mount,  
remove the btrfs volume from fstab and 
reboot. For some reason umount never works 
for me on the btrfs mounts.

So here is a partial lsblk. sde and sdb1 
are both in the volume together. I'm still 
not sure why sde  shows no partitions.

NAME   MAJ:MIN RM   SIZE RO TYPE  
MOUNTPOINT
sdb  8:16   0   3.7T  0 disk
└─sdb1   8:17   0   3.7T  0 part
sde  8:64   0   3.7T  0 disk
sdf  8:80   0   1.8T  0 disk
└─sdf1   8:81   0   1.8T  0 part  /media

Here is my entry in fstab:
UUID=7ffdecf9-af9a-4299-b697-4fc375bac3b1 
/media   btrfs   defaults  

So when I ran a sudo btrfs restore -s 
/dev/sde /tmp it restored some files. I 
had to stop it because I can't store 4TB 
on my root partition. I'd like to either 
recover /media/Movies from sde and sdb1. 
Or I would be willing to restore a 
snapshot but only of that one subdirectory 
as other data changes hourly.

So where can I start? I have run btrfs-
find-root /dev/sde and sdb1 and have the 
output files but I don't want to run the 
btrfs-undelete script since I get the 
errors above. I also noticed a strange 
line in the btrfs-undelete script I got 
from here: 
http://comments.gmane.org/gmane.comp.file-
systems.btrfs/22560

In the script, it references 
/dev/mapper/queen-home Now I've only been 
using Linux for a year now full time but 
I've never heard of that file.

Any ideas, questions, comments? I'm 
desperate so any help I can find would be 
greatly appreciated.

How to replicate a Xen VM using BTRFS as the root filesystem.

2015-10-28 Thread Austin S Hemmelgarn
I would put this on the wiki in the stuff about use cases, but I don't 
have a wiki account and don't really have the time or interest right now 
in getting one, so I'm posting it here instead.


This is a rather interesting use case for send/receive that I've never 
seen discussed anywhere else.  In essence, it lets you take a working 
Xen user domain which uses BTRFS as the root filesystem, and use that as 
a template for new VM's without having to deal with seed devices (which 
means you can do it with the source domain online with only minimal 
service degradation).


These steps assume you have a minimal working knowledge of Xen and 
BTRFS, as well as access to both Domain-0 and the user domain you are 
copying (refereed to below as the 'source domain'.  I've only used this 
myself with Gentoo, but it should work for almost any Linux distro. 
I've also only done this with fully paravirtualized domains, it's a lot 
more involved to do it with hardware virtualized domains, although PVH 
domains should work with exactly the same steps as regular PV domains. 
Your source domain's root also needs to be in a subvolume on the root 
filesystem, this won't work if it isn't.


1. Create your backend storage devices for the new domain.
2. Attach the storage device (or devices) that will hold the root 
filesystem for the new domain to the source domain.  Using xl, this 
translates to something similar to: 'xl block-attach source 
"/dev/vg/target-disk,raw,xvdz"'
3. From the source domain, use mkfs.btrfs to create a BTRFS filesystem 
on the target disk (be careful, if you specify the wrong disk you may 
cause serious issues).
4. On the source domain, mount both the root filesystem (don't bind 
mount it, and make sure that you are mounting the top-level, not a 
subvolume) and the target filesystem.
5. Create a read-only snapshot of the root subvolume from the source 
filesystem.
6. Use btrfs send piped to btrfs receive to copy the snapshot from the 
source filesystem to the target filesystem.  This will likely take quite 
some time (on the low-end server equivalent hardware I have, it takes 
about 2o-25 minutes for a somewhat minimalistic Gentoo installation).
7. While btrfs send/receive is running, prepare the configuration file 
for the target domain (I usually just copy the config from the source 
domain, and then change only what I need to).
8. Once the send/receive operation is complete, use btrfs property set 
to change the snapshot on the target to be writable, then rename it to 
whatever you want the root subvolume to be called on the target system.
9. In the newly created root subvolume, change any system specific 
configuration to what is needed for the new system (at a bare minimum, 
you probably need to change the hostname and networking configuration, 
and should verify that /etc/fstab and /etc/localtime are correct for the 
target system).

10. Unmount the target filesystem in the source domain.
11. Use 'xl block-detach' to detach the target device from the source 
domain.
12. Use your regular tools to start your new domain, log in, and preform 
any final configuration needed.


Using this methodology, I can have a new Gentoo PV domain running in 
about half an hour, whereas it takes me at least two and a half hours 
(and often much longer than that) when using the regular install process 
for Gentoo.




smime.p7s
Description: S/MIME Cryptographic Signature


Re: Questions about FIEMAP

2015-10-28 Thread Duncan
Wang, Zhiye posted on Wed, 28 Oct 2015 09:57:29 + as excerpted:

> Thank you all for your comments.
> 
> A further question is: if I mount a btrfs file system in "readonly"
> mode, will any operation cause the blocks of a file get changed?

Note that both bind-mounts and btrfs subvolume mounts can be used to make 
parts of a filesystem appear in multiple locations in the filesystem 
tree.  Because these different mounts can be separately mounted read-only 
or writable, there's no guarantee that just because a filesystem or part 
of it is read-only mounted in one location, it's read-only mounted 
everywhere it can be accessed, and thus no guarantee that files even on a 
read-only mounted filesystem or subvolume won't actually change out from 
under you.

However, bind-mounts in particular aren't btrfs specific, so just because 
btrfs subvolumes add another case in which the above can be true, doesn't 
mean bind-mounts can't be used on other filesystems to effect the same 
sort of otherwise read-only file instabilities.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: random i/o error without error in dmesg

2015-10-28 Thread Duncan
Szalma László posted on Wed, 28 Oct 2015 09:44:12 +0100 as excerpted:

> The umount/mount ALWAYS solved the problem for me, mount -o remount,ro
> was tried for the first time, but it was not enought. Reboot was not
> needed.
> (kernel 4.2.4)

So that means it's filesystem state that's tracked thru a remount, not 
something like deleted/orphan files that a remount should square away.  
But a full unmount clears the state, without having to unload the btrfs 
kernel module or reboot.

Which at least limits the active zone of the problem.  Unfortunately, I'm 
not a dev and don't have a clue where to go from there... unless it's 
related to the recent delayed-refs bug, in which case I believe a late 
4.3 rc, or 4.3.0 when released, should fix it.  That fix should be CCed 
to stable and thus appear there eventually, but I normally track pre-
releases, not stable, so don't know how long it might be...  And again, I 
don't know it's even related, but it does appear to be the active bug of 
the moment, so in the absence of knowing, one can hope.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


San Blaze

2015-10-28 Thread Zack Coffey
Curious if anyone knows if anything like San Blaze has been used to test
btrfs?
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 06/21] btrfs: delayed_ref: Add new function to record reserved space into delayed ref

2015-10-28 Thread Holger Hoffstätte
On Tue, Oct 27, 2015 at 12:34 PM, Chris Mason  wrote:
> On Tue, Oct 27, 2015 at 05:05:56PM +0800, Qu Wenruo wrote:
>>
>>
>> Chris Mason wrote on 2015/10/27 02:12 -0400:
>> >On Tue, Oct 27, 2015 at 01:48:34PM +0800, Qu Wenruo wrote:
>> Are you testing integration-4.4 from Chris repo?
>> Or 4.3-rc from mainline repo with my qgroup reserve patchset applied?
>> 
>> Although integration-4.4 already merged qgroup reserve patchset, but it's
>> causing some strange bug like over decrease data sinfo->bytes_may_use,
>> mainly in generic/127 testcase.
>> 
>> But if qgroup reserve patchset is rebased to integration-4.3 (I did all 
>> my
>> old tests based on that), no generic/127 problem at all.
>> >>>
>> >>>Did I mismerge things?
>> >>>
>> >>>-chris
>> >>>
>> >>Not sure yet.
>> >>
>> >>But at least some patches in 4.3 is not in integration-4.4, like the
>> >>following patch:
>> >>btrfs: Avoid truncate tailing page if fallocate range doesn't exceed inode
>> >>size
>> >
>> >Have you tried testing integration-4.4 merged with current Linus git?

Chris, something went definitely wrong with the 4.4-integration
branch, and it's not the point where you merged from Josef. Mainline
has: 0f6925fa2907df58496cabc33fa4677c635e2223 ("btrfs: Avoid truncate
tailing page if fallocate range doesn't exceed inode size"), and that
commit just doesn't exist in 4.4-integration any more. Neither did any
merges touch file.c, so it
seems this just got lost for some reason (rebase? forced push?).
It's difficult to say what else might have gone missing.

-h
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC] make btrfs subvol mounts appear in /proc/mounts

2015-10-28 Thread Albino B Neto
2015-10-27 20:25 GMT-02:00 Neil Brown :
>

snif

>
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 611b66d73e80..e96c53590f72 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -5621,6 +5621,23 @@ static void btrfs_dentry_release(struct dentry *dentry)

Signed-off-by: ?


  Albino
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Exclusive quota of snapshot exceeded despite no space used

2015-10-28 Thread Johannes Henninger
On 27.10.2015 02:06, Qu Wenruo wrote:
>
>
> Johannes Henninger wrote on 2015/10/27 01:15 +0100:
>> On 26.10.2015 08:12, Qu Wenruo wrote:
>>>
>> Thanks a lot for your reply!
>>
>> While remounting the filesystem fixes the issue temporary, it
>> doesn't
>> take very long for the bug to happen again so it's not really a
>> workaround I can work with.
>>
>> I did recompile the kernel using your patches, but unfortunately the
>> problem still appears.
>>
>> Thanks,
>> Johannes
>>
> Interesting, just touching file will cause EQUOTA is quite a big
> problem.
>
> I'll try to reproduce it with my patchset and see what really caused
> the problem.
> The problem seems to do with snapshot qgroup hacking.
> But I'm not completely sure yet.
>
> BTW, does "sync; btrfs qgroup show -prce" still show excl as 16K?
> 16K is the correct number with only 6 empty files, just in case.
>
> Thanks,
> Qu

 I ran my example from the first mail again and managed to write 7
 files
 this time, "qgroup show" still shows 16kB after sync:

 root@t420:/media/extern/snap# btrfs qg limit -e 50M .
 root@t420:/media/extern/snap# for file in {1..100}; do touch $file;
 sleep 5m; done
 touch: cannot touch ‘8’: Disk quota exceeded
 ^C
 root@t420:/media/extern/snap# sync
 root@t420:/media/extern/snap# btrfs qgroup show -pcre .
 qgroupid rfer excl max_rfer max_excl parent
 child
      --
 -
 0/5  16.00KiB 16.00KiB none none
 --- ---
 0/25716.00KiB 16.00KiB none none
 --- ---
 0/25816.00KiB 16.00KiB none 50.00MiB
 --- ---
 root@t420:/media/extern/snap# btrfs fi sync .
 FSSync '.'
 root@t420:/media/extern/snap# btrfs qgroup show -pcre .
 qgroupid rfer excl max_rfer max_excl parent
 child
      --
 -
 0/5  16.00KiB 16.00KiB none none
 --- ---
 0/25716.00KiB 16.00KiB none none
 --- ---
 0/25816.00KiB 16.00KiB none 50.00MiB
 --- ---

 By the way, I don't if its relevant but the problem is not limited to
 exclusive quotas, but also happens when setting a "referenced" limit
 (qgroup limit without "-e").

 Thanks,
 Johannes

>>>
>>> The bug is located, and turns out to be quite a stupid problem caused
>>> by myself.
>>>
>>> I just forgot to include a cleanup patch during rebase AGAIN!!!
>>>
>>> You can apply the following patch to resolve it:
>>> [PATCH 3/3] btrfs: qgroup: Fix a rebase bug which will cause qgroup
>>> double free
>>>
>>> Or just apply the whole patchset:
>>> [4.4][PATCH 0/3] btrfs: Qgroup hotfix
>>>
>>> At least, with the patchset based on Chris' integration-4.4 branch, it
>>> succeeded in touching all the 100 files in my test box.
>>>
>>> Thanks,
>>> Qu
>>>
>>
>> It's working! Thank you so much for fixing this bug, you don't even know
>> how much this has helped me!
>>
>> Thanks!
>> Johannes
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe
>> linux-btrfs" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> Glad to hear that.
>
> If it's working for you, it would be better to add a 'Tested-by' tag
> for the 3rd patch.
>
> Thanks,
> Qu

Sure! Is there anything I have to do? I'm a kernel and mailing list noob :)

Thanks,
Johannes
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V8 05/13] Btrfs: btrfs_page_mkwrite: Reserve space in sectorsized units

2015-10-28 Thread Chandan Rajendra
In subpagesize-blocksize scenario, if i_size occurs in a block which is not
the last block in the page, then the space to be reserved should be calculated
appropriately.

Reviewed-by: Liu Bo 
Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/inode.c | 35 ++-
 1 file changed, 30 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index a54dad6..f3f0d91 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8772,15 +8772,28 @@ int btrfs_page_mkwrite(struct vm_area_struct *vma, 
struct vm_fault *vmf)
loff_t size;
int ret;
int reserved = 0;
+   u64 reserved_space;
u64 page_start;
u64 page_end;
+   u64 end;
+
+   reserved_space = PAGE_CACHE_SIZE;
 
sb_start_pagefault(inode->i_sb);
page_start = page_offset(page);
page_end = page_start + PAGE_CACHE_SIZE - 1;
+   end = page_end;
 
+   /*
+* Reserving delalloc space after obtaining the page lock can lead to
+* deadlock. For example, if a dirty page is locked by this function
+* and the call to btrfs_delalloc_reserve_space() ends up triggering
+* dirty page write out, then the btrfs_writepage() function could
+* end up waiting indefinitely to get a lock on the page currently
+* being processed by btrfs_page_mkwrite() function.
+*/
ret = btrfs_delalloc_reserve_space(inode, page_start,
-  PAGE_CACHE_SIZE);
+  reserved_space);
if (!ret) {
ret = file_update_time(vma->vm_file);
reserved = 1;
@@ -8814,7 +8827,7 @@ again:
 * we can't set the delalloc bits if there are pending ordered
 * extents.  Drop our locks and wait for them to finish
 */
-   ordered = btrfs_lookup_ordered_extent(inode, page_start);
+   ordered = btrfs_lookup_ordered_range(inode, page_start, page_end);
if (ordered) {
unlock_extent_cached(io_tree, page_start, page_end,
 _state, GFP_NOFS);
@@ -8824,6 +8837,18 @@ again:
goto again;
}
 
+   if (page->index == ((size - 1) >> PAGE_CACHE_SHIFT)) {
+   reserved_space = round_up(size - page_start, root->sectorsize);
+   if (reserved_space < PAGE_CACHE_SIZE) {
+   end = page_start + reserved_space - 1;
+   spin_lock(_I(inode)->lock);
+   BTRFS_I(inode)->outstanding_extents++;
+   spin_unlock(_I(inode)->lock);
+   btrfs_delalloc_release_space(inode, page_start,
+   PAGE_CACHE_SIZE - 
reserved_space);
+   }
+   }
+
/*
 * XXX - page_mkwrite gets called every time the page is dirtied, even
 * if it was already dirty, so for space accounting reasons we need to
@@ -8831,12 +8856,12 @@ again:
 * is probably a better way to do this, but for now keep consistent with
 * prepare_pages in the normal write path.
 */
-   clear_extent_bit(_I(inode)->io_tree, page_start, page_end,
+   clear_extent_bit(_I(inode)->io_tree, page_start, end,
  EXTENT_DIRTY | EXTENT_DELALLOC |
  EXTENT_DO_ACCOUNTING | EXTENT_DEFRAG,
  0, 0, _state, GFP_NOFS);
 
-   ret = btrfs_set_extent_delalloc(inode, page_start, page_end,
+   ret = btrfs_set_extent_delalloc(inode, page_start, end,
_state);
if (ret) {
unlock_extent_cached(io_tree, page_start, page_end,
@@ -8875,7 +8900,7 @@ out_unlock:
}
unlock_page(page);
 out:
-   btrfs_delalloc_release_space(inode, page_start, PAGE_CACHE_SIZE);
+   btrfs_delalloc_release_space(inode, page_start, reserved_space);
 out_noreserve:
sb_end_pagefault(inode->i_sb);
return ret;
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V8 00/13] Btrfs: Pre subpagesize-blocksize cleanups

2015-10-28 Thread Chandan Rajendra
The patches posted along with this cover letter are cleanups made
during the development of subpagesize-blocksize patchset. I believe
that they can be integrated with the mainline kernel. Hence I have
posted them separately from the subpagesize-blocksize patchset.

I have tested the patchset by running xfstests on ppc64 and
x86_64. On ppc64, some of the Btrfs specific tests and generic/255
fail because they assume 4K as the filesystem's block size. I have
fixed some of the test cases. I will fix the rest and mail them to the
fstests mailing list in the near future.

Changes from V7:
1. The second argument passed to btrfs_delalloc_release_space() in
   btrfs_page_mkwrite() was incorrect. Version V8 fixes this.
   
Changes from V6:
1. Rebased on linux-btrfs/integration-4.4 branch. As a result the
   following patches have been trivially modified.
`  - Btrfs: __btrfs_buffered_write: Reserve/release extents aligned
 to block size.
   - Btrfs: fallocate: Work with sectorsized blocks.
   - Btrfs: btrfs_page_mkwrite: Reserve space in sectorsized units.

Changes from V5:
1. Introduced BTRFS_BYTES_TO_BLKS() helper to compute the number of
   filesystem blocks spanning across a range of bytes. A call to this
   macro replaces code such as "nr_blks = bytes >> inode->i_blkbits".

Changes from V4:
1. Removed the RFC tag.

Changes from V3:
Two new issues have been been fixed by the patches,
1. Btrfs: prepare_pages: Retry adding a page to the page cache.
2. Btrfs: Return valid delalloc range when the page does not have
   PG_Dirty flag set or has been invalidated.
IMHO, The above issues are also applicable to the "page size == block
size" scenario but for reasons unknown to me they aren't seen even
when the tests are run for a long time.

Changes from V2:
1. For detecting logical errors, Use ASSERT() calls instead of calls to
   BUG_ON().
2. In the patch "Btrfs: Compute and look up csums based on sectorsized
   blocks", fix usage of kmap_atomic/kunmap_atomic such that between the
   kmap_atomic() and kunmap_atomic() calls we do not invoke any function
   that might cause the current task to sleep.
   
Changes from V1:
1. Call round_[down,up]() functions instead of doing hard coded alignment.

Chandan Rajendra (13):
  Btrfs: __btrfs_buffered_write: Reserve/release extents aligned to
block size
  Btrfs: Compute and look up csums based on sectorsized blocks
  Btrfs: Direct I/O read: Work on sectorsized blocks
  Btrfs: fallocate: Work with sectorsized blocks
  Btrfs: btrfs_page_mkwrite: Reserve space in sectorsized units
  Btrfs: Search for all ordered extents that could span across a page
  Btrfs: Use (eb->start, seq) as search key for tree modification log
  Btrfs: btrfs_submit_direct_hook: Handle map_length < bio vector length
  Btrfs: Limit inline extents to root->sectorsize
  Btrfs: Fix block size returned to user space
  Btrfs: Clean pte corresponding to page straddling i_size
  Btrfs: prepare_pages: Retry adding a page to the page cache
  Btrfs: Return valid delalloc range when the page does not have
PG_Dirty flag set or has been invalidated

 fs/btrfs/ctree.c |  34 +++
 fs/btrfs/ctree.h |   5 +-
 fs/btrfs/extent_io.c |   5 +-
 fs/btrfs/file-item.c |  92 ---
 fs/btrfs/file.c  | 115 
 fs/btrfs/inode.c | 248 ---
 6 files changed, 336 insertions(+), 163 deletions(-)

-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V8 01/13] Btrfs: __btrfs_buffered_write: Reserve/release extents aligned to block size

2015-10-28 Thread Chandan Rajendra
Currently, the code reserves/releases extents in multiples of PAGE_CACHE_SIZE
units. Fix this by doing reservation/releases in block size units.

Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/ctree.h |  3 +++
 fs/btrfs/file.c  | 43 ++-
 2 files changed, 33 insertions(+), 13 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index bc3c711..c9963f6 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -2260,6 +2260,9 @@ struct btrfs_map_token {
unsigned long offset;
 };
 
+#define BTRFS_BYTES_TO_BLKS(fs_info, bytes) \
+   ((bytes) >> (fs_info)->sb->s_blocksize_bits)
+
 static inline void btrfs_init_map_token (struct btrfs_map_token *token)
 {
token->kaddr = NULL;
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 1243205..e31e120 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -499,7 +499,7 @@ int btrfs_dirty_pages(struct btrfs_root *root, struct inode 
*inode,
loff_t isize = i_size_read(inode);
 
start_pos = pos & ~((u64)root->sectorsize - 1);
-   num_bytes = ALIGN(write_bytes + pos - start_pos, root->sectorsize);
+   num_bytes = round_up(write_bytes + pos - start_pos, root->sectorsize);
 
end_of_last_block = start_pos + num_bytes - 1;
err = btrfs_set_extent_delalloc(inode, start_pos, end_of_last_block,
@@ -1362,16 +1362,19 @@ fail:
 static noinline int
 lock_and_cleanup_extent_if_need(struct inode *inode, struct page **pages,
size_t num_pages, loff_t pos,
+   size_t write_bytes,
u64 *lockstart, u64 *lockend,
struct extent_state **cached_state)
 {
+   struct btrfs_root *root = BTRFS_I(inode)->root;
u64 start_pos;
u64 last_pos;
int i;
int ret = 0;
 
-   start_pos = pos & ~((u64)PAGE_CACHE_SIZE - 1);
-   last_pos = start_pos + ((u64)num_pages << PAGE_CACHE_SHIFT) - 1;
+   start_pos = round_down(pos, root->sectorsize);
+   last_pos = start_pos
+   + round_up(pos + write_bytes - start_pos, root->sectorsize) - 1;
 
if (start_pos < inode->i_size) {
struct btrfs_ordered_extent *ordered;
@@ -1486,6 +1489,7 @@ static noinline ssize_t __btrfs_buffered_write(struct 
file *file,
 
while (iov_iter_count(i) > 0) {
size_t offset = pos & (PAGE_CACHE_SIZE - 1);
+   size_t sector_offset;
size_t write_bytes = min(iov_iter_count(i),
 nrptrs * (size_t)PAGE_CACHE_SIZE -
 offset);
@@ -1494,6 +1498,8 @@ static noinline ssize_t __btrfs_buffered_write(struct 
file *file,
size_t reserve_bytes;
size_t dirty_pages;
size_t copied;
+   size_t dirty_sectors;
+   size_t num_sectors;
 
WARN_ON(num_pages > nrptrs);
 
@@ -1506,7 +1512,9 @@ static noinline ssize_t __btrfs_buffered_write(struct 
file *file,
break;
}
 
-   reserve_bytes = num_pages << PAGE_CACHE_SHIFT;
+   sector_offset = pos & (root->sectorsize - 1);
+   reserve_bytes = round_up(write_bytes + sector_offset,
+   root->sectorsize);
 
if (BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW |
 BTRFS_INODE_PREALLOC)) {
@@ -1525,7 +1533,9 @@ static noinline ssize_t __btrfs_buffered_write(struct 
file *file,
 */
num_pages = DIV_ROUND_UP(write_bytes + offset,
 PAGE_CACHE_SIZE);
-   reserve_bytes = num_pages << PAGE_CACHE_SHIFT;
+   reserve_bytes = round_up(write_bytes
+   + sector_offset,
+   root->sectorsize);
goto reserve_metadata;
}
}
@@ -1559,8 +1569,8 @@ again:
break;
 
ret = lock_and_cleanup_extent_if_need(inode, pages, num_pages,
- pos, , ,
- _state);
+   pos, write_bytes, ,
+   , _state);
if (ret < 0) {
if (ret == -EAGAIN)
goto again;
@@ -1596,9 +1606,16 @@ again:
 * we still have an outstanding extent for the chunk we actually
 * managed to copy.
 */
-   if (num_pages > dirty_pages) {
-   release_bytes 

[PATCH V8 04/13] Btrfs: fallocate: Work with sectorsized blocks

2015-10-28 Thread Chandan Rajendra
While at it, this commit changes btrfs_truncate_page() to truncate sectorsized
blocks instead of pages. Hence the function has been renamed to
btrfs_truncate_block().

Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/ctree.h |  2 +-
 fs/btrfs/file.c  | 44 -
 fs/btrfs/inode.c | 60 ++--
 3 files changed, 55 insertions(+), 51 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index c9963f6..4469254 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3923,7 +3923,7 @@ int btrfs_unlink_subvol(struct btrfs_trans_handle *trans,
struct btrfs_root *root,
struct inode *dir, u64 objectid,
const char *name, int name_len);
-int btrfs_truncate_page(struct inode *inode, loff_t from, loff_t len,
+int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len,
int front);
 int btrfs_truncate_inode_items(struct btrfs_trans_handle *trans,
   struct btrfs_root *root,
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index e31e120..81df4fa 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -2285,10 +2285,10 @@ static int btrfs_punch_hole(struct inode *inode, loff_t 
offset, loff_t len)
int ret = 0;
int err = 0;
unsigned int rsv_count;
-   bool same_page;
+   bool same_block;
bool no_holes = btrfs_fs_incompat(root->fs_info, NO_HOLES);
u64 ino_size;
-   bool truncated_page = false;
+   bool truncated_block = false;
bool updated_inode = false;
 
ret = btrfs_wait_ordered_range(inode, offset, len);
@@ -2296,7 +2296,7 @@ static int btrfs_punch_hole(struct inode *inode, loff_t 
offset, loff_t len)
return ret;
 
mutex_lock(>i_mutex);
-   ino_size = round_up(inode->i_size, PAGE_CACHE_SIZE);
+   ino_size = round_up(inode->i_size, root->sectorsize);
ret = find_first_non_hole(inode, , );
if (ret < 0)
goto out_only_mutex;
@@ -2309,31 +2309,30 @@ static int btrfs_punch_hole(struct inode *inode, loff_t 
offset, loff_t len)
lockstart = round_up(offset, BTRFS_I(inode)->root->sectorsize);
lockend = round_down(offset + len,
 BTRFS_I(inode)->root->sectorsize) - 1;
-   same_page = ((offset >> PAGE_CACHE_SHIFT) ==
-   ((offset + len - 1) >> PAGE_CACHE_SHIFT));
-
+   same_block = (BTRFS_BYTES_TO_BLKS(root->fs_info, offset))
+   == (BTRFS_BYTES_TO_BLKS(root->fs_info, offset + len - 1));
/*
-* We needn't truncate any page which is beyond the end of the file
+* We needn't truncate any block which is beyond the end of the file
 * because we are sure there is no data there.
 */
/*
-* Only do this if we are in the same page and we aren't doing the
-* entire page.
+* Only do this if we are in the same block and we aren't doing the
+* entire block.
 */
-   if (same_page && len < PAGE_CACHE_SIZE) {
+   if (same_block && len < root->sectorsize) {
if (offset < ino_size) {
-   truncated_page = true;
-   ret = btrfs_truncate_page(inode, offset, len, 0);
+   truncated_block = true;
+   ret = btrfs_truncate_block(inode, offset, len, 0);
} else {
ret = 0;
}
goto out_only_mutex;
}
 
-   /* zero back part of the first page */
+   /* zero back part of the first block */
if (offset < ino_size) {
-   truncated_page = true;
-   ret = btrfs_truncate_page(inode, offset, 0, 0);
+   truncated_block = true;
+   ret = btrfs_truncate_block(inode, offset, 0, 0);
if (ret) {
mutex_unlock(>i_mutex);
return ret;
@@ -2368,9 +2367,10 @@ static int btrfs_punch_hole(struct inode *inode, loff_t 
offset, loff_t len)
if (!ret) {
/* zero the front end of the last page */
if (tail_start + tail_len < ino_size) {
-   truncated_page = true;
-   ret = btrfs_truncate_page(inode,
-   tail_start + tail_len, 0, 1);
+   truncated_block = true;
+   ret = btrfs_truncate_block(inode,
+   tail_start + tail_len,
+   0, 1);
if (ret)
goto out_only_mutex;
}
@@ -2537,7 +2537,7 @@ out:

[PATCH V8 02/13] Btrfs: Compute and look up csums based on sectorsized blocks

2015-10-28 Thread Chandan Rajendra
Checksums are applicable to sectorsize units. The current code uses
bio->bv_len units to compute and look up checksums. This works on machines
where sectorsize == PAGE_SIZE. This patch makes the checksum computation and
look up code to work with sectorsize units.

Reviewed-by: Liu Bo 
Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/file-item.c | 92 +---
 1 file changed, 59 insertions(+), 33 deletions(-)

diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c
index 58ece65..e2a1cad 100644
--- a/fs/btrfs/file-item.c
+++ b/fs/btrfs/file-item.c
@@ -172,6 +172,7 @@ static int __btrfs_lookup_bio_sums(struct btrfs_root *root,
u64 item_start_offset = 0;
u64 item_last_offset = 0;
u64 disk_bytenr;
+   u64 page_bytes_left;
u32 diff;
int nblocks;
int bio_index = 0;
@@ -220,6 +221,8 @@ static int __btrfs_lookup_bio_sums(struct btrfs_root *root,
disk_bytenr = (u64)bio->bi_iter.bi_sector << 9;
if (dio)
offset = logical_offset;
+
+   page_bytes_left = bvec->bv_len;
while (bio_index < bio->bi_vcnt) {
if (!dio)
offset = page_offset(bvec->bv_page) + bvec->bv_offset;
@@ -243,7 +246,7 @@ static int __btrfs_lookup_bio_sums(struct btrfs_root *root,
if (BTRFS_I(inode)->root->root_key.objectid ==
BTRFS_DATA_RELOC_TREE_OBJECTID) {
set_extent_bits(io_tree, offset,
-   offset + bvec->bv_len - 1,
+   offset + root->sectorsize - 1,
EXTENT_NODATASUM, GFP_NOFS);
} else {

btrfs_info(BTRFS_I(inode)->root->fs_info,
@@ -281,11 +284,17 @@ static int __btrfs_lookup_bio_sums(struct btrfs_root 
*root,
 found:
csum += count * csum_size;
nblocks -= count;
-   bio_index += count;
+
while (count--) {
-   disk_bytenr += bvec->bv_len;
-   offset += bvec->bv_len;
-   bvec++;
+   disk_bytenr += root->sectorsize;
+   offset += root->sectorsize;
+   page_bytes_left -= root->sectorsize;
+   if (!page_bytes_left) {
+   bio_index++;
+   bvec++;
+   page_bytes_left = bvec->bv_len;
+   }
+
}
}
btrfs_free_path(path);
@@ -432,6 +441,8 @@ int btrfs_csum_one_bio(struct btrfs_root *root, struct 
inode *inode,
struct bio_vec *bvec = bio->bi_io_vec;
int bio_index = 0;
int index;
+   int nr_sectors;
+   int i;
unsigned long total_bytes = 0;
unsigned long this_sum_bytes = 0;
u64 offset;
@@ -459,41 +470,56 @@ int btrfs_csum_one_bio(struct btrfs_root *root, struct 
inode *inode,
if (!contig)
offset = page_offset(bvec->bv_page) + bvec->bv_offset;
 
-   if (offset >= ordered->file_offset + ordered->len ||
-   offset < ordered->file_offset) {
-   unsigned long bytes_left;
-   sums->len = this_sum_bytes;
-   this_sum_bytes = 0;
-   btrfs_add_ordered_sum(inode, ordered, sums);
-   btrfs_put_ordered_extent(ordered);
+   data = kmap_atomic(bvec->bv_page);
 
-   bytes_left = bio->bi_iter.bi_size - total_bytes;
+   nr_sectors = BTRFS_BYTES_TO_BLKS(root->fs_info,
+   bvec->bv_len + root->sectorsize
+   - 1);
+
+   for (i = 0; i < nr_sectors; i++) {
+   if (offset >= ordered->file_offset + ordered->len ||
+   offset < ordered->file_offset) {
+   unsigned long bytes_left;
+
+   kunmap_atomic(data);
+   sums->len = this_sum_bytes;
+   this_sum_bytes = 0;
+   btrfs_add_ordered_sum(inode, ordered, sums);
+   btrfs_put_ordered_extent(ordered);
+
+   bytes_left = bio->bi_iter.bi_size - total_bytes;
+
+   sums = kzalloc(btrfs_ordered_sum_size(root, 
bytes_left),
+   GFP_NOFS);
+   BUG_ON(!sums); /* -ENOMEM */
+   sums->len = bytes_left;
+   ordered = 

[PATCH V8 08/13] Btrfs: btrfs_submit_direct_hook: Handle map_length < bio vector length

2015-10-28 Thread Chandan Rajendra
In subpagesize-blocksize scenario, map_length can be less than the length of a
bio vector. Such a condition may cause btrfs_submit_direct_hook() to submit a
zero length bio. Fix this by comparing map_length against block size rather
than with bv_len.

Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/inode.c | 25 +
 1 file changed, 17 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 0dbeccb..cbb05d0 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8248,9 +8248,11 @@ static int btrfs_submit_direct_hook(int rw, struct 
btrfs_dio_private *dip,
u64 file_offset = dip->logical_offset;
u64 submit_len = 0;
u64 map_length;
-   int nr_pages = 0;
-   int ret;
+   u32 blocksize = root->sectorsize;
int async_submit = 0;
+   int nr_sectors;
+   int ret;
+   int i;
 
map_length = orig_bio->bi_iter.bi_size;
ret = btrfs_map_block(root->fs_info, rw, start_sector << 9,
@@ -8280,9 +8282,12 @@ static int btrfs_submit_direct_hook(int rw, struct 
btrfs_dio_private *dip,
atomic_inc(>pending_bios);
 
while (bvec <= (orig_bio->bi_io_vec + orig_bio->bi_vcnt - 1)) {
-   if (map_length < submit_len + bvec->bv_len ||
-   bio_add_page(bio, bvec->bv_page, bvec->bv_len,
-bvec->bv_offset) < bvec->bv_len) {
+   nr_sectors = BTRFS_BYTES_TO_BLKS(root->fs_info, bvec->bv_len);
+   i = 0;
+next_block:
+   if (unlikely(map_length < submit_len + blocksize ||
+   bio_add_page(bio, bvec->bv_page, blocksize,
+   bvec->bv_offset + (i * blocksize)) < blocksize)) {
/*
 * inc the count before we submit the bio so
 * we know the end IO handler won't happen before
@@ -8303,7 +8308,6 @@ static int btrfs_submit_direct_hook(int rw, struct 
btrfs_dio_private *dip,
file_offset += submit_len;
 
submit_len = 0;
-   nr_pages = 0;
 
bio = btrfs_dio_bio_alloc(orig_bio->bi_bdev,
  start_sector, GFP_NOFS);
@@ -8321,9 +8325,14 @@ static int btrfs_submit_direct_hook(int rw, struct 
btrfs_dio_private *dip,
bio_put(bio);
goto out_err;
}
+
+   goto next_block;
} else {
-   submit_len += bvec->bv_len;
-   nr_pages++;
+   submit_len += blocksize;
+   if (--nr_sectors) {
+   i++;
+   goto next_block;
+   }
bvec++;
}
}
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V8 03/13] Btrfs: Direct I/O read: Work on sectorsized blocks

2015-10-28 Thread Chandan Rajendra
The direct I/O read's endio and corresponding repair functions work on
page sized blocks. This commit adds the ability for direct I/O read to work on
subpagesized blocks.

Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/inode.c | 98 +++-
 1 file changed, 75 insertions(+), 23 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index a018e47..98d901e 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7764,9 +7764,9 @@ static int btrfs_check_dio_repairable(struct inode *inode,
 }
 
 static int dio_read_error(struct inode *inode, struct bio *failed_bio,
- struct page *page, u64 start, u64 end,
- int failed_mirror, bio_end_io_t *repair_endio,
- void *repair_arg)
+   struct page *page, unsigned int pgoff,
+   u64 start, u64 end, int failed_mirror,
+   bio_end_io_t *repair_endio, void *repair_arg)
 {
struct io_failure_record *failrec;
struct bio *bio;
@@ -7787,7 +7787,9 @@ static int dio_read_error(struct inode *inode, struct bio 
*failed_bio,
return -EIO;
}
 
-   if (failed_bio->bi_vcnt > 1)
+   if ((failed_bio->bi_vcnt > 1)
+   || (failed_bio->bi_io_vec->bv_len
+   > BTRFS_I(inode)->root->sectorsize))
read_mode = READ_SYNC | REQ_FAILFAST_DEV;
else
read_mode = READ_SYNC;
@@ -7795,7 +7797,7 @@ static int dio_read_error(struct inode *inode, struct bio 
*failed_bio,
isector = start - btrfs_io_bio(failed_bio)->logical;
isector >>= inode->i_sb->s_blocksize_bits;
bio = btrfs_create_repair_bio(inode, failed_bio, failrec, page,
- 0, isector, repair_endio, repair_arg);
+   pgoff, isector, repair_endio, repair_arg);
if (!bio) {
free_io_failure(inode, failrec);
return -EIO;
@@ -7825,12 +7827,17 @@ struct btrfs_retry_complete {
 static void btrfs_retry_endio_nocsum(struct bio *bio)
 {
struct btrfs_retry_complete *done = bio->bi_private;
+   struct inode *inode;
struct bio_vec *bvec;
int i;
 
if (bio->bi_error)
goto end;
 
+   ASSERT(bio->bi_vcnt == 1);
+   inode = bio->bi_io_vec->bv_page->mapping->host;
+   ASSERT(bio->bi_io_vec->bv_len == BTRFS_I(inode)->root->sectorsize);
+
done->uptodate = 1;
bio_for_each_segment_all(bvec, bio, i)
clean_io_failure(done->inode, done->start, bvec->bv_page, 0);
@@ -7842,25 +7849,35 @@ end:
 static int __btrfs_correct_data_nocsum(struct inode *inode,
   struct btrfs_io_bio *io_bio)
 {
+   struct btrfs_fs_info *fs_info;
struct bio_vec *bvec;
struct btrfs_retry_complete done;
u64 start;
+   unsigned int pgoff;
+   u32 sectorsize;
+   int nr_sectors;
int i;
int ret;
 
+   fs_info = BTRFS_I(inode)->root->fs_info;
+   sectorsize = BTRFS_I(inode)->root->sectorsize;
+
start = io_bio->logical;
done.inode = inode;
 
bio_for_each_segment_all(bvec, _bio->bio, i) {
-try_again:
+   nr_sectors = BTRFS_BYTES_TO_BLKS(fs_info, bvec->bv_len);
+   pgoff = bvec->bv_offset;
+
+next_block_or_try_again:
done.uptodate = 0;
done.start = start;
init_completion();
 
-   ret = dio_read_error(inode, _bio->bio, bvec->bv_page, start,
-start + bvec->bv_len - 1,
-io_bio->mirror_num,
-btrfs_retry_endio_nocsum, );
+   ret = dio_read_error(inode, _bio->bio, bvec->bv_page,
+   pgoff, start, start + sectorsize - 1,
+   io_bio->mirror_num,
+   btrfs_retry_endio_nocsum, );
if (ret)
return ret;
 
@@ -7868,10 +7885,15 @@ try_again:
 
if (!done.uptodate) {
/* We might have another mirror, so try again */
-   goto try_again;
+   goto next_block_or_try_again;
}
 
-   start += bvec->bv_len;
+   start += sectorsize;
+
+   if (nr_sectors--) {
+   pgoff += sectorsize;
+   goto next_block_or_try_again;
+   }
}
 
return 0;
@@ -7881,7 +7903,9 @@ static void btrfs_retry_endio(struct bio *bio)
 {
struct btrfs_retry_complete *done = bio->bi_private;
struct btrfs_io_bio *io_bio = btrfs_io_bio(bio);
+   struct inode *inode;
struct bio_vec *bvec;
+   u64 start;
int uptodate;
int ret;
int i;

[PATCH V8 09/13] Btrfs: Limit inline extents to root->sectorsize

2015-10-28 Thread Chandan Rajendra
cow_file_range_inline() limits the size of an inline extent to
PAGE_CACHE_SIZE. This breaks in subpagesize-blocksize scenarios. Fix this by
comparing against root->sectorsize.

Reviewed-by: Josef Bacik 
Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/inode.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index cbb05d0..e2f7699 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -257,7 +257,7 @@ static noinline int cow_file_range_inline(struct btrfs_root 
*root,
data_len = compressed_size;
 
if (start > 0 ||
-   actual_end > PAGE_CACHE_SIZE ||
+   actual_end > root->sectorsize ||
data_len > BTRFS_MAX_INLINE_DATA_SIZE(root) ||
(!compressed_size &&
(actual_end & (root->sectorsize - 1)) == 0) ||
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V8 10/13] Btrfs: Fix block size returned to user space

2015-10-28 Thread Chandan Rajendra
btrfs_getattr() returns PAGE_CACHE_SIZE as the block size. Since
generic_fillattr() already does the right thing (by obtaining block size
from inode->i_blkbits), just remove the statement from btrfs_getattr.

Reviewed-by: Josef Bacik 
Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/inode.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index e2f7699..af24f8c 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -9312,7 +9312,6 @@ static int btrfs_getattr(struct vfsmount *mnt,
 
generic_fillattr(inode, stat);
stat->dev = BTRFS_I(inode)->root->anon_dev;
-   stat->blksize = PAGE_CACHE_SIZE;
 
spin_lock(_I(inode)->lock);
delalloc_bytes = BTRFS_I(inode)->delalloc_bytes;
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V8 07/13] Btrfs: Use (eb->start, seq) as search key for tree modification log

2015-10-28 Thread Chandan Rajendra
In subpagesize-blocksize a page can map multiple extent buffers and hence
using (page index, seq) as the search key is incorrect. For example, searching
through tree modification log tree can return an entry associated with the
first extent buffer mapped by the page (if such an entry exists), when we are
actually searching for entries associated with extent buffers that are mapped
at position 2 or more in the page.

Reviewed-by: Liu Bo 
Signed-off-by: Chandan Rajendra 
---
 fs/btrfs/ctree.c | 34 +-
 1 file changed, 17 insertions(+), 17 deletions(-)

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 5b8e235..51ef032 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -311,7 +311,7 @@ struct tree_mod_root {
 
 struct tree_mod_elem {
struct rb_node node;
-   u64 index;  /* shifted logical */
+   u64 logical;
u64 seq;
enum mod_log_op op;
 
@@ -435,11 +435,11 @@ void btrfs_put_tree_mod_seq(struct btrfs_fs_info *fs_info,
 
 /*
  * key order of the log:
- *   index -> sequence
+ *   node/leaf start address -> sequence
  *
- * the index is the shifted logical of the *new* root node for root replace
- * operations, or the shifted logical of the affected block for all other
- * operations.
+ * The 'start address' is the logical address of the *new* root node
+ * for root replace operations, or the logical address of the affected
+ * block for all other operations.
  *
  * Note: must be called with write lock (tree_mod_log_write_lock).
  */
@@ -460,9 +460,9 @@ __tree_mod_log_insert(struct btrfs_fs_info *fs_info, struct 
tree_mod_elem *tm)
while (*new) {
cur = container_of(*new, struct tree_mod_elem, node);
parent = *new;
-   if (cur->index < tm->index)
+   if (cur->logical < tm->logical)
new = &((*new)->rb_left);
-   else if (cur->index > tm->index)
+   else if (cur->logical > tm->logical)
new = &((*new)->rb_right);
else if (cur->seq < tm->seq)
new = &((*new)->rb_left);
@@ -523,7 +523,7 @@ alloc_tree_mod_elem(struct extent_buffer *eb, int slot,
if (!tm)
return NULL;
 
-   tm->index = eb->start >> PAGE_CACHE_SHIFT;
+   tm->logical = eb->start;
if (op != MOD_LOG_KEY_ADD) {
btrfs_node_key(eb, >key, slot);
tm->blockptr = btrfs_node_blockptr(eb, slot);
@@ -588,7 +588,7 @@ tree_mod_log_insert_move(struct btrfs_fs_info *fs_info,
goto free_tms;
}
 
-   tm->index = eb->start >> PAGE_CACHE_SHIFT;
+   tm->logical = eb->start;
tm->slot = src_slot;
tm->move.dst_slot = dst_slot;
tm->move.nr_items = nr_items;
@@ -699,7 +699,7 @@ tree_mod_log_insert_root(struct btrfs_fs_info *fs_info,
goto free_tms;
}
 
-   tm->index = new_root->start >> PAGE_CACHE_SHIFT;
+   tm->logical = new_root->start;
tm->old_root.logical = old_root->start;
tm->old_root.level = btrfs_header_level(old_root);
tm->generation = btrfs_header_generation(old_root);
@@ -739,16 +739,15 @@ __tree_mod_log_search(struct btrfs_fs_info *fs_info, u64 
start, u64 min_seq,
struct rb_node *node;
struct tree_mod_elem *cur = NULL;
struct tree_mod_elem *found = NULL;
-   u64 index = start >> PAGE_CACHE_SHIFT;
 
tree_mod_log_read_lock(fs_info);
tm_root = _info->tree_mod_log;
node = tm_root->rb_node;
while (node) {
cur = container_of(node, struct tree_mod_elem, node);
-   if (cur->index < index) {
+   if (cur->logical < start) {
node = node->rb_left;
-   } else if (cur->index > index) {
+   } else if (cur->logical > start) {
node = node->rb_right;
} else if (cur->seq < min_seq) {
node = node->rb_left;
@@ -1230,9 +1229,10 @@ __tree_mod_log_oldest_root(struct btrfs_fs_info *fs_info,
return NULL;
 
/*
-* the very last operation that's logged for a root is the replacement
-* operation (if it is replaced at all). this has the index of the *new*
-* root, making it the very first operation that's logged for this root.
+* the very last operation that's logged for a root is the
+* replacement operation (if it is replaced at all). this has
+* the logical address of the *new* root, making it the very
+* first operation that's logged for this root.
 */
while (1) {
tm = tree_mod_log_search_oldest(fs_info, root_logical,
@@ -1336,7 +1336,7 @@ __tree_mod_log_rewind(struct btrfs_fs_info *fs_info, 
struct extent_buffer *eb,
if (!next)
break;
 

Re: Exclusive quota of snapshot exceeded despite no space used

2015-10-28 Thread Qu Wenruo



Johannes Henninger wrote on 2015/10/28 15:02 +0100:

On 27.10.2015 02:06, Qu Wenruo wrote:



Johannes Henninger wrote on 2015/10/27 01:15 +0100:

On 26.10.2015 08:12, Qu Wenruo wrote:



Thanks a lot for your reply!

While remounting the filesystem fixes the issue temporary, it
doesn't
take very long for the bug to happen again so it's not really a
workaround I can work with.

I did recompile the kernel using your patches, but unfortunately the
problem still appears.

Thanks,
Johannes


Interesting, just touching file will cause EQUOTA is quite a big
problem.

I'll try to reproduce it with my patchset and see what really caused
the problem.
The problem seems to do with snapshot qgroup hacking.
But I'm not completely sure yet.

BTW, does "sync; btrfs qgroup show -prce" still show excl as 16K?
16K is the correct number with only 6 empty files, just in case.

Thanks,
Qu


I ran my example from the first mail again and managed to write 7
files
this time, "qgroup show" still shows 16kB after sync:

root@t420:/media/extern/snap# btrfs qg limit -e 50M .
root@t420:/media/extern/snap# for file in {1..100}; do touch $file;
sleep 5m; done
touch: cannot touch ‘8’: Disk quota exceeded
^C
root@t420:/media/extern/snap# sync
root@t420:/media/extern/snap# btrfs qgroup show -pcre .
qgroupid rfer excl max_rfer max_excl parent
child
     --
-
0/5  16.00KiB 16.00KiB none none
--- ---
0/25716.00KiB 16.00KiB none none
--- ---
0/25816.00KiB 16.00KiB none 50.00MiB
--- ---
root@t420:/media/extern/snap# btrfs fi sync .
FSSync '.'
root@t420:/media/extern/snap# btrfs qgroup show -pcre .
qgroupid rfer excl max_rfer max_excl parent
child
     --
-
0/5  16.00KiB 16.00KiB none none
--- ---
0/25716.00KiB 16.00KiB none none
--- ---
0/25816.00KiB 16.00KiB none 50.00MiB
--- ---

By the way, I don't if its relevant but the problem is not limited to
exclusive quotas, but also happens when setting a "referenced" limit
(qgroup limit without "-e").

Thanks,
Johannes



The bug is located, and turns out to be quite a stupid problem caused
by myself.

I just forgot to include a cleanup patch during rebase AGAIN!!!

You can apply the following patch to resolve it:
[PATCH 3/3] btrfs: qgroup: Fix a rebase bug which will cause qgroup
double free

Or just apply the whole patchset:
[4.4][PATCH 0/3] btrfs: Qgroup hotfix

At least, with the patchset based on Chris' integration-4.4 branch, it
succeeded in touching all the 100 files in my test box.

Thanks,
Qu



It's working! Thank you so much for fixing this bug, you don't even know
how much this has helped me!

Thanks!
Johannes
--
To unsubscribe from this list: send the line "unsubscribe
linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Glad to hear that.

If it's working for you, it would be better to add a 'Tested-by' tag
for the 3rd patch.

Thanks,
Qu


Sure! Is there anything I have to do? I'm a kernel and mailing list noob :)

Thanks,
Johannes


Find the mail
"[PATCH 3/3] btrfs: qgroup: Fix a rebase bug which will cause qgroup
double free",
and reply to it with the following contents if you tested the patch:

Tested-by: Johannes Hennigner 


Also refer to kernel documentation/SubmittingPatches.

Thanks,
Qu
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How to replicate a Xen VM using BTRFS as the root filesystem.

2015-10-28 Thread Russell Coker
On Wed, 28 Oct 2015 11:07:20 PM Austin S Hemmelgarn wrote:
> Using this methodology, I can have a new Gentoo PV domain running in 
> about half an hour, whereas it takes me at least two and a half hours 
> (and often much longer than that) when using the regular install process 
> for Gentoo.

On my virtual servers I have a BTRFS subvol /xenstore for the block devices of 
virtual machines.  When I want to duplicate a VM I run
"cp -a --reflink=aways /xenstore/A /xenstore/B" which takes a few seconds.

-- 
My Main Blog http://etbe.coker.com.au/
My Documents Bloghttp://doc.coker.com.au/
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Questions about FIEMAP

2015-10-28 Thread Wang, Zhiye
Thank you all for your comments.

A further question is: if I mount a btrfs file system in "readonly" mode, will 
any operation cause the blocks of a file get changed?


Regards
Mike



-Original Message-
From: Eric Sandeen [mailto:sand...@redhat.com] 
Sent: Monday, October 12, 2015 9:27 PM
To: Wang, Zhiye; linux-btrfs@vger.kernel.org
Subject: Re: Questions about FIEMAP

On 10/11/15 11:37 PM, Wang, Zhiye wrote:
> Hello everyone,
> 
> After googled a bit, I got information that btrfs supports FIEMAP (as "cp" 
> needs it), but it's not valid for "write" operation.

cp should not be using fiemap any more.  It was for a while, until they 
realized that copying based on fiemap output could lead to corruption because 
things changed between the fiemap call and the actual copy...

> I guess we cannot write to block device directly after get block list using 
> FIEMAP. This is because:
> 
> 1. COW feature of btrfs (but this can be disabled using NOCOW) 2. File 
> system rebalance 3. Defragmentation
> 
> Aren't item #2 and #3 also a problem for "read" operation? For example, after 
> "cp" get block list using FIEMAP, file system rebalance occurs, So, previous 
> result of FIEMAP is not valid anymore.
> 
> Or maybe I misunderstood something. Please correct me.

That all may be true for btrfs, but more fundamentally as dsterba said, nothing 
guarantees that the layout won't change *immediately* after your fiemap call.  
This is the case on any filesystem, not just btrfs.

-Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html