[PATCH] fs: btrfs: Replace -ENOENT by -ERANGE in btrfs_get_acl()

2016-07-02 Thread Salah Triki
size contains the value returned by posix_acl_from_xattr(), which
returns -ERANGE, -ENODATA, zero, or an integer greater than zero. So
replace -ENOENT by -ERANGE.

Signed-off-by: Salah Triki 
---
 fs/btrfs/acl.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/btrfs/acl.c b/fs/btrfs/acl.c
index 67a6077..53bb7af 100644
--- a/fs/btrfs/acl.c
+++ b/fs/btrfs/acl.c
@@ -55,8 +55,7 @@ struct posix_acl *btrfs_get_acl(struct inode *inode, int type)
}
if (size > 0) {
acl = posix_acl_from_xattr(_user_ns, value, size);
-   } else if (size == -ENOENT || size == -ENODATA || size == 0) {
-   /* FIXME, who returns -ENOENT?  I think nobody */
+   } else if (size == -ERANGE || size == -ENODATA || size == 0) {
acl = NULL;
} else {
acl = ERR_PTR(-EIO);
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs RAID 10 truncates files over 2G to 4096 bytes.

2016-07-02 Thread Tomasz Kusmierz
Hi,

My setup is that I use one file system for / and /home (on SSD) and a
larger raid 10 for /mnt/share (6 x 2TB).

Today I've discovered that 14 of files that are supposed to be over
2GB are in fact just 4096 bytes. I've checked the content of those 4KB
and it seems that it does contain information that were at the
beginnings of the files.

I've experienced this problem in the past (3 - 4 years ago ?) but
attributed it to different problem that I've spoke with you guys here
about (corruption due to non ECC ram). At that time I did deleted
files affected (56) and similar problem was discovered a year but not
more than 2 years ago and I believe I've deleted the files.

I periodically (once a month) run a scrub on my system to eliminate
any errors sneaking in. I believe I did a balance a half a year ago ?
to reclaim space after I deleted a large database.

root@noname_server:/mnt/share# btrfs fi show
Label: none  uuid: 060c2345-5d2f-4965-b0a2-47ed2d1a5ba2
Total devices 1 FS bytes used 177.19GiB
devid3 size 899.22GiB used 360.06GiB path /dev/sde2

Label: none  uuid: d4cd1d5f-92c4-4b0f-8d45-1b378eff92a1
Total devices 6 FS bytes used 4.02TiB
devid1 size 1.82TiB used 1.34TiB path /dev/sdg1
devid2 size 1.82TiB used 1.34TiB path /dev/sdh1
devid3 size 1.82TiB used 1.34TiB path /dev/sdi1
devid4 size 1.82TiB used 1.34TiB path /dev/sdb1
devid5 size 1.82TiB used 1.34TiB path /dev/sda1
devid6 size 1.82TiB used 1.34TiB path /dev/sdf1

root@noname_server:/mnt/share# uname -a
Linux noname_server 4.4.0-28-generic #47-Ubuntu SMP Fri Jun 24
10:09:13 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
root@noname_server:/mnt/share# btrfs --version
btrfs-progs v4.4
root@noname_server:/mnt/share#


Problem is that stuff on this filesystem moves so slowly that it's
hard to remember historical events ... it's like AWS glacier. What I
can state with 100% certainty is that:
- files that are affected are 2GB and over (safe to assume 4GB and over)
- files affected were just read (and some not even read) never written
after putting into storage
- In the past I've assumed that files affected are due to size, but I
have quite few ISO files some backups of virtual machines ... no
problems there - seems like problem originates in one folder & size >
2GB & extension .mkv
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Filesystem locks up, also with older kernel on any action after booting into 4.7-rc4 once

2016-07-02 Thread Hans van Kranenburg

On 07/02/2016 09:40 PM, Hans van Kranenburg wrote:

On 07/02/2016 09:18 PM, Chris Murphy wrote:

On Sat, Jul 2, 2016 at 11:34 AM, Hans van Kranenburg
 wrote:

On 07/02/2016 07:14 PM, Hans van Kranenburg wrote:


I just rebooted a VM into a 4.7 kernel. The joy didn't last long. After
177 seconds the btrfs data partition (root is on ext4) locked up.
Worse,
it keeps locking up on any action performed even when  rebooting it
with
older kernels again. D: The filesystem initially mounts fine, but then
locks up again immediately.

Linux stacheldraht 4.7.0-rc4-amd64 #1 SMP Debian 4.7~rc4-1~exp1
(2016-06-20) x86_64 GNU/Linux

ps output shows [btrfs-transaction] in D state:

root  1108  0.0  0.0  0 0 ?D17:42   0:00  \_
[btrfs-transacti]

From dmesg:

[blah blah blah]

So, something happened inside the fs that makes it lock up every time I
try to do anything with it...



I force-rebooted the poor thing again, and mounted the filesystem ro. It
mounts without any complaint. I can see all files now, I can do sub list
etc...

So I think I'm going to copy some data to a new filesystem on a new
block
device just in case. The thing has to move to new storage anyway it's
about
100 subvolumes with about 150GB of data, so that's a nice excercise with
send/receive.


Two things might be interesting:
1. btrfs check (without repair) to add to the above and see whether it
finds any problems.
2. For send, to try -e option, if you have related subvolume
snapshots. See if this bug is really a bug or user error or maybe it's
fixed.

https://bugzilla.kernel.org/show_bug.cgi?id=111221


The directory structure is dirvish with my btrfs patches.

These are the subvols:

2016050802/tree
2016051502/tree

So they're all named tree. I cannot just send them all to some location.
And I cannot rename them, because the fs is mounted ro...


Ok, I just moved the latest daily snapshots of all data to a new fs, so 
backups can run on top of it again tonight.


The borken fs is still mounted ro, and I can try to fix it.

Trying to send extra snapshots with send -c fails consistently with 
"parent determination failed for ..." and I'm not going to find out why 
today I guess.


The backup system on this host works by snapshotting (rw) the tree of 
yesterday and then rsyncing the remote over it, so snapshots are 
probably losing btrfs-level parent relationship.


Still, it would be nice to be able to use -c to move multiple ones with 
shared data to another fs. To be able to reconstruct the backup snapshot 
history, I would have to revert to send/receive + (snapshot +rsync) * 
N-1 now, which is not really btrfsish.


Ah, the send/receive finished, let's try some fun things...

-# btrfs check /dev/xvdc
Checking filesystem on /dev/xvdc
UUID: 49ca0cda-3233-4dac-936b-16265c0937a6
checking extents
checking free space tree
cache and super generation don't match, space cache will be invalidated
checking fs roots
checking csums
checking root refs
found 157548691476 bytes used err is 0
total csum bytes: 153411888
total tree bytes: 454918144
total fs tree bytes: 264257536
total extent tree bytes: 15941632
btree space waste bytes: 71694806
file data blocks allocated: 190005772288
 referenced 190005731328

Not many exciting explosions happening here.

The space cache error is maybe a result from switching to space_cache=v2 
while the old space cache is still present?


--
Hans van Kranenburg - System / Network Engineer
T +31 (0)10 2760434 | hans.van.kranenb...@mendix.com | www.mendix.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Filesystem locks up, also with older kernel on any action after booting into 4.7-rc4 once

2016-07-02 Thread Hans van Kranenburg

On 07/02/2016 09:18 PM, Chris Murphy wrote:

On Sat, Jul 2, 2016 at 11:34 AM, Hans van Kranenburg
 wrote:

On 07/02/2016 07:14 PM, Hans van Kranenburg wrote:


I just rebooted a VM into a 4.7 kernel. The joy didn't last long. After
177 seconds the btrfs data partition (root is on ext4) locked up. Worse,
it keeps locking up on any action performed even when  rebooting it with
older kernels again. D: The filesystem initially mounts fine, but then
locks up again immediately.

Linux stacheldraht 4.7.0-rc4-amd64 #1 SMP Debian 4.7~rc4-1~exp1
(2016-06-20) x86_64 GNU/Linux

ps output shows [btrfs-transaction] in D state:

root  1108  0.0  0.0  0 0 ?D17:42   0:00  \_
[btrfs-transacti]

From dmesg:

[blah blah blah]

So, something happened inside the fs that makes it lock up every time I
try to do anything with it...



I force-rebooted the poor thing again, and mounted the filesystem ro. It
mounts without any complaint. I can see all files now, I can do sub list
etc...

So I think I'm going to copy some data to a new filesystem on a new block
device just in case. The thing has to move to new storage anyway it's about
100 subvolumes with about 150GB of data, so that's a nice excercise with
send/receive.


Two things might be interesting:
1. btrfs check (without repair) to add to the above and see whether it
finds any problems.
2. For send, to try -e option, if you have related subvolume
snapshots. See if this bug is really a bug or user error or maybe it's
fixed.

https://bugzilla.kernel.org/show_bug.cgi?id=111221


The directory structure is dirvish with my btrfs patches.

These are the subvols:

2016050802/tree
2016051502/tree

So they're all named tree. I cannot just send them all to some location. 
And I cannot rename them, because the fs is mounted ro...


--
Hans van Kranenburg - System / Network Engineer
T +31 (0)10 2760434 | hans.van.kranenb...@mendix.com | www.mendix.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Filesystem locks up, also with older kernel on any action after booting into 4.7-rc4 once

2016-07-02 Thread Chris Murphy
On Sat, Jul 2, 2016 at 11:34 AM, Hans van Kranenburg
 wrote:
> On 07/02/2016 07:14 PM, Hans van Kranenburg wrote:
>>
>> I just rebooted a VM into a 4.7 kernel. The joy didn't last long. After
>> 177 seconds the btrfs data partition (root is on ext4) locked up. Worse,
>> it keeps locking up on any action performed even when  rebooting it with
>> older kernels again. D: The filesystem initially mounts fine, but then
>> locks up again immediately.
>>
>> Linux stacheldraht 4.7.0-rc4-amd64 #1 SMP Debian 4.7~rc4-1~exp1
>> (2016-06-20) x86_64 GNU/Linux
>>
>> ps output shows [btrfs-transaction] in D state:
>>
>> root  1108  0.0  0.0  0 0 ?D17:42   0:00  \_
>> [btrfs-transacti]
>>
>> From dmesg:
>>
>> [blah blah blah]
>>
>> So, something happened inside the fs that makes it lock up every time I
>> try to do anything with it...
>
>
> I force-rebooted the poor thing again, and mounted the filesystem ro. It
> mounts without any complaint. I can see all files now, I can do sub list
> etc...
>
> So I think I'm going to copy some data to a new filesystem on a new block
> device just in case. The thing has to move to new storage anyway it's about
> 100 subvolumes with about 150GB of data, so that's a nice excercise with
> send/receive.

Two things might be interesting:
1. btrfs check (without repair) to add to the above and see whether it
finds any problems.
2. For send, to try -e option, if you have related subvolume
snapshots. See if this bug is really a bug or user error or maybe it's
fixed.

https://bugzilla.kernel.org/show_bug.cgi?id=111221


-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Cannot balance FS (No space left on device)

2016-07-02 Thread Chris Murphy
On Sat, Jul 2, 2016 at 9:07 AM, Hans van Kranenburg
 wrote:

>
> Also, the behaviour of *always* creating a new empty block group before
> starting to work (which makes it impossible to free up space on a fully
> allocated filesystem with balance) got reverted in:
>
> https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?id=cf25ce518e8ef9d59b292e51193bed2b023a32da
>
> This patch is in 4.5 and 4.7-rc, but *not* in 4.6.

Upstream it first appears in 4.5.7.

-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs lockup

2016-07-02 Thread Chris Murphy
On Fri, Jul 1, 2016 at 3:50 PM, Grey Christoforo  wrote:
> On Fri, Jul 1, 2016 at 9:51 PM, Chris Murphy  wrote:
>> This is interesting:
>>
>> Jul 01 11:56:40 kernel: BTRFS info (device nvme0n1p5): relocating
>> block group 232511242240 flags 1
>>
>> a. It's an NVMe drive.
>> b. Btrfs at this time is involved in a balance operation of some sort.
> The balance operation is one I started manually. It's a coincidence
> that it's running at this point and I don't believe it's related to
> the lockups because
> 1) I saw the lockups on a previous baobab scan of my
> /var/lib/docker/btrfs/subvolumes when no balance was taking place
> and
> 2) after removing all of docker's subvolumes, the problem has gone away
>>
>>
>> And then what you previously reported, the parts of which I don't follow:
>>
>> Jul 01 12:00:31 kernel: NMI watchdog: BUG: soft lockup - CPU#0 stuck
>> for 22s! [scanner:7121]
> I'd guess this is the kernel's protection mechanism to try to recover
> when come critical (blocking) thread does not return, looks like it's
> essentially killing the process when a watchdog timer expires.

I suggest filing a bug and put the URL here. If you can reproduce
without balance that's probably cleaner for a developer to follow. And
then also when the blocking happens if you can do sysrq+t, then
something like 'journalctl -k -o short-monotonic > dmesg_sysrqt.log'
since often it'll overfill the kernel message buffer where journald
will get it all and just the kernel messages. Attach that to the bug.

Open question if there are kernel debug options to try, I can't really
tell if this is directly Btrfs related or incidental. The soft lockup
message refers to scanner as the running task, and it comes up more
than once in the message log. No idea what that is. I guess it could
be some unexpected interaction between Btrfs, VFS and this scanner
task.


-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Filesystem locks up, also with older kernel on any action after booting into 4.7-rc4 once

2016-07-02 Thread Hans van Kranenburg

On 07/02/2016 07:14 PM, Hans van Kranenburg wrote:

I just rebooted a VM into a 4.7 kernel. The joy didn't last long. After
177 seconds the btrfs data partition (root is on ext4) locked up. Worse,
it keeps locking up on any action performed even when  rebooting it with
older kernels again. D: The filesystem initially mounts fine, but then
locks up again immediately.

Linux stacheldraht 4.7.0-rc4-amd64 #1 SMP Debian 4.7~rc4-1~exp1
(2016-06-20) x86_64 GNU/Linux

ps output shows [btrfs-transaction] in D state:

root  1108  0.0  0.0  0 0 ?D17:42   0:00  \_
[btrfs-transacti]

From dmesg:

[blah blah blah]

So, something happened inside the fs that makes it lock up every time I
try to do anything with it...


I force-rebooted the poor thing again, and mounted the filesystem ro. It 
mounts without any complaint. I can see all files now, I can do sub list 
etc...


So I think I'm going to copy some data to a new filesystem on a new 
block device just in case. The thing has to move to new storage anyway 
it's about 100 subvolumes with about 150GB of data, so that's a nice 
excercise with send/receive.


--
Hans van Kranenburg - System / Network Engineer
T +31 (0)10 2760434 | hans.van.kranenb...@mendix.com | www.mendix.com
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Filesystem locks up, also with older kernel on any action after booting into 4.7-rc4 once

2016-07-02 Thread Hans van Kranenburg
I just rebooted a VM into a 4.7 kernel. The joy didn't last long. After 
177 seconds the btrfs data partition (root is on ext4) locked up. Worse, 
it keeps locking up on any action performed even when  rebooting it with 
older kernels again. D: The filesystem initially mounts fine, but then 
locks up again immediately.


Linux stacheldraht 4.7.0-rc4-amd64 #1 SMP Debian 4.7~rc4-1~exp1 
(2016-06-20) x86_64 GNU/Linux


ps output shows [btrfs-transaction] in D state:

root  1108  0.0  0.0  0 0 ?D17:42   0:00  \_ 
[btrfs-transacti]


From dmesg:

[  177.715994] [ cut here ]
[  177.716032] WARNING: CPU: 0 PID: 1108 at 
/build/linux-vIn3gu/linux-4.7~rc4/fs/btrfs/locking.c:251 
btrfs_tree_lock+0x1eb/0x210 [btrfs]
[  177.716037] Modules linked in: binfmt_misc nf_log_ipv6 ip6t_REJECT 
nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter 
ip6table_mangle ip6table_raw ip6_tables nf_log_ipv4 nf_log_common xt_LOG 
xt_limit ipt_REJECT nf_reject_ipv4 xt_tcpudp xt_owner xt_multiport 
xt_conntrack iptable_filter iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 
nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_raw ip_tables 
x_tables intel_powerclamp evdev coretemp pcspkr crct10dif_pclmul 
crc32_pclmul ghash_clmulni_intel quota_v2 quota_tree loop autofs4 ext4 
ecb crc16 jbd2 mbcache btrfs crc32c_generic xor raid6_pq crc32c_intel 
xen_netfront aesni_intel xen_blkfront aes_x86_64 glue_helper lrw 
gf128mul ablk_helper cryptd
[  177.716090] CPU: 0 PID: 1108 Comm: btrfs-transacti Tainted: G 
W   4.7.0-rc4-amd64 #1 Debian 4.7~rc4-1~exp1
[  177.716095]  0200 a4392a01 81312db5 

[  177.716104]   8107896e 880079adb9d8 
88007b940800
[  177.716113]  4000 880079adb9d8 880079887928 


[  177.716121] Call Trace:
[  177.716129]  [] ? dump_stack+0x5c/0x77
[  177.716138]  [] ? __warn+0xbe/0xe0
[  177.716154]  [] ? btrfs_tree_lock+0x1eb/0x210 [btrfs]
[  177.716168]  [] ? btrfs_reserve_extent+0x1b5/0x200 
[btrfs]
[  177.716182]  [] ? 
btrfs_alloc_tree_block+0x167/0x4e0 [btrfs]

[  177.716197]  [] ? __btrfs_cow_block+0x14c/0x5a0 [btrfs]
[  177.716210]  [] ? btrfs_cow_block+0x10b/0x1d0 [btrfs]
[  177.716224]  [] ? commit_cowonly_roots+0x5b/0x2f0 
[btrfs]
[  177.716238]  [] ? 
btrfs_run_delayed_refs+0x203/0x2b0 [btrfs]
[  177.716256]  [] ? 
btrfs_qgroup_account_extents+0x84/0x180 [btrfs]
[  177.716273]  [] ? 
btrfs_commit_transaction+0x568/0xa40 [btrfs]

[  177.716290]  [] ? start_transaction+0x95/0x4a0 [btrfs]
[  177.716304]  [] ? transaction_kthread+0x1e9/0x200 
[btrfs]
[  177.716319]  [] ? 
btrfs_cleanup_transaction+0x590/0x590 [btrfs]

[  177.716328]  [] ? kthread+0xcd/0xf0
[  177.716336]  [] ? ret_from_fork+0x1f/0x40
[  177.716341]  [] ? kthread_create_on_node+0x190/0x190
[  177.716360] ---[ end trace 558c4b7ce67e3503 ]---

And, then repeated every 120 seconds:

[  360.096092] INFO: task btrfs-transacti:1108 blocked for more than 120 
seconds.

[  360.096105]   Tainted: GW   4.7.0-rc4-amd64 #1
[  360.096110] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[  360.096120] btrfs-transacti D 88007d016d80 0  1108  2 
0x
[  360.096128]  88000292ee80  880078f3bbf0 
880078f3c000
[  360.096136]  880079adba40 880079adba58 880078f3bc08 
880079adba38
[  360.096143]   815cdc11 880079adb9d8 
c01424aa

[  360.096151] Call Trace:
[  360.096162]  [] ? schedule+0x31/0x80
[  360.096193]  [] ? btrfs_tree_lock+0xba/0x210 [btrfs]
[  360.096201]  [] ? wake_atomic_t_function+0x60/0x60
[  360.096215]  [] ? 
btrfs_alloc_tree_block+0x167/0x4e0 [btrfs]

[  360.096229]  [] ? __btrfs_cow_block+0x14c/0x5a0 [btrfs]
[  360.096241]  [] ? btrfs_cow_block+0x10b/0x1d0 [btrfs]
[  360.096256]  [] ? commit_cowonly_roots+0x5b/0x2f0 
[btrfs]
[  360.096269]  [] ? 
btrfs_run_delayed_refs+0x203/0x2b0 [btrfs]
[  360.096287]  [] ? 
btrfs_qgroup_account_extents+0x84/0x180 [btrfs]
[  360.096303]  [] ? 
btrfs_commit_transaction+0x568/0xa40 [btrfs]

[  360.096320]  [] ? start_transaction+0x95/0x4a0 [btrfs]
[  360.096334]  [] ? transaction_kthread+0x1e9/0x200 
[btrfs]
[  360.096348]  [] ? 
btrfs_cleanup_transaction+0x590/0x590 [btrfs]

[  360.096356]  [] ? kthread+0xcd/0xf0
[  360.096362]  [] ? ret_from_fork+0x1f/0x40
[  360.096367]  [] ? kthread_create_on_node+0x190/0x190

I'm surprised to see qgroup mentioned, because I'm quite sure I don't 
use that.


I just force-rebooted the thing. Starting went well, mounting the 
partition went without any error.


But, any operation on the thing locks it up again.

-# btrfs sub list .

[   41.046160] [ cut here ]
[   41.046196] WARNING: CPU: 2 PID: 573 at 
/build/linux-vIn3gu/linux-4.7~rc4/fs/btrfs/locking.c:251 
btrfs_tree_lock+0x1eb/0x210 [btrfs]
[   41.046201] Modules linked in: nf_log_ipv6 ip6t_REJECT 

Re: Cannot balance FS (No space left on device)

2016-07-02 Thread Hans van Kranenburg

On 06/13/2016 02:33 PM, Austin S. Hemmelgarn wrote:

On 2016-06-10 18:39, Hans van Kranenburg wrote:

On 06/11/2016 12:10 AM, ojab // wrote:

On Fri, Jun 10, 2016 at 9:56 PM, Hans van Kranenburg
 wrote:

You can work around it by either adding two disks (like Henk said),
or by
temporarily converting some chunks to single. Just enough to get some
free
space on the first two disks to get a balance going that can fill the
third
one. You don't have to convert all of your data or metadata to single!

Something like:

btrfs balance start -v -dconvert=single,limit=10 /mnt/xxx/


Unfortunately it fails even if I set limit=1:

$ sudo btrfs balance start -v -dconvert=single,limit=1 /mnt/xxx/
Dumping filters: flags 0x1, state 0x0, force is off
  DATA (flags 0x120): converting, target=281474976710656, soft is
off, limit=1
ERROR: error during balancing '/mnt/xxx/': No space left on device
There may be more info in syslog - try dmesg | tail


Ah, apparently the balance operation *always* wants to allocate some new
empty space before starting to look more close at the task you give it...

No, that's not exactly true.  It seems to be a rather common fallacy
right now that balance repacks data into existing chunks, which is
absolutely false.  What a balance does is to send everything selected by
the filters through the allocator again, and specifically prevent any
existing chunks from being used to satisfy the allocation.  When you
have 5 data chunks that are 20% used and run 'balance -dlimit=20', it
doesn't pack that all into the first chunk, it allocates a new chunk,
and then packs it all into that, then frees all the other chunks.  This
behavior is actually a pretty important property when adding or removing
devices or converting between profiles, because it's what forces things
into the new configuration of the filesystem.

In an ideal situation, the limit filters should make it repack into
existing chunks when specified alone, but currently that's not how it
works, and I kind of doubt that that will ever be how it works.


I have to disagree with you here, based on what I see happening. Two 
examples will follow, providing some pudding for the proof.


Also, the behaviour of *always* creating a new empty block group before 
starting to work (which makes it impossible to free up space on a fully 
allocated filesystem with balance) got reverted in:


https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?id=cf25ce518e8ef9d59b292e51193bed2b023a32da

This patch is in 4.5 and 4.7-rc, but *not* in 4.6.

Script used to provide block group output, using pyton-btrfs:

-# cat show_block_groups.py
#!/usr/bin/python

from __future__ import print_function
import btrfs
import sys

fs = btrfs.FileSystem(sys.argv[1])
for chunk in fs.chunks():
print(fs.block_group(chunk.vaddr, chunk.length))

Example 1:

-# uname -a
Linux ichiban 4.5.0-0.bpo.2-amd64 #1 SMP Debian 4.5.4-1~bpo8+1 
(2016-05-13) x86_64 GNU/Linux


-# ./show_block_groups.py /
block group vaddr 86211821568 length 1073741824 flags DATA used 
83712 used_pct 78
block group vaddr 87285563392 length 33554432 flags SYSTEM used 16384 
used_pct 0
block group vaddr 87319117824 length 1073741824 flags DATA used 
1070030848 used_pct 100
block group vaddr 88392859648 length 1073741824 flags DATA used 
1057267712 used_pct 98
block group vaddr 89466601472 length 1073741824 flags DATA used 
1066360832 used_pct 99
block group vaddr 90540343296 length 268435456 flags METADATA used 
238256128 used_pct 89
block group vaddr 90808778752 length 268435456 flags METADATA used 
226082816 used_pct 84
block group vaddr 91077214208 length 268435456 flags METADATA used 
242548736 used_pct 90
block group vaddr 91345649664 length 268435456 flags METADATA used 
218415104 used_pct 81
block group vaddr 91614085120 length 268435456 flags METADATA used 
223723520 used_pct 83
block group vaddr 91882520576 length 268435456 flags METADATA used 
68272128 used_pct 25
block group vaddr 92150956032 length 1073741824 flags DATA used 
1048154112 used_pct 98
block group vaddr 93224697856 length 1073741824 flags DATA used 
800985088 used_pct 75
block group vaddr 94298439680 length 1073741824 flags DATA used 62197760 
used_pct 6
block group vaddr 95372181504 length 1073741824 flags DATA used 49541120 
used_pct 5
block group vaddr 96445923328 length 1073741824 flags DATA used 
142856192 used_pct 13
block group vaddr 97519665152 length 1073741824 flags DATA used 
102051840 used_pct 10


Now do a balance, to remove the least used block group:

1st terminal:
-# watch -d './show_block_groups.py /'

2nd terminal:
-# btrfs balance start -v -dusage=5 /
Dumping filters: flags 0x1, state 0x0, force is off
  DATA (flags 0x2): balancing, usage=5
Done, had to relocate 1 out of 17 chunks

After:

-# ./show_block_groups.py /
block group vaddr 86211821568 length 1073741824 flags DATA used 
83712 used_pct 78
block group vaddr 87285563392 length 33554432 flags SYSTEM used 

Re: btrfs ops hang indefinitely (process in D state)

2016-07-02 Thread Eugene Crosser
Hi Duncan,

I pretty much understand the risks and do not need them to be explained to me.
When I installed the remote system, the versions where pretty close to the
cutting edge. And the problem looks as if it *could* to be the same in 3.13 and
in 4.4 kernels.

I wrote here to ask advice about "live" recovery *if* you have any, and to offer
debug information *if* you interested.

If you do not have advice for me, and are not interested in the sort of debug
data that I *can* provide, so be it...

Regards,

Eugene

On 07/02/2016 01:54 PM, Duncan wrote:
> Eugene Crosser posted on Sat, 02 Jul 2016 12:49:53 +0300 as excerpted:
> 
>> Enter the second system. It is a rented physical server in a datacenter
>> with two hard disks, joined into a single root btrfs (/dev/sd[ab]1 are
>> swap partitions):
>>
>> root@dehost:~# uname -a
>> Linux dehost 3.13.0-91-generic [...]
>> root@dehost:~# btrfs --version
>> Btrfs v3.12
>> root@dehost:~#
> 
> v3.12 userspace and v3.13 kernel are both ancient history in btrfs terms, 
> far too old to provide anything useful in terms of debugging info.
> 
> In general, btrfs is not yet fully stable, and usage on the production 
> systems where that ancient a kernel and userspace might be considered for 
> stability reasons is considered highly incompatible with that sort of an 
> interest in stability at the cost of new features, because btrfs itself 
> isn't anything close to that level of stable.  So the general 
> recommendation is choose one, either the still stabilizing btrfs on a 
> more current system if you want btrfs, or something truly stable, if you 
> really need that sort of years outdated stability.
> 
> That said, while this list does tend to focus on mainline and the last 
> two mainline releases series of the current and LTS kernels, so ATM 4.6 
> and 4.5 for current and 4.4 and 4.1 for LTS, not really much earlier, we 
> recognize that various distros do backporting and support much further 
> back.  But this list tracks mainline, not those distro kernels, and 
> specifically, we don't track what they've backported vs. what they 
> haven't.  So if you wish to use your distro's old kernels, that's fine, 
> but you're going to be better off going to them for support then, because 
> they'll know what they've backported and what they haven't and are thus 
> in a better position to provide that support.
> 
> Meanwhile, I do recognize that you had something similar happen on a much 
> newer kernel as well, but that was on a different system, and you don't 
> have the details or logs left for that one, so that's not of much help 
> either.
> 
> Unless of course you can duplicate the behavior once again with a 
> reasonably current kernel within the two-release series either LTS or 
> current range, as specified above, and can provide the logs, etc, from 
> it...
> 




signature.asc
Description: OpenPGP digital signature


Re: btrfs ops hang indefinitely (process in D state)

2016-07-02 Thread Duncan
Eugene Crosser posted on Sat, 02 Jul 2016 12:49:53 +0300 as excerpted:

> Enter the second system. It is a rented physical server in a datacenter
> with two hard disks, joined into a single root btrfs (/dev/sd[ab]1 are
> swap partitions):
> 
> root@dehost:~# uname -a
> Linux dehost 3.13.0-91-generic [...]
> root@dehost:~# btrfs --version
> Btrfs v3.12
> root@dehost:~#

v3.12 userspace and v3.13 kernel are both ancient history in btrfs terms, 
far too old to provide anything useful in terms of debugging info.

In general, btrfs is not yet fully stable, and usage on the production 
systems where that ancient a kernel and userspace might be considered for 
stability reasons is considered highly incompatible with that sort of an 
interest in stability at the cost of new features, because btrfs itself 
isn't anything close to that level of stable.  So the general 
recommendation is choose one, either the still stabilizing btrfs on a 
more current system if you want btrfs, or something truly stable, if you 
really need that sort of years outdated stability.

That said, while this list does tend to focus on mainline and the last 
two mainline releases series of the current and LTS kernels, so ATM 4.6 
and 4.5 for current and 4.4 and 4.1 for LTS, not really much earlier, we 
recognize that various distros do backporting and support much further 
back.  But this list tracks mainline, not those distro kernels, and 
specifically, we don't track what they've backported vs. what they 
haven't.  So if you wish to use your distro's old kernels, that's fine, 
but you're going to be better off going to them for support then, because 
they'll know what they've backported and what they haven't and are thus 
in a better position to provide that support.

Meanwhile, I do recognize that you had something similar happen on a much 
newer kernel as well, but that was on a different system, and you don't 
have the details or logs left for that one, so that's not of much help 
either.

Unless of course you can duplicate the behavior once again with a 
reasonably current kernel within the two-release series either LTS or 
current range, as specified above, and can provide the logs, etc, from 
it...

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs ops hang indefinitely (process in D state)

2016-07-02 Thread Eugene Crosser
Hello,

This may be the same problem as "btrfs lockup".

I have two systems using btrfs for several years. One is my home desktop, it has
root+home ext4 fs on a PCI SSD, and "big stuff" on a btrfs using two hard disks
in RAID1 configuration:

root@pccross:/export# uname -a
Linux pccross 4.7.0-rc2-custom #2 SMP Sat Jun 11 01:13:59 MSK 2016 x86_64 x86_64
x86_64 GNU/Linux # -- Was earlier 4.x version when the problem happened
root@pccross:/export# btrfs --version
btrfs-progs v4.4
root@pccross:/export# btrfs fi show
Label: 'export'  uuid: c94c3ef6-394e-4441-8992-d702bdff
Total devices 2 FS bytes used 1.26TiB
devid1 size 3.64TiB used 1.26TiB path /dev/sda
devid2 size 3.64TiB used 1.26TiB path /dev/sdb

root@pccross:/export# btrfs fi df /export
Data, RAID1: total=1.26TiB, used=1.25TiB
System, RAID1: total=32.00MiB, used=208.00KiB
Metadata, RAID1: total=5.00GiB, used=3.82GiB
GlobalReserve, single: total=512.00MiB, used=0.00B

A month ago, I moved a directory containing a few Gb from home (ext4) to btrfs
with `mv` command. The command took some minutes and eventually finished without
error. After some hours, a cron job that uses files on btrfs did not run. I
logged in to investigate and realized that its process was in 'D' state, and any
command that I tried that would use btrfs (ls, ...) would enter 'D' state and
stay there indefinitely. There was nothing interesting (that I remember) in
dmesg. Reboot did not help and indeed could not complete because some of startup
jobs use files on btfs, and they hang.

I rebooted without mounting btrfs and ran `btrfsck`. It found and fixed some
inconsistencies (no log, sorry), and I could mount, and since then everything
works, except the directory that I moved disappeared altogether (I had a backup
so could restore it). No debugging material left so this is just for background.

=
Enter the second system. It is a rented physical server in a datacenter with two
hard disks, joined into a single root btrfs (/dev/sd[ab]1 are swap partitions):

root@dehost:~# uname -a
Linux dehost 3.13.0-91-generic #138-Ubuntu SMP Fri Jun 24 17:00:34 UTC 2016
x86_64 x86_64 x86_64 GNU/Linux
root@dehost:~# btrfs --version
Btrfs v3.12
root@dehost:~# btrfs fi show
Label: none  uuid: 67a2708c-f039-4783-a699-6f6be0dac318
Total devices 2 FS bytes used 442.58GiB
devid1 size 2.72TiB used 444.04GiB path /dev/sda2
devid2 size 2.72TiB used 444.03GiB path /dev/sdb2

Btrfs v3.12
root@dehost:~# btrfs fi df /
Data, RAID1: total=440.00GiB, used=439.51GiB
System, RAID1: total=32.00MiB, used=72.00KiB
System, single: total=4.00MiB, used=0.00
Metadata, RAID1: total=4.00GiB, used=3.07GiB

A week ago, the system started to become unresponsive every day. Kernel works
(responds to ping) but no processes can start. Looking at the logs after reboot
I noticed that activity stops some time after the start of backup cron job that
covers a set of directories (/etc, /home, /var/mail and some more.). I disabled
the backup job and since then, several days, it did not hang.

=
My question to the developers: what can I do to (1) recover the filesystem while
it is mounted (I can use recovery netboot system and run `btrfs check` as the
last resort), and (2) provide any useful debugging information to the 
developers?

Thank you,

Eugene



signature.asc
Description: OpenPGP digital signature


stat() on btrfs reports the st_blocks with delay (data loss in archivers)

2016-07-02 Thread Pavel Raiskup
There are optimizations in archivers (tar, rsync, ...) that rely on up2date
st_blocks info.  For example, in GNU tar there is optimization check [1]
whether the 'st_size' reports more data than the 'st_blocks' can hold --> then
tar considers that file is sparse (and does additional steps).

It looks like btrfs doesn't show correct value in 'st_blocks' until the data
are synced.  ATM, there happens that:

a) some "tool" creates sparse file
b) that tool does not sync explicitly and exits ..
c) tar is called immediately after that to archive the sparse file
d) tar considers [2] the file is completely sparse (because st_blocks is
   zero) and archives no data.  Here comes data loss.

Because we fixed 'btrfs' to report non-zero 'st_blocks' when the file data is
small and is in-lined (no real data blocks) -- I consider this is too bug in
btrfs worth fixing.

[1] 
http://git.savannah.gnu.org/cgit/paxutils.git/tree/lib/system.h?id=ec72abd9dd63bbff4534ec77e97b1a6cadfc3cf8#n392
[2] 
http://git.savannah.gnu.org/cgit/tar.git/tree/src/sparse.c?id=ac065c57fdc1788a2769fb119ed0c8146e1b9dd6#n273

Tested on kernel:
kernel-4.5.7-300.fc24.x86_64

Originally reported here, reproducer available there:
https://bugzilla.redhat.com/show_bug.cgi?id=1352061

Pavel

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs/113: Assertion failure

2016-07-02 Thread Chandan Rajendra
On Friday, July 01, 2016 04:25:52 PM Josef Bacik wrote:
> On 07/01/2016 12:11 PM, Chandan Rajendra wrote:
> > Sorry, Forgot to add the mailing list to CC. Doing it now ...
> >
> >> While running btrfs/113, I see the following call trace,
> >>
> >> [  182.272009] BTRFS: assertion failed: !current->journal_info || flush != 
> >> BTRFS_RESERVE_FLUSH_ALL, file: 
> >> /home/chandan/repos/linux/fs/btrfs/extent-tree.c, line: 5131
> >> [  182.274010] [ cut here ]
> >> [  182.274685] kernel BUG at 
> >> /home/chandan/repos/linux/fs/btrfs/ctree.h:3347!
> >> [  182.274982] invalid opcode:  [#1] SMP DEBUG_PAGEALLOC
> >> [  182.274982] Modules linked in:
> >> [  182.274982] CPU: 2 PID: 2911 Comm: xfs_io Not tainted 
> >> 4.6.0-g5027553-dirty #29
> >> [  182.274982] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> >> Bochs 01/01/2011
> >> [  182.274982] task: 8818a4c3a400 ti: 8818aec4c000 task.ti: 
> >> 8818aec4c000
> >> [  182.274982] RIP: 0010:[]  [] 
> >> assfail+0x1a/0x1c
> >> [  182.274982] RSP: 0018:8818aec4f5d8  EFLAGS: 00010282
> >> [  182.274982] RAX: 0097 RBX: 0003 RCX: 
> >> 8203dc18
> >> [  182.274982] RDX: 0001 RSI: 0286 RDI: 
> >> 822c8ccc
> >> [  182.274982] RBP: 8818aec4f5d8 R08: fffe R09: 
> >> 
> >> [  182.274982] R10: 0005 R11: 028d R12: 
> >> 8818b44ff000
> >> [  182.274982] R13: 0003 R14: 8818a54a01f0 R15: 
> >> 8818a4f36150
> >> [  182.274982] FS:  7f54fbc6b700() GS:88193350() 
> >> knlGS:
> >> [  182.274982] CS:  0010 DS:  ES:  CR0: 80050033
> >> [  182.274982] CR2: 0061eba0 CR3: 0018aec54000 CR4: 
> >> 06e0
> >> [  182.274982] Stack:
> >> [  182.274982]  8818aec4fa70 8134a891 0002 
> >> 0003
> >> [  182.274982]  8818b2502600 8818b44ff000  
> >> 
> >> [  182.274982]     
> >> 
> >> [  182.274982] Call Trace:
> >> [  182.274982]  [] __reserve_metadata_bytes+0xb1/0x1fe0
> >> [  182.274982]  [] ? lookup_address+0x23/0x30
> >> [  182.274982]  [] ? _lookup_address_cpa.isra.9+0x2d/0x30
> >> [  182.274982]  [] ? 
> >> __change_page_attr_set_clr+0xeb/0xc80
> >> [  182.274982]  [] ? lookup_address+0x23/0x30
> >> [  182.274982]  [] ? get_alloc_profile+0x8a/0x1a0
> >> [  182.274982]  [] ? btrfs_get_alloc_profile+0x2b/0x30
> >> [  182.274982]  [] ? can_overcommit+0x9e/0x100
> >> [  182.274982]  [] ? 
> >> __reserve_metadata_bytes+0xc88/0x1fe0
> >> [  182.274982]  [] ? __alloc_pages_nodemask+0x10d/0xc80
> >> [  182.274982]  [] ? _lookup_address_cpa.isra.9+0x2d/0x30
> >> [  182.274982]  [] ? 
> >> __change_page_attr_set_clr+0xeb/0xc80
> >> [  182.274982]  [] ? lookup_address+0x23/0x30
> >> [  182.274982]  [] ? __slab_free+0x96/0x2b0
> >> [  182.274982]  [] ? __probe_kernel_read+0x39/0x90
> >> [  182.274982]  [] ? insert_state+0xc9/0x150
> >> [  182.274982]  [] ? 
> >> add_delayed_ref_tail_merge+0x2e/0x350
> >> [  182.274982]  [] reserve_metadata_bytes+0x1f/0xe0
> >> [  182.274982]  [] btrfs_block_rsv_add+0x26/0x50
> >> [  182.274982]  [] ? free_extent_state+0x15/0x20
> >> [  182.274982]  [] 
> >> btrfs_delalloc_reserve_metadata+0x13e/0x490
> >> [  182.274982]  [] btrfs_delalloc_reserve_space+0x2a/0x50
> >> [  182.274982]  [] btrfs_truncate_block+0x8a/0x430
> >> [  182.274982]  [] ? 
> >> generic_bin_search.constprop.35+0x86/0x1e0
> >> [  182.274982]  [] truncate_inline_extent+0x157/0x260
> >> [  182.274982]  [] ? btrfs_search_slot+0x86c/0x990
> >> [  182.274982]  [] ? free_extent_map+0x4c/0xa0
> >> [  182.274982]  [] btrfs_truncate_inode_items+0xba7/0xdc0
> >> [  182.274982]  [] btrfs_truncate+0x168/0x280
> >> [  182.274982]  [] btrfs_setattr+0x214/0x320
> >> [  182.274982]  [] notify_change+0x1dc/0x380
> >> [  182.274982]  [] do_truncate+0x61/0xa0
> >> [  182.274982]  [] 
> >> do_sys_ftruncate.constprop.17+0xf9/0x160
> >> [  182.274982]  [] SyS_ftruncate+0x9/0x10
> >> [  182.274982]  [] entry_SYSCALL_64_fastpath+0x13/0x8f
> >> [  182.274982] Code: 48 c7 c7 48 14 df 81 48 89 e5 e8 ac ac d3 ff 0f 0b 55 
> >> 89 d1 31 c0 48 89 f2 48 89 fe 48 c7 c7 48 14 df 81 48 89 e5 e8 90 ac d3 ff 
> >> <0f> 0b 55 b9 11 00 00 00 48 89 e5 41 55 49 89 f5 41 54 49 89 d4
> >> [  182.274982] RIP  [] assfail+0x1a/0x1c
> >> [  182.274982]  RSP 
> >> [  182.327207] ---[ end trace 44721e14eef0a6b2 ]---
> >>
> >>
> >> Basically btrfs_truncate() starts a transaction, 
> >> btrfs_truncate_inode_items()
> >> encounters an inline extent and invokes
> >> btrfs_truncate_block(). btrfs_truncate_block() tries to reserve delalloc 
> >> (both
> >> data & metadata) space. While doing so it passes BTRFS_RESERVE_FLUSH_ALL 
> >> as an
> >> argument. Since we already have a transaction running, we fail the 
> >> following
> >> ASSERT()