Re: kernel crash after upgrading to 4.9

2017-01-06 Thread Chris Murphy
On Fri, Jan 6, 2017 at 8:15 AM, Imran Geriskovan
 wrote:
>>> I seem to have a similar issue to a subject in December:
>>> Subject: page allocation stall in kernel 4.9 when copying files from one
>>> btrfs hdd to another
>>> In my case, this is caused when rsync'ing large amounts of data over NFS
>>> to the server with the BTRFS file system.  This was not apparent in the
>>> previous kernel (4.7).
>
> As I browse through latest series of btrfs corruption/crash reports
> I wonder which kernel version is reasonably safest to use.
> 4.7, 4.8 or 4.9 series?
>
> What are your experiences and recommendations?

I'm using 4.8.15. With 4.9 and 4.10 I'm seeing a regression where a
volume goes read only for inexplicable reasons. Both btrfs check and
scrub show no problems.
http://www.spinics.net/lists/linux-btrfs/msg61817.html




-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel crash after upgrading to 4.9

2017-01-06 Thread Imran Geriskovan
>> I seem to have a similar issue to a subject in December:
>> Subject: page allocation stall in kernel 4.9 when copying files from one
>> btrfs hdd to another
>> In my case, this is caused when rsync'ing large amounts of data over NFS
>> to the server with the BTRFS file system.  This was not apparent in the
>> previous kernel (4.7).

As I browse through latest series of btrfs corruption/crash reports
I wonder which kernel version is reasonably safest to use.
4.7, 4.8 or 4.9 series?

What are your experiences and recommendations?
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel crash after upgrading to 4.9

2017-01-05 Thread Duncan
Duncan posted on Thu, 05 Jan 2017 09:23:35 + as excerpted:

> In his case the copying was from 7.2krpm to 5.6krpm drives, but not the
> reverse or when copying from slower to faster.

Ugh.  What I /meant/ was:

Slower to faster:   worked
Between same speeds:worked
Faster to slower:   was broken

(I repeated the first case twice in different words instead of listing 
the second.)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel crash after upgrading to 4.9

2017-01-05 Thread Duncan
Matt McKinnon posted on Wed, 04 Jan 2017 10:25:17 -0500 as excerpted:

> Hi All,
> 
> I seem to have a similar issue to a subject in December:
> 
> Subject: page allocation stall in kernel 4.9 when copying files from one
> btrfs hdd to another
> 
> In my case, this is caused when rsync'ing large amounts of data over NFS
> to the server with the BTRFS file system.  This was not apparent in the
> previous kernel (4.7).
> 
> The poster mentioned some suggestions from Ducan here:
> 
> https://mail-archive.com/linux-btrfs@vger.kernel.org/msg60083.html
> 
> But those are not visible in the thread.  What suggestions were given to
> help alleviate this pain?

In his case the copying was from 7.2krpm to 5.6krpm drives, but not the 
reverse or when copying from slower to faster.

I said that sounded very much like an earlier bug report to both this 
list and LKML, where Linus responded, suggesting twiddling the dirty_* 
writecache knobs...  Here's my earlier post there quoted (nearly) 
verbatim, including footnotes.  I don't know how much memory your system 
has but the below numbers for my 16 GB system should give you a 
reasonable idea for initial ballpark settings...

It's generally accepted wisdom among kernel devs and sysadmins[1] that 
the existing dirty* write-cache defaults, set at a time when common 
system memories measured in the MiB, not the GiB of today, are no longer 
appropriate and should be lowered, but the lack of agreement as to 
precisely what the settings should be, combined with inertia and the lack 
of practical pressure given that those who know about the problem have 
long since adjusted their own systems accordingly, means the existing now 
generally agreed to be inappropriate defaults continue to remain. =:^(

These knobs can be tweaked in several ways.  For temporary 
experimentation, it's generally easiest to write (as root) updated values 
directly to the /proc/sys/vm/dirty_* files themselves.  Once you find 
values you are comfortable with, most distros have an existing sysctl 
config[2] that can be altered as appropriate, so the settings get 
reapplied at each boot.

Various articles with the details are easily googled so I'll be brief 
here, but here's the apropos settings and comments from my own
/etc/sysctl.conf and a brief explanation:

# write-cache, foreground/background flushing
# vm.dirty_ratio = 10 (% of RAM)
# make it 3% of 16G ~ half a gig
vm.dirty_ratio = 3
# vm.dirty_bytes = 0

# vm.dirty_background_ratio = 5 (% of RAM)
# make it 1% of 16G ~ 160 M
vm.dirty_background_ratio = 1
# vm.dirty_background_bytes = 0

# vm.dirty_expire_centisecs = 2999 (30 sec)
# vm.dirty_writeback_centisecs = 499 (5 sec)
# make it 10 sec
vm.dirty_writeback_centisecs = 1000


The *_bytes and *_ratio files configure the same thing in different ways, 
ratio being percentage of RAM, bytes being... bytes.  Set one or the 
other as you prefer and the other one will be automatically zeroed out.  
The vm.dirty_background_* settings control when the kernel starts lower 
priority flushing, while high priority vm.dirty_* (not background) 
settings control when the kernel forces threads trying to do further 
writes to wait until some currently in-flight writes are completed.

(Rereading this now, I seem to have been inaccurate on one detail.  I'm 
not a dev and definitely not a kernel dev, but from what I've read, once 
foreground writeback is triggered, the kernel actually accounts writes to 
the threads actually doing the writing, causing them to spend much of 
their time they'd otherwise be using to dirty even more memory in IO-
wait, waiting to write out memory they've already dirtied, thus 
throttling their ability to dirty even more memory, ultimately slowing 
down their ability to dirty memory to the speed at which writeback is 
actually occurring.)

But those values only apply to size up until the expiry time has 
occurred, at which point writeback is still forced.  That's where that 
setting comes in.

The problem is that memory has gotten bigger much faster than the speed 
of actually writing out to slow spinning rust has increased. (Fast ssds 
have far less issues in this regard, tho slow flash like common USB thumb 
drives remain affected, indeed, sometimes even more so.)  Common random-
write spinning rust write speeds are 100 MiB/sec and may be as low as 30 
MiB/sec.  Meanwhile, the default 10% dirty_ratio, at 16 GiB memory size, 
approaches[3] 1.6 GiB, ~1600 MiB.  At 100 MiB/sec that's 16 seconds worth 
of writeback to clear.  At 30 MiB/sec, that's... well beyond the 30 
second expiry time!

To be clear, there's still a bug if the system crashes as a result -- the 
normal case should simply be a system that at worst doesn't respond for 
the writeback period, to be sure a problem in itself when that period 
exceeds double-digit seconds, but surely less of one than a total crash, 
as long as the system /does/ come back after perhaps half a minute or so.

Anyway, as you can see from the above 

kernel crash after upgrading to 4.9

2017-01-04 Thread Matt McKinnon

Hi All,

I seem to have a similar issue to a subject in December:

Subject: page allocation stall in kernel 4.9 when copying files from one 
btrfs hdd to another


In my case, this is caused when rsync'ing large amounts of data over NFS 
to the server with the BTRFS file system.  This was not apparent in the 
previous kernel (4.7).


The poster mentioned some suggestions from Ducan here:

https://mail-archive.com/linux-btrfs@vger.kernel.org/msg60083.html

But those are not visible in the thread.  What suggestions were given to 
help alleviate this pain?


-Matt
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html