Re: Hangs in balance_dirty_pages with arm-32 LPAE + highmem

2018-03-14 Thread Michal Hocko
On Tue 06-03-18 20:28:59, Tetsuo Handa wrote:
> Laura Abbott wrote:
> > On 02/26/2018 06:28 AM, Michal Hocko wrote:
> > > On Fri 23-02-18 11:51:41, Laura Abbott wrote:
> > >> Hi,
> > >>
> > >> The Fedora arm-32 build VMs have a somewhat long standing problem
> > >> of hanging when running mkfs.ext4 with a bunch of processes stuck
> > >> in D state. This has been seen as far back as 4.13 but is still
> > >> present on 4.14:
> > >>
> > > [...]
> > >> This looks like everything is blocked on the writeback completing but
> > >> the writeback has been throttled. According to the infra team, this 
> > >> problem
> > >> is _not_ seen without LPAE (i.e. only 4G of RAM). I did see
> > >> https://patchwork.kernel.org/patch/10201593/ but that doesn't seem to
> > >> quite match since this seems to be completely stuck. Any suggestions to
> > >> narrow the problem down?
> > > 
> > > How much dirtyable memory does the system have? We do allow only lowmem
> > > to be dirtyable by default on 32b highmem systems. Maybe you have the
> > > lowmem mostly consumed by the kernel memory. Have you tried to enable
> > > highmem_is_dirtyable?
> > > 
> > 
> > Setting highmem_is_dirtyable did fix the problem. The infrastructure
> > people seemed satisfied enough with this (and are happy to have the
> > machines back).
> 
> That's good.
> 
> > I'll see if they are willing to run a few more tests
> > to get some more state information.
> 
> Well, I'm far from understanding what is happening in your case, but I'm
> interested in other threads which were trying to allocate memory. Therefore,
> I appreciate if they can take SysRq-m + SysRq-t than SysRq-w (as described
> at http://akari.osdn.jp/capturing-kernel-messages.html ).
> 
> Code which assumes that kswapd can make progress can get stuck when kswapd
> is blocked somewhere. And wbt_wait() seems to change behavior based on
> current_is_kswapd(). If everyone is waiting for kswapd but kswapd cannot
> make progress, I worry that it leads to hangups like your case.

Tetsuo, could you stop this finally, pretty please? This is a
well known limitation of 32b architectures with more than 4G. The lowmem
can only handle 896MB of memory and that can be filled up with other
kernel allocations. Stalled writeback is _usually_ a result of only
little dirtyable memory which is left in the lowmem. We cannot simply
allow highmem to be dirtyable by default due to reasons explained in
other email.

I can imagine that it is hard for you to grasp that not everything is
"silent hang during OOM" but there are other things going on in the VM.
-- 
Michal Hocko
SUSE Labs


Re: Hangs in balance_dirty_pages with arm-32 LPAE + highmem

2018-03-14 Thread Michal Hocko
On Mon 05-03-18 13:04:24, Laura Abbott wrote:
> On 02/26/2018 06:28 AM, Michal Hocko wrote:
> > On Fri 23-02-18 11:51:41, Laura Abbott wrote:
> > > Hi,
> > > 
> > > The Fedora arm-32 build VMs have a somewhat long standing problem
> > > of hanging when running mkfs.ext4 with a bunch of processes stuck
> > > in D state. This has been seen as far back as 4.13 but is still
> > > present on 4.14:
> > > 
> > [...]
> > > This looks like everything is blocked on the writeback completing but
> > > the writeback has been throttled. According to the infra team, this 
> > > problem
> > > is _not_ seen without LPAE (i.e. only 4G of RAM). I did see
> > > https://patchwork.kernel.org/patch/10201593/ but that doesn't seem to
> > > quite match since this seems to be completely stuck. Any suggestions to
> > > narrow the problem down?
> > 
> > How much dirtyable memory does the system have? We do allow only lowmem
> > to be dirtyable by default on 32b highmem systems. Maybe you have the
> > lowmem mostly consumed by the kernel memory. Have you tried to enable
> > highmem_is_dirtyable?
> > 
> 
> Setting highmem_is_dirtyable did fix the problem. The infrastructure
> people seemed satisfied enough with this (and are happy to have the
> machines back). I'll see if they are willing to run a few more tests
> to get some more state information.

Please be aware that highmem_is_dirtyable is not for free. There are
some code paths which can only allocate from lowmem (e.g. block device
AFAIR) and those could fill up the whole lowmem without any throttling.
-- 
Michal Hocko
SUSE Labs


Re: Hangs in balance_dirty_pages with arm-32 LPAE + highmem

2018-03-06 Thread Tetsuo Handa
Laura Abbott wrote:
> On 02/26/2018 06:28 AM, Michal Hocko wrote:
> > On Fri 23-02-18 11:51:41, Laura Abbott wrote:
> >> Hi,
> >>
> >> The Fedora arm-32 build VMs have a somewhat long standing problem
> >> of hanging when running mkfs.ext4 with a bunch of processes stuck
> >> in D state. This has been seen as far back as 4.13 but is still
> >> present on 4.14:
> >>
> > [...]
> >> This looks like everything is blocked on the writeback completing but
> >> the writeback has been throttled. According to the infra team, this problem
> >> is _not_ seen without LPAE (i.e. only 4G of RAM). I did see
> >> https://patchwork.kernel.org/patch/10201593/ but that doesn't seem to
> >> quite match since this seems to be completely stuck. Any suggestions to
> >> narrow the problem down?
> > 
> > How much dirtyable memory does the system have? We do allow only lowmem
> > to be dirtyable by default on 32b highmem systems. Maybe you have the
> > lowmem mostly consumed by the kernel memory. Have you tried to enable
> > highmem_is_dirtyable?
> > 
> 
> Setting highmem_is_dirtyable did fix the problem. The infrastructure
> people seemed satisfied enough with this (and are happy to have the
> machines back).

That's good.

> I'll see if they are willing to run a few more tests
> to get some more state information.

Well, I'm far from understanding what is happening in your case, but I'm
interested in other threads which were trying to allocate memory. Therefore,
I appreciate if they can take SysRq-m + SysRq-t than SysRq-w (as described
at http://akari.osdn.jp/capturing-kernel-messages.html ).

Code which assumes that kswapd can make progress can get stuck when kswapd
is blocked somewhere. And wbt_wait() seems to change behavior based on
current_is_kswapd(). If everyone is waiting for kswapd but kswapd cannot
make progress, I worry that it leads to hangups like your case.



Below is a totally different case which I got today, but an example of
whether SysRq-m + SysRq-t can give us some clues.

Running below program on CPU 0 (using "taskset -c 0") on 4.16-rc4 against XFS
can trigger OOM lockups (hangup without being able to invoke the OOM killer).

--
#include 
#include 
#include 
#include 
#include 
#include 
#include 

int main(int argc, char *argv[])
{
static char buffer[4096] = { };
char *buf = NULL;
unsigned long size;
unsigned long i;
for (i = 0; i < 1024; i++) {
if (fork() == 0) {
int fd;
snprintf(buffer, sizeof(buffer), "/tmp/file.%u", 
getpid());
fd = open(buffer, O_WRONLY | O_CREAT | O_APPEND, 0600);
memset(buffer, 0, sizeof(buffer));
sleep(1);
while (write(fd, buffer, sizeof(buffer)) == 
sizeof(buffer));
_exit(0);
}
}
for (size = 1048576; size < 512UL * (1 << 30); size <<= 1) {
char *cp = realloc(buf, size);
if (!cp) {
size >>= 1;
break;
}
buf = cp;
}
sleep(2);
/* Will cause OOM due to overcommit */
for (i = 0; i < size; i += 4096)
buf[i] = 0;
return 0;
}
--

MM people love to ignore such kind of problem with "It is a DoS attack", but
only one CPU out of 8 CPUs is occupied by this program, which means that other
threads (including kernel threads doing memory reclaim activities) are free to
use idle CPUs 1-7 as they need. Also, while CPU 0 was really busy processing
hundreds of threads doing direct reclaim, idle CPUs 1-7 should be able to invoke
the OOM killer shortly because there should be already little to reclaim. Also,
writepending: did not decrease (and no disk I/O was observed) during the OOM
lockup. Thus, I don't know whether this is just an overloaded.

[  660.035957] Node 0 Normal free:17056kB min:17320kB low:21648kB high:25976kB 
active_anon:570132kB inactive_anon:13452kB active_file:15136kB 
inactive_file:13296kB unevictable:0kB writepending:42320kB present:1048576kB 
managed:951188kB mlocked:0kB kernel_stack:22448kB pagetables:37304kB bounce:0kB 
free_pcp:0kB local_pcp:0kB free_cma:0kB
[  709.498421] Node 0 Normal free:16920kB min:17320kB low:21648kB high:25976kB 
active_anon:570132kB inactive_anon:13452kB active_file:19180kB 
inactive_file:17640kB unevictable:0kB writepending:42740kB present:1048576kB 
managed:951188kB mlocked:0kB kernel_stack:22400kB pagetables:37304kB bounce:0kB 
free_pcp:248kB local_pcp:0kB free_cma:0kB
[  751.290146] Node 0 Normal free:16920kB min:17320kB low:21648kB high:25976kB 
active_anon:570132kB inactive_anon:13452kB active_file:14556kB 
inactive_file:14452kB unevictable:0kB writepending:42740kB present:1048576kB 
managed:951188kB mlocked:0kB kernel_stack:22400kB pagetables:37304kB bounce:0kB 
free_pcp:248kB local_pcp:0kB 

Re: Hangs in balance_dirty_pages with arm-32 LPAE + highmem

2018-03-05 Thread Laura Abbott

On 02/26/2018 06:28 AM, Michal Hocko wrote:

On Fri 23-02-18 11:51:41, Laura Abbott wrote:

Hi,

The Fedora arm-32 build VMs have a somewhat long standing problem
of hanging when running mkfs.ext4 with a bunch of processes stuck
in D state. This has been seen as far back as 4.13 but is still
present on 4.14:


[...]

This looks like everything is blocked on the writeback completing but
the writeback has been throttled. According to the infra team, this problem
is _not_ seen without LPAE (i.e. only 4G of RAM). I did see
https://patchwork.kernel.org/patch/10201593/ but that doesn't seem to
quite match since this seems to be completely stuck. Any suggestions to
narrow the problem down?


How much dirtyable memory does the system have? We do allow only lowmem
to be dirtyable by default on 32b highmem systems. Maybe you have the
lowmem mostly consumed by the kernel memory. Have you tried to enable
highmem_is_dirtyable?



Setting highmem_is_dirtyable did fix the problem. The infrastructure
people seemed satisfied enough with this (and are happy to have the
machines back). I'll see if they are willing to run a few more tests
to get some more state information.

Thanks,
Laura


Re: Hangs in balance_dirty_pages with arm-32 LPAE + highmem

2018-02-26 Thread Michal Hocko
On Fri 23-02-18 11:51:41, Laura Abbott wrote:
> Hi,
> 
> The Fedora arm-32 build VMs have a somewhat long standing problem
> of hanging when running mkfs.ext4 with a bunch of processes stuck
> in D state. This has been seen as far back as 4.13 but is still
> present on 4.14:
> 
[...]
> This looks like everything is blocked on the writeback completing but
> the writeback has been throttled. According to the infra team, this problem
> is _not_ seen without LPAE (i.e. only 4G of RAM). I did see
> https://patchwork.kernel.org/patch/10201593/ but that doesn't seem to
> quite match since this seems to be completely stuck. Any suggestions to
> narrow the problem down?

How much dirtyable memory does the system have? We do allow only lowmem
to be dirtyable by default on 32b highmem systems. Maybe you have the
lowmem mostly consumed by the kernel memory. Have you tried to enable
highmem_is_dirtyable?
-- 
Michal Hocko
SUSE Labs