[Bug 1813018] Re: Kernel Oops - unable to handle kernel paging request; RIP is at wait_migrate_huge_page+0x51/0x70

2019-11-19 Thread Mauricio Faria de Oliveira
This bug has been fixed on Ubuntu 14.04 ESM, linux kernel package version 3.13.0-175.226. Marking the LP bug as Fix Released / Trusty. ** Changed in: linux (Ubuntu Trusty) Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Ubuntu

[Bug 1813018] Re: Kernel Oops - unable to handle kernel paging request; RIP is at wait_migrate_huge_page+0x51/0x70

2019-11-18 Thread Mauricio Faria de Oliveira
Verification on Trusty ESM updates. $ uname -rv 3.13.0-175-generic #226-Ubuntu SMP Fri Nov 8 15:26:34 UTC 2019 The problem does not happen with the test-case. (kmod.c updated for the new kernel uname strings, same instruction addresses from test build apply.) Test case snippets: Migrated from

[Bug 1813018] Re: Kernel Oops - unable to handle kernel paging request; RIP is at wait_migrate_huge_page+0x51/0x70

2019-11-07 Thread Stefan Bader
** Changed in: linux (Ubuntu Trusty) Status: In Progress => Fix Committed -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1813018 Title: Kernel Oops - unable to handle kernel paging request;

[Bug 1813018] Re: Kernel Oops - unable to handle kernel paging request; RIP is at wait_migrate_huge_page+0x51/0x70

2019-10-29 Thread Mauricio Faria de Oliveira
** Tags added: sts -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1813018 Title: Kernel Oops - unable to handle kernel paging request; RIP is at wait_migrate_huge_page+0x51/0x70 To manage

[Bug 1813018] Re: Kernel Oops - unable to handle kernel paging request; RIP is at wait_migrate_huge_page+0x51/0x70

2019-10-29 Thread Mauricio Faria de Oliveira
** Description changed: [Impact]  * Users on NUMA systems (mostly servers) with    NUMA balancing enabled (which is by default)    might hit a crash/BUG() on a race condition    if two simultaneous page faults of the same    transparent hugepage go into the path for    migration

[Bug 1813018] Re: Kernel Oops - unable to handle kernel paging request; RIP is at wait_migrate_huge_page+0x51/0x70

2019-10-29 Thread Mauricio Faria de Oliveira
Test-case snippet on original kernel on ESM (3.13.0-174.225) [ 1387.632017] cpu 4/pid 1920/task TWO :: change_prot_numa() :: address = 0x7f4560e0, end = 0x7f456100 [ 1389.640322] cpu 5/pid 1921/task ONE :: do_huge_pmd_numa_page() :: addr/mask = 0x7f4560e0, addr = 0x7f4560e0, pmd

[Bug 1813018] Re: Kernel Oops - unable to handle kernel paging request; RIP is at wait_migrate_huge_page+0x51/0x70

2019-10-29 Thread Mauricio Faria de Oliveira
Test-case snippet on original kernel (3.13.0-170.220) [ 330.980173] cpu 5/pid 2126/task ONE :: Stage 4. T1 wake up T2... it may BUG! <...> [ 331.975122] cpu 6/pid 2125/task TWO :: Stage 4. T2 sleep for 1s... BUG afterward? [ 332.980237] BUG: unable to handle kernel paging request at

[Bug 1813018] Re: Kernel Oops - unable to handle kernel paging request; RIP is at wait_migrate_huge_page+0x51/0x70

2019-10-29 Thread Mauricio Faria de Oliveira
Not quite -- using a pastebin (expire 1y): https://pastebin.ubuntu.com/p/pjMNxkJfJy/ ** Description changed: [Impact]  * Users on NUMA systems (mostly servers) with    NUMA balancing enabled (which is by default)    might hit a crash/BUG() on a race condition    if two simultaneous

[Bug 1813018] Re: Kernel Oops - unable to handle kernel paging request; RIP is at wait_migrate_huge_page+0x51/0x70

2019-10-29 Thread Mauricio Faria de Oliveira
** Attachment added: "kmod.c" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1813018/+attachment/5301231/+files/kmod.c -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1813018 Title: Kernel

[Bug 1813018] Re: Kernel Oops - unable to handle kernel paging request; RIP is at wait_migrate_huge_page+0x51/0x70

2019-10-29 Thread Mauricio Faria de Oliveira
Test-case snippet on modified kernel on ESM (3.13.0-174.225 + patch) The problem doesn't happen anymore. [ 169.972017] cpu 4/pid 1765/task TWO :: change_prot_numa() :: address = 0x7f534f80, end = 0x7f534fa0 [ 171.976313] cpu 5/pid 1766/task ONE :: do_huge_pmd_numa_page() ::

[Bug 1813018] Re: Kernel Oops - unable to handle kernel paging request; RIP is at wait_migrate_huge_page+0x51/0x70

2019-10-29 Thread Mauricio Faria de Oliveira
** Attachment added: "test.c" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1813018/+attachment/5301230/+files/test.c -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1813018 Title: Kernel

[Bug 1813018] Re: Kernel Oops - unable to handle kernel paging request; RIP is at wait_migrate_huge_page+0x51/0x70

2019-10-29 Thread Mauricio Faria de Oliveira
Checking if this snippet is properly spaced in a LP comment text (instead of Description.) Task 1 / CPU 1Task 2 / CPU 2 do_huge_pmd_numa_page() do_huge_pmd_numa_page() - pmd_lock(). - trylock_page()

[Bug 1813018] Re: Kernel Oops - unable to handle kernel paging request; RIP is at wait_migrate_huge_page+0x51/0x70

2019-10-29 Thread Mauricio Faria de Oliveira
** Description changed: - Kernel oops occurs randomly every now and then, seemingly when running - memory-intensive processes (so far, it happened to me when using bowtie2 - or STAR). + [Impact] + + * Users on NUMA systems (mostly servers) with +NUMA balancing enabled (which is by default)

[Bug 1813018] Re: Kernel Oops - unable to handle kernel paging request; RIP is at wait_migrate_huge_page+0x51/0x70

2019-10-29 Thread Mauricio Faria de Oliveira
** Also affects: linux (Ubuntu Trusty) Importance: Undecided Status: New ** Changed in: linux (Ubuntu) Status: Confirmed => Invalid ** Changed in: linux (Ubuntu Trusty) Status: New => In Progress ** Changed in: linux (Ubuntu Trusty) Importance: Undecided => Medium **

[Bug 1813018] Re: Kernel Oops - unable to handle kernel paging request; RIP is at wait_migrate_huge_page+0x51/0x70

2019-10-21 Thread Mauricio Faria de Oliveira
@mluypaert Thanks for the details. Do you know whether the problem reproduces with any bowtie2 example that I could run for myself? I'm not familiar w/ it. Apparently there's a workaround for it, if you're willing to test: to disable NUMA balancing. This _might_ impact performance on some

[Bug 1813018] Re: Kernel Oops - unable to handle kernel paging request; RIP is at wait_migrate_huge_page+0x51/0x70

2019-10-21 Thread mluypaert
Hi @mfo I still hit this problem on instances running the 3.13.0 Kernel, but on other instances where I upgraded the Kernel to the HWE Kernel (derived from Ubuntu 16.04) I no longer have the issue. It's hard to reproduce, since the problems occur only occasionally and randomly. Out of 626

[Bug 1813018] Re: Kernel Oops - unable to handle kernel paging request; RIP is at wait_migrate_huge_page+0x51/0x70

2019-10-20 Thread Mauricio Faria de Oliveira
Hi @mluypaert, Do you still hit this problem? Would you be able to provide steps of how you run bowtie2/star that reproudces the problem? And/or be able to try a workaround that might prevent it from happening? Thank you, Mauricio -- You received this bug notification because you are a

[Bug 1813018] Re: Kernel Oops - unable to handle kernel paging request; RIP is at wait_migrate_huge_page+0x51/0x70

2019-01-23 Thread mluypaert
apport information ** Tags added: apport-collected ec2-images trusty ** Description changed: Kernel oops occurs randomly every now and then, seemingly when running memory-intensive processes (so far, it happened to me when using bowtie2 or STAR). Running Ubuntu 14.04 LTS on AWS EC2