On Fri, Jul 20, 2018 at 2:35 PM, Andy Lutomirski wrote:
>
>> On Jul 16, 2018, at 6:05 AM, H.J. Lu wrote:
>>
>>> On Fri, Jul 13, 2018 at 7:08 PM, Andy Lutomirski
>>> wrote:
>>> I'm not at all convinced that this is the problem, but the series here
>>> will give a better diagnostic if the issue
On Fri, Jul 20, 2018 at 2:35 PM, Andy Lutomirski wrote:
>
>> On Jul 16, 2018, at 6:05 AM, H.J. Lu wrote:
>>
>>> On Fri, Jul 13, 2018 at 7:08 PM, Andy Lutomirski
>>> wrote:
>>> I'm not at all convinced that this is the problem, but the series here
>>> will give a better diagnostic if the issue
> On Jul 16, 2018, at 6:05 AM, H.J. Lu wrote:
>
>> On Fri, Jul 13, 2018 at 7:08 PM, Andy Lutomirski wrote:
>> I'm not at all convinced that this is the problem, but the series here
>> will give a better diagnostic if the issue really is an IRQ stack
>> overflow:
>>
>>
> On Jul 16, 2018, at 6:05 AM, H.J. Lu wrote:
>
>> On Fri, Jul 13, 2018 at 7:08 PM, Andy Lutomirski wrote:
>> I'm not at all convinced that this is the problem, but the series here
>> will give a better diagnostic if the issue really is an IRQ stack
>> overflow:
>>
>>
On Fri, Jul 13, 2018 at 7:08 PM, Andy Lutomirski wrote:
> I'm not at all convinced that this is the problem, but the series here
> will give a better diagnostic if the issue really is an IRQ stack
> overflow:
>
>
On Fri, Jul 13, 2018 at 7:08 PM, Andy Lutomirski wrote:
> I'm not at all convinced that this is the problem, but the series here
> will give a better diagnostic if the issue really is an IRQ stack
> overflow:
>
>
I'm not at all convinced that this is the problem, but the series here
will give a better diagnostic if the issue really is an IRQ stack
overflow:
https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/log/?h=x86/guard_pages
(link currently broken. should work soon.)
I'm not at all convinced that this is the problem, but the series here
will give a better diagnostic if the issue really is an IRQ stack
overflow:
https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/log/?h=x86/guard_pages
(link currently broken. should work soon.)
On Thu, Jul 12, 2018 at 7:44 AM, H.J. Lu wrote:
> On Wed, Jul 11, 2018 at 4:14 PM, Dave Hansen wrote:
>> On 07/11/2018 04:07 PM, Andy Lutomirski wrote:
>>> Could the cause be an overflow of the IRQ stack? I’ve been meaning
>>> to put guard pages on all the special stacks for a while. Let me see
On Thu, Jul 12, 2018 at 7:44 AM, H.J. Lu wrote:
> On Wed, Jul 11, 2018 at 4:14 PM, Dave Hansen wrote:
>> On 07/11/2018 04:07 PM, Andy Lutomirski wrote:
>>> Could the cause be an overflow of the IRQ stack? I’ve been meaning
>>> to put guard pages on all the special stacks for a while. Let me see
On Wed, Jul 11, 2018 at 4:14 PM, Dave Hansen wrote:
> On 07/11/2018 04:07 PM, Andy Lutomirski wrote:
>> Could the cause be an overflow of the IRQ stack? I’ve been meaning
>> to put guard pages on all the special stacks for a while. Let me see
>> if I can do that in the next couple days.
>
> But
On Wed, Jul 11, 2018 at 4:14 PM, Dave Hansen wrote:
> On 07/11/2018 04:07 PM, Andy Lutomirski wrote:
>> Could the cause be an overflow of the IRQ stack? I’ve been meaning
>> to put guard pages on all the special stacks for a while. Let me see
>> if I can do that in the next couple days.
>
> But
On 07/11/2018 04:07 PM, Andy Lutomirski wrote:
> Could the cause be an overflow of the IRQ stack? I’ve been meaning
> to put guard pages on all the special stacks for a while. Let me see
> if I can do that in the next couple days.
But what would that overflow into? Wouldn't it most likely be
On 07/11/2018 04:07 PM, Andy Lutomirski wrote:
> Could the cause be an overflow of the IRQ stack? I’ve been meaning
> to put guard pages on all the special stacks for a while. Let me see
> if I can do that in the next couple days.
But what would that overflow into? Wouldn't it most likely be
> On Jul 11, 2018, at 11:31 AM, Dave Jones wrote:
>
>> On Wed, Jul 11, 2018 at 10:50:22AM -0700, Dave Hansen wrote:
>> On 07/11/2018 10:29 AM, H.J. Lu wrote:
I have seen it on machines with various amounts of cores and RAMs.
It triggers the fastest on 8 cores with 6GB RAM reliably.
> On Jul 11, 2018, at 11:31 AM, Dave Jones wrote:
>
>> On Wed, Jul 11, 2018 at 10:50:22AM -0700, Dave Hansen wrote:
>> On 07/11/2018 10:29 AM, H.J. Lu wrote:
I have seen it on machines with various amounts of cores and RAMs.
It triggers the fastest on 8 cores with 6GB RAM reliably.
On Wed, Jul 11, 2018 at 10:50:22AM -0700, Dave Hansen wrote:
> On 07/11/2018 10:29 AM, H.J. Lu wrote:
> >> I have seen it on machines with various amounts of cores and RAMs.
> >> It triggers the fastest on 8 cores with 6GB RAM reliably.
> > Here is the first kernel message.
>
> This looks
On Wed, Jul 11, 2018 at 10:50:22AM -0700, Dave Hansen wrote:
> On 07/11/2018 10:29 AM, H.J. Lu wrote:
> >> I have seen it on machines with various amounts of cores and RAMs.
> >> It triggers the fastest on 8 cores with 6GB RAM reliably.
> > Here is the first kernel message.
>
> This looks
On 07/11/2018 10:29 AM, H.J. Lu wrote:
>> I have seen it on machines with various amounts of cores and RAMs.
>> It triggers the fastest on 8 cores with 6GB RAM reliably.
> Here is the first kernel message.
This looks like random corruption again. It's probably a bogus 'struct
page' that fails
On 07/11/2018 10:29 AM, H.J. Lu wrote:
>> I have seen it on machines with various amounts of cores and RAMs.
>> It triggers the fastest on 8 cores with 6GB RAM reliably.
> Here is the first kernel message.
This looks like random corruption again. It's probably a bogus 'struct
page' that fails
On Wed, Jul 11, 2018 at 10:43 AM, Dave Hansen wrote:
> On 07/11/2018 10:29 AM, H.J. Lu wrote:
>>> I have seen it on machines with various amounts of cores and RAMs.
>>> It triggers the fastest on 8 cores with 6GB RAM reliably.
>> Here is the first kernel message.
>
> Does it trigger better with
On Wed, Jul 11, 2018 at 10:43 AM, Dave Hansen wrote:
> On 07/11/2018 10:29 AM, H.J. Lu wrote:
>>> I have seen it on machines with various amounts of cores and RAMs.
>>> It triggers the fastest on 8 cores with 6GB RAM reliably.
>> Here is the first kernel message.
>
> Does it trigger better with
On 07/11/2018 10:29 AM, H.J. Lu wrote:
>> I have seen it on machines with various amounts of cores and RAMs.
>> It triggers the fastest on 8 cores with 6GB RAM reliably.
> Here is the first kernel message.
Does it trigger better with more RAM or less?
On 07/11/2018 10:29 AM, H.J. Lu wrote:
>> I have seen it on machines with various amounts of cores and RAMs.
>> It triggers the fastest on 8 cores with 6GB RAM reliably.
> Here is the first kernel message.
Does it trigger better with more RAM or less?
On Wed, Jul 11, 2018 at 10:29 AM, H.J. Lu wrote:
> On Wed, Jul 11, 2018 at 9:53 AM, H.J. Lu wrote:
>> On Wed, Jul 11, 2018 at 9:49 AM, Dave Hansen wrote:
>>> On 07/11/2018 09:29 AM, H.J. Lu wrote:
>> # It takes about 3 hour to bootstrap x86-64 GCC and 3 hour to run tests,
>> TIMEOUT=480
On Wed, Jul 11, 2018 at 10:29 AM, H.J. Lu wrote:
> On Wed, Jul 11, 2018 at 9:53 AM, H.J. Lu wrote:
>> On Wed, Jul 11, 2018 at 9:49 AM, Dave Hansen wrote:
>>> On 07/11/2018 09:29 AM, H.J. Lu wrote:
>> # It takes about 3 hour to bootstrap x86-64 GCC and 3 hour to run tests,
>> TIMEOUT=480
On Wed, Jul 11, 2018 at 9:49 AM, Dave Hansen wrote:
> On 07/11/2018 09:29 AM, H.J. Lu wrote:
# It takes about 3 hour to bootstrap x86-64 GCC and 3 hour to run tests,
TIMEOUT=480
# Run it every hour,
30 * * * * /export/gnu/import/git/gcc-test-x32/gcc-build -mx32
--with-pic
On Wed, Jul 11, 2018 at 9:49 AM, Dave Hansen wrote:
> On 07/11/2018 09:29 AM, H.J. Lu wrote:
# It takes about 3 hour to bootstrap x86-64 GCC and 3 hour to run tests,
TIMEOUT=480
# Run it every hour,
30 * * * * /export/gnu/import/git/gcc-test-x32/gcc-build -mx32
--with-pic
On 07/11/2018 09:29 AM, H.J. Lu wrote:
>>> # It takes about 3 hour to bootstrap x86-64 GCC and 3 hour to run tests,
>>> TIMEOUT=480
>>> # Run it every hour,
>>> 30 * * * * /export/gnu/import/git/gcc-test-x32/gcc-build -mx32
>>> --with-pic > /dev/null 2>&1
>> Oh, fun, one of those.
>>
>> How long
On 07/11/2018 09:29 AM, H.J. Lu wrote:
>>> # It takes about 3 hour to bootstrap x86-64 GCC and 3 hour to run tests,
>>> TIMEOUT=480
>>> # Run it every hour,
>>> 30 * * * * /export/gnu/import/git/gcc-test-x32/gcc-build -mx32
>>> --with-pic > /dev/null 2>&1
>> Oh, fun, one of those.
>>
>> How long
On Wed, Jul 11, 2018 at 9:24 AM, Dave Hansen wrote:
> On 07/11/2018 08:40 AM, H.J. Lu wrote:
>> This is a quad-core machine with HT and 6 GB RAM. The workload is
>> x32 GCC build and test with "make -j8". The bug is triggered during GCC
>> test after a couple hours. I have a script to set up
On Wed, Jul 11, 2018 at 9:24 AM, Dave Hansen wrote:
> On 07/11/2018 08:40 AM, H.J. Lu wrote:
>> This is a quad-core machine with HT and 6 GB RAM. The workload is
>> x32 GCC build and test with "make -j8". The bug is triggered during GCC
>> test after a couple hours. I have a script to set up
On 07/11/2018 08:40 AM, H.J. Lu wrote:
> This is a quad-core machine with HT and 6 GB RAM. The workload is
> x32 GCC build and test with "make -j8". The bug is triggered during GCC
> test after a couple hours. I have a script to set up my workload:
>
>
On 07/11/2018 08:40 AM, H.J. Lu wrote:
> This is a quad-core machine with HT and 6 GB RAM. The workload is
> x32 GCC build and test with "make -j8". The bug is triggered during GCC
> test after a couple hours. I have a script to set up my workload:
>
>
On Wed, Jul 11, 2018 at 8:13 AM, Dave Hansen wrote:
> On 07/11/2018 07:56 AM, H.J. Lu wrote:
>> On Mon, Jul 9, 2018 at 8:47 PM, Dave Hansen wrote:
>>> On 07/09/2018 07:14 PM, H.J. Lu wrote:
> I'd really want to see this reproduced without KASLR to make the oops
> easier to read. It
On Wed, Jul 11, 2018 at 8:13 AM, Dave Hansen wrote:
> On 07/11/2018 07:56 AM, H.J. Lu wrote:
>> On Mon, Jul 9, 2018 at 8:47 PM, Dave Hansen wrote:
>>> On 07/09/2018 07:14 PM, H.J. Lu wrote:
> I'd really want to see this reproduced without KASLR to make the oops
> easier to read. It
On 07/11/2018 07:56 AM, H.J. Lu wrote:
> On Mon, Jul 9, 2018 at 8:47 PM, Dave Hansen wrote:
>> On 07/09/2018 07:14 PM, H.J. Lu wrote:
I'd really want to see this reproduced without KASLR to make the oops
easier to read. It would also be handy to try your workload with all
the
On 07/11/2018 07:56 AM, H.J. Lu wrote:
> On Mon, Jul 9, 2018 at 8:47 PM, Dave Hansen wrote:
>> On 07/09/2018 07:14 PM, H.J. Lu wrote:
I'd really want to see this reproduced without KASLR to make the oops
easier to read. It would also be handy to try your workload with all
the
On 07/09/2018 07:14 PM, H.J. Lu wrote:
>> I'd really want to see this reproduced without KASLR to make the oops
>> easier to read. It would also be handy to try your workload with all
>> the pedantic debugging: KASAN, slab debugging, DEBUG_PAGE_ALLOC, etc...
>> and see if it still triggers.
> How
On 07/09/2018 07:14 PM, H.J. Lu wrote:
>> I'd really want to see this reproduced without KASLR to make the oops
>> easier to read. It would also be handy to try your workload with all
>> the pedantic debugging: KASAN, slab debugging, DEBUG_PAGE_ALLOC, etc...
>> and see if it still triggers.
> How
On Mon, Jul 9, 2018 at 5:44 PM, Dave Hansen wrote:
> ... cc'ing a few folks who I know have been looking at this code
> lately. The full oops is below if any of you want to take a look.
>
> OK, well, annotating the disassembly a bit:
>
>> (gdb) disass free_pages_and_swap_cache
>> Dump of
On Mon, Jul 9, 2018 at 5:44 PM, Dave Hansen wrote:
> ... cc'ing a few folks who I know have been looking at this code
> lately. The full oops is below if any of you want to take a look.
>
> OK, well, annotating the disassembly a bit:
>
>> (gdb) disass free_pages_and_swap_cache
>> Dump of
... cc'ing a few folks who I know have been looking at this code
lately. The full oops is below if any of you want to take a look.
OK, well, annotating the disassembly a bit:
> (gdb) disass free_pages_and_swap_cache
> Dump of assembler code for function free_pages_and_swap_cache:
>
... cc'ing a few folks who I know have been looking at this code
lately. The full oops is below if any of you want to take a look.
OK, well, annotating the disassembly a bit:
> (gdb) disass free_pages_and_swap_cache
> Dump of assembler code for function free_pages_and_swap_cache:
>
On Mon, Jul 9, 2018 at 7:54 AM, Dave Hansen wrote:
> On 07/09/2018 06:19 AM, Lu, Hongjiu wrote:
>> On 3 x86-64 machines, kernel 4.17.4 locked up under heavy load. 2 of
> them don't have any kernel messages. One has
>
> Hi H.J.,
>
> It'd be really handy if you could pastebin things like this, or
On Mon, Jul 9, 2018 at 7:54 AM, Dave Hansen wrote:
> On 07/09/2018 06:19 AM, Lu, Hongjiu wrote:
>> On 3 x86-64 machines, kernel 4.17.4 locked up under heavy load. 2 of
> them don't have any kernel messages. One has
>
> Hi H.J.,
>
> It'd be really handy if you could pastebin things like this, or
On 07/09/2018 06:19 AM, Lu, Hongjiu wrote:
> On 3 x86-64 machines, kernel 4.17.4 locked up under heavy load. 2 of
them don't have any kernel messages. One has
Hi H.J.,
It'd be really handy if you could pastebin things like this, or attach a
text file with the oops. Your email wrapped the heck
On 07/09/2018 06:19 AM, Lu, Hongjiu wrote:
> On 3 x86-64 machines, kernel 4.17.4 locked up under heavy load. 2 of
them don't have any kernel messages. One has
Hi H.J.,
It'd be really handy if you could pastebin things like this, or attach a
text file with the oops. Your email wrapped the heck
On 3 x86-64 machines, kernel 4.17.4 locked up under heavy load. 2 of them don't
have any kernel messages. One has
Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: general protection
fault: [#1] SMP PTI
Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: Modules linked in:
rpcsec_gss_krb5 nfsv4
On 3 x86-64 machines, kernel 4.17.4 locked up under heavy load. 2 of them don't
have any kernel messages. One has
Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: general protection
fault: [#1] SMP PTI
Jul 05 14:33:32 gnu-hsw-1.sc.intel.com kernel: Modules linked in:
rpcsec_gss_krb5 nfsv4
50 matches
Mail list logo