from:"Patrick McLean"

Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11

2017-11-17 Thread Patrick McLean



On 2017-11-17 04:55 PM, Linus Torvalds wrote:
> On Fri, Nov 17, 2017 at 4:27 PM, Patrick McLean <chutz...@gentoo.org> wrote:
>>
>> I am still getting the crash at d9e12200852d, I figured I would
>> double-check the "good" and "bad" kernels before starting a full bisect.
> 
> .. but without GCC_PLUGIN_RANDSTRUCT it's solid?

Yes, without GCC_PLUGIN_RANDSTRUCT it's solid.

> Kees removed even the baseline "randomize pure function pointer
> structures", so at that commit, nothing should be randomized.
> 
> But maybe the plugin code itself ends up confusing gcc somehow?
> 
> Even when it doesn't actually do that "relayout_struct()" on the
> structure, it always does those TYPE_ATTRIBUTES() games.

Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11

2017-11-17 Thread Patrick McLean



On 2017-11-17 04:55 PM, Linus Torvalds wrote:
> On Fri, Nov 17, 2017 at 4:27 PM, Patrick McLean  wrote:
>>
>> I am still getting the crash at d9e12200852d, I figured I would
>> double-check the "good" and "bad" kernels before starting a full bisect.
> 
> .. but without GCC_PLUGIN_RANDSTRUCT it's solid?

Yes, without GCC_PLUGIN_RANDSTRUCT it's solid.

> Kees removed even the baseline "randomize pure function pointer
> structures", so at that commit, nothing should be randomized.
> 
> But maybe the plugin code itself ends up confusing gcc somehow?
> 
> Even when it doesn't actually do that "relayout_struct()" on the
> structure, it always does those TYPE_ATTRIBUTES() games.

Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11

2017-11-17 Thread Patrick McLean

On 2017-11-17 01:26 PM, Kees Cook wrote:
> On Fri, Nov 17, 2017 at 11:03 AM, Patrick McLean <chutz...@gentoo.org> wrote:
>> On 2017-11-16 04:54 PM, Kees Cook wrote:
>>> On Mon, Nov 13, 2017 at 2:48 PM, Patrick McLean <chutz...@gentoo.org> wrote:
>>>> On 2017-11-11 09:31 AM, Linus Torvalds wrote:
>>>>> Boris Lukashev points out that Patrick should probably check a newer
>>>>> version of gcc.
>>>>>
>>>>> I looked around, and in one of the emails, Patrick said:
>>>>>
>>>>>   "No changes, both the working and broken kernels were built with
>>>>>distro-provided gcc 5.4.0 and binutils 2.28.1"
>>>>>
>>>>> and gcc-5.4.0 is certainly not very recent. It's not _ancient_, but
>>>>> it's a bug-fix release to a pretty old branch that is not exactly new.
>>>>>
>>>>> It would probably be good to check if the problems persist with gcc
>>>>> 6.x or 7.x.. I have no idea which gcc version the randstruct people
>>>>> tend to use themselves.
>>>>
>>>> I just tested it with gcc 7.2, and was able to reproduce the NULL
>>>> pointer dereference, the backtrace looks slightly different this time.
>>>>
>>>> I will also test with binutils 2.29, though I doubt that will make any
>>>> difference.
>>>>
>>>>> [   56.165181] BUG: unable to handle kernel NULL pointer dereference at 
>>>>> 0560
>>>>> [   56.166563] IP: vfs_statfs+0x7c/0xc0
>>>>> [   56.167249] PGD 0 P4D 0
>>>>> [   56.167860] Oops:  [#1] SMP
>>>>> [   56.176478] Modules linked in: ipt_MASQUERADE nf_nat_masquerade_ipv4 
>>>>> xt_multiport xt_addrtype iptable_mangle iptable>
>>>>> [   56.180227] CPU: 0 PID: 3985 Comm: nfsd Tainted: G   O
>>>>> 4.14.0-git-kratos-1 #1
>>>>> [   56.181728] Hardware name: TYAN S5510/S5510, BIOS V2.02 03/12/2013
>>>>> [   56.182729] task: 88040c412a00 task.stack: c90002c18000
>>>>> [   56.183629] RIP: 0010:vfs_statfs+0x7c/0xc0
>>>>> [   56.184341] RSP: 0018:c90002c1bb28 EFLAGS: 00010202
>>>>> [   56.185143] RAX:  RBX: c90002c1bbf0 RCX: 
>>>>> 0020
>>>>> [   56.186085] RDX: 1801 RSI: 1801 RDI: 
>>>>> 
>>>>> [   56.187066] RBP: c90002c1bbc0 R08: ff00 R09: 
>>>>> 00ff
>>>>> [   56.188268] R10: 0038be3a R11: 880408b18258 R12: 
>>>>> 
>>>>> [   56.189336] R13: 88040c23ad00 R14: 88040b874000 R15: 
>>>>> c90002c1bbf0
>>>>> [   56.190444] FS:  () GS:88041fc0() 
>>>>> knlGS:
>>>>> [   56.191876] CS:  0010 DS:  ES:  CR0: 80050033
>>>>> [   56.192843] CR2: 0560 CR3: 01e0a002 CR4: 
>>>>> 001606f0
>>>>> [   56.193898] Call Trace:
>>>>> [   56.194510]  nfsd4_encode_fattr+0x201/0x1f90
>>>>> [   56.195267]  ? generic_permission+0x12c/0x1a0
>>>>> [   56.196025]  nfsd4_encode_getattr+0x25/0x30
>>>>> [   56.196753]  nfsd4_encode_operation+0x98/0x1b0
>>>>> [   56.197526]  nfsd4_proc_compound+0x2a0/0x5e0
>>>>> [   56.198268]  nfsd_dispatch+0xe8/0x220
>>>>> [   56.198968]  svc_process_common+0x475/0x640
>>>>> [   56.199696]  ? nfsd_destroy+0x60/0x60
>>>>> [   56.200404]  svc_process+0xf2/0x1a0
>>>>> [   56.201079]  nfsd+0xe3/0x150
>>>>> [   56.201706]  kthread+0x117/0x130
>>>>> [   56.202354]  ? kthread_create_on_node+0x40/0x40
>>>>> [   56.203100]  ret_from_fork+0x25/0x30
>>>>> [   56.203774] Code: d6 89 d6 81 ce 00 04 00 00 f6 c1 08 0f 45 d6 89 d6 
>>>>> 81 ce 00 08 00 00 f6 c1 10 0f 45 d6 89 d6 81 ce>
>>>>> [   56.206289] RIP: vfs_statfs+0x7c/0xc0 RSP: c90002c1bb28
>>>>> [   56.207110] CR2: 0560
>>>>> [   56.207763] ---[ end trace d452986a80f64aaa ]---
>>>>
>>>>> On Sat, Nov 11, 2017 at 8:13 AM, Kees Cook <keesc...@chromium.org> wrote:
>>>>>>
>>>>>> I'll take a closer look at this and see if I can provide something to
>>>>>> narrow it down.
>>>

Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11

2017-11-17 Thread Patrick McLean

On 2017-11-17 01:26 PM, Kees Cook wrote:
> On Fri, Nov 17, 2017 at 11:03 AM, Patrick McLean  wrote:
>> On 2017-11-16 04:54 PM, Kees Cook wrote:
>>> On Mon, Nov 13, 2017 at 2:48 PM, Patrick McLean  wrote:
>>>> On 2017-11-11 09:31 AM, Linus Torvalds wrote:
>>>>> Boris Lukashev points out that Patrick should probably check a newer
>>>>> version of gcc.
>>>>>
>>>>> I looked around, and in one of the emails, Patrick said:
>>>>>
>>>>>   "No changes, both the working and broken kernels were built with
>>>>>distro-provided gcc 5.4.0 and binutils 2.28.1"
>>>>>
>>>>> and gcc-5.4.0 is certainly not very recent. It's not _ancient_, but
>>>>> it's a bug-fix release to a pretty old branch that is not exactly new.
>>>>>
>>>>> It would probably be good to check if the problems persist with gcc
>>>>> 6.x or 7.x.. I have no idea which gcc version the randstruct people
>>>>> tend to use themselves.
>>>>
>>>> I just tested it with gcc 7.2, and was able to reproduce the NULL
>>>> pointer dereference, the backtrace looks slightly different this time.
>>>>
>>>> I will also test with binutils 2.29, though I doubt that will make any
>>>> difference.
>>>>
>>>>> [   56.165181] BUG: unable to handle kernel NULL pointer dereference at 
>>>>> 0560
>>>>> [   56.166563] IP: vfs_statfs+0x7c/0xc0
>>>>> [   56.167249] PGD 0 P4D 0
>>>>> [   56.167860] Oops:  [#1] SMP
>>>>> [   56.176478] Modules linked in: ipt_MASQUERADE nf_nat_masquerade_ipv4 
>>>>> xt_multiport xt_addrtype iptable_mangle iptable>
>>>>> [   56.180227] CPU: 0 PID: 3985 Comm: nfsd Tainted: G   O
>>>>> 4.14.0-git-kratos-1 #1
>>>>> [   56.181728] Hardware name: TYAN S5510/S5510, BIOS V2.02 03/12/2013
>>>>> [   56.182729] task: 88040c412a00 task.stack: c90002c18000
>>>>> [   56.183629] RIP: 0010:vfs_statfs+0x7c/0xc0
>>>>> [   56.184341] RSP: 0018:c90002c1bb28 EFLAGS: 00010202
>>>>> [   56.185143] RAX:  RBX: c90002c1bbf0 RCX: 
>>>>> 0020
>>>>> [   56.186085] RDX: 1801 RSI: 1801 RDI: 
>>>>> 
>>>>> [   56.187066] RBP: c90002c1bbc0 R08: ff00 R09: 
>>>>> 00ff
>>>>> [   56.188268] R10: 0038be3a R11: 880408b18258 R12: 
>>>>> 
>>>>> [   56.189336] R13: 88040c23ad00 R14: 88040b874000 R15: 
>>>>> c90002c1bbf0
>>>>> [   56.190444] FS:  () GS:88041fc0() 
>>>>> knlGS:
>>>>> [   56.191876] CS:  0010 DS:  ES:  CR0: 80050033
>>>>> [   56.192843] CR2: 0560 CR3: 01e0a002 CR4: 
>>>>> 001606f0
>>>>> [   56.193898] Call Trace:
>>>>> [   56.194510]  nfsd4_encode_fattr+0x201/0x1f90
>>>>> [   56.195267]  ? generic_permission+0x12c/0x1a0
>>>>> [   56.196025]  nfsd4_encode_getattr+0x25/0x30
>>>>> [   56.196753]  nfsd4_encode_operation+0x98/0x1b0
>>>>> [   56.197526]  nfsd4_proc_compound+0x2a0/0x5e0
>>>>> [   56.198268]  nfsd_dispatch+0xe8/0x220
>>>>> [   56.198968]  svc_process_common+0x475/0x640
>>>>> [   56.199696]  ? nfsd_destroy+0x60/0x60
>>>>> [   56.200404]  svc_process+0xf2/0x1a0
>>>>> [   56.201079]  nfsd+0xe3/0x150
>>>>> [   56.201706]  kthread+0x117/0x130
>>>>> [   56.202354]  ? kthread_create_on_node+0x40/0x40
>>>>> [   56.203100]  ret_from_fork+0x25/0x30
>>>>> [   56.203774] Code: d6 89 d6 81 ce 00 04 00 00 f6 c1 08 0f 45 d6 89 d6 
>>>>> 81 ce 00 08 00 00 f6 c1 10 0f 45 d6 89 d6 81 ce>
>>>>> [   56.206289] RIP: vfs_statfs+0x7c/0xc0 RSP: c90002c1bb28
>>>>> [   56.207110] CR2: 0560
>>>>> [   56.207763] ---[ end trace d452986a80f64aaa ]---
>>>>
>>>>> On Sat, Nov 11, 2017 at 8:13 AM, Kees Cook  wrote:
>>>>>>
>>>>>> I'll take a closer look at this and see if I can provide something to
>>>>>> narrow it down.
>>>
>>> How reliable is this crash? The best idea I have to isolate it would
&

Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11

2017-11-17 Thread Patrick McLean

On 2017-11-16 04:54 PM, Kees Cook wrote:
> On Mon, Nov 13, 2017 at 2:48 PM, Patrick McLean <chutz...@gentoo.org> wrote:
>> On 2017-11-11 09:31 AM, Linus Torvalds wrote:
>>> Boris Lukashev points out that Patrick should probably check a newer
>>> version of gcc.
>>>
>>> I looked around, and in one of the emails, Patrick said:
>>>
>>>   "No changes, both the working and broken kernels were built with
>>>distro-provided gcc 5.4.0 and binutils 2.28.1"
>>>
>>> and gcc-5.4.0 is certainly not very recent. It's not _ancient_, but
>>> it's a bug-fix release to a pretty old branch that is not exactly new.
>>>
>>> It would probably be good to check if the problems persist with gcc
>>> 6.x or 7.x.. I have no idea which gcc version the randstruct people
>>> tend to use themselves.
>>
>> I just tested it with gcc 7.2, and was able to reproduce the NULL
>> pointer dereference, the backtrace looks slightly different this time.
>>
>> I will also test with binutils 2.29, though I doubt that will make any
>> difference.
>>
>>> [   56.165181] BUG: unable to handle kernel NULL pointer dereference at 
>>> 0560
>>> [   56.166563] IP: vfs_statfs+0x7c/0xc0
>>> [   56.167249] PGD 0 P4D 0
>>> [   56.167860] Oops:  [#1] SMP
>>> [   56.176478] Modules linked in: ipt_MASQUERADE nf_nat_masquerade_ipv4 
>>> xt_multiport xt_addrtype iptable_mangle iptable>
>>> [   56.180227] CPU: 0 PID: 3985 Comm: nfsd Tainted: G   O
>>> 4.14.0-git-kratos-1 #1
>>> [   56.181728] Hardware name: TYAN S5510/S5510, BIOS V2.02 03/12/2013
>>> [   56.182729] task: 88040c412a00 task.stack: c90002c18000
>>> [   56.183629] RIP: 0010:vfs_statfs+0x7c/0xc0
>>> [   56.184341] RSP: 0018:c90002c1bb28 EFLAGS: 00010202
>>> [   56.185143] RAX:  RBX: c90002c1bbf0 RCX: 
>>> 0020
>>> [   56.186085] RDX: 1801 RSI: 1801 RDI: 
>>> 
>>> [   56.187066] RBP: c90002c1bbc0 R08: ff00 R09: 
>>> 00ff
>>> [   56.188268] R10: 0038be3a R11: 880408b18258 R12: 
>>> 
>>> [   56.189336] R13: 88040c23ad00 R14: 88040b874000 R15: 
>>> c90002c1bbf0
>>> [   56.190444] FS:  () GS:88041fc0() 
>>> knlGS:
>>> [   56.191876] CS:  0010 DS:  ES:  CR0: 80050033
>>> [   56.192843] CR2: 0560 CR3: 01e0a002 CR4: 
>>> 001606f0
>>> [   56.193898] Call Trace:
>>> [   56.194510]  nfsd4_encode_fattr+0x201/0x1f90
>>> [   56.195267]  ? generic_permission+0x12c/0x1a0
>>> [   56.196025]  nfsd4_encode_getattr+0x25/0x30
>>> [   56.196753]  nfsd4_encode_operation+0x98/0x1b0
>>> [   56.197526]  nfsd4_proc_compound+0x2a0/0x5e0
>>> [   56.198268]  nfsd_dispatch+0xe8/0x220
>>> [   56.198968]  svc_process_common+0x475/0x640
>>> [   56.199696]  ? nfsd_destroy+0x60/0x60
>>> [   56.200404]  svc_process+0xf2/0x1a0
>>> [   56.201079]  nfsd+0xe3/0x150
>>> [   56.201706]  kthread+0x117/0x130
>>> [   56.202354]  ? kthread_create_on_node+0x40/0x40
>>> [   56.203100]  ret_from_fork+0x25/0x30
>>> [   56.203774] Code: d6 89 d6 81 ce 00 04 00 00 f6 c1 08 0f 45 d6 89 d6 81 
>>> ce 00 08 00 00 f6 c1 10 0f 45 d6 89 d6 81 ce>
>>> [   56.206289] RIP: vfs_statfs+0x7c/0xc0 RSP: c90002c1bb28
>>> [   56.207110] CR2: 0560
>>> [   56.207763] ---[ end trace d452986a80f64aaa ]---
>>
>>> On Sat, Nov 11, 2017 at 8:13 AM, Kees Cook <keesc...@chromium.org> wrote:
>>>>
>>>> I'll take a closer look at this and see if I can provide something to
>>>> narrow it down.
> 
> How reliable is this crash? The best idea I have to isolate it would
> be to bisect the additions of the __randomize_layout markings on
> various structures. I would start with the ones Al is most upset to
> see randomized. ;)

It's pretty reliable, once I get a bad seed I can reproduce the crash
pretty quickly.

> 
> All that said, I'd like to better understand the BIOS side of this a
> little better. In the first email in this thread, you showed two BUGs
> separated by a little time, which implies to me that the NULL deref
> and the BIOS no longer POSTing are separate (though seemingly related)
> issues. Have you had machines survive the BUG without blowing up the
> BIOS?

We had 3 machines die due to

Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11

2017-11-17 Thread Patrick McLean

On 2017-11-16 04:54 PM, Kees Cook wrote:
> On Mon, Nov 13, 2017 at 2:48 PM, Patrick McLean  wrote:
>> On 2017-11-11 09:31 AM, Linus Torvalds wrote:
>>> Boris Lukashev points out that Patrick should probably check a newer
>>> version of gcc.
>>>
>>> I looked around, and in one of the emails, Patrick said:
>>>
>>>   "No changes, both the working and broken kernels were built with
>>>distro-provided gcc 5.4.0 and binutils 2.28.1"
>>>
>>> and gcc-5.4.0 is certainly not very recent. It's not _ancient_, but
>>> it's a bug-fix release to a pretty old branch that is not exactly new.
>>>
>>> It would probably be good to check if the problems persist with gcc
>>> 6.x or 7.x.. I have no idea which gcc version the randstruct people
>>> tend to use themselves.
>>
>> I just tested it with gcc 7.2, and was able to reproduce the NULL
>> pointer dereference, the backtrace looks slightly different this time.
>>
>> I will also test with binutils 2.29, though I doubt that will make any
>> difference.
>>
>>> [   56.165181] BUG: unable to handle kernel NULL pointer dereference at 
>>> 0560
>>> [   56.166563] IP: vfs_statfs+0x7c/0xc0
>>> [   56.167249] PGD 0 P4D 0
>>> [   56.167860] Oops:  [#1] SMP
>>> [   56.176478] Modules linked in: ipt_MASQUERADE nf_nat_masquerade_ipv4 
>>> xt_multiport xt_addrtype iptable_mangle iptable>
>>> [   56.180227] CPU: 0 PID: 3985 Comm: nfsd Tainted: G   O
>>> 4.14.0-git-kratos-1 #1
>>> [   56.181728] Hardware name: TYAN S5510/S5510, BIOS V2.02 03/12/2013
>>> [   56.182729] task: 88040c412a00 task.stack: c90002c18000
>>> [   56.183629] RIP: 0010:vfs_statfs+0x7c/0xc0
>>> [   56.184341] RSP: 0018:c90002c1bb28 EFLAGS: 00010202
>>> [   56.185143] RAX:  RBX: c90002c1bbf0 RCX: 
>>> 0020
>>> [   56.186085] RDX: 1801 RSI: 1801 RDI: 
>>> 
>>> [   56.187066] RBP: c90002c1bbc0 R08: ff00 R09: 
>>> 00ff
>>> [   56.188268] R10: 0038be3a R11: 880408b18258 R12: 
>>> 
>>> [   56.189336] R13: 88040c23ad00 R14: 88040b874000 R15: 
>>> c90002c1bbf0
>>> [   56.190444] FS:  () GS:88041fc0() 
>>> knlGS:
>>> [   56.191876] CS:  0010 DS:  ES:  CR0: 80050033
>>> [   56.192843] CR2: 0560 CR3: 01e0a002 CR4: 
>>> 001606f0
>>> [   56.193898] Call Trace:
>>> [   56.194510]  nfsd4_encode_fattr+0x201/0x1f90
>>> [   56.195267]  ? generic_permission+0x12c/0x1a0
>>> [   56.196025]  nfsd4_encode_getattr+0x25/0x30
>>> [   56.196753]  nfsd4_encode_operation+0x98/0x1b0
>>> [   56.197526]  nfsd4_proc_compound+0x2a0/0x5e0
>>> [   56.198268]  nfsd_dispatch+0xe8/0x220
>>> [   56.198968]  svc_process_common+0x475/0x640
>>> [   56.199696]  ? nfsd_destroy+0x60/0x60
>>> [   56.200404]  svc_process+0xf2/0x1a0
>>> [   56.201079]  nfsd+0xe3/0x150
>>> [   56.201706]  kthread+0x117/0x130
>>> [   56.202354]  ? kthread_create_on_node+0x40/0x40
>>> [   56.203100]  ret_from_fork+0x25/0x30
>>> [   56.203774] Code: d6 89 d6 81 ce 00 04 00 00 f6 c1 08 0f 45 d6 89 d6 81 
>>> ce 00 08 00 00 f6 c1 10 0f 45 d6 89 d6 81 ce>
>>> [   56.206289] RIP: vfs_statfs+0x7c/0xc0 RSP: c90002c1bb28
>>> [   56.207110] CR2: 0560
>>> [   56.207763] ---[ end trace d452986a80f64aaa ]---
>>
>>> On Sat, Nov 11, 2017 at 8:13 AM, Kees Cook  wrote:
>>>>
>>>> I'll take a closer look at this and see if I can provide something to
>>>> narrow it down.
> 
> How reliable is this crash? The best idea I have to isolate it would
> be to bisect the additions of the __randomize_layout markings on
> various structures. I would start with the ones Al is most upset to
> see randomized. ;)

It's pretty reliable, once I get a bad seed I can reproduce the crash
pretty quickly.

> 
> All that said, I'd like to better understand the BIOS side of this a
> little better. In the first email in this thread, you showed two BUGs
> separated by a little time, which implies to me that the NULL deref
> and the BIOS no longer POSTing are separate (though seemingly related)
> issues. Have you had machines survive the BUG without blowing up the
> BIOS?

We had 3 machines die due to the BIOS issue (all of them pretty quickly
with the bad-se

Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11

2017-11-13 Thread Patrick McLean

On 2017-11-11 09:31 AM, Linus Torvalds wrote:
> Boris Lukashev points out that Patrick should probably check a newer
> version of gcc.
> 
> I looked around, and in one of the emails, Patrick said:
> 
>   "No changes, both the working and broken kernels were built with
>distro-provided gcc 5.4.0 and binutils 2.28.1"
> 
> and gcc-5.4.0 is certainly not very recent. It's not _ancient_, but
> it's a bug-fix release to a pretty old branch that is not exactly new.
> 
> It would probably be good to check if the problems persist with gcc
> 6.x or 7.x.. I have no idea which gcc version the randstruct people
> tend to use themselves.

I just tested it with gcc 7.2, and was able to reproduce the NULL
pointer dereference, the backtrace looks slightly different this time.

I will also test with binutils 2.29, though I doubt that will make any
difference.

> [   56.165181] BUG: unable to handle kernel NULL pointer dereference at 
> 0560
> [   56.166563] IP: vfs_statfs+0x7c/0xc0
> [   56.167249] PGD 0 P4D 0
> [   56.167860] Oops:  [#1] SMP
> [   56.176478] Modules linked in: ipt_MASQUERADE nf_nat_masquerade_ipv4 
> xt_multiport xt_addrtype iptable_mangle iptable>
> [   56.180227] CPU: 0 PID: 3985 Comm: nfsd Tainted: G   O
> 4.14.0-git-kratos-1 #1
> [   56.181728] Hardware name: TYAN S5510/S5510, BIOS V2.02 03/12/2013
> [   56.182729] task: 88040c412a00 task.stack: c90002c18000
> [   56.183629] RIP: 0010:vfs_statfs+0x7c/0xc0
> [   56.184341] RSP: 0018:c90002c1bb28 EFLAGS: 00010202
> [   56.185143] RAX:  RBX: c90002c1bbf0 RCX: 
> 0020
> [   56.186085] RDX: 1801 RSI: 1801 RDI: 
> 
> [   56.187066] RBP: c90002c1bbc0 R08: ff00 R09: 
> 00ff
> [   56.188268] R10: 0038be3a R11: 880408b18258 R12: 
> 
> [   56.189336] R13: 88040c23ad00 R14: 88040b874000 R15: 
> c90002c1bbf0
> [   56.190444] FS:  () GS:88041fc0() 
> knlGS:
> [   56.191876] CS:  0010 DS:  ES:  CR0: 80050033
> [   56.192843] CR2: 0560 CR3: 01e0a002 CR4: 
> 001606f0
> [   56.193898] Call Trace:
> [   56.194510]  nfsd4_encode_fattr+0x201/0x1f90
> [   56.195267]  ? generic_permission+0x12c/0x1a0
> [   56.196025]  nfsd4_encode_getattr+0x25/0x30
> [   56.196753]  nfsd4_encode_operation+0x98/0x1b0
> [   56.197526]  nfsd4_proc_compound+0x2a0/0x5e0
> [   56.198268]  nfsd_dispatch+0xe8/0x220
> [   56.198968]  svc_process_common+0x475/0x640
> [   56.199696]  ? nfsd_destroy+0x60/0x60
> [   56.200404]  svc_process+0xf2/0x1a0
> [   56.201079]  nfsd+0xe3/0x150
> [   56.201706]  kthread+0x117/0x130
> [   56.202354]  ? kthread_create_on_node+0x40/0x40
> [   56.203100]  ret_from_fork+0x25/0x30
> [   56.203774] Code: d6 89 d6 81 ce 00 04 00 00 f6 c1 08 0f 45 d6 89 d6 81 ce 
> 00 08 00 00 f6 c1 10 0f 45 d6 89 d6 81 ce>
> [   56.206289] RIP: vfs_statfs+0x7c/0xc0 RSP: c90002c1bb28
> [   56.207110] CR2: 0560
> [   56.207763] ---[ end trace d452986a80f64aaa ]---

> On Sat, Nov 11, 2017 at 8:13 AM, Kees Cook  wrote:
>>
>> I'll take a closer look at this and see if I can provide something to
>> narrow it down.

Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11

2017-11-13 Thread Patrick McLean

On 2017-11-11 09:31 AM, Linus Torvalds wrote:
> Boris Lukashev points out that Patrick should probably check a newer
> version of gcc.
> 
> I looked around, and in one of the emails, Patrick said:
> 
>   "No changes, both the working and broken kernels were built with
>distro-provided gcc 5.4.0 and binutils 2.28.1"
> 
> and gcc-5.4.0 is certainly not very recent. It's not _ancient_, but
> it's a bug-fix release to a pretty old branch that is not exactly new.
> 
> It would probably be good to check if the problems persist with gcc
> 6.x or 7.x.. I have no idea which gcc version the randstruct people
> tend to use themselves.

I just tested it with gcc 7.2, and was able to reproduce the NULL
pointer dereference, the backtrace looks slightly different this time.

I will also test with binutils 2.29, though I doubt that will make any
difference.

> [   56.165181] BUG: unable to handle kernel NULL pointer dereference at 
> 0560
> [   56.166563] IP: vfs_statfs+0x7c/0xc0
> [   56.167249] PGD 0 P4D 0
> [   56.167860] Oops:  [#1] SMP
> [   56.176478] Modules linked in: ipt_MASQUERADE nf_nat_masquerade_ipv4 
> xt_multiport xt_addrtype iptable_mangle iptable>
> [   56.180227] CPU: 0 PID: 3985 Comm: nfsd Tainted: G   O
> 4.14.0-git-kratos-1 #1
> [   56.181728] Hardware name: TYAN S5510/S5510, BIOS V2.02 03/12/2013
> [   56.182729] task: 88040c412a00 task.stack: c90002c18000
> [   56.183629] RIP: 0010:vfs_statfs+0x7c/0xc0
> [   56.184341] RSP: 0018:c90002c1bb28 EFLAGS: 00010202
> [   56.185143] RAX:  RBX: c90002c1bbf0 RCX: 
> 0020
> [   56.186085] RDX: 1801 RSI: 1801 RDI: 
> 
> [   56.187066] RBP: c90002c1bbc0 R08: ff00 R09: 
> 00ff
> [   56.188268] R10: 0038be3a R11: 880408b18258 R12: 
> 
> [   56.189336] R13: 88040c23ad00 R14: 88040b874000 R15: 
> c90002c1bbf0
> [   56.190444] FS:  () GS:88041fc0() 
> knlGS:
> [   56.191876] CS:  0010 DS:  ES:  CR0: 80050033
> [   56.192843] CR2: 0560 CR3: 01e0a002 CR4: 
> 001606f0
> [   56.193898] Call Trace:
> [   56.194510]  nfsd4_encode_fattr+0x201/0x1f90
> [   56.195267]  ? generic_permission+0x12c/0x1a0
> [   56.196025]  nfsd4_encode_getattr+0x25/0x30
> [   56.196753]  nfsd4_encode_operation+0x98/0x1b0
> [   56.197526]  nfsd4_proc_compound+0x2a0/0x5e0
> [   56.198268]  nfsd_dispatch+0xe8/0x220
> [   56.198968]  svc_process_common+0x475/0x640
> [   56.199696]  ? nfsd_destroy+0x60/0x60
> [   56.200404]  svc_process+0xf2/0x1a0
> [   56.201079]  nfsd+0xe3/0x150
> [   56.201706]  kthread+0x117/0x130
> [   56.202354]  ? kthread_create_on_node+0x40/0x40
> [   56.203100]  ret_from_fork+0x25/0x30
> [   56.203774] Code: d6 89 d6 81 ce 00 04 00 00 f6 c1 08 0f 45 d6 89 d6 81 ce 
> 00 08 00 00 f6 c1 10 0f 45 d6 89 d6 81 ce>
> [   56.206289] RIP: vfs_statfs+0x7c/0xc0 RSP: c90002c1bb28
> [   56.207110] CR2: 0560
> [   56.207763] ---[ end trace d452986a80f64aaa ]---

> On Sat, Nov 11, 2017 at 8:13 AM, Kees Cook  wrote:
>>
>> I'll take a closer look at this and see if I can provide something to
>> narrow it down.

Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11

2017-11-10 Thread Patrick McLean

On 2017-11-10 03:26 PM, Patrick McLean wrote:
> 
> 
> On 2017-11-10 10:42 AM, Linus Torvalds wrote:
>> On Thu, Nov 9, 2017 at 5:58 PM, Patrick McLean <chutz...@gentoo.org> wrote:
>>>
>>> Something must have changed since 4.13.8 to trigger this though.
>>
>> Arnd pointed to some commits that might be relevant for the cp210x
>> module, but those are all already in 4.13.8, so if 4.13.8 really is
>> rock solid for you, I don't think that's it.
>>
>> I really don't see anything that looks even half-way suspicious in
>> that 4.13.8..11 range. But as mentioned, compiler interactions can be
>> _really_ subtle.
>>
>> And hey, it can be a real kernel bug too, that just happens to be
>> exposed by RANDSTRUCT, so a bisect really would be very nice.
> 
> I am working on bisecting the issue now, but I think I have some more
> evidence pointing to a compiler issue related to RANDSTRUCT. There are
> actually 3 issues that we have seen. Sometimes we get the null pointer
> deref in the initial message, sometimes we get the GPF, and sometimes we
> see an issue where the NFS clients see all files as root-owned
> directories. Any given kernel will always see the same issue, but after
> a "make mrproper" and recompile (with the same .config), the issue will
> often change. I suspect that all 3 of these problems are actually the
> same issue manifesting itself in different ways depending on what seed
> the RANDSTRUCT gcc plugin is using.
> 

Further update on this, using the same seed for RANDSTRUCT, I have
reproduced this issue on v4.13.0, so it does not seem to be recently
introduced. The older kernel apparently only worked for us because we
were lucky. Generally we always compile new kernels from a fresh tree,
so they are never using the same seed.

In case someone wants to play with this, here are some interesting seeds
(in include/generated/randomize_layout_hash.h):

Produce a NULL pointer dereference (though I am not sure what the client
does to produce this).
5970d6494d0f4236ec57147a46e700f4f501536236d96f6f68ea223e06a258bc

All files for nfsd4 clients appear as directories owned as root, no
matter the real owner (this happens for all clients we have tested):
3f158cd1014800ce5eb6c1f532ac64f2357fdb9a684096557d2fbb1d281f325e

This is the seed that was breaking motherboards (make sure you have a
way to flash the BIOS with this one):
3e32f2d1b4a3dde9f2fd95151386cd1d5bd6167597a0b868f6273aabfc5712dd

Finally, here is a seed that produces a kernel that does not exhibit any
problems we are aware of:
e8698c12137fcd1dcbff6d1ed97e5d766128447a27ce9f9d61e0cb8c05ad4d3b

>>
>> Because in the end, compiler bugs are very rare. They are particularly
>> annoying when they do happen, though, so they loom big in the mind of
>> people who have had to chase them down.
>>

Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11

2017-11-10 Thread Patrick McLean

On 2017-11-10 03:26 PM, Patrick McLean wrote:
> 
> 
> On 2017-11-10 10:42 AM, Linus Torvalds wrote:
>> On Thu, Nov 9, 2017 at 5:58 PM, Patrick McLean  wrote:
>>>
>>> Something must have changed since 4.13.8 to trigger this though.
>>
>> Arnd pointed to some commits that might be relevant for the cp210x
>> module, but those are all already in 4.13.8, so if 4.13.8 really is
>> rock solid for you, I don't think that's it.
>>
>> I really don't see anything that looks even half-way suspicious in
>> that 4.13.8..11 range. But as mentioned, compiler interactions can be
>> _really_ subtle.
>>
>> And hey, it can be a real kernel bug too, that just happens to be
>> exposed by RANDSTRUCT, so a bisect really would be very nice.
> 
> I am working on bisecting the issue now, but I think I have some more
> evidence pointing to a compiler issue related to RANDSTRUCT. There are
> actually 3 issues that we have seen. Sometimes we get the null pointer
> deref in the initial message, sometimes we get the GPF, and sometimes we
> see an issue where the NFS clients see all files as root-owned
> directories. Any given kernel will always see the same issue, but after
> a "make mrproper" and recompile (with the same .config), the issue will
> often change. I suspect that all 3 of these problems are actually the
> same issue manifesting itself in different ways depending on what seed
> the RANDSTRUCT gcc plugin is using.
> 

Further update on this, using the same seed for RANDSTRUCT, I have
reproduced this issue on v4.13.0, so it does not seem to be recently
introduced. The older kernel apparently only worked for us because we
were lucky. Generally we always compile new kernels from a fresh tree,
so they are never using the same seed.

In case someone wants to play with this, here are some interesting seeds
(in include/generated/randomize_layout_hash.h):

Produce a NULL pointer dereference (though I am not sure what the client
does to produce this).
5970d6494d0f4236ec57147a46e700f4f501536236d96f6f68ea223e06a258bc

All files for nfsd4 clients appear as directories owned as root, no
matter the real owner (this happens for all clients we have tested):
3f158cd1014800ce5eb6c1f532ac64f2357fdb9a684096557d2fbb1d281f325e

This is the seed that was breaking motherboards (make sure you have a
way to flash the BIOS with this one):
3e32f2d1b4a3dde9f2fd95151386cd1d5bd6167597a0b868f6273aabfc5712dd

Finally, here is a seed that produces a kernel that does not exhibit any
problems we are aware of:
e8698c12137fcd1dcbff6d1ed97e5d766128447a27ce9f9d61e0cb8c05ad4d3b

>>
>> Because in the end, compiler bugs are very rare. They are particularly
>> annoying when they do happen, though, so they loom big in the mind of
>> people who have had to chase them down.
>>

Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11

2017-11-10 Thread Patrick McLean

On 2017-11-10 10:42 AM, Linus Torvalds wrote:
> On Thu, Nov 9, 2017 at 5:58 PM, Patrick McLean <chutz...@gentoo.org> wrote:
>>
>> Something must have changed since 4.13.8 to trigger this though.
> 
> Arnd pointed to some commits that might be relevant for the cp210x
> module, but those are all already in 4.13.8, so if 4.13.8 really is
> rock solid for you, I don't think that's it.
> 
> I really don't see anything that looks even half-way suspicious in
> that 4.13.8..11 range. But as mentioned, compiler interactions can be
> _really_ subtle.
> 
> And hey, it can be a real kernel bug too, that just happens to be
> exposed by RANDSTRUCT, so a bisect really would be very nice.

I am working on bisecting the issue now, but I think I have some more
evidence pointing to a compiler issue related to RANDSTRUCT. There are
actually 3 issues that we have seen. Sometimes we get the null pointer
deref in the initial message, sometimes we get the GPF, and sometimes we
see an issue where the NFS clients see all files as root-owned
directories. Any given kernel will always see the same issue, but after
a "make mrproper" and recompile (with the same .config), the issue will
often change. I suspect that all 3 of these problems are actually the
same issue manifesting itself in different ways depending on what seed
the RANDSTRUCT gcc plugin is using.

> 
> Because in the end, compiler bugs are very rare. They are particularly
> annoying when they do happen, though, so they loom big in the mind of
> people who have had to chase them down.
>

Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11

2017-11-10 Thread Patrick McLean

On 2017-11-10 10:42 AM, Linus Torvalds wrote:
> On Thu, Nov 9, 2017 at 5:58 PM, Patrick McLean  wrote:
>>
>> Something must have changed since 4.13.8 to trigger this though.
> 
> Arnd pointed to some commits that might be relevant for the cp210x
> module, but those are all already in 4.13.8, so if 4.13.8 really is
> rock solid for you, I don't think that's it.
> 
> I really don't see anything that looks even half-way suspicious in
> that 4.13.8..11 range. But as mentioned, compiler interactions can be
> _really_ subtle.
> 
> And hey, it can be a real kernel bug too, that just happens to be
> exposed by RANDSTRUCT, so a bisect really would be very nice.

I am working on bisecting the issue now, but I think I have some more
evidence pointing to a compiler issue related to RANDSTRUCT. There are
actually 3 issues that we have seen. Sometimes we get the null pointer
deref in the initial message, sometimes we get the GPF, and sometimes we
see an issue where the NFS clients see all files as root-owned
directories. Any given kernel will always see the same issue, but after
a "make mrproper" and recompile (with the same .config), the issue will
often change. I suspect that all 3 of these problems are actually the
same issue manifesting itself in different ways depending on what seed
the RANDSTRUCT gcc plugin is using.

> 
> Because in the end, compiler bugs are very rare. They are particularly
> annoying when they do happen, though, so they loom big in the mind of
> people who have had to chase them down.
>

Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11

2017-11-09 Thread Patrick McLean

On 2017-11-09 12:04 PM, Linus Torvalds wrote:
> On Thu, Nov 9, 2017 at 11:51 AM, Patrick McLean <chutz...@gentoo.org> wrote:
>>
>> We do have CONFIG_GCC_PLUGIN_STRUCTLEAK and
>> CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF_ALL enabled on these boxes as well as
>> CONFIG_GCC_PLUGIN_RANDSTRUCT as you pointed out before.
> 
> It might be worth just verifying without RANDSTRUCT in particular.
> 
> And most obviously: if there is some module or part of the kernel that
> got compiled with a different seed for the randstruct hashing, that
> will break in nasty nasty ways. Your out-of-kernel module is the
> obvious suspect for something like that, but honestly, it could be
> some missing build dependency, or simply a missing special case in the
> plugin itself a missing __no_randomize_layout or any number of things.
> 

We will check our fork against the in-kernel cp201x driver to make sure
we didn't miss anything, but it seems odd we would be hitting the issue
so consistently in the NFS code path, rather than somewhere in USB,
serial, or GPIO paths.

> So since you seem to be able to reproduce this _reasonably_ easily,
> it's definitely worth checking that it still reproduces even without
> the gcc plugins.

I haven't been able to reproduce it with RANDSTRUCT disabled (and
structleak enabled). I will keep trying for a little while more, but
evidence seems to be pointing to that.

Something must have changed since 4.13.8 to trigger this though. This
did not crop up at all until we tried 4.13.11, where it we saw it pretty
quickly. We have a pretty large number of machines running 4.13.6 with
RANDSTRUCT enabled and running a the same workload with many more
clients, and have not seen this bug at all.

Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11

2017-11-09 Thread Patrick McLean

On 2017-11-09 12:04 PM, Linus Torvalds wrote:
> On Thu, Nov 9, 2017 at 11:51 AM, Patrick McLean  wrote:
>>
>> We do have CONFIG_GCC_PLUGIN_STRUCTLEAK and
>> CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF_ALL enabled on these boxes as well as
>> CONFIG_GCC_PLUGIN_RANDSTRUCT as you pointed out before.
> 
> It might be worth just verifying without RANDSTRUCT in particular.
> 
> And most obviously: if there is some module or part of the kernel that
> got compiled with a different seed for the randstruct hashing, that
> will break in nasty nasty ways. Your out-of-kernel module is the
> obvious suspect for something like that, but honestly, it could be
> some missing build dependency, or simply a missing special case in the
> plugin itself a missing __no_randomize_layout or any number of things.
> 

We will check our fork against the in-kernel cp201x driver to make sure
we didn't miss anything, but it seems odd we would be hitting the issue
so consistently in the NFS code path, rather than somewhere in USB,
serial, or GPIO paths.

> So since you seem to be able to reproduce this _reasonably_ easily,
> it's definitely worth checking that it still reproduces even without
> the gcc plugins.

I haven't been able to reproduce it with RANDSTRUCT disabled (and
structleak enabled). I will keep trying for a little while more, but
evidence seems to be pointing to that.

Something must have changed since 4.13.8 to trigger this though. This
did not crop up at all until we tried 4.13.11, where it we saw it pretty
quickly. We have a pretty large number of machines running 4.13.6 with
RANDSTRUCT enabled and running a the same workload with many more
clients, and have not seen this bug at all.

Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11

2017-11-09 Thread Patrick McLean



On 2017-11-09 11:51 AM, Patrick McLean wrote:
> On 2017-11-09 11:37 AM, Al Viro wrote:
>> On Wed, Nov 08, 2017 at 06:40:22PM -0800, Linus Torvalds wrote:
>>
>>>> Here is the BUG we are getting:
>>>>> [   58.962528] BUG: unable to handle kernel NULL pointer dereference at 
>>>>> 0230
>>>>> [   58.963918] IP: vfs_statfs+0x73/0xb0
>>>
>>> The code disassembles to
>>
>>>   2a:* 48 8b b7 30 02 00 00 mov0x230(%rdi),%rsi <-- trapping instruction
>>
>>> that matters (and that traps) but I'm almost certain that it's the
>>> "mnt->mnt_sb->s_flags" loading that is part of calculate_f_flags()
>>> when it then does
>>>
>>>  flags_by_sb(mnt->mnt_sb->s_flags);
>>>
>>> and I think mnt->mnt_sb is NULL. We know it's not 'mnt' itself that is
>>> NULL, because we wouldn't have gotten this far if it was.
>>>
>>
>> All instances of struct dentry are created by __d_alloc()[*], which assigns
>> ->d_sb (never to be modified afterwards) *and* dereferences the pointer
>> it has stored in ->d_sb before the created struct dentry becomes visible
>> to anyone else.  No struct dentry should ever be observed with NULL ->d_sb;
>> the only way to get that is memory corruption or looking at freed instance
>> after its memory has been reused for something else and zeroed.
>>
>> In other words, we should never observe a struct mount with NULL 
>> ->mnt.mnt_sb -
>> not without memory corruption or looking at freed instance.
>>
>> The pointer in that case should've come from exp->ex_path.mnt, exp being
>> the argument of nfsd4_encode_fattr().  Sure, it might have been a dangling
>> reference.  However, it looks a lot more like a memory corruptor *OR*
>> miscompiled kernel.
>>
>> What kind of load do the reproducer boxen have and how fast does that
>> bug trigger?  Would it be possible to slap something like
>>  if (unlikely(!exp->exp_path.mnt->mnt_sb)) {
>>  struct mount *m = real_mount(exp->exp_path.mnt);
>>  printk(KERN_ERR "mnt: %p\n", exp->exp_path.mnt);
>>  printk(KERN_ERR "name: [%s]\n", m->mnt_devname);
>>  printk(KERN_ERR "ns: [%p]\n", m->mnt_ns);
>>  printk(KERN_ERR "parent: [%p]\n", m->mnt_parent);
>>  WARN_ON(1);
>>  err = -EINVAL;
>>  goto out_nfserr;
>>  }
>> in the beginning of nfsd4_encode_fattr() (with include of ../mount.h added
>> in fs/nfsd/nfs4xdr.c) and see what will it catch?
>>
>> Both with and without randomized structs, if possible - I might be barking
>> at the wrong tree, but IMO the very first step in localizing that crap is
>> to find out whether it's toolchain-related or not.
> 

That condition did not seem to trigger, and I am getting a slightly
different crash message (GPF rather than null pointer dereference). Here
is the dump from the latest crash (with CONFIG_GCC_PLUGIN_STRUCTLEAK,
CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF_ALL and CONFIG_GCC_PLUGIN_RANDSTRUCT
all enabled).

> [   36.834232] general protection fault:  [#1] SMP
> [   36.835168] Modules linked in: ipt_MASQUERADE nf_nat_masquerade_ipv4 
> xt_multiport xt_addrtype iptable_mangle iptable_raw iptable_nat nf_nat_ipv4 
> nf_nat gkuart(O) usbserial x86_pkg_temp_thermal ie31200_edac tpm_tis 
> ipmi_ssif tpm_tis_core ext4 mbcache jbd2 e1000e crc32c_intel
> [   36.839120] CPU: 1 PID: 3969 Comm: nfsd Tainted: G   O
> 4.14.0-rc8-git-kratos-1-00053-gd93d4ce103fd-dirty #1
> [   36.840883] Hardware name: TYAN S5510/S5510, BIOS V2.02 03/12/2013
> [   36.841892] task: 88040a0b1c80 task.stack: c900027bc000
> [   36.842887] RIP: 0010:vfs_statfs+0x73/0xb0
> [   36.843728] RSP: 0018:c900027bfb30 EFLAGS: 00010202
> [   36.844687] RAX:  RBX: c900027bfbf8 RCX: 
> 180d
> [   36.845891] RDX: 080d RSI: 0020 RDI: 
> e2006d6574737973
> [   36.847075] RBP: c900027bfbc8 R08:  R09: 
> 00ff
> [   36.848175] R10: 0038be3a R11: 88040b687578 R12: 
> 
> [   36.849260] R13: 88040d7dc400 R14: 88040d38b000 R15: 
> c900027bfbf8
> [   36.850347] FS:  () GS:88041fc4() 
> knlGS:
> [   36.851891] CS:  0010 DS:  ES:  CR0: 80050033
> [   36.852873] CR2: 7f049228edc0 CR3: 01e0a004 CR4: 
> 001606e0
> [   36.853942] Call Trace:
> [   36.854

Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11

2017-11-09 Thread Patrick McLean



On 2017-11-09 11:51 AM, Patrick McLean wrote:
> On 2017-11-09 11:37 AM, Al Viro wrote:
>> On Wed, Nov 08, 2017 at 06:40:22PM -0800, Linus Torvalds wrote:
>>
>>>> Here is the BUG we are getting:
>>>>> [   58.962528] BUG: unable to handle kernel NULL pointer dereference at 
>>>>> 0230
>>>>> [   58.963918] IP: vfs_statfs+0x73/0xb0
>>>
>>> The code disassembles to
>>
>>>   2a:* 48 8b b7 30 02 00 00 mov0x230(%rdi),%rsi <-- trapping instruction
>>
>>> that matters (and that traps) but I'm almost certain that it's the
>>> "mnt->mnt_sb->s_flags" loading that is part of calculate_f_flags()
>>> when it then does
>>>
>>>  flags_by_sb(mnt->mnt_sb->s_flags);
>>>
>>> and I think mnt->mnt_sb is NULL. We know it's not 'mnt' itself that is
>>> NULL, because we wouldn't have gotten this far if it was.
>>>
>>
>> All instances of struct dentry are created by __d_alloc()[*], which assigns
>> ->d_sb (never to be modified afterwards) *and* dereferences the pointer
>> it has stored in ->d_sb before the created struct dentry becomes visible
>> to anyone else.  No struct dentry should ever be observed with NULL ->d_sb;
>> the only way to get that is memory corruption or looking at freed instance
>> after its memory has been reused for something else and zeroed.
>>
>> In other words, we should never observe a struct mount with NULL 
>> ->mnt.mnt_sb -
>> not without memory corruption or looking at freed instance.
>>
>> The pointer in that case should've come from exp->ex_path.mnt, exp being
>> the argument of nfsd4_encode_fattr().  Sure, it might have been a dangling
>> reference.  However, it looks a lot more like a memory corruptor *OR*
>> miscompiled kernel.
>>
>> What kind of load do the reproducer boxen have and how fast does that
>> bug trigger?  Would it be possible to slap something like
>>  if (unlikely(!exp->exp_path.mnt->mnt_sb)) {
>>  struct mount *m = real_mount(exp->exp_path.mnt);
>>  printk(KERN_ERR "mnt: %p\n", exp->exp_path.mnt);
>>  printk(KERN_ERR "name: [%s]\n", m->mnt_devname);
>>  printk(KERN_ERR "ns: [%p]\n", m->mnt_ns);
>>  printk(KERN_ERR "parent: [%p]\n", m->mnt_parent);
>>  WARN_ON(1);
>>  err = -EINVAL;
>>  goto out_nfserr;
>>  }
>> in the beginning of nfsd4_encode_fattr() (with include of ../mount.h added
>> in fs/nfsd/nfs4xdr.c) and see what will it catch?
>>
>> Both with and without randomized structs, if possible - I might be barking
>> at the wrong tree, but IMO the very first step in localizing that crap is
>> to find out whether it's toolchain-related or not.
> 

That condition did not seem to trigger, and I am getting a slightly
different crash message (GPF rather than null pointer dereference). Here
is the dump from the latest crash (with CONFIG_GCC_PLUGIN_STRUCTLEAK,
CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF_ALL and CONFIG_GCC_PLUGIN_RANDSTRUCT
all enabled).

> [   36.834232] general protection fault:  [#1] SMP
> [   36.835168] Modules linked in: ipt_MASQUERADE nf_nat_masquerade_ipv4 
> xt_multiport xt_addrtype iptable_mangle iptable_raw iptable_nat nf_nat_ipv4 
> nf_nat gkuart(O) usbserial x86_pkg_temp_thermal ie31200_edac tpm_tis 
> ipmi_ssif tpm_tis_core ext4 mbcache jbd2 e1000e crc32c_intel
> [   36.839120] CPU: 1 PID: 3969 Comm: nfsd Tainted: G   O
> 4.14.0-rc8-git-kratos-1-00053-gd93d4ce103fd-dirty #1
> [   36.840883] Hardware name: TYAN S5510/S5510, BIOS V2.02 03/12/2013
> [   36.841892] task: 88040a0b1c80 task.stack: c900027bc000
> [   36.842887] RIP: 0010:vfs_statfs+0x73/0xb0
> [   36.843728] RSP: 0018:c900027bfb30 EFLAGS: 00010202
> [   36.844687] RAX:  RBX: c900027bfbf8 RCX: 
> 180d
> [   36.845891] RDX: 080d RSI: 0020 RDI: 
> e2006d6574737973
> [   36.847075] RBP: c900027bfbc8 R08:  R09: 
> 00ff
> [   36.848175] R10: 0038be3a R11: 88040b687578 R12: 
> 
> [   36.849260] R13: 88040d7dc400 R14: 88040d38b000 R15: 
> c900027bfbf8
> [   36.850347] FS:  () GS:88041fc4() 
> knlGS:
> [   36.851891] CS:  0010 DS:  ES:  CR0: 80050033
> [   36.852873] CR2: 7f049228edc0 CR3: 01e0a004 CR4: 
> 001606e0
> [   36.853942] Call Trace:
> [   36.854

Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11

2017-11-09 Thread Patrick McLean



On 2017-11-09 12:47 PM, J. Bruce Fields wrote:
> On Wed, Nov 08, 2017 at 06:40:22PM -0800, Linus Torvalds wrote:
>> Anyway, that cmovne noise makes it a bit hard to see the actual part
>> that matters (and that traps) but I'm almost certain that it's the
>> "mnt->mnt_sb->s_flags" loading that is part of calculate_f_flags()
>> when it then does
>>
>>  flags_by_sb(mnt->mnt_sb->s_flags);
>>
>> and I think mnt->mnt_sb is NULL. We know it's not 'mnt' itself that is
>> NULL, because we wouldn't have gotten this far if it was.
>>
>> Now, afaik, mnt->mnt_sb should never be NULL in the first place for a
>> proper path. And the vfs_statfs() code itself hasn't changed in a
>> while.
>>
>> Which does seem to implicate nfsd as having passed in a bad path to
>> vfs_statfs(). But I'm not seeing any changes in nfsd either.
>>
>> In particular, there are *no* nfsd changes in that 4.13.8..4.13.11
>> range. There is a bunch of xfs changes, though. What's the underlying
>> filesystem that you are exporting?
>>
>> But bringing in Al Viro and Bruce Fields explicitly in case they see
>> something. And Darrick, just in case it might be xfs.
> 
> Looking at https://lkml.org/lkml/2017/11/8/1086 for the actual oops...
> 
> It doesn't remind me of any known issue.
> 
> So either I'm overlooking something or the bug's elsewhere.
> 
> It sounds like you're varying *only* the server version, so there's not
> much chance that this could be triggered by changes in client behavior?
> 

We are definitely only varying the kernel on the server, nothing on the
client side is changing. The client in this case is essentially an
embedded device that we do not have a whole lot of control over.

Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11

2017-11-09 Thread Patrick McLean



On 2017-11-09 12:47 PM, J. Bruce Fields wrote:
> On Wed, Nov 08, 2017 at 06:40:22PM -0800, Linus Torvalds wrote:
>> Anyway, that cmovne noise makes it a bit hard to see the actual part
>> that matters (and that traps) but I'm almost certain that it's the
>> "mnt->mnt_sb->s_flags" loading that is part of calculate_f_flags()
>> when it then does
>>
>>  flags_by_sb(mnt->mnt_sb->s_flags);
>>
>> and I think mnt->mnt_sb is NULL. We know it's not 'mnt' itself that is
>> NULL, because we wouldn't have gotten this far if it was.
>>
>> Now, afaik, mnt->mnt_sb should never be NULL in the first place for a
>> proper path. And the vfs_statfs() code itself hasn't changed in a
>> while.
>>
>> Which does seem to implicate nfsd as having passed in a bad path to
>> vfs_statfs(). But I'm not seeing any changes in nfsd either.
>>
>> In particular, there are *no* nfsd changes in that 4.13.8..4.13.11
>> range. There is a bunch of xfs changes, though. What's the underlying
>> filesystem that you are exporting?
>>
>> But bringing in Al Viro and Bruce Fields explicitly in case they see
>> something. And Darrick, just in case it might be xfs.
> 
> Looking at https://lkml.org/lkml/2017/11/8/1086 for the actual oops...
> 
> It doesn't remind me of any known issue.
> 
> So either I'm overlooking something or the bug's elsewhere.
> 
> It sounds like you're varying *only* the server version, so there's not
> much chance that this could be triggered by changes in client behavior?
> 

We are definitely only varying the kernel on the server, nothing on the
client side is changing. The client in this case is essentially an
embedded device that we do not have a whole lot of control over.

Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11

2017-11-09 Thread Patrick McLean

On 2017-11-09 11:37 AM, Al Viro wrote:
> On Wed, Nov 08, 2017 at 06:40:22PM -0800, Linus Torvalds wrote:
> 
>>> Here is the BUG we are getting:
 [   58.962528] BUG: unable to handle kernel NULL pointer dereference at 
 0230
 [   58.963918] IP: vfs_statfs+0x73/0xb0
>>
>> The code disassembles to
> 
>>   2a:* 48 8b b7 30 02 00 00 mov0x230(%rdi),%rsi <-- trapping instruction
> 
>> that matters (and that traps) but I'm almost certain that it's the
>> "mnt->mnt_sb->s_flags" loading that is part of calculate_f_flags()
>> when it then does
>>
>>  flags_by_sb(mnt->mnt_sb->s_flags);
>>
>> and I think mnt->mnt_sb is NULL. We know it's not 'mnt' itself that is
>> NULL, because we wouldn't have gotten this far if it was.
>>
> 
> All instances of struct dentry are created by __d_alloc()[*], which assigns
> ->d_sb (never to be modified afterwards) *and* dereferences the pointer
> it has stored in ->d_sb before the created struct dentry becomes visible
> to anyone else.  No struct dentry should ever be observed with NULL ->d_sb;
> the only way to get that is memory corruption or looking at freed instance
> after its memory has been reused for something else and zeroed.
> 
> In other words, we should never observe a struct mount with NULL ->mnt.mnt_sb 
> -
> not without memory corruption or looking at freed instance.
> 
> The pointer in that case should've come from exp->ex_path.mnt, exp being
> the argument of nfsd4_encode_fattr().  Sure, it might have been a dangling
> reference.  However, it looks a lot more like a memory corruptor *OR*
> miscompiled kernel.
> 
> What kind of load do the reproducer boxen have and how fast does that
> bug trigger?  Would it be possible to slap something like
>   if (unlikely(!exp->exp_path.mnt->mnt_sb)) {
>   struct mount *m = real_mount(exp->exp_path.mnt);
>   printk(KERN_ERR "mnt: %p\n", exp->exp_path.mnt);
>   printk(KERN_ERR "name: [%s]\n", m->mnt_devname);
>   printk(KERN_ERR "ns: [%p]\n", m->mnt_ns);
>   printk(KERN_ERR "parent: [%p]\n", m->mnt_parent);
>   WARN_ON(1);
>   err = -EINVAL;
>   goto out_nfserr;
>   }
> in the beginning of nfsd4_encode_fattr() (with include of ../mount.h added
> in fs/nfsd/nfs4xdr.c) and see what will it catch?
> 
> Both with and without randomized structs, if possible - I might be barking
> at the wrong tree, but IMO the very first step in localizing that crap is
> to find out whether it's toolchain-related or not.

The reproducer boxen are not under particularly heavy load, they are
serving NFS to 1 or 2 clients (which are essentially embedded devices).
When the bug triggers, it usually triggers pretty fast and reliably, but
it seems to only trigger on some subset of bootups. Once it fails to
trigger, we seem to have to reboot to get it to trigger.

I should be able to have some results with that added in a few hours.
It's weirdly unreliable to reproduce this.

We do have CONFIG_GCC_PLUGIN_STRUCTLEAK and
CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF_ALL enabled on these boxes as well as
CONFIG_GCC_PLUGIN_RANDSTRUCT as you pointed out before.

Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11

2017-11-09 Thread Patrick McLean

On 2017-11-09 11:37 AM, Al Viro wrote:
> On Wed, Nov 08, 2017 at 06:40:22PM -0800, Linus Torvalds wrote:
> 
>>> Here is the BUG we are getting:
 [   58.962528] BUG: unable to handle kernel NULL pointer dereference at 
 0230
 [   58.963918] IP: vfs_statfs+0x73/0xb0
>>
>> The code disassembles to
> 
>>   2a:* 48 8b b7 30 02 00 00 mov0x230(%rdi),%rsi <-- trapping instruction
> 
>> that matters (and that traps) but I'm almost certain that it's the
>> "mnt->mnt_sb->s_flags" loading that is part of calculate_f_flags()
>> when it then does
>>
>>  flags_by_sb(mnt->mnt_sb->s_flags);
>>
>> and I think mnt->mnt_sb is NULL. We know it's not 'mnt' itself that is
>> NULL, because we wouldn't have gotten this far if it was.
>>
> 
> All instances of struct dentry are created by __d_alloc()[*], which assigns
> ->d_sb (never to be modified afterwards) *and* dereferences the pointer
> it has stored in ->d_sb before the created struct dentry becomes visible
> to anyone else.  No struct dentry should ever be observed with NULL ->d_sb;
> the only way to get that is memory corruption or looking at freed instance
> after its memory has been reused for something else and zeroed.
> 
> In other words, we should never observe a struct mount with NULL ->mnt.mnt_sb 
> -
> not without memory corruption or looking at freed instance.
> 
> The pointer in that case should've come from exp->ex_path.mnt, exp being
> the argument of nfsd4_encode_fattr().  Sure, it might have been a dangling
> reference.  However, it looks a lot more like a memory corruptor *OR*
> miscompiled kernel.
> 
> What kind of load do the reproducer boxen have and how fast does that
> bug trigger?  Would it be possible to slap something like
>   if (unlikely(!exp->exp_path.mnt->mnt_sb)) {
>   struct mount *m = real_mount(exp->exp_path.mnt);
>   printk(KERN_ERR "mnt: %p\n", exp->exp_path.mnt);
>   printk(KERN_ERR "name: [%s]\n", m->mnt_devname);
>   printk(KERN_ERR "ns: [%p]\n", m->mnt_ns);
>   printk(KERN_ERR "parent: [%p]\n", m->mnt_parent);
>   WARN_ON(1);
>   err = -EINVAL;
>   goto out_nfserr;
>   }
> in the beginning of nfsd4_encode_fattr() (with include of ../mount.h added
> in fs/nfsd/nfs4xdr.c) and see what will it catch?
> 
> Both with and without randomized structs, if possible - I might be barking
> at the wrong tree, but IMO the very first step in localizing that crap is
> to find out whether it's toolchain-related or not.

The reproducer boxen are not under particularly heavy load, they are
serving NFS to 1 or 2 clients (which are essentially embedded devices).
When the bug triggers, it usually triggers pretty fast and reliably, but
it seems to only trigger on some subset of bootups. Once it fails to
trigger, we seem to have to reboot to get it to trigger.

I should be able to have some results with that added in a few hours.
It's weirdly unreliable to reproduce this.

We do have CONFIG_GCC_PLUGIN_STRUCTLEAK and
CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF_ALL enabled on these boxes as well as
CONFIG_GCC_PLUGIN_RANDSTRUCT as you pointed out before.

Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11

2017-11-09 Thread Patrick McLean



On 2017-11-09 11:38 AM, Al Viro wrote:
> On Thu, Nov 09, 2017 at 11:34:19AM -0800, Patrick McLean wrote:
> 
>>> In particular, there are *no* nfsd changes in that 4.13.8..4.13.11
>>> range. There is a bunch of xfs changes, though. What's the underlying
>>> filesystem that you are exporting?
>>
>> It's an ext4 filesystem.
> 
> Had there been toolchain changes around the same period?
> 
No changes, both the working and broken kernels were built with
distro-provided gcc 5.4.0 and binutils 2.28.1.

Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11

2017-11-09 Thread Patrick McLean



On 2017-11-09 11:38 AM, Al Viro wrote:
> On Thu, Nov 09, 2017 at 11:34:19AM -0800, Patrick McLean wrote:
> 
>>> In particular, there are *no* nfsd changes in that 4.13.8..4.13.11
>>> range. There is a bunch of xfs changes, though. What's the underlying
>>> filesystem that you are exporting?
>>
>> It's an ext4 filesystem.
> 
> Had there been toolchain changes around the same period?
> 
No changes, both the working and broken kernels were built with
distro-provided gcc 5.4.0 and binutils 2.28.1.

Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11

2017-11-09 Thread Patrick McLean



On 2017-11-08 06:40 PM, Linus Torvalds wrote:
> On Wed, Nov 8, 2017 at 4:43 PM, Patrick McLean <chutz...@gentoo.org> wrote:
>> As of 4.13.11 (and also with 4.14-rc) we have an issue where when
>> serving nfs4 sometimes we get the following BUG. When this bug happens,
>> it usually also causes the motherboard to no longer POST until we
>> externally re-flash the BIOS (using the BMC web interface). If a
>> motherboard does not have an external way to flash the BIOS, this would
>> brick the hardware.
> 
> That sounds like your BIOS is just broken.

All the dead boards were from the same vendor. We are going to try some
boards from another vendor today.

> 
> The kernel oops is probably just a trigger for that - possibly because
> you reboot with a particular state that breaks the BIOS.
> 
> Also, are you sure you really need to reflash the BIOS? It's actually
> fairly hard to overwrite the BIOS itself, but crashing with bad
> hardware state (where "bad" can just mean "unexpected by the BIOS")
> can cause the BIOS to not properly re-initialize things, and hang at
> boot.
> 
> So not booting cleanly from a warm reset is a reasonably common BIOS failure.
> 
> And yes, reflashing tends to force a full initialization and thus
> "fixes" things, but it may be a big hammer when a cold boot or just a
> "reset BIOS to safe defaults" might be sufficient.
> 
> In pretty much all cases this is a sign of a nasty BIOS problem,
> though, and you may want to look into a firmware update from the
> vendor for that.

We tried a cold power off (physically unplugging the machine from power)
and a CMOS reset, and neither helped. The only thing that actually
restored one of the dead boards was a reflash. I did the reflash with
the latest code when I reflashed it.

> 
> But on to the kernel side:
> 
>> Here is the BUG we are getting:
>>> [   58.962528] BUG: unable to handle kernel NULL pointer dereference at 
>>> 0230
>>> [   58.963918] IP: vfs_statfs+0x73/0xb0
> 
> The code disassembles to
> 
>0: 83 c9 08  or $0x8,%ecx
>3: 40 f6 c6 04  test   $0x4,%sil
>7: 0f 45 d1  cmovne %ecx,%edx
>a: 89 d1mov%edx,%ecx
>c: 80 cd 04  or $0x4,%ch
>f: 40 f6 c6 08  test   $0x8,%sil
>   13: 0f 45 d1  cmovne %ecx,%edx
>   16: 89 d1mov%edx,%ecx
>   18: 80 cd 08  or $0x8,%ch
>   1b: 40 f6 c6 10  test   $0x10,%sil
>   1f: 0f 45 d1  cmovne %ecx,%edx
>   22: 89 d1mov%edx,%ecx
>   24: 80 cd 10  or $0x10,%ch
>   27: 83 e6 20  and$0x20,%esi
>   2a:* 48 8b b7 30 02 00 00 mov0x230(%rdi),%rsi <-- trapping instruction
>   31: 0f 45 d1  cmovne %ecx,%edx
>   34: 83 ca 20  or $0x20,%edx
>   37: 89 f1mov%esi,%ecx
>   39: 83 e1 10  and$0x10,%ecx
>   3c: 89 cfmov%ecx,%edi
> 
> and all those odd cmovne and bit-ops are just the bit selection code
> in flags_by_mnt(), which is inlined through calculate_f_flags (which
> is _also_ inlined) into vfs_statfs().
> 
> Sadly, gcc makes a mess of it and actually generates code that looks
> like the original C. I would have hoped that gcc could have turned
> 
>if (x & BIT)
> y |= OTHER_BIT;
> 
> into
> 
> y |= (x & BIT) shifted-by-the-bit-difference-between BIT/OTHER_BIT;
> 
> but that doesn't happen. We actually do it by hand in some other more
> critical places, but it's painful to do by hand (because the shift
> direction/amount is not trivial to do in C).
> 
> Anyway, that cmovne noise makes it a bit hard to see the actual part
> that matters (and that traps) but I'm almost certain that it's the
> "mnt->mnt_sb->s_flags" loading that is part of calculate_f_flags()
> when it then does
> 
>  flags_by_sb(mnt->mnt_sb->s_flags);
> 
> and I think mnt->mnt_sb is NULL. We know it's not 'mnt' itself that is
> NULL, because we wouldn't have gotten this far if it was.
> 
> Now, afaik, mnt->mnt_sb should never be NULL in the first place for a
> proper path. And the vfs_statfs() code itself hasn't changed in a
> while.
> 
> Which does seem to implicate nfsd as having passed in a bad path to
> vfs_statfs(). But I'm not seeing any changes in nfsd either.
> 
> In particular, there are *no* nfsd changes in that 4.13.8..4.13.11
> range. There is a bunch of xfs changes, though. What's the underlying
> filesystem that you are exporting?

It's an ext4 filesystem.

> 
> But bringing in Al Viro and Bruce Fields explicitly in case they see
> something. And Darrick, just in case it might be xfs.
> 

Thanks

Re: [nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11

2017-11-09 Thread Patrick McLean



On 2017-11-08 06:40 PM, Linus Torvalds wrote:
> On Wed, Nov 8, 2017 at 4:43 PM, Patrick McLean  wrote:
>> As of 4.13.11 (and also with 4.14-rc) we have an issue where when
>> serving nfs4 sometimes we get the following BUG. When this bug happens,
>> it usually also causes the motherboard to no longer POST until we
>> externally re-flash the BIOS (using the BMC web interface). If a
>> motherboard does not have an external way to flash the BIOS, this would
>> brick the hardware.
> 
> That sounds like your BIOS is just broken.

All the dead boards were from the same vendor. We are going to try some
boards from another vendor today.

> 
> The kernel oops is probably just a trigger for that - possibly because
> you reboot with a particular state that breaks the BIOS.
> 
> Also, are you sure you really need to reflash the BIOS? It's actually
> fairly hard to overwrite the BIOS itself, but crashing with bad
> hardware state (where "bad" can just mean "unexpected by the BIOS")
> can cause the BIOS to not properly re-initialize things, and hang at
> boot.
> 
> So not booting cleanly from a warm reset is a reasonably common BIOS failure.
> 
> And yes, reflashing tends to force a full initialization and thus
> "fixes" things, but it may be a big hammer when a cold boot or just a
> "reset BIOS to safe defaults" might be sufficient.
> 
> In pretty much all cases this is a sign of a nasty BIOS problem,
> though, and you may want to look into a firmware update from the
> vendor for that.

We tried a cold power off (physically unplugging the machine from power)
and a CMOS reset, and neither helped. The only thing that actually
restored one of the dead boards was a reflash. I did the reflash with
the latest code when I reflashed it.

> 
> But on to the kernel side:
> 
>> Here is the BUG we are getting:
>>> [   58.962528] BUG: unable to handle kernel NULL pointer dereference at 
>>> 0230
>>> [   58.963918] IP: vfs_statfs+0x73/0xb0
> 
> The code disassembles to
> 
>0: 83 c9 08  or $0x8,%ecx
>3: 40 f6 c6 04  test   $0x4,%sil
>7: 0f 45 d1  cmovne %ecx,%edx
>a: 89 d1mov%edx,%ecx
>c: 80 cd 04  or $0x4,%ch
>f: 40 f6 c6 08  test   $0x8,%sil
>   13: 0f 45 d1  cmovne %ecx,%edx
>   16: 89 d1mov%edx,%ecx
>   18: 80 cd 08  or $0x8,%ch
>   1b: 40 f6 c6 10  test   $0x10,%sil
>   1f: 0f 45 d1  cmovne %ecx,%edx
>   22: 89 d1mov%edx,%ecx
>   24: 80 cd 10  or $0x10,%ch
>   27: 83 e6 20  and$0x20,%esi
>   2a:* 48 8b b7 30 02 00 00 mov0x230(%rdi),%rsi <-- trapping instruction
>   31: 0f 45 d1  cmovne %ecx,%edx
>   34: 83 ca 20  or $0x20,%edx
>   37: 89 f1mov%esi,%ecx
>   39: 83 e1 10  and$0x10,%ecx
>   3c: 89 cfmov%ecx,%edi
> 
> and all those odd cmovne and bit-ops are just the bit selection code
> in flags_by_mnt(), which is inlined through calculate_f_flags (which
> is _also_ inlined) into vfs_statfs().
> 
> Sadly, gcc makes a mess of it and actually generates code that looks
> like the original C. I would have hoped that gcc could have turned
> 
>if (x & BIT)
> y |= OTHER_BIT;
> 
> into
> 
> y |= (x & BIT) shifted-by-the-bit-difference-between BIT/OTHER_BIT;
> 
> but that doesn't happen. We actually do it by hand in some other more
> critical places, but it's painful to do by hand (because the shift
> direction/amount is not trivial to do in C).
> 
> Anyway, that cmovne noise makes it a bit hard to see the actual part
> that matters (and that traps) but I'm almost certain that it's the
> "mnt->mnt_sb->s_flags" loading that is part of calculate_f_flags()
> when it then does
> 
>  flags_by_sb(mnt->mnt_sb->s_flags);
> 
> and I think mnt->mnt_sb is NULL. We know it's not 'mnt' itself that is
> NULL, because we wouldn't have gotten this far if it was.
> 
> Now, afaik, mnt->mnt_sb should never be NULL in the first place for a
> proper path. And the vfs_statfs() code itself hasn't changed in a
> while.
> 
> Which does seem to implicate nfsd as having passed in a bad path to
> vfs_statfs(). But I'm not seeing any changes in nfsd either.
> 
> In particular, there are *no* nfsd changes in that 4.13.8..4.13.11
> range. There is a bunch of xfs changes, though. What's the underlying
> filesystem that you are exporting?

It's an ext4 filesystem.

> 
> But bringing in Al Viro and Bruce Fields explicitly in case they see
> something. And Darrick, just in case it might be xfs.
> 

Thanks

[nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11

2017-11-08 Thread Patrick McLean

As of 4.13.11 (and also with 4.14-rc) we have an issue where when
serving nfs4 sometimes we get the following BUG. When this bug happens,
it usually also causes the motherboard to no longer POST until we
externally re-flash the BIOS (using the BMC web interface). If a
motherboard does not have an external way to flash the BIOS, this would
brick the hardware.

The issue was introduced somewhere between 4.13.8 and 4.13.11 in the
stable series 4.13 kernels. It seems to be much easier to trigger on
4.14 kernels than 4.13 kernels.

We are working on bisecting it, but it is slow going since it often
takes several reboots to trigger the issue.

The taint is caused by the "gkuart" an out-of-kernel driver which is a
fork of the cp210x driver with GPIO lines added to it, we can provide
the source for this if needed.

When the BIOS is gets broke, we see these messages in the shutdown logs:
> [ 2206.698884] kvm: exiting hardware virtualization
> [ 2206.700160] e1000e: EEE TX LPI TIMER: 00t
> [ 2206.743126] ACPI MEMORY or I/O RESET_REG.

Here is the BUG we are getting:
> [   58.962528] BUG: unable to handle kernel NULL pointer dereference at 
> 0230
> [   58.963918] IP: vfs_statfs+0x73/0xb0
> [   58.964597] PGD 0 P4D 0 
> [   58.965208] Oops:  [#1] SMP
> [   58.965847] Modules linked in: ipt_MASQUERADE nf_nat_masquerade_ipv4 
> xt_multiport xt_addrtype iptable_mangle iptable_raw iptable_nat nf_nat_ipv4 
> nf_nat gkuart(O) usbserial x86_pkg_temp_thermal ipmi_ssif tpm_tis 
> tpm_tis_core ie31200_edac ext4 mbcache jbd2 e1000e crc32c_intel
> [   58.969163] CPU: 0 PID: 3970 Comm: nfsd Tainted: G   O
> 4.14.0-rc8-git-kratos-1-00012-gd6a2cf07f0c9 #1
> [   58.970693] Hardware name: TYAN S5510/S5510, BIOS V2.02 03/12/2013
> [   58.971685] task: 88040b286200 task.stack: c90002c94000
> [   58.972576] RIP: 0010:vfs_statfs+0x73/0xb0
> [   58.973329] RSP: 0018:c90002c97b30 EFLAGS: 00010202
> [   58.974188] RAX:  RBX: c90002c97bf8 RCX: 
> 1c00
> [   58.975253] RDX: 0c00 RSI: 0020 RDI: 
> 
> [   58.976213] RBP: c90002c97bc8 R08:  R09: 
> 00ff
> [   58.977161] R10: 0038be3a R11: 88040ec440c8 R12: 
> 88040c5ba000
> [   58.978107] R13: 88040a86e000 R14: 88040c5c1000 R15: 
> c90002c97bf8
> [   58.979051] FS:  () GS:88041fc0() 
> knlGS:
> [   58.980448] CS:  0010 DS:  ES:  CR0: 80050033
> [   58.981419] CR2: 0230 CR3: 01e0a002 CR4: 
> 001606f0
> [   58.982483] Call Trace:
> [   58.983108]  nfsd4_encode_fattr+0x1f3/0x2070
> [   58.983873]  ? find_inode_fast+0x52/0x90
> [   58.984587]  ? get_acl+0x17/0xf0
> [   58.985258]  ? generic_permission+0x122/0x1a0
> [   58.986019]  nfsd4_encode_getattr+0x25/0x30
> [   58.986746]  nfsd4_encode_operation+0x98/0x1a0
> [   58.987485]  nfsd4_proc_compound+0x3eb/0x5c0
> [   58.988206]  nfsd_dispatch+0xa8/0x230
> [   58.988891]  svc_process_common+0x347/0x640
> [   58.989619]  svc_process+0x100/0x1b0
> [   58.990334]  nfsd+0xe3/0x150
> [   58.990988]  kthread+0xfc/0x130
> [   58.991651]  ? nfsd_destroy+0x60/0x60
> [   58.992364]  ? kthread_create_on_node+0x40/0x40
> [   58.993153]  ret_from_fork+0x25/0x30
> [   58.993858] Code: d1 83 c9 08 40 f6 c6 04 0f 45 d1 89 d1 80 cd 04 40 f6 c6 
> 08 0f 45 d1 89 d1 80 cd 08 40 f6 c6 10 0f 45 d1 89 d1 80 cd 10 83 e6 20 <48> 
> 8b b7 30 02 00 00 0f 45 d1 83 ca 20 89 f1 83 e1 10 89 cf 83
> [   58.996592] RIP: vfs_statfs+0x73/0xb0 RSP: c90002c97b30
> [   58.997474] CR2: 0230
> [   58.998147] ---[ end trace c3a6e976d53aaa00 ]---
> [  107.669217] random: crng init done
> [  210.170059] BUG: unable to handle kernel NULL pointer dereference at 
> 0230
> [  210.176363] IP: vfs_statfs+0x73/0xb0
> [  210.177032] PGD 0 P4D 0
> [  210.177633] Oops:  [#2] SMP
> [  210.178286] Modules linked in: ipt_MASQUERADE nf_nat_masquerade_ipv4 
> xt_multiport xt_addrtype iptable_mangle iptable_raw iptable_nat nf_nat_ipv4 
> nf_nat gkuart(O) usbserial x86_pkg_temp_thermal ipmi_ssif tpm_tis 
> tpm_tis_core ie31200_edac ext4 mbcache jbd2 e1000e crc32c_intel
> [  210.192120] CPU: 0 PID: 3969 Comm: nfsd Tainted: G  DO
> 4.14.0-rc8-git-kratos-1-00012-gd6a2cf07f0c9 #1
> [  210.203168] Hardware name: TYAN S5510/S5510, BIOS V2.02 03/12/2013
> [  210.204140] task: 880409a7aa00 task.stack: c90002c8c000
> [  210.205168] RIP: 0010:vfs_statfs+0x73/0xb0
> [  210.205893] RSP: 0018:c90002c8fb30 EFLAGS: 00010202
> [  210.206708] RAX:  RBX: c90002c8fbf8 RCX: 
> 1c00
> [  210.218314] RDX: 0c00 RSI: 0020 RDI: 
> 
> [  210.219364] RBP: c90002c8fbc8 R08:  R09: 
> 00ff
> [  210.220426] R10: 0038be3a R11: 88040ec440c8 R12: 
> 88040c5b8000
> [  210.221455] R13: 88040a86e000 R14:

[nfsd4] potentially hardware breaking regression in 4.14-rc and 4.13.11

2017-11-08 Thread Patrick McLean

As of 4.13.11 (and also with 4.14-rc) we have an issue where when
serving nfs4 sometimes we get the following BUG. When this bug happens,
it usually also causes the motherboard to no longer POST until we
externally re-flash the BIOS (using the BMC web interface). If a
motherboard does not have an external way to flash the BIOS, this would
brick the hardware.

The issue was introduced somewhere between 4.13.8 and 4.13.11 in the
stable series 4.13 kernels. It seems to be much easier to trigger on
4.14 kernels than 4.13 kernels.

We are working on bisecting it, but it is slow going since it often
takes several reboots to trigger the issue.

The taint is caused by the "gkuart" an out-of-kernel driver which is a
fork of the cp210x driver with GPIO lines added to it, we can provide
the source for this if needed.

When the BIOS is gets broke, we see these messages in the shutdown logs:
> [ 2206.698884] kvm: exiting hardware virtualization
> [ 2206.700160] e1000e: EEE TX LPI TIMER: 00t
> [ 2206.743126] ACPI MEMORY or I/O RESET_REG.

Here is the BUG we are getting:
> [   58.962528] BUG: unable to handle kernel NULL pointer dereference at 
> 0230
> [   58.963918] IP: vfs_statfs+0x73/0xb0
> [   58.964597] PGD 0 P4D 0 
> [   58.965208] Oops:  [#1] SMP
> [   58.965847] Modules linked in: ipt_MASQUERADE nf_nat_masquerade_ipv4 
> xt_multiport xt_addrtype iptable_mangle iptable_raw iptable_nat nf_nat_ipv4 
> nf_nat gkuart(O) usbserial x86_pkg_temp_thermal ipmi_ssif tpm_tis 
> tpm_tis_core ie31200_edac ext4 mbcache jbd2 e1000e crc32c_intel
> [   58.969163] CPU: 0 PID: 3970 Comm: nfsd Tainted: G   O
> 4.14.0-rc8-git-kratos-1-00012-gd6a2cf07f0c9 #1
> [   58.970693] Hardware name: TYAN S5510/S5510, BIOS V2.02 03/12/2013
> [   58.971685] task: 88040b286200 task.stack: c90002c94000
> [   58.972576] RIP: 0010:vfs_statfs+0x73/0xb0
> [   58.973329] RSP: 0018:c90002c97b30 EFLAGS: 00010202
> [   58.974188] RAX:  RBX: c90002c97bf8 RCX: 
> 1c00
> [   58.975253] RDX: 0c00 RSI: 0020 RDI: 
> 
> [   58.976213] RBP: c90002c97bc8 R08:  R09: 
> 00ff
> [   58.977161] R10: 0038be3a R11: 88040ec440c8 R12: 
> 88040c5ba000
> [   58.978107] R13: 88040a86e000 R14: 88040c5c1000 R15: 
> c90002c97bf8
> [   58.979051] FS:  () GS:88041fc0() 
> knlGS:
> [   58.980448] CS:  0010 DS:  ES:  CR0: 80050033
> [   58.981419] CR2: 0230 CR3: 01e0a002 CR4: 
> 001606f0
> [   58.982483] Call Trace:
> [   58.983108]  nfsd4_encode_fattr+0x1f3/0x2070
> [   58.983873]  ? find_inode_fast+0x52/0x90
> [   58.984587]  ? get_acl+0x17/0xf0
> [   58.985258]  ? generic_permission+0x122/0x1a0
> [   58.986019]  nfsd4_encode_getattr+0x25/0x30
> [   58.986746]  nfsd4_encode_operation+0x98/0x1a0
> [   58.987485]  nfsd4_proc_compound+0x3eb/0x5c0
> [   58.988206]  nfsd_dispatch+0xa8/0x230
> [   58.988891]  svc_process_common+0x347/0x640
> [   58.989619]  svc_process+0x100/0x1b0
> [   58.990334]  nfsd+0xe3/0x150
> [   58.990988]  kthread+0xfc/0x130
> [   58.991651]  ? nfsd_destroy+0x60/0x60
> [   58.992364]  ? kthread_create_on_node+0x40/0x40
> [   58.993153]  ret_from_fork+0x25/0x30
> [   58.993858] Code: d1 83 c9 08 40 f6 c6 04 0f 45 d1 89 d1 80 cd 04 40 f6 c6 
> 08 0f 45 d1 89 d1 80 cd 08 40 f6 c6 10 0f 45 d1 89 d1 80 cd 10 83 e6 20 <48> 
> 8b b7 30 02 00 00 0f 45 d1 83 ca 20 89 f1 83 e1 10 89 cf 83
> [   58.996592] RIP: vfs_statfs+0x73/0xb0 RSP: c90002c97b30
> [   58.997474] CR2: 0230
> [   58.998147] ---[ end trace c3a6e976d53aaa00 ]---
> [  107.669217] random: crng init done
> [  210.170059] BUG: unable to handle kernel NULL pointer dereference at 
> 0230
> [  210.176363] IP: vfs_statfs+0x73/0xb0
> [  210.177032] PGD 0 P4D 0
> [  210.177633] Oops:  [#2] SMP
> [  210.178286] Modules linked in: ipt_MASQUERADE nf_nat_masquerade_ipv4 
> xt_multiport xt_addrtype iptable_mangle iptable_raw iptable_nat nf_nat_ipv4 
> nf_nat gkuart(O) usbserial x86_pkg_temp_thermal ipmi_ssif tpm_tis 
> tpm_tis_core ie31200_edac ext4 mbcache jbd2 e1000e crc32c_intel
> [  210.192120] CPU: 0 PID: 3969 Comm: nfsd Tainted: G  DO
> 4.14.0-rc8-git-kratos-1-00012-gd6a2cf07f0c9 #1
> [  210.203168] Hardware name: TYAN S5510/S5510, BIOS V2.02 03/12/2013
> [  210.204140] task: 880409a7aa00 task.stack: c90002c8c000
> [  210.205168] RIP: 0010:vfs_statfs+0x73/0xb0
> [  210.205893] RSP: 0018:c90002c8fb30 EFLAGS: 00010202
> [  210.206708] RAX:  RBX: c90002c8fbf8 RCX: 
> 1c00
> [  210.218314] RDX: 0c00 RSI: 0020 RDI: 
> 
> [  210.219364] RBP: c90002c8fbc8 R08:  R09: 
> 00ff
> [  210.220426] R10: 0038be3a R11: 88040ec440c8 R12: 
> 88040c5b8000
> [  210.221455] R13: 88040a86e000 R14:

Re: [tip:sched/core] mm/numa: Remove BUG_ON() in __handle_mm_fault()

2014-06-23 Thread Patrick McLean

On Thu, 8 May 2014 03:43:19 -0700

Could this please be included in 3.14 and 3.15 stable as well.

tip-bot for Rik van Riel  wrote:

> Commit-ID:  107437febd495a50e2cd09c81bbaa84d30e57b07
> Gitweb:
> http://git.kernel.org/tip/107437febd495a50e2cd09c81bbaa84d30e57b07
> Author: Rik van Riel  AuthorDate: Tue, 29 Apr
> 2014 15:36:15 -0400 Committer:  Ingo Molnar 
> CommitDate: Wed, 7 May 2014 13:33:48 +0200
> 
> mm/numa: Remove BUG_ON() in __handle_mm_fault()
> 
> Changing PTEs and PMDs to pte_numa & pmd_numa is done with the
> mmap_sem held for reading, which means a pmd can be instantiated
> and turned into a numa one while __handle_mm_fault() is examining
> the value of old_pmd.
> 
> If that happens, __handle_mm_fault() should just return and let
> the page fault retry, instead of throwing an oops. This is
> handled by the test for pmd_trans_huge(*pmd) below.
> 
> Signed-off-by: Rik van Riel 
> Reviewed-by: Naoya Horiguchi 
> Reported-by: Sunil Pandey 
> Signed-off-by: Peter Zijlstra 
> Cc: Andrew Morton 
> Cc: Johannes Weiner 
> Cc: Kirill A. Shutemov 
> Cc: Linus Torvalds 
> Cc: Mel Gorman 
> Cc: linux...@kvack.org
> Cc: lwood...@redhat.com
> Cc: dave.han...@intel.com
> Link:
> http://lkml.kernel.org/r/20140429153615.2d720...@annuminas.surriel.com
> Signed-off-by: Ingo Molnar  ---
>  mm/memory.c | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index d0f0bef..9c2dc65 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3900,9 +3900,6 @@ static int __handle_mm_fault(struct mm_struct
> *mm, struct vm_area_struct *vma, }
>   }
>  
> - /* THP should already have been handled */
> - BUG_ON(pmd_numa(*pmd));
> -
>   /*
>* Use __pte_alloc instead of pte_alloc_map, because we can't
>* run pte_offset_map on the pmd, if an huge pmd could
> --
> To unsubscribe from this list: send the line "unsubscribe
> linux-kernel" in the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [tip:sched/core] mm/numa: Remove BUG_ON() in __handle_mm_fault()

2014-06-23 Thread Patrick McLean

On Thu, 8 May 2014 03:43:19 -0700

Could this please be included in 3.14 and 3.15 stable as well.

tip-bot for Rik van Riel tip...@zytor.com wrote:

 Commit-ID:  107437febd495a50e2cd09c81bbaa84d30e57b07
 Gitweb:
 http://git.kernel.org/tip/107437febd495a50e2cd09c81bbaa84d30e57b07
 Author: Rik van Riel r...@redhat.com AuthorDate: Tue, 29 Apr
 2014 15:36:15 -0400 Committer:  Ingo Molnar mi...@kernel.org
 CommitDate: Wed, 7 May 2014 13:33:48 +0200
 
 mm/numa: Remove BUG_ON() in __handle_mm_fault()
 
 Changing PTEs and PMDs to pte_numa  pmd_numa is done with the
 mmap_sem held for reading, which means a pmd can be instantiated
 and turned into a numa one while __handle_mm_fault() is examining
 the value of old_pmd.
 
 If that happens, __handle_mm_fault() should just return and let
 the page fault retry, instead of throwing an oops. This is
 handled by the test for pmd_trans_huge(*pmd) below.
 
 Signed-off-by: Rik van Riel r...@redhat.com
 Reviewed-by: Naoya Horiguchi n-horigu...@ah.jp.nec.com
 Reported-by: Sunil Pandey sunil.k.pan...@intel.com
 Signed-off-by: Peter Zijlstra pet...@infradead.org
 Cc: Andrew Morton a...@linux-foundation.org
 Cc: Johannes Weiner han...@cmpxchg.org
 Cc: Kirill A. Shutemov kirill.shute...@linux.intel.com
 Cc: Linus Torvalds torva...@linux-foundation.org
 Cc: Mel Gorman mgor...@suse.de
 Cc: linux...@kvack.org
 Cc: lwood...@redhat.com
 Cc: dave.han...@intel.com
 Link:
 http://lkml.kernel.org/r/20140429153615.2d720...@annuminas.surriel.com
 Signed-off-by: Ingo Molnar mi...@kernel.org ---
  mm/memory.c | 3 ---
  1 file changed, 3 deletions(-)
 
 diff --git a/mm/memory.c b/mm/memory.c
 index d0f0bef..9c2dc65 100644
 --- a/mm/memory.c
 +++ b/mm/memory.c
 @@ -3900,9 +3900,6 @@ static int __handle_mm_fault(struct mm_struct
 *mm, struct vm_area_struct *vma, }
   }
  
 - /* THP should already have been handled */
 - BUG_ON(pmd_numa(*pmd));
 -
   /*
* Use __pte_alloc instead of pte_alloc_map, because we can't
* run pte_offset_map on the pmd, if an huge pmd could
 --
 To unsubscribe from this list: send the line unsubscribe
 linux-kernel in the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Regression with initramfs and nfsroot (appears to be in the dcache)

2012-11-29 Thread Patrick McLean

On 29/11/12 06:00 PM, Al Viro wrote:
> On Thu, Nov 29, 2012 at 05:54:02PM -0800, Patrick McLean wrote:
>>> Very interesting.  Do you have anything mounted on the corresponding
>>> directories on server?  The picture looks like you are getting empty
>>> fhandles in readdir+ respons for exactly the same directories that happen
>>> to be mountpoints on client.  In any case, we shouldn't do that blind
>>> d_drop() - empty fhandles can happen.  The only remaining question is
>>> why do they happen on that set of entries.  From my reading of
>>> encode_entryplus_baggage() it looks like we have compose_entry_fh()
>>> failing for those entries and those entries alone.  One possible cause
>>> would be d_mountpoint(dchild) being true on server.  If it is true, we
>>> can declare the case closed; if not, I really wonder what's going on.
>>
>> Those directories do have the server's own copies of the said directories 
>> bind mounted at the moment in a separate mount namespace.
>>
>> Unmounting those directories on the server does appear to stop the WARN_ON 
>> from triggering.
> 
> OK, that settles it.  WARN_ON() and printks in the area can be dropped;
> the right fix is below.  However, there's a similar place in cifs that
> also needs to be dealt with and I really, really wonder why the hell do
> we do d_drop() in nfs_revalidate_lookup().  It's not relevant in this
> bug, but I would like to understand what's wrong with simply returning
> 0 from ->d_revalidate() and letting the caller (in fs/namei.c) take care
> of unhashing, etc. itself.  Would make have_submounts() in there pointless
> as well - we could just return 0 and let d_invalidate() take care of the
> checks...  Trond?
> 
> diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
> --- a/fs/nfs/dir.c
> +++ b/fs/nfs/dir.c
> @@ -450,7 +450,8 @@ void nfs_prime_dcache(struct dentry *parent, struct 
> nfs_entry *entry)
>   nfs_refresh_inode(dentry->d_inode, entry->fattr);
>   goto out;
>   } else {
> - d_drop(dentry);
> + if (d_invalidate(dentry) != 0)
> + goto out;
>   dput(dentry);
>   }
>   }

Excellent, thanks. Is there any chance this will make it to 3.7? Also we might 
want to cc stable@ on this as well since it is a regression in 3.6.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Regression with initramfs and nfsroot (appears to be in the dcache)

2012-11-29 Thread Patrick McLean

On 29/11/12 05:36 PM, Al Viro wrote:
> On Thu, Nov 29, 2012 at 04:57:19PM -0800, Patrick McLean wrote:
>>> Interesting...  Server-side that should've been produced by
>>> encode_entryplus_baggage(), which looks like failing compose_entry_fh()...
>>> which has explicit
>>> if (d_mountpoint(dchild))
>>> goto out;
>>> resulting in ENOENT on everything that's overmounted on server.
>>>
>>> Do you, by any chance, have the server really exporting its own root
>>> filesystem?  Another thing to check: have nfs_prime_dcache() print
>>> filename.name of everything that fails nfs_same_entry() and has
>>> zero entry->fh->size, regardless of d_invalidate() results.
>>
>> The server is running 3.6.6 and is just exporting a subdir of an xfs 
>> filesystem (which does not happen to be the root filesystem).
>>
>> The client is running as a KVM guest on the machine that is serving the NFS. 
>> I am reproducing this by booting the guest via an initramfs, and doing
>> "ls /" at in single user mode.
>>
>> I added a check that prints the filename.name of everything that fails 
>> nfs_same_file, and it appears to just be triggered by the same filesystems 
>> that
>> are triggering the WARN_ON, the relevant dmesg is below.
> 
> [the same /dev, /proc and /sys]
> 
>   Very interesting.  Do you have anything mounted on the corresponding
> directories on server?  The picture looks like you are getting empty
> fhandles in readdir+ respons for exactly the same directories that happen
> to be mountpoints on client.  In any case, we shouldn't do that blind
> d_drop() - empty fhandles can happen.  The only remaining question is
> why do they happen on that set of entries.  From my reading of
> encode_entryplus_baggage() it looks like we have compose_entry_fh()
> failing for those entries and those entries alone.  One possible cause
> would be d_mountpoint(dchild) being true on server.  If it is true, we
> can declare the case closed; if not, I really wonder what's going on.

Those directories do have the server's own copies of the said directories bind 
mounted at the moment in a separate mount namespace.

Unmounting those directories on the server does appear to stop the WARN_ON from 
triggering.

> Note that if the same fs is mounted elsewhere, d_mountpoint() would mean
> that something is mounted on top of that directory in _some_ instance;
> not necessary the exported one.  Can you slap printks on fs/nfsd/nfs3xdr.c
> compose_entry_fh() failure exits and see which one triggers server-side?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Regression with initramfs and nfsroot (appears to be in the dcache)

2012-11-29 Thread Patrick McLean

On 29/11/12 04:35 PM, Al Viro wrote:
> On Thu, Nov 29, 2012 at 04:19:51PM -0800, Patrick McLean wrote:
>>>> [8.821584] FH(0)]
>>>> [8.821586] FH(36)[01 00 07 01 89 00 00 00 00 00 00 00 e1 21 fe c4 9e 
>>>> 38 44 dc bf 1b d5 95 d6 76 d6 d9 a7 3c 1b 80 33 38 e3 62]
>>>> [8.821601] filename: proc
>>>
>>> *whoa*
>>>
>>> So we have zero entry->fh->size?  No wonder it doesn't match...  Which NFS
>>> version it is?  entry->fh->size is set by nfs[34]_decode_dirent().
>>
>> This is nfs v3 over TCP on Linus git at commit 
>> e9296e89b85604862bd9ec2d54dc43edad775c0d with nfs-utils-1.2.6 userspace.
> 
> So we have nfs3_decode_dirent(), stepping into
> /* In fact, a post_op_fh3: */
> p = xdr_inline_decode(xdr, 4);
> if (unlikely(p == NULL))
> goto out_overflow;
> if (*p != xdr_zero) {
> error = decode_nfs_fh3(xdr, entry->fh);
> if (unlikely(error)) {
> if (error == -E2BIG)
> goto out_truncated;
> return error;
> }
> } else
> zero_nfs_fh3(entry->fh);
> Interesting...  Server-side that should've been produced by
> encode_entryplus_baggage(), which looks like failing compose_entry_fh()...
> which has explicit
> if (d_mountpoint(dchild))
> goto out;
> resulting in ENOENT on everything that's overmounted on server.
> 
> Do you, by any chance, have the server really exporting its own root
> filesystem?  Another thing to check: have nfs_prime_dcache() print
> filename.name of everything that fails nfs_same_entry() and has
> zero entry->fh->size, regardless of d_invalidate() results.

The server is running 3.6.6 and is just exporting a subdir of an xfs filesystem 
(which does not happen to be the root filesystem).

The client is running as a KVM guest on the machine that is serving the NFS. I 
am reproducing this by booting the guest via an initramfs, and doing
"ls /" at in single user mode.

I added a check that prints the filename.name of everything that fails 
nfs_same_file, and it appears to just be triggered by the same filesystems that
are triggering the WARN_ON, the relevant dmesg is below.

[9.495217] entry->fh->size is 0 on: proc
[9.495222] [ cut here ]
[9.495230] WARNING: at fs/nfs/dir.c:464 
nfs_readdir_page_filler+0x1ef/0x3eb()
[9.495232] Hardware name: Bochs
[9.495233] Modules linked in:
[9.495237] Pid: 655, comm: ls Not tainted 3.7.0-rc7+ #40
[9.495239] Call Trace:
[9.495247]  [] ? warn_slowpath_common+0x76/0x8a
[9.495250]  [] ? nfs_readdir_page_filler+0x1ef/0x3eb
[9.495254]  [] ? nfs_readdir_xdr_to_array+0x1c0/0x22d
[9.495257]  [] ? nfs_readdir_filler+0x1c/0x6b
[9.495263]  [] ? add_to_page_cache_lru+0x2c/0x36
[9.495266]  [] ? nfs_readdir_xdr_to_array+0x22d/0x22d
[9.495269]  [] ? do_read_cache_page+0x7d/0x12b
[9.495274]  [] ? sys_ioctl+0x7a/0x7a
[9.495277]  [] ? read_cache_page+0x7/0x10
[9.495280]  [] ? nfs_readdir+0x12d/0x435
[9.495285]  [] ? nfs3_xdr_dec_create3res+0xc5/0xc5
[9.495288]  [] ? sys_ioctl+0x7a/0x7a
[9.495291]  [] ? sys_ioctl+0x7a/0x7a
[9.495294]  [] ? vfs_readdir+0x6c/0xa7
[9.495298]  [] ? sys_getdents+0x7e/0xdc
[9.495302]  [] ? system_call_fastpath+0x16/0x1b
[9.495304] ---[ end trace e502c5d56c594e85 ]---
[9.495306] FH(0)]
[9.495308] FH(36)[01 00 07 01 89 00 00 00 00 00 00 00 e1 21 fe c4 9e 38 44 
dc bf 1b d5 95 d6 76 d6 d9 a7 3c 1b 80 33 38 e3 62]
[9.495323] filename: proc
[9.495330] entry->fh->size is 0 on: dev
[9.495332] [ cut here ]
[9.495335] WARNING: at fs/nfs/dir.c:464 
nfs_readdir_page_filler+0x1ef/0x3eb()
[9.495336] Hardware name: Bochs
[9.495337] Modules linked in:
[9.495340] Pid: 655, comm: ls Tainted: GW3.7.0-rc7+ #40
[9.495341] Call Trace:
[9.495345]  [] ? warn_slowpath_common+0x76/0x8a
[9.495348]  [] ? nfs_readdir_page_filler+0x1ef/0x3eb
[9.495351]  [] ? nfs_readdir_xdr_to_array+0x1c0/0x22d
[9.495354]  [] ? nfs_readdir_filler+0x1c/0x6b
[9.495358]  [] ? add_to_page_cache_lru+0x2c/0x36
[9.495361]  [] ? nfs_readdir_xdr_to_array+0x22d/0x22d
[9.495364]  [] ? do_read_cache_page+0x7d/0x12b
[9.495368]  [] ? sys_ioctl+0x7a/0x7a
[9.495371]  [] ? read_cache_page+0x7/0x10
[9.495373]  [] ? nfs_readdir+0x12d/0x435
[9.495377]  [] ? nfs3_xdr_dec_create3res+0xc5/0xc5
[9.495380]  [] ? sys_ioctl+0x7a/0x7a
[9.495383]  [] ? sys_ioctl+0x7a/0x7a
[9.495387]  [] ? vfs_readdir+0x6c/0xa7
[9.

Re: Regression with initramfs and nfsroot (appears to be in the dcache)

2012-11-29 Thread Patrick McLean

On 29/11/12 03:43 PM, Al Viro wrote:
> On Thu, Nov 29, 2012 at 02:53:13PM -0800, Patrick McLean wrote:
>> On 29/11/12 02:21 PM, Al Viro wrote:
>>> On Thu, Nov 29, 2012 at 02:06:22PM -0800, Patrick McLean wrote:
>>>
>>>> I have a trivial reproducer and am happy to help debug in any way that
>>>> I can. That patch seems to fix the problem, and produces these
>>>> warnings in dmesg:
>>>
>>> OK...  So we have differing entry->fh and NFS_FH(dentry->d_inode).  
>>> Something
>>> like
>>> static void dump_fh(const struct nfs_fh *fh)
>>> {
>>> int i;
>>> printk(KERN_INFO "FH(%d)", fh->size);
>>> for (i = 0; i < fh->size; i++)
>>> printk(KERN_CONT "%c%02x", i ? ' ' : '[', fh->data[i]);
>>> printk(KERN_CONT "]\n");
>>> }
>>> with dump_fh(entry->fh); dump_fh(NFS_FH(dentry->d_inode)); added next to
>>> that WARN_ON(1) would probably be interesting.  And probably would make
>>> sense to print filename->name as well, to see which files it is about.
> 
>> [8.821584] FH(0)]
>> [8.821586] FH(36)[01 00 07 01 89 00 00 00 00 00 00 00 e1 21 fe c4 9e 38 
>> 44 dc bf 1b d5 95 d6 76 d6 d9 a7 3c 1b 80 33 38 e3 62]
>> [8.821601] filename: proc
> 
> *whoa*
> 
> So we have zero entry->fh->size?  No wonder it doesn't match...  Which NFS
> version it is?  entry->fh->size is set by nfs[34]_decode_dirent().

This is nfs v3 over TCP on Linus git at commit 
e9296e89b85604862bd9ec2d54dc43edad775c0d with nfs-utils-1.2.6 userspace.

> NFS folks: any ideas on best way to debug it?  The brute-force way would be
> to capture all NFS traffic with tcpdump and see what's going on, but that
> would be a lot of work...
> 
> Looks like we have READDIRPLUS attempted and succeeded, but fhandle was not
> given.  Result: nfs_prime_dcache() is doing blind d_drop() on perfectly
> valid dentries, no matter how busy.
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Regression with initramfs and nfsroot (appears to be in the dcache)

2012-11-29 Thread Patrick McLean

On 29/11/12 02:21 PM, Al Viro wrote:
> On Thu, Nov 29, 2012 at 02:06:22PM -0800, Patrick McLean wrote:
> 
>> I have a trivial reproducer and am happy to help debug in any way that
>> I can. That patch seems to fix the problem, and produces these
>> warnings in dmesg:
> 
> OK...  So we have differing entry->fh and NFS_FH(dentry->d_inode).  Something
> like
> static void dump_fh(const struct nfs_fh *fh)
> {
>   int i;
>   printk(KERN_INFO "FH(%d)", fh->size);
>   for (i = 0; i < fh->size; i++)
>   printk(KERN_CONT "%c%02x", i ? ' ' : '[', fh->data[i]);
>   printk(KERN_CONT "]\n");
> }
> with dump_fh(entry->fh); dump_fh(NFS_FH(dentry->d_inode)); added next to
> that WARN_ON(1) would probably be interesting.  And probably would make
> sense to print filename->name as well, to see which files it is about.
> 

Here is the output of the first of the 3 times it hits the WARN_ON (I can 
include all 3 if desired), with the filename.name at the end:

[8.821503] [ cut here ]
[8.821512] WARNING: at fs/nfs/dir.c:463 
nfs_readdir_page_filler+0x1d0/0x3d2()
[8.821513] Hardware name: Bochs
[8.821515] Modules linked in:
[8.821519] Pid: 630, comm: bash Not tainted 3.7.0-rc7+ #36
[8.821520] Call Trace:
[8.821528]  [] ? warn_slowpath_common+0x76/0x8a
[8.821531]  [] ? nfs_readdir_page_filler+0x1d0/0x3d2
[8.821535]  [] ? nfs_readdir_xdr_to_array+0x1c0/0x22d
[8.821538]  [] ? nfs_readdir_filler+0x1c/0x6b
[8.821543]  [] ? add_to_page_cache_lru+0x2c/0x36
[8.821546]  [] ? nfs_readdir_xdr_to_array+0x22d/0x22d
[8.821549]  [] ? do_read_cache_page+0x7d/0x12b
[8.821554]  [] ? sys_ioctl+0x7a/0x7a
[8.821557]  [] ? read_cache_page+0x7/0x10
[8.821560]  [] ? nfs_readdir+0x12d/0x435
[8.821564]  [] ? nfs3_xdr_dec_create3res+0xc5/0xc5
[8.821568]  [] ? sys_ioctl+0x7a/0x7a
[8.821571]  [] ? sys_ioctl+0x7a/0x7a
[8.821574]  [] ? vfs_readdir+0x6c/0xa7
[8.821577]  [] ? sys_getdents+0x7e/0xdc
[8.821581]  [] ? system_call_fastpath+0x16/0x1b
[8.821583] ---[ end trace 89263124889205c1 ]---
[8.821584] FH(0)]
[8.821586] FH(36)[01 00 07 01 89 00 00 00 00 00 00 00 e1 21 fe c4 9e 38 44 
dc bf 1b d5 95 d6 76 d6 d9 a7 3c 1b 80 33 38 e3 62]
[8.821601] filename: proc
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Regression with initramfs and nfsroot (appears to be in the dcache)

2012-11-29 Thread Patrick McLean

On Thu, Nov 29, 2012 at 1:33 PM, Al Viro  wrote:
> On Thu, Nov 29, 2012 at 11:16:59AM -0800, Patrick McLean wrote:
>> With 3.6-rc1 and up, when using a (dracut) initramfs with a read-only
>> nfs root, all accesses to /proc. /sys and /dev return EBUSY.
>
> See "[PATCH] Revert "__d_unalias() should refuse to move mountpoints"
> thread.  If you have a convenient reproducer, could you check if
> the fixes the breakage?  If so, we'll need to look into false negatives
> from nfs_same_file() in there...
>
> diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
> index ce8cb92..55436f5 100644
> --- a/fs/nfs/dir.c
> +++ b/fs/nfs/dir.c
> @@ -450,7 +450,10 @@ void nfs_prime_dcache(struct dentry *parent, struct 
> nfs_entry *entry)
> nfs_refresh_inode(dentry->d_inode, entry->fattr);
> goto out;
> } else {
> -   d_drop(dentry);
> +   if (d_invalidate(dentry) != 0) {
> +   WARN_ON(1);
> +   goto out;
> +   }
> dput(dentry);
> }
> }

I have a trivial reproducer and am happy to help debug in any way that
I can. That patch seems to fix the problem, and produces these
warnings in dmesg:

[3.306483] dracut: Switching root
[4.324378] systemd-udevd[552]: starting version 195
[9.254972] [ cut here ]
[9.254981] WARNING: at fs/nfs/dir.c:454
nfs_readdir_page_filler+0x1cc/0x3a2()
[9.254983] Hardware name: Bochs
[9.254984] Modules linked in:
[9.254989] Pid: 676, comm: ls Not tainted 3.7.0-rc7+ #35
[9.254990] Call Trace:
[9.254999]  [] ? warn_slowpath_common+0x76/0x8a
[9.255002]  [] ? nfs_readdir_page_filler+0x1cc/0x3a2
[9.255005]  [] ? nfs_readdir_xdr_to_array+0x1c0/0x22d
[9.255009]  [] ? nfs_readdir_filler+0x1c/0x6b
[9.255014]  [] ? add_to_page_cache_lru+0x2c/0x36
[9.255017]  [] ? nfs_readdir_xdr_to_array+0x22d/0x22d
[9.255020]  [] ? do_read_cache_page+0x7d/0x12b
[9.255025]  [] ? sys_ioctl+0x7a/0x7a
[9.255028]  [] ? read_cache_page+0x7/0x10
[9.255031]  [] ? nfs_readdir+0x12d/0x435
[9.255036]  [] ? nfs3_xdr_dec_create3res+0xc5/0xc5
[9.255039]  [] ? sys_ioctl+0x7a/0x7a
[9.255042]  [] ? sys_ioctl+0x7a/0x7a
[9.255045]  [] ? vfs_readdir+0x6c/0xa7
[9.255049]  [] ? sys_getdents+0x7e/0xdc
[9.255053]  [] ? system_call_fastpath+0x16/0x1b
[9.255055] ---[ end trace 5e8b5f37fe752ab1 ]---
[9.255062] [ cut here ]
[9.255065] WARNING: at fs/nfs/dir.c:454
nfs_readdir_page_filler+0x1cc/0x3a2()
[9.255066] Hardware name: Bochs
[9.255067] Modules linked in:
[9.255070] Pid: 676, comm: ls Tainted: GW3.7.0-rc7+ #35
[9.255071] Call Trace:
[9.255075]  [] ? warn_slowpath_common+0x76/0x8a
[9.255077]  [] ? nfs_readdir_page_filler+0x1cc/0x3a2
[9.255080]  [] ? nfs_readdir_xdr_to_array+0x1c0/0x22d
[9.255083]  [] ? nfs_readdir_filler+0x1c/0x6b
[9.255087]  [] ? add_to_page_cache_lru+0x2c/0x36
[9.255089]  [] ? nfs_readdir_xdr_to_array+0x22d/0x22d
[9.255093]  [] ? do_read_cache_page+0x7d/0x12b
[9.255096]  [] ? sys_ioctl+0x7a/0x7a
[9.255099]  [] ? read_cache_page+0x7/0x10
[9.255102]  [] ? nfs_readdir+0x12d/0x435
[9.255105]  [] ? nfs3_xdr_dec_create3res+0xc5/0xc5
[9.255109]  [] ? sys_ioctl+0x7a/0x7a
[9.255112]  [] ? sys_ioctl+0x7a/0x7a
[9.255115]  [] ? vfs_readdir+0x6c/0xa7
[9.255118]  [] ? sys_getdents+0x7e/0xdc
[9.255121]  [] ? system_call_fastpath+0x16/0x1b
[9.255122] ---[ end trace 5e8b5f37fe752ab2 ]---
[9.255133] [ cut here ]
[9.255135] WARNING: at fs/nfs/dir.c:454
nfs_readdir_page_filler+0x1cc/0x3a2()
[9.255136] Hardware name: Bochs
[9.255137] Modules linked in:
[9.255140] Pid: 676, comm: ls Tainted: GW3.7.0-rc7+ #35
[9.255141] Call Trace:
[9.255144]  [] ? warn_slowpath_common+0x76/0x8a
[9.255147]  [] ? nfs_readdir_page_filler+0x1cc/0x3a2
[9.255150]  [] ? nfs_readdir_xdr_to_array+0x1c0/0x22d
[9.255153]  [] ? nfs_readdir_filler+0x1c/0x6b
[9.255157]  [] ? add_to_page_cache_lru+0x2c/0x36
[9.255159]  [] ? nfs_readdir_xdr_to_array+0x22d/0x22d
[9.255162]  [] ? do_read_cache_page+0x7d/0x12b
[9.255166]  [] ? sys_ioctl+0x7a/0x7a
[9.255169]  [] ? read_cache_page+0x7/0x10
[9.255171]  [] ? nfs_readdir+0x12d/0x435
[9.255175]  [] ? nfs3_xdr_dec_create3res+0xc5/0xc5
[9.255178]  [] ? sys_ioctl+0x7a/0x7a
[9.255181]  [] ? sys_ioctl+0x7a/0x7a
[9.255184]  [] ? vfs_readdir+0x6c/0xa7
[9.255188]  [] ? sys_getdents+0x7e/0xdc
[9.255190]  [] ? system_call_fastpath+0x16/0x1b
[9.255192] ---[ end trace 5e8b5f37fe752ab3 ]---
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel&q

Regression with initramfs and nfsroot (appears to be in the dcache)

2012-11-29 Thread Patrick McLean

With 3.6-rc1 and up, when using a (dracut) initramfs with a read-only
nfs root, all accesses to /proc. /sys and /dev return EBUSY.

Bisecting finds this commit as where this was introduced:

> ee3efa91e240f513898050ef305a49a653c8ed90 is the first bad commit
> commit ee3efa91e240f513898050ef305a49a653c8ed90
> Author: Al Viro 
> Date:   Fri Jun 8 15:59:33 2012 -0400
>
> __d_unalias() should refuse to move mountpoints
>
> Signed-off-by: Al Viro 
>
> :04 04 1d6ecde959d3f8252b33f4adff3c4bf1e67f2b92 
> 992ec34563b90fb349957418f76d4673c1af4ab6 M fs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Regression with initramfs and nfsroot (appears to be in the dcache)

2012-11-29 Thread Patrick McLean

With 3.6-rc1 and up, when using a (dracut) initramfs with a read-only
nfs root, all accesses to /proc. /sys and /dev return EBUSY.

Bisecting finds this commit as where this was introduced:

 ee3efa91e240f513898050ef305a49a653c8ed90 is the first bad commit
 commit ee3efa91e240f513898050ef305a49a653c8ed90
 Author: Al Viro v...@zeniv.linux.org.uk
 Date:   Fri Jun 8 15:59:33 2012 -0400

 __d_unalias() should refuse to move mountpoints

 Signed-off-by: Al Viro v...@zeniv.linux.org.uk

 :04 04 1d6ecde959d3f8252b33f4adff3c4bf1e67f2b92 
 992ec34563b90fb349957418f76d4673c1af4ab6 M fs
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Regression with initramfs and nfsroot (appears to be in the dcache)

2012-11-29 Thread Patrick McLean

On Thu, Nov 29, 2012 at 1:33 PM, Al Viro v...@zeniv.linux.org.uk wrote:
 On Thu, Nov 29, 2012 at 11:16:59AM -0800, Patrick McLean wrote:
 With 3.6-rc1 and up, when using a (dracut) initramfs with a read-only
 nfs root, all accesses to /proc. /sys and /dev return EBUSY.

 See [PATCH] Revert __d_unalias() should refuse to move mountpoints
 thread.  If you have a convenient reproducer, could you check if
 the fixes the breakage?  If so, we'll need to look into false negatives
 from nfs_same_file() in there...

 diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
 index ce8cb92..55436f5 100644
 --- a/fs/nfs/dir.c
 +++ b/fs/nfs/dir.c
 @@ -450,7 +450,10 @@ void nfs_prime_dcache(struct dentry *parent, struct 
 nfs_entry *entry)
 nfs_refresh_inode(dentry-d_inode, entry-fattr);
 goto out;
 } else {
 -   d_drop(dentry);
 +   if (d_invalidate(dentry) != 0) {
 +   WARN_ON(1);
 +   goto out;
 +   }
 dput(dentry);
 }
 }

I have a trivial reproducer and am happy to help debug in any way that
I can. That patch seems to fix the problem, and produces these
warnings in dmesg:

[3.306483] dracut: Switching root
[4.324378] systemd-udevd[552]: starting version 195
[9.254972] [ cut here ]
[9.254981] WARNING: at fs/nfs/dir.c:454
nfs_readdir_page_filler+0x1cc/0x3a2()
[9.254983] Hardware name: Bochs
[9.254984] Modules linked in:
[9.254989] Pid: 676, comm: ls Not tainted 3.7.0-rc7+ #35
[9.254990] Call Trace:
[9.254999]  [8108534c] ? warn_slowpath_common+0x76/0x8a
[9.255002]  [8117de91] ? nfs_readdir_page_filler+0x1cc/0x3a2
[9.255005]  [8117e683] ? nfs_readdir_xdr_to_array+0x1c0/0x22d
[9.255009]  [8117e70c] ? nfs_readdir_filler+0x1c/0x6b
[9.255014]  [810dca9a] ? add_to_page_cache_lru+0x2c/0x36
[9.255017]  [8117e6f0] ? nfs_readdir_xdr_to_array+0x22d/0x22d
[9.255020]  [810dcbe3] ? do_read_cache_page+0x7d/0x12b
[9.255025]  [811274f8] ? sys_ioctl+0x7a/0x7a
[9.255028]  [810dccc6] ? read_cache_page+0x7/0x10
[9.255031]  [8117e888] ? nfs_readdir+0x12d/0x435
[9.255036]  [8118e653] ? nfs3_xdr_dec_create3res+0xc5/0xc5
[9.255039]  [811274f8] ? sys_ioctl+0x7a/0x7a
[9.255042]  [811274f8] ? sys_ioctl+0x7a/0x7a
[9.255045]  [811277b3] ? vfs_readdir+0x6c/0xa7
[9.255049]  [811278da] ? sys_getdents+0x7e/0xdc
[9.255053]  [814ac769] ? system_call_fastpath+0x16/0x1b
[9.255055] ---[ end trace 5e8b5f37fe752ab1 ]---
[9.255062] [ cut here ]
[9.255065] WARNING: at fs/nfs/dir.c:454
nfs_readdir_page_filler+0x1cc/0x3a2()
[9.255066] Hardware name: Bochs
[9.255067] Modules linked in:
[9.255070] Pid: 676, comm: ls Tainted: GW3.7.0-rc7+ #35
[9.255071] Call Trace:
[9.255075]  [8108534c] ? warn_slowpath_common+0x76/0x8a
[9.255077]  [8117de91] ? nfs_readdir_page_filler+0x1cc/0x3a2
[9.255080]  [8117e683] ? nfs_readdir_xdr_to_array+0x1c0/0x22d
[9.255083]  [8117e70c] ? nfs_readdir_filler+0x1c/0x6b
[9.255087]  [810dca9a] ? add_to_page_cache_lru+0x2c/0x36
[9.255089]  [8117e6f0] ? nfs_readdir_xdr_to_array+0x22d/0x22d
[9.255093]  [810dcbe3] ? do_read_cache_page+0x7d/0x12b
[9.255096]  [811274f8] ? sys_ioctl+0x7a/0x7a
[9.255099]  [810dccc6] ? read_cache_page+0x7/0x10
[9.255102]  [8117e888] ? nfs_readdir+0x12d/0x435
[9.255105]  [8118e653] ? nfs3_xdr_dec_create3res+0xc5/0xc5
[9.255109]  [811274f8] ? sys_ioctl+0x7a/0x7a
[9.255112]  [811274f8] ? sys_ioctl+0x7a/0x7a
[9.255115]  [811277b3] ? vfs_readdir+0x6c/0xa7
[9.255118]  [811278da] ? sys_getdents+0x7e/0xdc
[9.255121]  [814ac769] ? system_call_fastpath+0x16/0x1b
[9.255122] ---[ end trace 5e8b5f37fe752ab2 ]---
[9.255133] [ cut here ]
[9.255135] WARNING: at fs/nfs/dir.c:454
nfs_readdir_page_filler+0x1cc/0x3a2()
[9.255136] Hardware name: Bochs
[9.255137] Modules linked in:
[9.255140] Pid: 676, comm: ls Tainted: GW3.7.0-rc7+ #35
[9.255141] Call Trace:
[9.255144]  [8108534c] ? warn_slowpath_common+0x76/0x8a
[9.255147]  [8117de91] ? nfs_readdir_page_filler+0x1cc/0x3a2
[9.255150]  [8117e683] ? nfs_readdir_xdr_to_array+0x1c0/0x22d
[9.255153]  [8117e70c] ? nfs_readdir_filler+0x1c/0x6b
[9.255157]  [810dca9a] ? add_to_page_cache_lru+0x2c/0x36
[9.255159]  [8117e6f0] ? nfs_readdir_xdr_to_array+0x22d/0x22d
[9.255162]  [810dcbe3] ? do_read_cache_page+0x7d/0x12b
[9.255166]  [811274f8

Re: Regression with initramfs and nfsroot (appears to be in the dcache)

2012-11-29 Thread Patrick McLean

On 29/11/12 02:21 PM, Al Viro wrote:
 On Thu, Nov 29, 2012 at 02:06:22PM -0800, Patrick McLean wrote:
 
 I have a trivial reproducer and am happy to help debug in any way that
 I can. That patch seems to fix the problem, and produces these
 warnings in dmesg:
 
 OK...  So we have differing entry-fh and NFS_FH(dentry-d_inode).  Something
 like
 static void dump_fh(const struct nfs_fh *fh)
 {
   int i;
   printk(KERN_INFO FH(%d), fh-size);
   for (i = 0; i  fh-size; i++)
   printk(KERN_CONT %c%02x, i ? ' ' : '[', fh-data[i]);
   printk(KERN_CONT ]\n);
 }
 with dump_fh(entry-fh); dump_fh(NFS_FH(dentry-d_inode)); added next to
 that WARN_ON(1) would probably be interesting.  And probably would make
 sense to print filename-name as well, to see which files it is about.
 

Here is the output of the first of the 3 times it hits the WARN_ON (I can 
include all 3 if desired), with the filename.name at the end:

[8.821503] [ cut here ]
[8.821512] WARNING: at fs/nfs/dir.c:463 
nfs_readdir_page_filler+0x1d0/0x3d2()
[8.821513] Hardware name: Bochs
[8.821515] Modules linked in:
[8.821519] Pid: 630, comm: bash Not tainted 3.7.0-rc7+ #36
[8.821520] Call Trace:
[8.821528]  [8108534c] ? warn_slowpath_common+0x76/0x8a
[8.821531]  [8117de95] ? nfs_readdir_page_filler+0x1d0/0x3d2
[8.821535]  [8117e6b3] ? nfs_readdir_xdr_to_array+0x1c0/0x22d
[8.821538]  [8117e73c] ? nfs_readdir_filler+0x1c/0x6b
[8.821543]  [810dca9a] ? add_to_page_cache_lru+0x2c/0x36
[8.821546]  [8117e720] ? nfs_readdir_xdr_to_array+0x22d/0x22d
[8.821549]  [810dcbe3] ? do_read_cache_page+0x7d/0x12b
[8.821554]  [811274f8] ? sys_ioctl+0x7a/0x7a
[8.821557]  [810dccc6] ? read_cache_page+0x7/0x10
[8.821560]  [8117e8b8] ? nfs_readdir+0x12d/0x435
[8.821564]  [8118e683] ? nfs3_xdr_dec_create3res+0xc5/0xc5
[8.821568]  [811274f8] ? sys_ioctl+0x7a/0x7a
[8.821571]  [811274f8] ? sys_ioctl+0x7a/0x7a
[8.821574]  [811277b3] ? vfs_readdir+0x6c/0xa7
[8.821577]  [811278da] ? sys_getdents+0x7e/0xdc
[8.821581]  [814ac7e9] ? system_call_fastpath+0x16/0x1b
[8.821583] ---[ end trace 89263124889205c1 ]---
[8.821584] FH(0)]
[8.821586] FH(36)[01 00 07 01 89 00 00 00 00 00 00 00 e1 21 fe c4 9e 38 44 
dc bf 1b d5 95 d6 76 d6 d9 a7 3c 1b 80 33 38 e3 62]
[8.821601] filename: proc
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Regression with initramfs and nfsroot (appears to be in the dcache)

2012-11-29 Thread Patrick McLean

On 29/11/12 03:43 PM, Al Viro wrote:
 On Thu, Nov 29, 2012 at 02:53:13PM -0800, Patrick McLean wrote:
 On 29/11/12 02:21 PM, Al Viro wrote:
 On Thu, Nov 29, 2012 at 02:06:22PM -0800, Patrick McLean wrote:

 I have a trivial reproducer and am happy to help debug in any way that
 I can. That patch seems to fix the problem, and produces these
 warnings in dmesg:

 OK...  So we have differing entry-fh and NFS_FH(dentry-d_inode).  
 Something
 like
 static void dump_fh(const struct nfs_fh *fh)
 {
 int i;
 printk(KERN_INFO FH(%d), fh-size);
 for (i = 0; i  fh-size; i++)
 printk(KERN_CONT %c%02x, i ? ' ' : '[', fh-data[i]);
 printk(KERN_CONT ]\n);
 }
 with dump_fh(entry-fh); dump_fh(NFS_FH(dentry-d_inode)); added next to
 that WARN_ON(1) would probably be interesting.  And probably would make
 sense to print filename-name as well, to see which files it is about.
 
 [8.821584] FH(0)]
 [8.821586] FH(36)[01 00 07 01 89 00 00 00 00 00 00 00 e1 21 fe c4 9e 38 
 44 dc bf 1b d5 95 d6 76 d6 d9 a7 3c 1b 80 33 38 e3 62]
 [8.821601] filename: proc
 
 *whoa*
 
 So we have zero entry-fh-size?  No wonder it doesn't match...  Which NFS
 version it is?  entry-fh-size is set by nfs[34]_decode_dirent().

This is nfs v3 over TCP on Linus git at commit 
e9296e89b85604862bd9ec2d54dc43edad775c0d with nfs-utils-1.2.6 userspace.

 NFS folks: any ideas on best way to debug it?  The brute-force way would be
 to capture all NFS traffic with tcpdump and see what's going on, but that
 would be a lot of work...
 
 Looks like we have READDIRPLUS attempted and succeeded, but fhandle was not
 given.  Result: nfs_prime_dcache() is doing blind d_drop() on perfectly
 valid dentries, no matter how busy.
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Regression with initramfs and nfsroot (appears to be in the dcache)

2012-11-29 Thread Patrick McLean

On 29/11/12 04:35 PM, Al Viro wrote:
 On Thu, Nov 29, 2012 at 04:19:51PM -0800, Patrick McLean wrote:
 [8.821584] FH(0)]
 [8.821586] FH(36)[01 00 07 01 89 00 00 00 00 00 00 00 e1 21 fe c4 9e 
 38 44 dc bf 1b d5 95 d6 76 d6 d9 a7 3c 1b 80 33 38 e3 62]
 [8.821601] filename: proc

 *whoa*

 So we have zero entry-fh-size?  No wonder it doesn't match...  Which NFS
 version it is?  entry-fh-size is set by nfs[34]_decode_dirent().

 This is nfs v3 over TCP on Linus git at commit 
 e9296e89b85604862bd9ec2d54dc43edad775c0d with nfs-utils-1.2.6 userspace.
 
 So we have nfs3_decode_dirent(), stepping into
 /* In fact, a post_op_fh3: */
 p = xdr_inline_decode(xdr, 4);
 if (unlikely(p == NULL))
 goto out_overflow;
 if (*p != xdr_zero) {
 error = decode_nfs_fh3(xdr, entry-fh);
 if (unlikely(error)) {
 if (error == -E2BIG)
 goto out_truncated;
 return error;
 }
 } else
 zero_nfs_fh3(entry-fh);
 Interesting...  Server-side that should've been produced by
 encode_entryplus_baggage(), which looks like failing compose_entry_fh()...
 which has explicit
 if (d_mountpoint(dchild))
 goto out;
 resulting in ENOENT on everything that's overmounted on server.
 
 Do you, by any chance, have the server really exporting its own root
 filesystem?  Another thing to check: have nfs_prime_dcache() print
 filename.name of everything that fails nfs_same_entry() and has
 zero entry-fh-size, regardless of d_invalidate() results.

The server is running 3.6.6 and is just exporting a subdir of an xfs filesystem 
(which does not happen to be the root filesystem).

The client is running as a KVM guest on the machine that is serving the NFS. I 
am reproducing this by booting the guest via an initramfs, and doing
ls / at in single user mode.

I added a check that prints the filename.name of everything that fails 
nfs_same_file, and it appears to just be triggered by the same filesystems that
are triggering the WARN_ON, the relevant dmesg is below.

[9.495217] entry-fh-size is 0 on: proc
[9.495222] [ cut here ]
[9.495230] WARNING: at fs/nfs/dir.c:464 
nfs_readdir_page_filler+0x1ef/0x3eb()
[9.495232] Hardware name: Bochs
[9.495233] Modules linked in:
[9.495237] Pid: 655, comm: ls Not tainted 3.7.0-rc7+ #40
[9.495239] Call Trace:
[9.495247]  [8108534c] ? warn_slowpath_common+0x76/0x8a
[9.495250]  [8117deb4] ? nfs_readdir_page_filler+0x1ef/0x3eb
[9.495254]  [8117e6cc] ? nfs_readdir_xdr_to_array+0x1c0/0x22d
[9.495257]  [8117e755] ? nfs_readdir_filler+0x1c/0x6b
[9.495263]  [810dca9a] ? add_to_page_cache_lru+0x2c/0x36
[9.495266]  [8117e739] ? nfs_readdir_xdr_to_array+0x22d/0x22d
[9.495269]  [810dcbe3] ? do_read_cache_page+0x7d/0x12b
[9.495274]  [811274f8] ? sys_ioctl+0x7a/0x7a
[9.495277]  [810dccc6] ? read_cache_page+0x7/0x10
[9.495280]  [8117e8d1] ? nfs_readdir+0x12d/0x435
[9.495285]  [8118e69b] ? nfs3_xdr_dec_create3res+0xc5/0xc5
[9.495288]  [811274f8] ? sys_ioctl+0x7a/0x7a
[9.495291]  [811274f8] ? sys_ioctl+0x7a/0x7a
[9.495294]  [811277b3] ? vfs_readdir+0x6c/0xa7
[9.495298]  [811278da] ? sys_getdents+0x7e/0xdc
[9.495302]  [814ac829] ? system_call_fastpath+0x16/0x1b
[9.495304] ---[ end trace e502c5d56c594e85 ]---
[9.495306] FH(0)]
[9.495308] FH(36)[01 00 07 01 89 00 00 00 00 00 00 00 e1 21 fe c4 9e 38 44 
dc bf 1b d5 95 d6 76 d6 d9 a7 3c 1b 80 33 38 e3 62]
[9.495323] filename: proc
[9.495330] entry-fh-size is 0 on: dev
[9.495332] [ cut here ]
[9.495335] WARNING: at fs/nfs/dir.c:464 
nfs_readdir_page_filler+0x1ef/0x3eb()
[9.495336] Hardware name: Bochs
[9.495337] Modules linked in:
[9.495340] Pid: 655, comm: ls Tainted: GW3.7.0-rc7+ #40
[9.495341] Call Trace:
[9.495345]  [8108534c] ? warn_slowpath_common+0x76/0x8a
[9.495348]  [8117deb4] ? nfs_readdir_page_filler+0x1ef/0x3eb
[9.495351]  [8117e6cc] ? nfs_readdir_xdr_to_array+0x1c0/0x22d
[9.495354]  [8117e755] ? nfs_readdir_filler+0x1c/0x6b
[9.495358]  [810dca9a] ? add_to_page_cache_lru+0x2c/0x36
[9.495361]  [8117e739] ? nfs_readdir_xdr_to_array+0x22d/0x22d
[9.495364]  [810dcbe3] ? do_read_cache_page+0x7d/0x12b
[9.495368]  [811274f8] ? sys_ioctl+0x7a/0x7a
[9.495371]  [810dccc6] ? read_cache_page+0x7/0x10
[9.495373]  [8117e8d1] ? nfs_readdir+0x12d/0x435
[9.495377]  [8118e69b] ? nfs3_xdr_dec_create3res+0xc5/0xc5
[9.495380

Re: Regression with initramfs and nfsroot (appears to be in the dcache)

2012-11-29 Thread Patrick McLean

On 29/11/12 05:36 PM, Al Viro wrote:
 On Thu, Nov 29, 2012 at 04:57:19PM -0800, Patrick McLean wrote:
 Interesting...  Server-side that should've been produced by
 encode_entryplus_baggage(), which looks like failing compose_entry_fh()...
 which has explicit
 if (d_mountpoint(dchild))
 goto out;
 resulting in ENOENT on everything that's overmounted on server.

 Do you, by any chance, have the server really exporting its own root
 filesystem?  Another thing to check: have nfs_prime_dcache() print
 filename.name of everything that fails nfs_same_entry() and has
 zero entry-fh-size, regardless of d_invalidate() results.

 The server is running 3.6.6 and is just exporting a subdir of an xfs 
 filesystem (which does not happen to be the root filesystem).

 The client is running as a KVM guest on the machine that is serving the NFS. 
 I am reproducing this by booting the guest via an initramfs, and doing
 ls / at in single user mode.

 I added a check that prints the filename.name of everything that fails 
 nfs_same_file, and it appears to just be triggered by the same filesystems 
 that
 are triggering the WARN_ON, the relevant dmesg is below.
 
 [the same /dev, /proc and /sys]
 
   Very interesting.  Do you have anything mounted on the corresponding
 directories on server?  The picture looks like you are getting empty
 fhandles in readdir+ respons for exactly the same directories that happen
 to be mountpoints on client.  In any case, we shouldn't do that blind
 d_drop() - empty fhandles can happen.  The only remaining question is
 why do they happen on that set of entries.  From my reading of
 encode_entryplus_baggage() it looks like we have compose_entry_fh()
 failing for those entries and those entries alone.  One possible cause
 would be d_mountpoint(dchild) being true on server.  If it is true, we
 can declare the case closed; if not, I really wonder what's going on.

Those directories do have the server's own copies of the said directories bind 
mounted at the moment in a separate mount namespace.

Unmounting those directories on the server does appear to stop the WARN_ON from 
triggering.

 Note that if the same fs is mounted elsewhere, d_mountpoint() would mean
 that something is mounted on top of that directory in _some_ instance;
 not necessary the exported one.  Can you slap printks on fs/nfsd/nfs3xdr.c
 compose_entry_fh() failure exits and see which one triggers server-side?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Regression with initramfs and nfsroot (appears to be in the dcache)

2012-11-29 Thread Patrick McLean

On 29/11/12 06:00 PM, Al Viro wrote:
 On Thu, Nov 29, 2012 at 05:54:02PM -0800, Patrick McLean wrote:
 Very interesting.  Do you have anything mounted on the corresponding
 directories on server?  The picture looks like you are getting empty
 fhandles in readdir+ respons for exactly the same directories that happen
 to be mountpoints on client.  In any case, we shouldn't do that blind
 d_drop() - empty fhandles can happen.  The only remaining question is
 why do they happen on that set of entries.  From my reading of
 encode_entryplus_baggage() it looks like we have compose_entry_fh()
 failing for those entries and those entries alone.  One possible cause
 would be d_mountpoint(dchild) being true on server.  If it is true, we
 can declare the case closed; if not, I really wonder what's going on.

 Those directories do have the server's own copies of the said directories 
 bind mounted at the moment in a separate mount namespace.

 Unmounting those directories on the server does appear to stop the WARN_ON 
 from triggering.
 
 OK, that settles it.  WARN_ON() and printks in the area can be dropped;
 the right fix is below.  However, there's a similar place in cifs that
 also needs to be dealt with and I really, really wonder why the hell do
 we do d_drop() in nfs_revalidate_lookup().  It's not relevant in this
 bug, but I would like to understand what's wrong with simply returning
 0 from -d_revalidate() and letting the caller (in fs/namei.c) take care
 of unhashing, etc. itself.  Would make have_submounts() in there pointless
 as well - we could just return 0 and let d_invalidate() take care of the
 checks...  Trond?
 
 diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
 --- a/fs/nfs/dir.c
 +++ b/fs/nfs/dir.c
 @@ -450,7 +450,8 @@ void nfs_prime_dcache(struct dentry *parent, struct 
 nfs_entry *entry)
   nfs_refresh_inode(dentry-d_inode, entry-fattr);
   goto out;
   } else {
 - d_drop(dentry);
 + if (d_invalidate(dentry) != 0)
 + goto out;
   dput(dentry);
   }
   }

Excellent, thanks. Is there any chance this will make it to 3.7? Also we might 
want to cc stable@ on this as well since it is a regression in 3.6.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

BUG: at fs/inotify.c:182 set_dentry_child_flags()

2007-02-27 Thread Patrick McLean

I have this message in the dmesg on a mail server running on an xfs
filesystem. It appears to have happened at some point when nfsd was
restarted, but I can't seem to convince it to reproduce.

The machine is running Gentoo's 2.6.20 kernel.

BUG: at fs/inotify.c:182 set_dentry_child_flags()

Call Trace:
 [] set_dentry_child_flags+0xd4/0x132
 [] remove_watch_no_event+0x67/0x76
 [] inotify_remove_watch_locked+0x18/0x3b
 [] inotify_rm_wd+0x7e/0xa1
 [] sys_inotify_rm_watch+0x46/0x62
 [] system_call+0x7e/0x83

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

BUG: at fs/inotify.c:182 set_dentry_child_flags()

2007-02-27 Thread Patrick McLean

I have this message in the dmesg on a mail server running on an xfs
filesystem. It appears to have happened at some point when nfsd was
restarted, but I can't seem to convince it to reproduce.

The machine is running Gentoo's 2.6.20 kernel.

BUG: at fs/inotify.c:182 set_dentry_child_flags()

Call Trace:
 [802b7e2a] set_dentry_child_flags+0xd4/0x132
 [802b7eef] remove_watch_no_event+0x67/0x76
 [802b7f16] inotify_remove_watch_locked+0x18/0x3b
 [802b8004] inotify_rm_wd+0x7e/0xa1
 [802b8519] sys_inotify_rm_watch+0x46/0x62
 [80253e6e] system_call+0x7e/0x83

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

sorry about that (Old email address)

2001-04-05 Thread Patrick McLean


Sorry about that, that last email had an old address on it, this address should work 
for replies/cc's:

[EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

lockup/crash in 2.4.3 (kernel BUG at exit.c:458!)

2001-04-05 Thread Patrick McLean


After I installed 2.4.3, my system would seemingly randomly hadn, about once a 
day. It hung at least 3 times, but it looks like theres only entries in my 
syslog for 2 of those times, my system is an AMD Thununderbird 1Ghz, 256MB RAM,
VIA KX133A chipset (Abit KT7A), if there's any other info you need let me know,
and please CC me any responses, I'm not subscribed to the list.

I'm not exactly up on kernel internals, and I can't really provide any info about
what could have caused it.

Here's what showed up in my syslog:

Apr  1 08:05:00 chutz kernel: Unable to handle kernel paging request at virtual 
address 1030
Apr  1 08:05:00 chutz kernel:  printing eip:
Apr  1 08:05:00 chutz kernel: c01343a6
Apr  1 08:05:00 chutz kernel: *pde = 
Apr  1 08:05:00 chutz kernel: Oops: 0002
Apr  1 08:05:00 chutz kernel: CPU:0
Apr  1 08:05:00 chutz kernel: EIP:0010:[try_to_free_buffers+150/816]
Apr  1 08:05:00 chutz kernel: EFLAGS: 00210206
Apr  1 08:05:00 chutz kernel: eax: 1000   ebx: c9fb19c0   ecx:    edx: 
cfd5d740
Apr  1 08:05:00 chutz kernel: esi: c9fb19c0   edi: c9fb19c0   ebp:    esp: 
c1479f54
Apr  1 08:05:00 chutz kernel: ds: 0018   es: 0018   ss: 0018
Apr  1 08:05:00 chutz kernel: Process bdflush (pid: 5, stackpage=c1479000)
Apr  1 08:05:00 chutz kernel: Stack:  0003 c1479f88 ceff87c0 0008 
 0001 00200213
Apr  1 08:05:00 chutz kernel:0001 01de c012ad43  c1073490 
 0007 c0129e27
Apr  1 08:05:00 chutz kernel:c1073490   0004  
0001 2f99 
Apr  1 08:05:00 chutz kernel: Call Trace: [free_shortage+35/208] 
[page_launder+871/2208] [bdflush+140/288] [init+0/384] [init+0/384] 
[kernel_thread+38/48] [bdflush+0/288]
Apr  1 08:05:00 chutz kernel:
Apr  1 08:05:00 chutz kernel: Code: 89 50 30 8b 53 30 8b 03 89 02 c7 43 30 00 00 00 00 
8b 53 24
Apr  1 08:05:00 chutz kernel: kernel BUG at exit.c:458!
Apr  1 08:05:00 chutz kernel: invalid operand: 
Apr  1 08:05:00 chutz kernel: CPU:0
Apr  1 08:05:00 chutz kernel: EIP:0010:[do_exit+512/528]
Apr  1 08:05:00 chutz kernel: EFLAGS: 00010282
Apr  1 08:05:00 chutz kernel: eax: 001a   ebx:    ecx: 0001   edx: 
c0256308
Apr  1 08:05:00 chutz kernel: esi: c1478000   edi: 000b   ebp: c0218880   esp: 
c1479e40
Apr  1 08:05:00 chutz kernel: ds: 0018   es: 0018   ss: 0018
Apr  1 08:05:00 chutz kernel: Process bdflush (pid: 5, stackpage=c1479000)
Apr  1 08:05:00 chutz kernel: Stack: c0219c05 c0219c9c 01ca c0218880 c01077a9 
c02145e1 c021472d 
Apr  1 08:05:00 chutz kernel:0002 1030 c0110858 000b c1479f20 
0002  c1478000
Apr  1 08:05:00 chutz kernel:c1478000 c147200c 0047 00030001 c02cb620 
c1473000 0001 c02cb7b4
Apr  1 08:05:00 chutz kernel: Call Trace: [die+57/80] [do_page_fault+824/1056] 
[__make_request+311/1680] [__make_request+616/1680] [__make_request+640/1680] 
[ide_do_request+675/752] [do_page_fault+0/1056]
Apr  1 08:05:00 chutz kernel:[error_code+52/60] [try_to_free_buffers+150/816] 
[free_shortage+35/208] [page_launder+871/2208] [bdflush+140/288] [init+0/384] 
[init+0/384] [kernel_thread+38/48]
Apr  1 08:05:00 chutz kernel:[bdflush+0/288]
Apr  1 08:05:00 chutz kernel:
Apr  1 08:05:00 chutz kernel: Code: 0f 0b 83 c4 0c e9 57 fe ff ff 8d b6 00 00 00 00 55 
57 56 53
Apr  1 08:05:00 chutz kernel: kernel BUG at exit.c:458!
Apr  1 08:05:00 chutz kernel: invalid operand: 
Apr  1 08:05:00 chutz kernel: CPU:0
Apr  1 08:05:00 chutz kernel: EIP:0010:[do_exit+512/528]
Apr  1 08:05:00 chutz kernel: EFLAGS: 00013282
Apr  1 08:05:00 chutz kernel: eax: 001a   ebx:    ecx: 0001   edx: 
c0256308
Apr  1 08:05:00 chutz kernel: esi: c1478000   edi: 000b   ebp: c0218880   esp: 
c1479d18
Apr  1 08:05:00 chutz kernel: ds: 0018   es: 0018   ss: 0018
Apr  1 08:05:00 chutz kernel: Process bdflush (pid: 5, stackpage=c1479000)
Apr  1 08:05:00 chutz kernel: Stack: c0219c05 c0219c9c 01ca c1479e0c  
c0107ab0 c0218880 c1479e0c
Apr  1 08:05:00 chutz kernel: c0107ab0 c0107b62 000b c024ab80 
3000  c1479d64
Apr  1 08:05:00 chutz kernel: c1479d6c  c1479d74  
c024ab80 2000 00343538
Apr  1 08:05:00 chutz kernel: Call Trace: [do_invalid_op+0/192] [do_invalid_op+0/192] 
[do_invalid_op+178/192] [do_exit+512/528] [do_notify_parent+197/224] 
[vsprintf+908/960] [error_code+52/60]
Apr  1 08:05:00 chutz kernel:[do_exit+512/528] [die+57/80] 
[do_page_fault+824/1056] [__make_request+311/1680] [__make_request+616/1680] 
[__make_request+640/1680] [ide_do_request+675/752] [do_page_fault+0/1056]
Apr  1 08:05:00 chutz kernel:[error_code+52/60] [try_to_free_buffers+150/816] 
[free_shortage+35/208] [page_launder+871/2208] [bdflush+140/288] [init+0/384] 
[init+0/384] [kernel_thread+38/48]
Apr  1 08:05:00 chutz kernel:[bdflush+0/288]
Apr  1 08:05:00

lockup/crash in 2.4.3 (kernel BUG at exit.c:458!)

2001-04-05 Thread Patrick McLean


After I installed 2.4.3, my system would seemingly randomly hadn, about once a 
day. It hung at least 3 times, but it looks like theres only entries in my 
syslog for 2 of those times, my system is an AMD Thununderbird 1Ghz, 256MB RAM,
VIA KX133A chipset (Abit KT7A), if there's any other info you need let me know,
and please CC me any responses, I'm not subscribed to the list.

I'm not exactly up on kernel internals, and I can't really provide any info about
what could have caused it.

Here's what showed up in my syslog:

Apr  1 08:05:00 chutz kernel: Unable to handle kernel paging request at virtual 
address 1030
Apr  1 08:05:00 chutz kernel:  printing eip:
Apr  1 08:05:00 chutz kernel: c01343a6
Apr  1 08:05:00 chutz kernel: *pde = 
Apr  1 08:05:00 chutz kernel: Oops: 0002
Apr  1 08:05:00 chutz kernel: CPU:0
Apr  1 08:05:00 chutz kernel: EIP:0010:[try_to_free_buffers+150/816]
Apr  1 08:05:00 chutz kernel: EFLAGS: 00210206
Apr  1 08:05:00 chutz kernel: eax: 1000   ebx: c9fb19c0   ecx:    edx: 
cfd5d740
Apr  1 08:05:00 chutz kernel: esi: c9fb19c0   edi: c9fb19c0   ebp:    esp: 
c1479f54
Apr  1 08:05:00 chutz kernel: ds: 0018   es: 0018   ss: 0018
Apr  1 08:05:00 chutz kernel: Process bdflush (pid: 5, stackpage=c1479000)
Apr  1 08:05:00 chutz kernel: Stack:  0003 c1479f88 ceff87c0 0008 
 0001 00200213
Apr  1 08:05:00 chutz kernel:0001 01de c012ad43  c1073490 
 0007 c0129e27
Apr  1 08:05:00 chutz kernel:c1073490   0004  
0001 2f99 
Apr  1 08:05:00 chutz kernel: Call Trace: [free_shortage+35/208] 
[page_launder+871/2208] [bdflush+140/288] [init+0/384] [init+0/384] 
[kernel_thread+38/48] [bdflush+0/288]
Apr  1 08:05:00 chutz kernel:
Apr  1 08:05:00 chutz kernel: Code: 89 50 30 8b 53 30 8b 03 89 02 c7 43 30 00 00 00 00 
8b 53 24
Apr  1 08:05:00 chutz kernel: kernel BUG at exit.c:458!
Apr  1 08:05:00 chutz kernel: invalid operand: 
Apr  1 08:05:00 chutz kernel: CPU:0
Apr  1 08:05:00 chutz kernel: EIP:0010:[do_exit+512/528]
Apr  1 08:05:00 chutz kernel: EFLAGS: 00010282
Apr  1 08:05:00 chutz kernel: eax: 001a   ebx:    ecx: 0001   edx: 
c0256308
Apr  1 08:05:00 chutz kernel: esi: c1478000   edi: 000b   ebp: c0218880   esp: 
c1479e40
Apr  1 08:05:00 chutz kernel: ds: 0018   es: 0018   ss: 0018
Apr  1 08:05:00 chutz kernel: Process bdflush (pid: 5, stackpage=c1479000)
Apr  1 08:05:00 chutz kernel: Stack: c0219c05 c0219c9c 01ca c0218880 c01077a9 
c02145e1 c021472d 
Apr  1 08:05:00 chutz kernel:0002 1030 c0110858 000b c1479f20 
0002  c1478000
Apr  1 08:05:00 chutz kernel:c1478000 c147200c 0047 00030001 c02cb620 
c1473000 0001 c02cb7b4
Apr  1 08:05:00 chutz kernel: Call Trace: [die+57/80] [do_page_fault+824/1056] 
[__make_request+311/1680] [__make_request+616/1680] [__make_request+640/1680] 
[ide_do_request+675/752] [do_page_fault+0/1056]
Apr  1 08:05:00 chutz kernel:[error_code+52/60] [try_to_free_buffers+150/816] 
[free_shortage+35/208] [page_launder+871/2208] [bdflush+140/288] [init+0/384] 
[init+0/384] [kernel_thread+38/48]
Apr  1 08:05:00 chutz kernel:[bdflush+0/288]
Apr  1 08:05:00 chutz kernel:
Apr  1 08:05:00 chutz kernel: Code: 0f 0b 83 c4 0c e9 57 fe ff ff 8d b6 00 00 00 00 55 
57 56 53
Apr  1 08:05:00 chutz kernel: kernel BUG at exit.c:458!
Apr  1 08:05:00 chutz kernel: invalid operand: 
Apr  1 08:05:00 chutz kernel: CPU:0
Apr  1 08:05:00 chutz kernel: EIP:0010:[do_exit+512/528]
Apr  1 08:05:00 chutz kernel: EFLAGS: 00013282
Apr  1 08:05:00 chutz kernel: eax: 001a   ebx:    ecx: 0001   edx: 
c0256308
Apr  1 08:05:00 chutz kernel: esi: c1478000   edi: 000b   ebp: c0218880   esp: 
c1479d18
Apr  1 08:05:00 chutz kernel: ds: 0018   es: 0018   ss: 0018
Apr  1 08:05:00 chutz kernel: Process bdflush (pid: 5, stackpage=c1479000)
Apr  1 08:05:00 chutz kernel: Stack: c0219c05 c0219c9c 01ca c1479e0c  
c0107ab0 c0218880 c1479e0c
Apr  1 08:05:00 chutz kernel: c0107ab0 c0107b62 000b c024ab80 
3000  c1479d64
Apr  1 08:05:00 chutz kernel: c1479d6c  c1479d74  
c024ab80 2000 00343538
Apr  1 08:05:00 chutz kernel: Call Trace: [do_invalid_op+0/192] [do_invalid_op+0/192] 
[do_invalid_op+178/192] [do_exit+512/528] [do_notify_parent+197/224] 
[vsprintf+908/960] [error_code+52/60]
Apr  1 08:05:00 chutz kernel:[do_exit+512/528] [die+57/80] 
[do_page_fault+824/1056] [__make_request+311/1680] [__make_request+616/1680] 
[__make_request+640/1680] [ide_do_request+675/752] [do_page_fault+0/1056]
Apr  1 08:05:00 chutz kernel:[error_code+52/60] [try_to_free_buffers+150/816] 
[free_shortage+35/208] [page_launder+871/2208] [bdflush+140/288] [init+0/384] 
[init+0/384] [kernel_thread+38/48]
Apr  1 08:05:00 chutz kernel:[bdflush+0/288]
Apr  1 08:05:00

sorry about that (Old email address)

2001-04-05 Thread Patrick McLean


Sorry about that, that last email had an old address on it, this address should work 
for replies/cc's:

[EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

48 matches

Mail list logo